AI Existential Safety Community
Welcome to our AI existential safety community! On this page, you’ll find a growing group of AI researchers keen to ensure that AI remains safe and beneficial even if it eventually supercedes human ability on essentially all tasks.
How to join:
Vitalik Buterin Fellowships
If you’re considering applying for the Vitalik Buterin postdoctoral fellowships or PhD fellowships, please use this page as a resource for finding a faculty mentor. All awarded fellows receive automatic community membership.
AI Professors
If you’re a professor interested in free funding for grad students or postdocs working on AI existential safety, you can apply for community membership here.
Junior AI researchers
If you’re a more junior researcher working on AI existential safety, you’re also welcome to apply for membership here, to showcase your research areas, qualify for our hassle-free “minigrants” and get invited to our workshops and networking events.
Faculty Members
Mouse-over or tap a profile to reveal more information:
Prof. Alessandro Abate
Why do you care about AI Existential Safety?
My background in Formal Verification makes me aware of the importance of assuring safety in certain application domains: whilst this is usually done in engineering areas such as Aeronautics, Space, or Critical Infrastructures, also modern developments in AI, particularly concerning the interactions of machines and humans, bring to the fore the topic of safety assurance of AI systems, and of certification of control software for AI. This is being studied for single-agent systems (think of autonomous driving applications), but will become ever more relevant in the near future for newer, multi-agent setups, particularly involving humans. Understanding causes of risk, and potential preventative safety measures, that can be obtained for these engineering areas, can help us mitigate certain severe and unintended consequences. And whilst it is admittedly perhaps draconian to think of existential risks as we speak, engineers and computer scientist ought to be aware of potential future escalation of development and reach of AI systems.
Please give one or more examples of research interests relevant to AI existential safety:
With my research group at Oxford (OXCAV, oxcav.web,ox.ac.uk) I am engaged in a broad initiative on ‘Safe RL’, spanning issues including logically-constrained RL, inverse RL for non-Markovian and sparse tasks, multi-agent RL, and Bayesian or Bayes-adaptive (direct and inverse) RL. All these projects contribute to developing new learning architectures with certificates that in particular can encompass safety assurance. We are active in translating research into applications and in transferring research into new technological solutions in various safety-critical domains, such as Cyber-Physical Systems (CPS).
Prof. Samuel Bowman
Why do you care about AI Existential Safety?
Please give one or more examples of research interests relevant to AI existential safety:
Email:bowman@nyu.edu
Prof. Anca Dragan
Why do you care about AI Existential Safety?
Please give one or more examples of research interests relevant to AI existential safety:
Email:anca@berkeley.edu
Prof. Jaime Fernandez Fisac
Why do you care about AI Existential Safety?
My research focuses on understanding how autonomous systems—from individual robots to large-scale intelligent infrastructure—can actively ensure safety during their operation. This requires us to engage with the complex interactions between AI systems and their human users and stakeholders, which often induce subtle and hard to model feedback loops. My group seeks to do this by bringing together analytical foundations from control and dynamic game theory with algorithmic tools from optimization and machine learning.
Our general contention is that AI systems need not achieve the over-debated threshold of superintelligent or human-level capability in order to pose a catastrophic risk to human society. In fact, the rampant ideological polarization spurred by rudimentary but massively deployed content recommendation algorithms already gives us painful evidence of the destabilizing power of large-scale socio-technical feedback loops over time scales of just a handful of years.
Please give one or more examples of research interests relevant to AI existential safety:
Our research group is currently trying to shed light on what we think is one of the most pressing dangers presaged by the increasing power and reach of AI technologies. The conjunction of large-scale language models like GPT-3 with advanced strategic decision-making systems like AlphaZero can bring about a plethora of extremely effective AI text-generation systems with the ability to produce compelling arguments in support of arbitrary ideas, whether true, false, benign or malicious.
Through continued interactions with many millions of users, such systems could quickly learn to produce statements that are highly likely to elicit the desired human response, belief or action. That is, these systems will reliably say whatever they need to say to achieve their goal: we call this Machine Bullshit, after Harry Frankfurt’s excellent 1986 philosophical essay “On Bullshit”. If not properly understood and mitigated, this technology could result in a large-scale behavior manipulation device far more effective than subliminal advertising, and far more damaging than “deep fakes” in the hands of malicious actors.
In order to detect and mitigate future AI systems’ ability to generate false-yet-convincing arguments, we have begun by creating a language model benchmark test called “Convince Me”, as part of the Google/OpenAI-led BIG-bench effort. The task measures a system’s ability to sway the belief of one or multiple interlocutors (whether human or automated) regarding a collection of true and false claims. Although the intended purpose of the benchmark is to evaluate future goal-driven AI text-generation systems, our preliminary results on state-of-the-art language models suggest that even naive (purely imitative) large-scale models like GPT-3 are disturbingly good at producing compelling arguments for false statements.
Email:jfisac@princeton.edu
Prof. Roger Grosse
Why do you care about AI Existential Safety?
Please give one or more examples of research interests relevant to AI existential safety:
- Incentivizing neural networks to give answers which are easily checkable. We are doing this using prover-verifier games for which the equilibrium requires finding a proof system.
- Understanding (in terms of neural net architectures) when mesa-optimizers are likely to arise, their patterns of generalization, and how this should inform the design of a learning algorithm.
- Better tools for understanding neural networks.
- Better understanding of neural net scaling laws (which are an important input to AI forecasting).
Email:rgrosse@cs.toronto.edu
Prof. Dylan Hadfield-Menell
Why do you care about AI Existential Safety?
Please give one or more examples of research interests relevant to AI existential safety:
Prof. The Anh Han
Why do you care about AI Existential Safety?
AI technologies can pose significant global risks to our civilization (which can be even existential), if not safely developed and appropriately regulated. In my research group, we have developed computational models (both analytic and simulated) that capture key factors of an AI development race, revealing which strategic behaviors regarding safety compliance would likely emerge in different conditions and hypothetical scenarios of the race, and how incentives can be used to drive the race into a more positive direction. This research is part of a FLI funded AI Safety grant (https://futureoflife.org/2018-ai-grant-recipients/#Han).
For development of suitable and realistic models, it is important to capture different scenarios and contexts of AI safety development (e.g., what is the relationship between safety technologies and AI capacity and the level of risks of AI systems), so as to provide suitable regulatory actions. On the other hand, our behavioral modelling work informs e.g. what is the acceptable level of risk without leading to unncessary regulation (i.e. over-regulation).
I believe it’s important to be part of this community to learn about AI Safety research and to inform my own research agenda on AI development race/competition modelling.
Please give one or more examples of research interests relevant to AI existential safety:
My relevant research interest is to understand the dynamics of cooperation and competition of AI safety development behaviours (e.g., by companies, governments) and how incentives such as reward of safety-compliant behaviours and punishment of non-compliant ones can improve safety behaviour.
Some of my relevant publications in this direction:
1) T. A. Han, L. M. Pereira, F. C. Santos and T. Lenaerts. To Regulate or Not: A Social Dynamics Analysis of an Idealised AI Race. Vol 69, pages 881-921, Journal of Artificial Intelligence Research, 2020.
Link to publication:
https://jair.org/index.php/jair/article/view/12225
2) T. A. Han, L. M. Pereira, T. Lenaerts and F. C. Santos. Mediating artificial intelligence developments through negative and positive incentives. PloS one 16.1 (2021): e0244592.
Link to publication:
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244592
3) T. A. Han, L. M. Pereira, T. Lenaerts. Modelling and Influencing the AI Bidding War: A Research Agenda. AAAI/ACM conference on AI, Ethics and Society, pages 5-11, Honolulu, Hawaii, 2019.
Link to publication:
https://dl.acm.org/doi/abs/10.1145/3306618.3314265
4) A press release article by TheConversation: https://theconversation.com/ai-developers-often-ignore-safety-in-the-pursuit-of-a-breakthrough-so-how-do-we-regulate-them-without-blocking-progress-155825?utm_source=twitter&utm_medium=bylinetwitterbutton
5) A preprint showing the impact of network structures on the AI race dynamics and safety behavioral outcome
Link: https://arxiv.org/abs/2012.15234
6) A preprint showing our analysis of a new proposal for AI regulation and governance through voluntary safety commitments
Link: https://arxiv.org/abs/2104.03741
Email:t.han@tees.ac.uk
Prof. David Krueger
Why do you care about AI Existential Safety?
I got into AI because I was worried about the societal impacts of advanced AI systems, and x-risk in particular. We are not prepared – as a field, society, or species – for AGI, prepotent AI, or many other possible forms of transformative AI. This is an unprecedented global coordination challenge. Technical research may play an important role, but is unlikely to play a decisive one. I consider addressing this problem an ethical priority.
Please give one or more examples of research interests relevant to AI existential safety:
The primary goal of my work is to increase AI existential safety. My main areas of expertise are Deep Learning and AI Alignment. I am also interested in governance and technical areas relevant for global coordination, such as mechanism design.
I am interested in any areas relevant to AI x-safety. My main interests at the moment are in:
- New questions and possibilities presented by large “foundation models” and other putative “proto-AGI” systems. For instance, Machine Learning-based Alignment researchers have emphasized our ability to inspect and train models. But foundation models roughly match the “classic” threat model of “there is a misaligned black-box AI agent that we need to somehow do something aligned with”. An important difference is that these models do not appear to be “agentic” and are trained offline. Will foundation models exhibit emergent forms of agency, e.g. due to mesa-optimization? Will models trained offline understand the world properly, or will they suffer from spurious dependencies and causal confusion? How can we safely leverage the capabilities of misaligned foundation models?
- Understanding Deep Learning, especially learning and generalization, especially systematic and out-of-distribution generalization, and especially invariant prediction. (How) can we get Deep Learning systems to understand and view the world the same way humans do? How can we get them to generalize in the ways we intend?
- Preference Learning, especially Reward Modelling. I think reward modelling is among the most promising approaches to alignment in the short term, although it would likely require good generalization (2), and still involves using Reinforcement Learning, with the attendant concerns about instrumental goals (4).
- Controlling instrumental goals, e.g. to manipulate users of content recommendation systems, e.g. by studying and managing incentives of AI systems. Can we find ways for AI systems to do long-term planning that don’t engender dangerous instrumental goals?
Some quick thoughts on governance and global coordination:
- A key challenge seems to be: clearly defining which categories of systems should be subject to which kind of oversight, standards, or regulation. For instance, “automated decision-making (ADM)” seems like a crisper concept than “Artificial Intelligence” at the moment, but neither category is fine-grained enough.
- I think we will need substantial involvement from AI experts in governance, and expect most good work to be highly interdisciplinary. I would like to help promote such research.
- I am optimistic about the vision of RadicalXChange as a direction for solving coordination problems in the longer run.
Prof. Sharon Li
Why do you care about AI Existential Safety?
As artificial intelligence reaches society at large, the need for safe and reliable decision-making is increasingly critical. This requires intelligent systems to have an awareness of uncertainty and a mandate to confront unknown situations with caution. Yet for many decades, machine learning methods commonly have made the closed-world assumption—the test data is drawn from the same distribution as the training data (i.e., in-distribution data). Such an idealistic assumption rarely holds true in the open world, where test inputs can naturally arise from unseen categories that were not in the training data. When such a discrepancy occurs, algorithms that classify OOD samples as one of the in-distribution (ID) classes can be catastrophic. For example, a medical AI system trained on a certain set of diseases (ID) may encounter a different disease (OOD) and can cause mistreatment if not handled cautiously. Unfortunately, modern deep neural networks can produce overconfident predictions on OOD data, which raises significant reliability concerns. In my research, I deeply care about improving the safety and reliability of modern machine learning models in deployment.
Please give one or more examples of research interests relevant to AI existential safety:
My broad research interests are in deep learning and machine learning. My time in both academia and industry has shaped my view and approach in research. The goal of my research is to enable transformative algorithms and practices towards safe and reliable open-world learning, which can function safely and adaptively in the presence of evolving and unpredictable data streams. My works explore, understand, and mitigate the many challenges where failure modes can naturally occur in deploying machine learning models in the open world. Research topics that I am currently focusing on include: (1) Out-of-distribution uncertainty estimation for reliable decision-making; (2) Uncertainty-aware deep learning in healthcare and computer vision; (3) Open-world deep learning.
My research stands to benefit a wide range of societal activities and systems that range from AI services (e.g., content understanding) to transportation (e.g., autonomous vehicles), finance (e.g., risk management), and healthcare (e.g., medical diagnosis).
Email:sharonli@cs.wisc.edu
Prof. Tegan Maharaj
Why do you care about AI Existential Safety?
Life on earth has evolved over such a very long time, robust to such a huge range of earth conditions, that it’s easy to feel it will always be there, that it will continue in some way or another no matter what happens. But it might not. Everything *could* go wrong.
And in fact a lot is already going wrong — humans’ actions are changing the world more rapidly than it’s ever changed, and we are decimating the diversity of earth’s ecosystem. Time to adapt and multiple redundancy have been crucial to the adaptability and robustness of life in the past. AI systems afford the possibility of changing things even more rapidly, in ways we have decreasing understanding of and control over.
It’s not pleasant to think about everything going wrong, but once one accepts that it could, it sure feels better to try to do something to help make sure it doesn’t.
Please give one or more examples of research interests relevant to AI existential safety:
We are in a critical period of the development of AI systems, where we are beginning to see important societal issues with their use, but also great promise for societal good, generating widespread will to regulate & govern AI systems responsibly. I think there’s a real possibility of doing this right if we act now, and I hope to help make that happen.
These are my short (1-5 year) research foci:
(1) Theoretical results and experiments which help better understand robustness and generalization behaviour in more realistic settings, with a focus on representation learning and out-of-distribution data. E.g. average-case generalization and sample-complexity bounds, measuring OOD robustness, time-to-failure analysis, measuring ‘representativeness’.
(2) Practical methods for safe and responsible development of AI, with a focus on alignment and dealing with distributional shift. E.g. unit tests for particular (un)desirable behaviours that could enable 3rd-party audits, sandboxes for evaluating AI systems prior to deployment and guiding design of randomized control trials, generalization suites.
(3) Popularization and specification of novel problem settings, with baseline results, for AI systems addressing important societal problems (e.g. pricing negative externalities or estimating individual-level impact of climate change, pollution, epidemic disease, or polarization in content recommendation), with a focus on common-good problems.
Email:tegan.jrm@gmail.com
Prof. Michael Osborne
Why do you care about AI Existential Safety?
Please give one or more examples of research interests relevant to AI existential safety:
I aim to apply the probabilistic numeric framework to the identification and communication of computational errors within composite AI systems. Probabilistic numerical methods offer the promise of monitoring assumptions in running computations, yielding a monitoring regime that can safely interrupt algorithms overwhelmed by their task’s complexity. This approach will allow AI systems to monitor the extent to which their own internal model matches external data, and to respond appropriately cautiously.
Email:mosb@robots.ox.ac.uk
Prof. Stuart Russell
Why do you care about AI Existential Safety?
It is increasingly important to ask, “What if we succeed?” Our intelligence gives us power over the world and over other species; we will eventually build systems with superhuman intelligence; therefore, we face the problem of retaining power, forever, over entities that are far more powerful than ourselves.
Please give one or more examples of research interests relevant to AI existential safety:
Rebuilding AI on a new and broader foundation, with the goal of creating AI systems that are provably beneficial to humans.
Email:russell@berkeley.edu
Prof. Bart Selman
AI capabilities are increasing rapidly. This opens up exciting opportunities to address many of society’s challenges. However, we also need to recognize that we cannot fully understand the future path of AI. So we need to devote research resources to guard against potential existential risks.
Please give one or more examples of research interests relevant to AI existential safety:
My most closely related research interests are in deep RL, particularly concerning concerning challenges of safety and interpretability.
Email:selman@cs.cornell.edu
Prof. Jacob Noah Steinhardt
Why do you care about AI Existential Safety?
Please give one or more examples of research interests relevant to AI existential safety:
Email:jsteinhardt@berkeley.edu
Prof. Max Tegmark
Why do you care about AI Existential Safety?
Please give one or more examples of research interests relevant to AI existential safety:
Email:tegmark@mit.edu
Prof. Victor Veitch
Why do you care about AI Existential Safety?
I’m generally concerned with doing work that has the greatest impact on human wellbeing. I think it’s plausible that we can achieve strong AI in the near-term future. This will have a major impact on the rest of human history – so, we should get it right. As a pleasant bonus, I find that working on AI Safety leads to problems that are of fundamental importance to our understanding of machine learning and AI generally.
Please give one or more examples of research interests relevant to AI existential safety:
My main current interest in this area is the application of causality to trustworthy machine learning. Informally, the causal structure of the world seems key to making sound decisions, and so causal reasoning must be a key component of any future AI system. Accordingly, determining exactly how causal understanding can be baked into systems – and in particular how this affects their trustworthiness – is key. Additionally, this research programme offers insight into near-term trustworthiness problems, which can offer concrete directions for development. For example, the tools of causal inference play a key role in understanding domain shift, the failures of machine-learning models under (apparently) benign perturbations of input date, and in explaining (and enforcing) the rationale for decisions made by machine learning systems. For a concrete example of this type of work, see here.
Email:victorveitch@gmail.com