AI Existential Safety Community

Welcome to our AI existential safety community! On this page, you’ll find a growing group of AI researchers keen to ensure that AI remains safe and beneficial even if it eventually supercedes human ability on essentially all tasks.

How to join:

Vitalik Buterin Fellowships

If you’re considering applying for the Vitalik Buterin postdoctoral fellowships or PhD fellowships, please use this page as a resource for finding a faculty mentor. All awarded fellows receive automatic community membership.

AI Professors

If you’re a professor interested in free funding for grad students or postdocs working on AI existential safety, you can apply for community membership here.

Junior AI researchers

If you’re a more junior researcher working on AI existential safety, you’re also welcome to apply for membership here, to showcase your research areas, qualify for our hassle-free “minigrants” and get invited to our workshops and networking events.

Faculty Members

Mouse-over or tap a profile to reveal more information:

University of Oxford

Prof. Alessandro Abate

New York University

Prof. Samuel Bowman

UC Berkeley

Prof. Anca Dragan

Princeton University

Prof. Jaime Fernandez Fisac

University of Toronto

Prof. Roger Grosse

Massachusetts Institute of Technology

Prof. Dylan Hadfield-Menell

Teesside University

Prof. The Anh Han

University of Cambridge

Prof. David Krueger

University of Wisconsin - Madison

Prof. Sharon Li

University of Toronto

Prof. Tegan Maharaj

University of Oxford

Prof. Michael Osborne

UC Berkeley

Prof. Stuart Russell

Cornell University

Prof. Bart Selman

UC Berkeley

Prof. Jacob Noah Steinhardt

Massachusetts Institute of Technology

Prof. Max Tegmark

University of Chicago

Prof. Victor Veitch

University of Oxford

Prof. Alessandro Abate

Why do you care about AI Existential Safety?

My background in Formal Verification makes me aware of the importance of assuring safety in certain application domains: whilst this is usually done in engineering areas such as Aeronautics, Space, or Critical Infrastructures, also modern developments in AI, particularly concerning the interactions of machines and humans, bring to the fore the topic of safety assurance of AI systems, and of certification of control software for AI. This is being studied for single-agent systems (think of autonomous driving applications), but will become ever more relevant in the near future for newer, multi-agent setups, particularly involving humans. Understanding causes of risk, and potential preventative safety measures, that can be obtained for these engineering areas, can help us mitigate certain severe and unintended consequences. And whilst it is admittedly perhaps draconian to think of existential risks as we speak, engineers and computer scientist ought to be aware of potential future escalation of development and reach of AI systems.

Please give one or more examples of research interests relevant to AI existential safety:

With my research group at Oxford (OXCAV, oxcav.web,ox.ac.uk) I am engaged in a broad initiative on ‘Safe RL’, spanning issues including logically-constrained RL, inverse RL for non-Markovian and sparse tasks, multi-agent RL, and Bayesian or Bayes-adaptive (direct and inverse) RL. All these projects contribute to developing new learning architectures with certificates that in particular can encompass safety assurance. We are active in translating research into applications and in transferring research into new technological solutions in various safety-critical domains, such as Cyber-Physical Systems (CPS).

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
New York University

Prof. Samuel Bowman

Why do you care about AI Existential Safety?

I find it likely that state-of-the-art machine learning systems will continue to be deployed in increasingly high-stakes settings as their capabilities continue to improve, and that this trend will continue even if these systems are not conclusively shown to be robust, leading to potentially catastrophic accidents. I also find it plausible that more powerful future systems could share building blocks in common with current technology, making it especially worthwhile to identify potentially dangerous or surprising failure modes in current technology and to develop scalable ways of mitigating these issues.

Please give one or more examples of research interests relevant to AI existential safety:

My group generally works with neural network models for language (and potentially similar multimodal models), with a focus on benchmarking, data collection, human feedback, and empirical analysis, rather than model design, theory, or systems research. Within these constraints, I’m broadly interested in work that helps to document and mitigate potential negative impacts from these systems, especially impacts that we expect may become more serious as models become more capable. I’m also open to co-advising students who are interested in these risks but are looking to pursue a wider range of methods.
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
UC Berkeley

Prof. Anca Dragan

Why do you care about AI Existential Safety?

I am highly skeptical that we can extrapolate current progress in AI to “general AI” anytime soon. However, I believe that the current AI paradigms fail to enable us to design AI agents in ways that avoid negative side effects — which can be catastrophic for society even when using capable yet narrow AI tools. I believe we need to think about the design of AI systems differently, and empower designers to anticipate and avoid undesired outcomes.

Please give one or more examples of research interests relevant to AI existential safety:

My research interests in AI safety include value alignment or preference learning, including accounting for human biases and suboptimality; assistive agents that empower people without having to infer their intentions; and robustness of learned rewards and of predictive human policies.
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
Princeton University

Prof. Jaime Fernandez Fisac

Why do you care about AI Existential Safety?

My research focuses on understanding how autonomous systems—from individual robots to large-scale intelligent infrastructure—can actively ensure safety during their operation. This requires us to engage with the complex interactions between AI systems and their human users and stakeholders, which often induce subtle and hard to model feedback loops. My group seeks to do this by bringing together analytical foundations from control and dynamic game theory with algorithmic tools from optimization and machine learning.

Our general contention is that AI systems need not achieve the over-debated threshold of superintelligent or human-level capability in order to pose a catastrophic risk to human society. In fact, the rampant ideological polarization spurred by rudimentary but massively deployed content recommendation algorithms already gives us painful evidence of the destabilizing power of large-scale socio-technical feedback loops over time scales of just a handful of years.

Please give one or more examples of research interests relevant to AI existential safety:
Our research group is currently trying to shed light on what we think is one of the most pressing dangers presaged by the increasing power and reach of AI technologies. The conjunction of large-scale language models like GPT-3 with advanced strategic decision-making systems like AlphaZero can bring about a plethora of extremely effective AI text-generation systems with the ability to produce compelling arguments in support of arbitrary ideas, whether true, false, benign or malicious.

Through continued interactions with many millions of users, such systems could quickly learn to produce statements that are highly likely to elicit the desired human response, belief or action. That is, these systems will reliably say whatever they need to say to achieve their goal: we call this Machine Bullshit, after Harry Frankfurt’s excellent 1986 philosophical essay “On Bullshit”. If not properly understood and mitigated, this technology could result in a large-scale behavior manipulation device far more effective than subliminal advertising, and far more damaging than “deep fakes” in the hands of malicious actors.

In order to detect and mitigate future AI systems’ ability to generate false-yet-convincing arguments, we have begun by creating a language model benchmark test called “Convince Me”, as part of the Google/OpenAI-led BIG-bench effort. The task measures a system’s ability to sway the belief of one or multiple interlocutors (whether human or automated) regarding a collection of true and false claims. Although the intended purpose of the benchmark is to evaluate future goal-driven AI text-generation systems, our preliminary results on state-of-the-art language models suggest that even naive (purely imitative) large-scale models like GPT-3 are disturbingly good at producing compelling arguments for false statements.

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
University of Toronto

Prof. Roger Grosse

Why do you care about AI Existential Safety?

Humanity has produced some powerful and dangerous technologies, but as of yet none that deliberately pursued long-term goals that may be at odds with our own. If we succeed in building machines smarter than ourselves — as seems likely to happen in the next few decades — our only hope for a good outcome is if we prepare well in advance.

Please give one or more examples of research interests relevant to AI existential safety:

So far, my research has primarily focused on understanding and improving neural networks, and my research style can be thought of as theory-driven empiricism. I’m intending to focus on safety as much as I can while maintaining the quality of the research. Here are some of my group’s current and planned AI safety research directions, which build on our expertise in deep learning:

  • Incentivizing neural networks to give answers which are easily checkable. We are doing this using prover-verifier games for which the equilibrium requires finding a proof system.
  • Understanding (in terms of neural net architectures) when mesa-optimizers are likely to arise, their patterns of generalization, and how this should inform the design of a learning algorithm.
  • Better tools for understanding neural networks.
  • Better understanding of neural net scaling laws (which are an important input to AI forecasting).
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
Massachusetts Institute of Technology

Prof. Dylan Hadfield-Menell

Why do you care about AI Existential Safety?

With AI systems, you often get what you can measure. This creates a structural bias towards simpler measures of value and runs the risk of diverting more and more resources towards these simple goals. My interest in existential safety comes from a desire to make sure that technology supports and nurtures a rich and diverse set of values.

Please give one or more examples of research interests relevant to AI existential safety:

I work on the theoretical and practical study of machine alignment. This includes: methods for value learning from observations; algorithms to optimize uncertain objectives; formal analysis of design/oversight strategies for AI systems; as well as the study of incomplete goal specifications and corresponding consequences of overoptimization.
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
Teesside University

Prof. The Anh Han

Why do you care about AI Existential Safety?

AI technologies can pose significant global risks to our civilization (which can be even existential), if not safely developed and appropriately regulated. In my research group, we have developed computational models (both analytic and simulated) that capture key factors of an AI development race, revealing which strategic behaviors regarding safety compliance would likely emerge in different conditions and hypothetical scenarios of the race, and how incentives can be used to drive the race into a more positive direction. This research is part of a FLI funded AI Safety grant (https://futureoflife.org/2018-ai-grant-recipients/#Han).

For development of suitable and realistic models, it is important to capture different scenarios and contexts of AI safety development (e.g., what is the relationship between safety technologies and AI capacity and the level of risks of AI systems), so as to provide suitable regulatory actions. On the other hand, our behavioral modelling work informs e.g. what is the acceptable level of risk without leading to unncessary regulation (i.e. over-regulation).

I believe it’s important to be part of this community to learn about AI Safety research and to inform my own research agenda on AI development race/competition modelling.

Please give one or more examples of research interests relevant to AI existential safety:

My relevant research interest is to understand the dynamics of cooperation and competition of AI safety development behaviours (e.g., by companies, governments) and how incentives such as reward of safety-compliant behaviours and punishment of non-compliant ones can improve safety behaviour.

Some of my relevant publications in this direction:

1) T. A. Han, L. M. Pereira, F. C. Santos and T. Lenaerts. To Regulate or Not: A Social Dynamics Analysis of an Idealised AI Race. Vol 69, pages 881-921, Journal of Artificial Intelligence Research, 2020.
Link to publication:
https://jair.org/index.php/jair/article/view/12225

2) T. A. Han, L. M. Pereira, T. Lenaerts and F. C. Santos. Mediating artificial intelligence developments through negative and positive incentives. PloS one 16.1 (2021): e0244592.
Link to publication:
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244592

3) T. A. Han, L. M. Pereira, T. Lenaerts. Modelling and Influencing the AI Bidding War: A Research Agenda. AAAI/ACM conference on AI, Ethics and Society, pages 5-11, Honolulu, Hawaii, 2019.
Link to publication:
https://dl.acm.org/doi/abs/10.1145/3306618.3314265

4) A press release article by TheConversation: https://theconversation.com/ai-developers-often-ignore-safety-in-the-pursuit-of-a-breakthrough-so-how-do-we-regulate-them-without-blocking-progress-155825?utm_source=twitter&utm_medium=bylinetwitterbutton

5) A preprint showing the impact of network structures on the AI race dynamics and safety behavioral outcome
Link: https://arxiv.org/abs/2012.15234

6) A preprint showing our analysis of a new proposal for AI regulation and governance through voluntary safety commitments
Link: https://arxiv.org/abs/2104.03741

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
University of Cambridge

Prof. David Krueger

Why do you care about AI Existential Safety?

I got into AI because I was worried about the societal impacts of advanced AI systems, and x-risk in particular. We are not prepared – as a field, society, or species – for AGI, prepotent AI, or many other possible forms of transformative AI. This is an unprecedented global coordination challenge. Technical research may play an important role, but is unlikely to play a decisive one.  I consider addressing this problem an ethical priority.

Please give one or more examples of research interests relevant to AI existential safety:

The primary goal of my work is to increase AI existential safety. My main areas of expertise are Deep Learning and AI Alignment. I am also interested in governance and technical areas relevant for global coordination, such as mechanism design.

I am interested in any areas relevant to AI x-safety. My main interests at the moment are in:

  1. New questions and possibilities presented by large “foundation models” and other putative “proto-AGI” systems. For instance, Machine Learning-based Alignment researchers have emphasized our ability to inspect and train models. But foundation models roughly match the “classic” threat model of “there is a misaligned black-box AI agent that we need to somehow do something aligned with”. An important difference is that these models do not appear to be “agentic” and are trained offline.  Will foundation models exhibit emergent forms of agency, e.g. due to mesa-optimization?  Will models trained offline understand the world properly, or will they suffer from spurious dependencies and causal confusion?  How can we safely leverage the capabilities of misaligned foundation models?
  2. Understanding Deep Learning, especially learning and generalization, especially systematic and out-of-distribution generalization, and especially invariant prediction.  (How) can we get Deep Learning systems to understand and view the world the same way humans do?  How can we get them to generalize in the ways we intend?
  3. Preference Learning, especially Reward Modelling.  I think reward modelling is among the most promising approaches to alignment in the short term, although it would likely require good generalization (2), and still involves using Reinforcement Learning, with the attendant concerns about instrumental goals (4).
  4. Controlling instrumental goals, e.g. to manipulate users of content recommendation systems, e.g. by studying and managing incentives of AI systems.  Can we find ways for AI systems to do long-term planning that don’t engender dangerous instrumental goals?

Some quick thoughts on governance and global coordination: 

  1. A key challenge seems to be: clearly defining which categories of systems should be subject to which kind of oversight, standards, or regulation.  For instance, “automated decision-making (ADM)” seems like a crisper concept than “Artificial Intelligence” at the moment, but neither category is fine-grained enough. 
  2. I think we will need substantial involvement from AI experts in governance, and expect most good work to be highly interdisciplinary.  I would like to help promote such research.
  3. I ​am optimistic about the vision of RadicalXChange as a direction for solving coordination problems in the longer run.
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
University of Wisconsin - Madison

Prof. Sharon Li

Why do you care about AI Existential Safety?

As artificial intelligence reaches society at large, the need for safe and reliable decision-making is increasingly critical. This requires intelligent systems to have an awareness of uncertainty and a mandate to confront unknown situations with caution. Yet for many decades, machine learning methods commonly have made the closed-world assumption—the test data is drawn from the same distribution as the training data (i.e., in-distribution data). Such an idealistic assumption rarely holds true in the open world, where test inputs can naturally arise from unseen categories that were not in the training data. When such a discrepancy occurs, algorithms that classify OOD samples as one of the in-distribution (ID) classes can be catastrophic. For example, a medical AI system trained on a certain set of diseases (ID) may encounter a different disease (OOD) and can cause mistreatment if not handled cautiously. Unfortunately, modern deep neural networks can produce overconfident predictions on OOD data, which raises significant reliability concerns. In my research, I deeply care about improving the safety and reliability of modern machine learning models in deployment.

Please give one or more examples of research interests relevant to AI existential safety:

My broad research interests are in deep learning and machine learning. My time in both academia and industry has shaped my view and approach in research. The goal of my research is to enable transformative algorithms and practices towards safe and reliable open-world learning, which can function safely and adaptively in the presence of evolving and unpredictable data streams. My works explore, understand, and mitigate the many challenges where failure modes can naturally occur in deploying machine learning models in the open world. Research topics that I am currently focusing on include: (1) Out-of-distribution uncertainty estimation for reliable decision-making; (2) Uncertainty-aware deep learning in healthcare and computer vision; (3) Open-world deep learning.

My research stands to benefit a wide range of societal activities and systems that range from AI services (e.g., content understanding) to transportation (e.g., autonomous vehicles), finance (e.g., risk management), and healthcare (e.g., medical diagnosis).

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
University of Toronto

Prof. Tegan Maharaj

Why do you care about AI Existential Safety?

Life on earth has evolved over such a very long time, robust to such a huge range of earth conditions, that it’s easy to feel it will always be there, that it will continue in some way or another no matter what happens. But it might not. Everything *could* go wrong.

And in fact a lot is already going wrong — humans’ actions are changing the world more rapidly than it’s ever changed, and we are decimating the diversity of earth’s ecosystem. Time to adapt and multiple redundancy have been crucial to the adaptability and robustness of life in the past. AI systems afford the possibility of changing things even more rapidly, in ways we have decreasing understanding of and control over.

It’s not pleasant to think about everything going wrong, but once one accepts that it could, it sure feels better to try to do something to help make sure it doesn’t.

Please give one or more examples of research interests relevant to AI existential safety:

We are in a critical period of the development of AI systems, where we are beginning to see important societal issues with their use, but also great promise for societal good, generating widespread will to regulate & govern AI systems responsibly. I think there’s a real possibility of doing this right if we act now, and I hope to help make that happen.

These are my short (1-5 year) research foci:

(1) Theoretical results and experiments which help better understand robustness and generalization behaviour in more realistic settings, with a focus on representation learning and out-of-distribution data. E.g. average-case generalization and sample-complexity bounds, measuring OOD robustness, time-to-failure analysis, measuring ‘representativeness’.

(2) Practical methods for safe and responsible development of AI, with a focus on alignment and dealing with distributional shift. E.g. unit tests for particular (un)desirable behaviours that could enable 3rd-party audits, sandboxes for evaluating AI systems prior to deployment and guiding design of randomized control trials, generalization suites.

(3) Popularization and specification of novel problem settings, with baseline results, for AI systems addressing important societal problems (e.g. pricing negative externalities or estimating individual-level impact of climate change, pollution, epidemic disease, or polarization in content recommendation), with a focus on common-good problems.

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
University of Oxford

Prof. Michael Osborne

Why do you care about AI Existential Safety?

I believe that AI presents a real existential threat, and one to which I, as an AI researcher, have a duty to address. Nor is the threat from AI limited to a distant future. As AI algorithms are deployed more widely, within ever more sensitive applications, from healthcare to defence, the need for AI systems to be safer is with us today. In answer to these challenges, I believe that my particular interests – Bayesian models and numeric algorithms – offer a framework for AI that is transparent, performant and safe.

Please give one or more examples of research interests relevant to AI existential safety:

In control engineering for safety-critical areas like aerospace and automotive domains, it has long been a requirement that computer code is verifiably safe: the designers must guarantee that the code will never reach a state in which it might take a catastrophic decision. AI methods, however, are vastly more complex and adaptive than classic control algorithms, meaning that similar guarantees have not yet been achieved. As AI systems begin to have increasing influence on our lives, they must become better monitored and controlled.
I am interested in new, verifiably safe, algorithms for the most elementary computational steps that make up AI systems: numerical methods. Numerical methods, particularly optimisation methods, are well-known to be critical to both the performance and reliability of AI systems. State-of-the-art numerical methods aim to create minimal computational error through conservative assumptions. Unfortunately, in practice, these assumptions are often invalid, leading to unexpectedly high error.
Instead, I aim to develop novel numerical algorithms that explicitly estimate their own error, incorporating all possible error sources, as well as adaptively assigning computation so as to reduce overall risk. Probabilistic numerics is a new, rigorous, framework for the quantification of computational error in numerical tasks. Probabilistic Numerics was born of recent developments in the interpretation of numerical methods, providing new tools for ensuring AI safety. Numerical algorithms estimate latent (non-analytic) quantities from the result of tractable (“observable”) computations. Their task can thus be described as inference in the statistical sense, and numerical algorithms cast as learning machines that actively collect (compute) data to infer a non-analytic quantity. Importantly, this notion applies even if the quantity in question is entirely of a deterministic nature—uncertainty can be assigned to quantities that are not stochastic, just unknown. Probabilistic Numerics is the treatment of numerical computation as inference, yielding algorithms that take in probability distributions over input variables, and return probability distributions over their output, such that the output distribution reflects uncertainty caused both by the uncertain inputs and the imperfect internal computation. Moreover, Probabilistic Numerics, through its estimates of how uncertain and hence how valuable is a computation, allows the allocation of computation to itself be optimised. As a result, probabilistic numeric algorithms have been shown to offer significantly lower computational costs than alternatives. Intelligent allocation of computation can also improve safety, by forcing computation to explore troublesome edge cases that might otherwise be neglected.

I aim to apply the probabilistic numeric framework to the identification and communication of computational errors within composite AI systems. Probabilistic numerical methods offer the promise of monitoring assumptions in running computations, yielding a monitoring regime that can safely interrupt algorithms overwhelmed by their task’s complexity. This approach will allow AI systems to monitor the extent to which their own internal model matches external data, and to respond appropriately cautiously.

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
UC Berkeley

Prof. Stuart Russell

Why do you care about AI Existential Safety?

It is increasingly important to ask, “What if we succeed?” Our intelligence gives us power over the world and over other species; we will eventually build systems with superhuman intelligence; therefore, we face the problem of retaining power, forever, over entities that are far more powerful than ourselves.

Please give one or more examples of research interests relevant to AI existential safety:

Rebuilding AI on a new and broader foundation, with the goal of creating AI systems that are provably beneficial to humans.

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
Cornell University

Prof. Bart Selman

Why do you care about AI Existential Safety?

AI capabilities are increasing rapidly. This opens up exciting opportunities to address many of society’s challenges. However, we also need to recognize that we cannot fully understand the future path of AI. So we need to devote research resources to guard against potential existential risks.

Please give one or more examples of research interests relevant to AI existential safety:

My most closely related research interests are in deep RL, particularly concerning concerning challenges of safety and interpretability.

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
UC Berkeley

Prof. Jacob Noah Steinhardt

Why do you care about AI Existential Safety?

In the coming decades, AI will likely have a transformative effect on society–including potentially automating and then surpassing almost all human labor. For these effects to be beneficial, we need better forecasting of AI capabilities, better tools for understanding and aligning AI systems, and a community of researchers, engineers, and policymakers prepared to implement necessary responses. I aim to help with all of these, starting from a foundation of basic research.

Please give one or more examples of research interests relevant to AI existential safety:

I have written several position papers on research agendas for AI safety, including “Concrete Problems in AI Safety”, “AI Alignment Research Overview”, and “Unsolved Problems in ML Safety”. Current projects study robustness, reward learning and reward hacking, unintended consequences of ML (especially in economic or many-to-many contexts), interpretability, forecasting, and safety from the perspective of complex systems theory.
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
Massachusetts Institute of Technology

Prof. Max Tegmark

Why do you care about AI Existential Safety?

I’m convinced that AI will become the most powerful technology in human history, and end up being either the best or worst thing ever to happen to humanity. I therefore feel highly motivated to work on research that can tip the balance toward the former outcome.

Please give one or more examples of research interests relevant to AI existential safety:

I believe that our best shot at beneficial AGI involves replacing black-box neural networks by intelligible intelligence. The only way I’ll trust a superintelligence to be beneficial is if I can prove it, since no matter how smart it is, it can’t do the impossible. My MIT research group therefore focuses on using tools from physics and information theory to transform black-box neural networks into more understandable systems. Recent applications have included auto-discovery of symbolic formulas and invariants as well as hidden symmetries and modularities.
The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.
University of Chicago

Prof. Victor Veitch

Why do you care about AI Existential Safety?

I’m generally concerned with doing work that has the greatest impact on human wellbeing. I think it’s plausible that we can achieve strong AI in the near-term future. This will have a major impact on the rest of human history – so, we should get it right. As a pleasant bonus, I find that working on AI Safety leads to problems that are of fundamental importance to our understanding of machine learning and AI generally.

Please give one or more examples of research interests relevant to AI existential safety:

My main current interest in this area is the application of causality to trustworthy machine learning. Informally, the causal structure of the world seems key to making sound decisions, and so causal reasoning must be a key component of any future AI system. Accordingly, determining exactly how causal understanding can be baked into systems – and in particular how this affects their trustworthiness – is key. Additionally, this research programme offers insight into near-term trustworthiness problems, which can offer concrete directions for development. For example, the tools of causal inference play a key role in understanding domain shift, the failures of machine-learning models under (apparently) benign perturbations of input date, and in explaining (and enforcing) the rationale for decisions made by machine learning systems. For a concrete example of this type of work, see here.

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.

OPEN FOR APPLICATIONS

The Latest from the Future of Life Institute
Subscribe To Our Newsletter

Stay up to date with our grant announcements, new podcast episodes and more.

Invalid email address
You can unsubscribe at any time.