AI Safety Research

“What we really need to do is make sure that life continues into the future. […] It’s best to try to prevent a negative circumstance from occurring than to wait for it to occur and then be reactive.”

-Elon Musk on keeping AI safe and beneficial

In spring of 2018, FLI launched our second AI Safety Research program, this time focusing on Artificial General Intelligence (AGI) and how to keep it safe and beneficial. By the summer, 10 researchers were awarded over $2 million to tackle the technical and strategic questions related to preparing for AGI, funded by generous donations from Elon Musk and the Berkeley Existential Risk Institute. You can read about their projects in the table below.

This research program comes as a sequel to our AI Safety grants competition in 2015, where generous donations from Elon Musk and the Open Philanthropy Project funded 37 researchers to begin various projects to help ensure that artificial intelligence remains safe and beneficial. Now, three years later, our grant winners have produced over 45 scientific publications and a host of conference events, which you can also read about below.

AGI Safety Researchers 2018

Click on any of the researchers below for more information about their work.

AI Safety Researchers 2015

Click on any of the researchers below for more information about their work.

Primary Investigator Project Title Amount Recommended Email
Alex Aiken, Stanford University Verifying Deep Mathematical Properties of AI Systems $100,813
Peter Asaro, The New School Regulating Autonomous Artificial Agents: A Systematic Approach to Developing AI & Robot Policy (Research Overview, Podcast) $116,974
Seth Baum, Social & Environmental Entrepreneurs Evaluation of Safe Development Pathways for Artificial Superintelligence (Research Overview) $100,000
Nick Bostrom, University of Oxford Strategic Research Center for Artificial Intelligence (Research Overview) $1,500,000
Paul Christiano, University of California, Berkeley Counterfactual Human Oversight (Research Overview) $50,000
Vincent Conitzer, Duke University How to Build Ethics into Robust Artificial Intelligence (Research Overview) $200,000
Owen Cotton-Barratt, Centre for Effective Altruism, Oxford Decision-relevant uncertainty in AI safety (Research Overview) $119,670
Thomas Dietterich, Oregon State University Robust and Transparent Artificial Intelligence Via Anomaly Detection and Explanation (Research Overview) $200,000
Stefano Ermon, Stanford University Robust probabilistic inference engines for autonomous agents (Research Overview) $250,000
Owain Evans, University of Oxford Inferring Human Values: Learning “Ought”, not “Is” $227,212
Benja Fallenstein, Machine Intelligence Research Institute Aligning Superintelligence With Human Interests (Research Overview) $250,000
Katja Grace, Machine Intelligence Research Institute AI Impacts (Research Overview) $49,310
Seth Herd, University of Colorado Stability of Neuromorphic Motivational Systems $98,400
Ramana Kumar, University of Cambridge Applying Formal Verification to Reflective Reasoning $36,750
Fuxin Li, Georgia Institute of Technology Understanding when a deep network is going to be wrong (Research Overview) $121,642
Percy Liang, Stanford University Predictable AI via Failure Detection and Robustness (Research Overview) $255,160
Long Ouyang, Thesis Research Democratizing Programming: Synthesizing Valid Programs with Recursive Bayesian Inference (Research Overview) $99,750
David Parkes, Harvard University Mechanism Design for AI Architectures (Research Overview) $200,000
Andre Platzer, Carnegie Mellon University Faster Verification of AI-based Cyber-physical Systems $200,000
Heather Roff, University of Denver Lethal Autonomous Weapons, Artificial Intelligence and Meaningful Human Control (Research Overview, Podcast) $136,918
Francesca Rossi, University of Padova Safety Constraints and Ethical Principles in Collective Decision Making Systems $275,000
Benjamin Rubinstein, The University of Melbourne Security Evaluation of Machine Learning Systems (Research Overview) $98,532
Stuart Russell, University of California, Berkeley Value Alignment and Moral Metareasoning (Research Overview) $342,727
Anna Salamon, Center for Applied Rationality Specialized rationality skills for the AI research community $111,757
Bart Selman, Cornell University Scaling-up AI Systems: Insights From Computational Complexity (Research Overview) $24,950
Kaj Sotala, Thesis Research Teaching AI Systems Human Values Through Human-Like Concept Learning $20,000
Bas Steunebrink, IDSIA Experience-based AI (EXPAI) (Research Overview) $196,650
Jacob Steinhardt, Stanford University Summer Program in Applied Rationality and Cognition (Research Overview) $88,050
Moshe Vardi, Rice University Artificial Intelligence and the Future of Work (Research Overview) $69,000
Manuela Veloso, Carnegie Mellon University Explanations for Complex AI Systems (Research Overview) $200,000
Wendell Wallach, Yale Control and Responsible Innovation in the Development of Autonomous Machines (Research Overview) $180,000
Michael Webb, Stanford University Optimal Transition to the AI Economy $76,318
Daniel Weld, University of Washington Computational Ethics for Probabilistic Planning (Research Overview) $200,000
Adrian Weller, University of Cambridge Investigation of Self-Policing AI Agents $50,000
Michael Wellman, University of Michigan Understanding and Mitigating AI Threats to the Financial System (Research Overview) $200,000
Michael Wooldridge, University of Oxford Towards a Code of Ethics for AI Research (Research Overview) $125,000
Brian Ziebart, University of Illinois at Chicago Towards Safer Inductive Learning (Research Overview) $134,247


  1. Achim, T, et al. Beyond parity constraints: Fourier analysis of hash functions for inference. Proceedings of The 33rd International Conference on Machine Learning, pages 2254–2262, 2016.
  2. Armstrong, Stuart and Orseau, Laurent. Safely Interruptible Agents. Uncertainty in Artificial Intelligence (UAI) 2016.
  3. Asaro, P. The Liability Problem for Autonomous Artificial Agents, Proceedings of the AAAI Symposium on Ethical and Moral Considerations in Non-Human Agents, Stanford University, Stanford, CA, March 21-23, 2016.
  4. Bai, Aijun and Russell, Stuart. Markovian State and Action Abstractions in Monte ­Carlo Tree Search. In Proc. IJCAI­16, New York, 2016.
  5. Boddington, Paula. EPSRC Principles of Robotics: Commentary on safety, robots as products, and responsibility. Ethical Principles of Robotics, special issue, 2016.
  6. Boddington, Paula. The Distinctiveness of AI Ethics, and Implications for Ethical Codes. Presented at IJCAI-16 Workshop 6 Ethics for Artificial Intelligence, New York, July 2016.
  7. Bostrom, N. Strategic Implications of Openness in AI Development, Technical Report #2016­1, Future of Humanity Institute, Oxford University: pp. 1­26, 2016.
  8. Chen, Xiangli, et al. Robust Covariate Shift Regression. International Conference on Artificial Intelligence and Statistics (AISTATS), 2016.
  9. Conitzer, Vincent, et al. Moral Decision Making Frameworks for Artificial Intelligence. (Preliminary version.) To appear in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Senior Member / Blue Sky Track, San Francisco, CA, USA, 2017.
  10. Critch, Andrew. Parametric Bounded Löb’s Theorem and Robust Cooperation of Bounded Agents. 2016.
  11. Evans, Owain, et al. Learning the Preferences of Bounded Agents. NIPS Workshop on Bounded Optimality, 2015.­nipsworkshop2015.pdf
  12. Evans, Owain, et al. Learning the Preferences of Ignorant, Inconsistent Agents. 2015.
  13. Fathony, Rizal, et al. Multiclass Classification:  A Risk Minimization Perspective. Neural Information Processing Systems (NIPS), 2016.
  14. Fulton, Nathan and Platzer, André. A logic of proofs for differential dynamic logic: Toward independently checkable proof certificates for dynamic logics. Jeremy Avigad and Adam Chlipala, editors, Proceedings of the 2016 Conference on Certified Programs and Proofs, CPP 2016, St. Petersburg, FL, USA, January 18-19, 2016, pp. 110-121. ACM, 2016.  
  15. Garrabrant, Scott, et al. Asymptotically Coherent, Well Calibrated, Self-trusting Logical Induction. Working Paper (Berkeley, CA: Machine Intelligence Research Institute). 2016.
  16. Garrabrant, Scott, et al. Inductive Coherence. arXiv:1604.05288 [cs.AI]. 2016.
  17. Garrabrant, Scott, et al. Asymptotic Convergence in Online Learning with Unbounded Delays. arXiv:1604.05280 [cs.LG]. 2016.
  18. Greene, J. D. Our driverless dilemma. Science, 352(6293), 1514-1515. 2016.
  19. Greene, J. et al. Embedding Ethical Principles in Collective Decision Support Systems. Thirtieth AAAI Conference on Artificial Intelligence. March 2016.
  20. Hadfield-­Menell, Dylan, et al. Cooperative Inverse Reinforcement Learning. Neural Information Processing Systems (NIPS), 2016.
  21. Hsu, L.K., et al. Tight variational bounds via random projections and i-projections. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pages 1087–1095, 2016.
  22. Khani, F., et al. Unanimous prediction for 100% precision with application to learning semantic mappings. Association for Computational Linguistics (ACL), 2016.
  23. Kim, C., et al. Exact sampling with integer linear programs and random perturbations. Proc. 30th AAAI Conference on Artificial Intelligence, 2016.
  24. Leike, Jan, et al. A Formal Solution to the Grain of Truth Problem. Uncertainty in Artificial Intelligence: 32nd Conference (UAI 2016), edited by Alexander Ihler and Dominik Janzing, 427–436. Jersey City, New Jersey, USA. 2016.
  25. Liu, C., et al. Goal inference improves objective and perceived performance in human robot collaboration. In Proc. AAMAS­16, Singapore, 2016.
  26. Nivel,  E., et al. Bounded Recursive Self-Improvement. Technical Report RUTR-SCS13006, Reykjavik University, 2013.
  27. Perera, Vittorio, et al. Dynamic Generation and Refinement of Robot Verbalization. Proceedings of RO-MAN’16, the IEEE International Symposium on Robot and Human Interactive Communication, Columbia University, NY, August, 2016. mmv/papers/16roman-verbalization.pdf
  28. Pistono, F and Yampolskiy, RV. Unethical research: How to create a malevolent artificial intelligence. 25th International Joint Conference on Artificial Intelligence (IJCAI-16), Ethics for Artificial Intelligence Workshop (AI-Ethics-2016).
  29. Rosenthal, Stephanie, et al. Verbalization: Narration of Autonomous Mobile Robot Experience, In Proceedings of IJCAI’16, the 26th International Joint Conference on Artificial Intelligence, New York City, NY, July, 2016. mmv/papers/16ijcai-verbalization.pdf
  30. Rossi, F. Ethical Preference-Based Decision Support System. Proc. CONCUR 2016, Springer. 2016.
  31. Rossi, F. Moral preferences, Proc. IJCAI 2016 workshop on AI and ethics, and Proc. IJCAI 2016 workshop on multidisciplinary approaches to preferences. 2016.
  32. Siddiqui, A., et al. Finite Sample Complexity of Rare Pattern Anomaly Detection. Proceedings of UAI-2016 (pp. 10). 2016.
  33. Steinhardt, J. and Liang, P. Unsupervised Risk Estimation with only Conditional Independence Structure. Neural Information Processing Systems (NIPS), 2016.
  34. Steunebrink,  B.R., et al.  Growing  Recursive  Self-Improvers. Proceedings  of  the  9th  Conference  on  Artificial  General  Intelligence  (AGI 2016), LNAI 9782, pages 129-139. Springer, Heidelberg. 2016.
  35. Taylor, Erin. The Threat-Response Model: Ethical Decision in the Real World.
  36. Taylor, Jessica. Quantilizers: A Safer Alternative to Maximizers for Limited Optimization. 2nd International Workshop on AI, Ethics and Society at AAAI-2016. Phoenix, AZ. 2016.
  37. Thorisson, K.R., et al. Why Artificial Intelligence Needs a Task Theory (And What It Might Look Like). Proceedings of the 9th Conference on Artificial General Intelligence (AGI 2016), LNAI 9782, pages 118-128. Springer, Heidelberg. 2016.
  38. Thorisson, K.R., et al. About Understanding. Proceedings  of  the  9th  Conference  on  Artificial  General  Intelligence  (AGI  2016), LNAI 9782, pages 106-117. Springer, Heidelberg. 2016.
  39. Tossou, A.C.Y. and Dimitrakakis, C. Algorithms for Differentially Private Multi-­Armed Bandits. Proc. 13th AAAI Conf. on Artificial Intelligence (AAAI 2016), 2016.
  40. Wellman,  MP and Rajan, U. Ethical issues for autonomous trading agents. IJCAI-16 Workshop on Ethics for Artificial Intelligence, July 2016.
  41. Yampolskiy, RV. Taxonomy of pathways to dangerous AI. 30th AAAI Conference on Artificial Intelligence (AAAI-2016), 2nd International Workshop on AI, Ethics and Society (AI Ethics Society 2016).
  42. Zhang, et al. On the Differential Privacy of Bayesian Inference. Proc. 13th AAAI Conf. on Artificial Intelligence (AAAI 2016), 2016.
  43. Zhao, et al. Closing the gap between short and long xors for model counting. Thirtieth AAAI Conference on Artificial Intelligence, 2016.


Conitzer, Vincent. Artificial intelligence:  where’s the philosophical scrutiny? Published in the magazine Prospect.

Creighton, Jolene. The Evolution of AI: Can Morality be Programmed? Futurism, based on an interview about our project with Conitzer.

Ermon, Stefano. What Are Some Recent Advances in Non-Convex Optimization Research? The Huffington Post.

➣ Russell, Stuart. Moral Philosophy Will Become Part of the Tech Industry. Time, September 15, 2015.

➣ Russell, Stuart. Should we fear super smart robots? Scientific American, 314, 58­-59, June 2016.

Taylor, Jessica. A first look at the hard problem of corrigibility. Intelligent Agent Foundations Forum, 2015.

Taylor, Jessica. A sketch of a value-learning sovereign. Intelligent Agent Foundations Forum, 2015.

Taylor, Jessica. Three preference frameworks for goal-directed agents. Intelligent Agent Foundations Forum, 2015.

Taylor, Jessica. What do we need value learning for? Intelligent Agent Foundations Forum, 2015.

Weld, D.S. “The real threat of artificial intelligence,”Geekwire, May 23, 2016.

Software Releases

Andre Platzer: Major contributions to the KeYmaera X Theorem Prover for Hybrid Systems. Source code is available at

Course Materials  

Kristen Brent Venable, IHMC: Taught a new ad-hoc independent study course entitled “Ethics for Artificial Intelligence” during the spring 2016 semester with the goal of carrying out an in-depth state of the review of models for ethical issues and ethical values in AI.

Owain Evans ( An interactive online textbook, to communicate the idea of IRL to a broader audience and to give a detailed explanation of our approach to IRL to the existing AI Safety and AI/ML communities.

Joshua Greene, Harvard: Spring 2016, graduate seminar “Evolving Morality: From Primordial Soup to Superintelligent Machines.”

 Andre Platzer: Foundations of Cyber-Physical Systems (Spring 2016)

 Stuart Russell, Tom Griffiths, Anca Dragan, UC Berkeley: Spring 2016, graduate course on “Human-Compatible AI”

Workshops Funded

The Control Problem in AI: by the Strategic AI Research Centre

This was an intensive workshop at Oxford, with a large number of participants, and covered, among many other things, goals and principles of AI policy and strategy, value alignment for advanced machine learning, the relative importance of AI v. other x-risk, geopolitical strategy, government involvement, analysis of the strategic landscape, theory and methods of communication and engagement, the prospects of international-space-station-like coordinated AGI development, and an enormous array of technical AI control topics.

Policies for Responsible AI Development: by the Strategic AI Research Centre

This workshop focused on a selection of key areas, such as: classifying risks, international governance, and surveillance. The workshop also engaged in a series of brainstorming and analysis exercises. The brainstorming sessions included “rapid problem attacks” on especially difficult issues, a session drafting various “positive visions” for different AI development scenarios, and a session (done in partnership with Open Philanthropy) which involved brainstorming ideas for major funders interested in x-risk reduction. This workshop even engaged in two separate “red team” exercises in which we sought out vulnerabilities, first in our own approach and research agenda, and then on global security.

Intersections between Moral Psychology and Artificial Intelligence: by Molly Crockett and Walter Sinnott-Armstrong.

This workshop included two panels. The first asked whether artificial intelligence systems could ever provide reliable moral advice on a wide range of issues. Two speakers were skeptical about traditional top-down approaches, but the other two speakers argued that new alternatives are more promising. The second panel focussed on particular applications of artificial intelligence in war. The panelists again vigorously disagreed but left with a much better understanding of each other’s positions. Both panels were very diverse in their disciplinary backgrounds. The audience consisted of approximately 50 professors and students as well as members of the public.

Moral AI Projects: by Vincent Conitzer, Walter Sinnott-Armstrong, Erin Taylor, and others.

Each part of this workshop included an in depth discussion of an innovative model for moral artificial intelligence. The first group of speakers explained and defended the bottom-up approach that they are developing with support from the FLI. The second session was led by a guest speaker who presented a dialogic theory of moral reasoning that has potential to be programmed into artificial intelligence systems. In the end, both groups found that their perspectives were complementary rather than competing. The audience consisted of around 20 students and faculty from a wide variety of fields.

Embedded Machine Learning: by Dragos Margineantu (Boeing), Rich Caruana (Microsoft Research), Thomas Dietterich (Oregon State University)

This workshop took place at the AAAI Fall Symposium, Arlington, VA, November 12-14, 2015 and included issues of Unknown Unknowns in machine learning and more generally touched on issues at the intersection of software engineering and machine learning, including verification and validation.

The Future of Artificial Intelligence: by Jacob Steinhardt, Stanford; Tom Dietterich, OSU; Percy Liang, Stanford; Andrew Critch, MIRI; Jessica Taylor, MIRI; Adrian Weller, Cambridge

The Future of Artificial Intelligence workshop was held at NYU. The first day consisted of two public sessions on the subject of “How AI is Used in Industry, Present and Future”. The first session included talks by Eric Schmidt (Alphabet), Mike Schroepfer (Facebook), Eric Horvitz (MSR), and me. This was followed by a panel  including all of us plus Demis Hassabis (Deep Mind) and Bart Selman (Cornell). I talked about AI applications in science (bird migration, automated scientist), law enforcement (fraud detection, insider threat detection), and sustainability (managing invasive species). This session was generally very up-beat about the potential of AI to do great things. The second session had talks by Jen-Hsun Huang (NVIDIA), Amnon Shashua (Mobileye), John Kelly (IBM), and Martial Hebert (CMU). The final session turned toward the present and future of AI with presentations by Bernhard Schölkopf (Max Planck Institute), Demis Hassabis (Google DeepMind), and Yann LeCun (Facebook AI Research & NYU). Bernhard spoke about discovering causal relationships, Demis spoke about artificial general intelligence and his vision of how to achieve it. Yann discussed “differentiable programs” and raised the issue of whether we can differentiate traditional symbolic AI methods or need to adopt continuous representations for them.

The second and third days of the workshop were subject to Chatham House Rules. Many topics were discussed including (a) the impact of AI on the future of employment and economic growth, (b) social intelligence and human-robot interaction, (c) the time scales of AI risks: short term, medium term, and very long term, (d) the extent to which mapping the brain will help us understand how the brain works, (e) the future of US Federal funding for AI research and especially for young faculty, (f) the challenges of creating AI systems that understand and exhibit ethical behavior, (g) the extent to which AI should be regulated either by government or by community institutions and standards, and (h) how do we develop appropriate “motivational systems” for AI agents?

Reliable Machine Learning in the Wild: by Jacob Steinhardt, Stanford; Tom Dietterich, OSU; Percy Liang, Stanford; Andrew Critch, MIRI; Jessica Taylor, MIRI; Adrian Weller, Cambridge.

This was an ICML Workshop, NY, June 23, 2016. This workshop discussed a wide range of issues related to engineering reliable AI systems. Among the questions discussed were (a) how to estimate causal effects under various kinds of situations (A/B tests, domain adaptation, observational medical data), (b) how to train classifiers to be robust in the face of adversarial attacks (on both training and test data), (c) how to train reinforcement learning systems with risk-sensitive objectives, especially when the model class may be misspecified and the observations are incomplete, and (d) how to guarantee that a learned policy for an MDP satisfies specified temporal logic properties. Several important engineering practices were also discussed, especially engaging a Red Team to perturb/poison data and making sure we are measuring the right data. My assessment is that a research community is coalescing nicely around these questions, and the quality of the work is excellent.

More details of the workshop can be found at our website:

MIRI hosted one stand-alone workshop (, and also co-hosted a 22-day June colloquium series ( with the Future of Humanity Institute, which included four additional workshops.

Over 50 people attended the colloquium series from 25 different institutions, including Stuart Russell (UC Berkeley), Bart Selman (Cornell), Francesca Rossi (IBM Research), and Tom Dietterich (Oregon State). MIRI also ran four research retreats, internal workshops exclusive to MIRI researchers

    • Workshop #1: Self-Reference, Type Theory, and Formal Verification. April 1-3.

Participants worked on questions of self-reference in type theory and automated theorem provers, with the goal of studying systems that model themselves.
Participants: Benya Fallenstein (MIRI), Daniel Selsam (Stanford), Jack Gallagher (Gallabytes), Jason Gross (MIT), Miëtek Bak (Least Fixed), Nathaniel Thomas (Stanford), Patrick LaVictoire (MIRI), Ramana Kumar (Cambridge)

    • Workshop #2: Transparency. May 28-29.

In many cases, it can be prohibitively difficult for humans to understand AI systems’ internal states and reasoning. This makes it more difficult to anticipate such systems’ behavior and correct errors. On the other hand, there have been striking advances in communicating the internals of some machine learning systems, and in formally verifying certain features of algorithms. We would like to see how far we can push the transparency of AI systems while maintaining their capabilities.
Slides are up for Tom Dietterich’s overview talk at this workshop, “Issues Concerning AI Transparency” (
Participants: Nate Soares (MIRI), Andrew Critch (MIRI), Patrick LaVictoire (MIRI), Jessica Taylor (MIRI), Scott Garrabrant (MIRI), Alan Fern (Oregon State University), Daniel Filan (Australian National University), Devi Borg (Future of Humanity Institute), Francesca Rossi (IBM Research), Jack Gallagher (Gallabytes), János Kramár (Montreal Institute for Learning Algorithms), Jim Babcock (unaffiliated), Marcello Herreshoff (Google), Moshe Looks (Google), Nathaniel Thomas (Stanford), Nisan Stiennon (Google), Sune Jakobsen (University College Longdon), Tom Dietterich (Oregon State University), Tsvi Benson-Tilsen (UC Berkeley), Victoria Krakovna (Future of Life Institute)

    • Workshop #3: Robustness and Error-Tolerance. June 4-5.

How can we ensure that when AI system fail, they fail gracefully and detectably? This is difficult for systems that must adapt to new or changing environments; standard PAC guarantees for machine learning systems fail to hold when the distribution of test data does not match the distribution of training data. Moreover, systems capable of means-end reasoning may have incentives to conceal failures that would result in their being shut down. We would much prefer to have methods of developing and validating AI systems such that any mistakes can be quickly noticed and corrected.
Participants: Andrew Critch (MIRI), Patrick LaVictoire (MIRI), Jessica Taylor (MIRI), Scott Garrabrant (MIRI), Abram Demski (USC Institute for Creative Technologies), Bart Selman (Cornell), Bas Steunebrink (IDSIA), Daniel Filan (Australian National University), Devi Borg (Future of Humanity Institute), Jack Gallagher (Gallabytes), Jim Babcock, Nisan Stiennon (Google), Ryan Carey (Centre for the Study of Existential Risk), Sune Jakobsen (University College Longdon)

    • Workshop #4: Preference Specification. June 11-12.

The perennial problem of wanting code to “do what I mean, not what I said” becomes increasingly challenging when systems may find unexpected ways to pursue a given goal. Highly capable AI systems thereby increase the difficulty of specifying safe and useful goals, or specifying safe and useful methods for learning human preferences.
Participants: Patrick LaVictoire (MIRI), Jessica Taylor (MIRI), Abram Demski (USC Institute for Creative Technologies), Bas Steunebrink (IDSIA), Daniel Filan (Australian National University), David Abel (Brown University), David Krueger (Montreal Institute for Learning Algorithms), Devi Borg (Future of Humanity Institute), Jan Leike (Future of Humanity Institute), Jim Babcock (unaffiliated), Lucas Hansen (unaffiliated), Owain Evans (Future of Humanity Institute), Rafael Cosman (unaffiliated), Ryan Carey (Centre for the Study of Existential Risk), Stuart Armstrong (Future of Humanity Institute), Sune Jakobsen (University College Longdon), Tom Everitt (Australian National University), Tsvi Benson-Tilsen (UC Berkeley), Vadim Kosoy (Epicycle)

    • Workshop #5: Agent Models and Multi-Agent Dilemmas. June 17.

When designing an agent to behave well in its environment, it is risky to ignore the effects of the agent’s own actions on the environment or on other agents within the environment. For example, a spam classifier in wide use may cause changes in the distribution of data it receives, as adversarial spammers attempt to bypass the classifier. Considerations from game theory, decision theory, and economics become increasingly useful in such cases.
Participants: Andrew Critch (MIRI), Patrick LaVictoire (MIRI), Abram Demski (USC Institute for Creative Technologies), Andrew MacFie (Carleton University), Daniel Filan (Australian National University), Devi Borg (Future of Humanity Institute), Jaan Altosaar (Google Brain), Jan Leike (Future of Humanity Institute), Jim Babcock (unaffiliated), Matthew Johnson (Harvard), Rafael Cosman (unaffiliated), Stefano Albrecht (UT Austin), Stuart Armstrong (Future of Humanity Institute), Sune Jakobsen (University College Longdon), Tom Everitt (Australian National University), Tsvi Benson-Tilsen (UC Berkeley), Vadim Kosoy (Epicycle)

    • Workshop #6: Logic, Probability, and Reflection. August 12-14.

Participants at this workshop, consisting of MIRI staff and regular collaborators, worked on a variety of problems related to MIRI’s Agent Foundations technical agenda, with a focus on decision theory and the formal construction of logical counterfactuals.
Participants: Andrew Critch (MIRI), Benya Fallenstein (MIRI), Eliezer Yudkowsky (MIRI), Jessica Taylor (MIRI), Nate Soares (MIRI), Patrick LaVictoire (MIRI), Sam Eisenstat (UC Berkeley), Scott Garrabrant (MIRI), Tsvi Benson-Tilsen (UC Berkeley)

Control and Responsible Innovation in the Development of Autonomous Systems Workshop: by The Hastings Center

The four co-­chairs (Gary Marchant, Stuart Russell, Bart Selman, and Wendell Wallach) and The Hastings Center staff (particularly Mildred Solomon and Greg Kaebnick) designed this first workshop. This workshop was focused on exposing participants to relevant research progressing in an array of fields, stimulating extended reflection upon key issues and beginning a process of dismantling intellectual silos and loosely knitting the represented disciplines into a transdisciplinary community. Twenty-five participants gathered at The Hastings Center in Garrison, NY from April 24th – 26th, 2016. The workshop included representatives from key institutions that have entered this space, including IEEE, the Office of Naval Research, the World Economic Forum, and of course AAAI. They are planning a second workshop, scheduled for October 30-November 1, 2016. The invitees for the second workshop are primarily scientists, but also include social theorists, legal scholars, philosophers, and ethicists. The expertise of the social scientists will be drawn upon in clarifying the application of research in cognitive science and legal and ethical theory to the development of autonomous systems. Not all of the invitees to the second workshop have considered the challenge of developing beneficial trustworthy artificial agents. However, we believe we are bringing together brilliant and creative minds to collectively address this challenge. We hope that scientific and intellectual leaders, new to the challenge and participating in the second workshop, will take on the development of beneficial, robust, safe, and controllable AI as a serious research agenda.

A Day of Ethical AI at Oxford: by Michael Wooldridge, Peter Millican, and Paula Boddington

This workshop was held at the Oxford Martin School on June 8th, 2016. The goal of the workshop was collaborative discussion between those working in AI and ethics and related areas,    between geographically close and linked centres. Participants were invited from the Oxford Martin    School, The Future of Humanity Institute, the Cambridge Centre for the Study of Existential Risk, and the Leverhulme Centre for the Future of Intelligence, plus others.    Participants included    FLI grantholders. This workshop included participants from diverse disciplines, including computing,philosophy and psychology, to facilitate cross disciplinary conversation and understanding.

Ethics for Artificial Intelligence: by Brian Ziebart

This workshop took place at IJCAI-’16, July 9th, 2016, in New York. This workshop focussed on selecting papers which speak to the themes of law and autonomous vehicles, ethics of autonomous systems, and superintelligence.

Workshop Participation and Presentation

Asaro, P. (2016) “Ethics for Artificial Intelligence,” International Joint Conference on Artificial Intelligence (IJCAI), New York, NY, July 9, 2016.

Asaro, P. (2016) “AI Now: The Social and Economic Implications of Artificial Intelligence,” Whitehouse Workshop on AI, New York University, New York, NY, July 7, 2016.

Asaro, P. (2016). “Autonomous Weapons,” Computers Gone Wild Workshop, Berkman Center for Internet and Society, Harvard University, Cambridge, MA, February 19, 2016.

Asaro, P. (2015). “The Internet of (Smart) Things,” and “Ethics Panel,” Blockchain Workshop, Harvard Berkman Center, Sydney, Australia, December 10-11, 2015. collaborative product:

Asaro, P. (2015). “Internet of Things” and “Philosophical Panel,” Blockchain Workshop, Harvard Berkman Center, Hong Kong, China, October 11-13, 2015.

Asaro, P. (2015). “The Human Brain in the Age of Robots: Social & Ethical Issues,” Webinar on Future Computing and Robotics in the Human Brain Project, Danish Board of Technology, October 9, 2015.

Asaro, P. (2016). “Regulating Autonomous Agents: The Scope and Limits of Liability,” 4thAnnual Conference on Governance of Emerging Technologies: Law, Policy & Ethics, Arizona State University, Tempe, AZ, May 24-26, 2016.

Asaro, P. (2016). “The Liability Problem for Autonomous Artificial Agents,”AAAI Symposium on Ethical and Moral Considerations in Non-Human Agents, Stanford University, Stanford, CA, March 21-23, 2016.

Asaro, P.(2015). “Concepts of Agency & Autonomy: Towards the Governance of Autonomous Weapons,” Meeting of the Society for the Social Studies of Science, Denver, Co, November 11-15, 2015.

Walter Sinnott-Armstrong: co-organized and spoke at a workshop on “Moral Issues in Artificial Intelligence”at the Oxford Martin School of Oxford University.

 Seth Baum, Anthony Barrett, and Roman Yampolskiy presented their research at the 2015 Society for Risk Analysis Annual Meeting

Seth Baum organized several informal meetings on AI safety with attendees from (among other places) CSER, FHI, MIRI, Yale, and the United Nations at the International Joint Conference on Artificial Intelligence

Vincent Conitzer: participated in the ethics workshop at AAAI, describing our work on this project in a session and also serving on a panel on research directions for keeping AI beneficial.

 Owen Cotton-Barratt: presented on new ideas at a one-day workshop on “Ethical AI” in Oxford on June 8, 2016. He has further developed informal models of likely crucial parameters to include in the models, and he now believes that the model should additionally include a division between scenarios where a single AI-enabled actor gains a decisive strategic advantage, and ones where this does not occur.

Dietterich, T. G. (2015). Toward Beneficial Artificial Intelligence. Blouin Creative Leadership Summit, NY, NY, September 21, 2015.

Dietterich, T. G. (2015). Artificial Intelligence: Progress and Challenges. Technical and Business Perspectives on the Current and Future Impact of Machine Learning. Valencia, Spain, October 20, 2015. Press coverage in El Mundo.

Dietterich, T. G. (2015). Algorithms Among Us: The Societal Impacts of Machine Learning(opening remarks). NIPS Symposium. Montreal, Canada, December 10, 2015.

Dietterich, T. G. (2016). AI in Science, Law Enforcement, and Sustainability. The Future of Artificial Intelligence. NYU, January 11, 2016.I also participated in a side meeting with Henry Kissinger on January 13 along with Max Tegmark and several other key people.

Dietterich, T. G. (2016). Steps Toward Robust Artificial Intelligence(AAAI President’s Address). AAAI Conference on Artificial Intelligence, Phoenix, AZ. February 14, 2016.

Dietterich, T. G. (2016). Testing, Verification & Validation, Monitoring. Control and Responsible Innovation in the Development of Autonomous Machines. Hastings Center, Garrison, NY, April 25, 2016.

Dietterich, T. G. (2016). Steps Toward Robust Artificial Intelligence(short version). Huawei STW Workshop, Shenzhen, China, May 17, 2016.

Dietterich, T. G. (2016). Steps Toward Robust Artificial Intelligence. Distinguished Seminar, National Key Laboratory for Novel Software Technology, University of Nanjing, Nanjing, China, May 19, 2016.

Dietterich, T. G. (2016). Understanding and Managing Ecosystems through Artificial Intelligence. AI For Social Good. White House OSTP Workshop. Washington, DC, June 6-7, 2016.

Dietterich, T. G., Fern, A., Wong, W-K., Emmott, A., Das, S., Siddiqui, M. A., Zemicheal, T.(2016). Anomaly Detection: Principles, Benchmarking, Explanation, and Theory. ICML Workshop on Anomaly Detection Keynote Speech.NY. June, 24, 2016.

Dietterich, T. G. (2016). Making artificial intelligence systems robust. Safe Artificial Intelligence. White House OSTP Workshop, Pittsburgh, PA, June 28, 2016.

Fern, A., Dietterich, T. G. (2016). Toward Explainable Uncertainty. MIRI Colloquium Series on Robust and Beneficial Artificial Intelligence.Alan and I also participated inthe two-day workshop on Transparency.MIRI, Berkeley, CA. May 27-29, 2016.

Nathan Fulton:

  • Presented A Logic of Proofs for Differential Dynamic Logic: Toward Independently Checkable Proof Certificates for Dynamic Logics at The  5th ACM SIGPLAN Conference  on Certified  Programs  and Proofs.
  • Nathan  Fulton,  Stefan  Mitsch,  and  André  Platzer  presented  a tutorial  on KeYmaera  X and hybrid  systems  verification  at CPSWeek 2016, and a similar  tutorial  has been accepted at FM 2016.
  • Nathan  Fulton  presented  a talk  on work supported  by this  grant  at a workshop  on Safe  AI for CPS held  at Carnegie  Mellon  in  April  2016.

 Percy Liang: Workshop on Human Interpretability in Machine Learning at ICML 2016. Presented two papers:

 Francesca Rossi:

  • German conference on AI (KI 2015) in September 2015, titled “Safety constraints and ethical principles in collective decision making systems”
  • “Moral Preferences”– ACS 2016 (Conference on Advances in Cognitive Systems, see, June 2016 — Colloquium Series on Robust and Beneficial AI (CSRBAI) of MIRI (see
  • “Ethical Preference-Based Decision Support Systems”– CONCUR 2016 (Int’l conference on concurrency theory, see , August 2016
  • Ethics of AI — Two TEDx talks: TEDx Lake Como in November 2015, TEDx Ghent in June 2015, TEDx Osnabruck in April 2015

 Stuart Russell

  • “The long-­term future of (artificial) intelligence”, invited lecture, Software Alliance Annual Meeting, Napa, Nov 13, 2015
  • “The Future of AI and the Human Race”, TedX talk, Berkeley, Nov 8, 2015
  • “Value Alignment”, invited lecture, Workshop on Algorithms for Human­-Robot Interaction, Nov 18, 2015
  • “Killer Robots, the End of Humanity, and All That”, Award Lecture, World Technology Awards, New York, Nov 2015
  • “Should we Fear or Welcome the Singularity?”, panel presentation, Nobel Week Dialogue, December 2015
  • “The Future of Human­-Computer Interaction”, panel presentation (chair), Nobel Week Dialogue, December 2015
  • “The Future Development of AI”, panel presentation, Nobel Week Dialogue, December 2015
  • “Some thoughts on the future”, invited lecture, NYU AI Symposium, January 2016
  • “The State of AI”, televised panel presentation, World Economic Forum, Davos, January 2016
  • “AI: Friend or Foe?” panel presentation, World Economic Forum, Davos, January 2016
  • “The long­-term future of (artificial) intelligence”, CERN Colloquium, Geneva, Jan 16,2016
  • “Some thoughts on the future”, invited presentation, National Intelligence Council,Berkeley, Jan 28, 2016
  • “The long­-term future of (artificial) intelligence”,  Herbst Lecture, University of Colorado, Boulder, March 11 2016
  • “The Future of AI”, Keynote Lecture, Annual Ethics Forum, California State University Monterey Bay, March 16, 2016
  • “The long-­term future of (artificial) intelligence”, IARPA Colloquium, Washington DC,March 21 2016
  • “AI: Friend or Foe?”, panel presentation, Milken Global Institute, Los Angeles, May 2,2016
  • “Will Superintelligent Robots Make Us Better People?”, Keynote Lecture (televised),Seoul Digital Forum, South Korea, May 19, 2016
  • “The long-­term future of (artificial) intelligence”, Keynote Lecture, Strata Big Data Conference, London, June 2, 2016
  • “Moral Economy of Technology”, panel presentation, Annual Meeting of the Society for the Advancement of Socio-­Economics, Berkeley, June 2016

 Michael Wooldridge and Paula Boddington:

  • EPSRC Systems-Net Grand Challenge Workshop, “Ethics in Autonomous Systems”, Sheffield University, November 25, 2015.
  • AISB workshop on Principles of Robotics, Sheffield University, 4 Apr 2016
    • Workshop examined the EPSRC (Engineering and Physical Sciences Research Council) Principles of Robotics. Boddington presented a paper, “Commentary on responsibility, product design and notions of safety”, and contributed to discussion.
    • Outcome of workshop: Paper for Special Issue of Connection Science on Ethical Principles of Robotics, ‘EPSRC principles of robotics: Commentary on Safety, Robots as Products, and Responsibility”–Paula Boddington

Bas Steunebrink:

  • AAAI-16 conference in Phoenix.
  • Colloquium  Series  on  Robust  and  Beneficial  AI  (CSRBAI),  hosted  by the Machine Intelligence Research Institute in Berkeley, in collaboration with the Future of  Humanity  Institute  at  Oxford.
  • AGI-16 conference in New York.
  • IEEE Symposium on Ethics of Autonomous Systems (SEAS Europe).
  • ECAI-16 conference in The Hague.

Manuela Veloso

  • OSTP/NYU Workshop on The Social and Economic Implications of Artificial Intelligence Technologies in the Near-Term, NYC, July 2016.
  • Intelligent Autonomous Vehicles Conference, Leipzig, July 2016.
  • STP/CMU Workshop on Safety and Control for Artificial Intelligence, Pittsburgh, June 2016. (video at
  • Founders Forum, London, June 2016.
  • MIT Technology Review EmTech Digital, San Francisco, May 2016.

Understanding and Mitigating AI Threats to the Financial System (MP Wellman and Uday Rajan). Center for Finance, Law, and Policy, University of Michigan, 4 Jun 2015.

Do Trading Algorithms Threaten Financial Market Stability? (MP Wellman).  Conference on Interdisciplinary Approaches to Financial Stability, University of Michigan Law School, 22 Oct 2015.

Autonomous Agents: Threat or Menace? (MP Wellman). Collegiate Professorship Lecture, University of Michigan, 5 May 2016. (Link:

Autonomous Agents in Financial Markets: Implications and Risks (MP Wellman). Machine Intelligence Research Institute Colloquium on Robust and Beneficial AI, Berkeley, CA, 15 Jun 2016.