Posts in this category get featured at the top of the front page.

Kelly Wanser on Climate Change as a Possible Existential Threat

 Topics discussed in this episode include:

  • The risks of climate change in the short-term
  • Tipping points and tipping cascades
  • Climate intervention via marine cloud brightening and releasing particles in the stratosphere
  • The benefits and risks of climate intervention techniques
  • The international politics of climate change and weather modification

 

Timestamps: 

0:00 Intro

2:30 What is SilverLining’s mission?

4:27 Why is climate change thought to be very risky in the next 10-30 years?

8:40 Tipping points and tipping cascades

13:25 Is climate change an existential risk?

17:39 Earth systems that help to stabilize the climate

21:23 Days where it will be unsafe to work outside

25:03 Marine cloud brightening, stratospheric sunlight reflection, and other climate interventions SilverLining is interested in

41:46 What experiments are happening to understand tropospheric and stratospheric climate interventions?

50:20 International politics of weather modification

53:52 How do efforts to reduce greenhouse gas emissions fit into the project of reflecting sunlight?

57:35 How would you respond to someone who views climate intervention by marine cloud brightening as too dangerous?

59:33 What are the main points of persons skeptical of climate intervention approaches

01:13:21 The international problem of coordinating on climate change

01:24:50 Is climate change a global catastrophic or existential risk, and how does it relate to other large risks?

01:33:20 Should effective altruists spend more time on the issue of climate change and climate intervention?

01:37:48 What can listeners do to help with this issue?

01:40:00 Climate change and mars colonization

01:44:55 Where to find and follow Kelly

 

Citations:

SilverLining

Kelly’s Twitter

Kelly’s LinkedIn

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on YoutubeSpotify, SoundCloudiTunesGoogle PlayStitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. In this episode, we have Kelly Wanser joining us from SilverLining. SilverLining is a non-profit that is focused on ensuring a safe climate due to the risks of near-term catastrophic climate change. Given that we may fail to reduce CO2 emissions sufficiently, it may be necessary to take direct action to promote cooling of the planet to stabilize both human and Earth systems. This conversation centrally focuses how we might intervene in the climate by brightening marine clouds to reflect sunlight and thus cool the planet down and offset global warming. This episode also explores other methods of climate intervention, like releasing particles in the stratosphere, their risks and benefits, and we also get into how climate change fits into global catastrophic and existential risk thinking.

There is a video recording of this podcast conversation uploaded to our Youtube channel. You can find a link in the description. This is the first in a series of video uploads of the podcast to see if that’s something that listeners might find valuable. Kelly shows some slides during our conversation and those are included in the video version. The video podcast’s audio and content is unedited, so it’s a bit longer than the audio only version and contains some sound hiccups and more filler words.

Kelly Wanser is an innovator committed to pursuing near-term options for ensuring a safe climate. In her role as Executive Director of SilverLining, she oversees the organization’s efforts to promote scientific research, science-based policy, and effective international cooperation in rapid responses to climate change. Kelly co-founded—and currently serves as Senior Advisor to—the University of Washington Marine Cloud Brightening Project, an effort to research and understand one possible form of climate intervention: the cooling effects of particles on clouds. She also holds degrees in economics and philosophy from Boston College and the University of Oxford.

And with that, let’s get into our conversation with Kelly Wanser

Let’s kick things off here with just a simple introductory question. So could you give us a little bit of background about SilverLining and what is its mission?

Kelly Wanser: Sure Lucas. I’m going to start by thanking you for inviting me to talk with you and your community, because the issue of existential threats is not an easy one. So our approach at SilverLining I think overlaps with some of the kinds of dialogue that you’re having, where we’re really concerned about this sort of catastrophic risks that we may have with regards to climate change in the next 10 to 30 years. So SilverLining was started specifically to focus on near term climate risk and the uncertainty you have about climate system instability, runaway climate change, and the kinds of things we don’t have insurance policies against yet. My background is from the technology sector. I worked in areas of complex systems analysis and IT infrastructure. And so I came into this problem, looking at it primarily from a risk point of view, and the fact that the kind of risks that we currently have exposure to is an unacceptable one.

So we need to expand our toolkit and our portfolio until we’ve got sufficient options in there that we can address the different kinds of risks that we’re facing in the context of the climate situation. SilverLining is a two year old organization, and there are two things that we do. We look at policy and sort of driving how in particular these interventions in climate, these things that might help reduce warming or cool the planet quickly, how we might move those forward in terms of research and assessment from a policy perspective, and then how we might actually help drive research and technology innovation directly.

Lucas Perry: Okay, so the methods of intervention are policy and research?

Kelly Wanser: Our methods of operation are policy and research, the methods of intervention in particular that I’m referring to are these technologies and approaches for directly and rapidly reducing warming in the climate system.

Lucas Perry: So in what you just said you mentioned that you’re concerned about catastrophic risks from climate change for example, in the next 10 to 30 years. Could you paint us a little bit of a picture about why that kind of timescale is relevant? I think many people and myself included might have thought that the more significant changes would take longer than 10 to 30 years. So what is the general state of the climate now and where we’re heading in the next few decades?

Kelly Wanser: So I think there are a couple of key issues in the evolution of climate change and what to expect and how to think about risk. One is that the projections that we have, it’s a tough type of system and a tough type of situation to project and predict. And there are some things that climate modelers and climate scientists know are not adequately represented in our forecasts and projections. So a lot of the projections we’ve had over the past 10 or 15 years talk about climate change through 2100. And we see these sort of smooth curves depending on how we manage greenhouse gases. But people who are familiar with climate system itself or complex type systems problems know that there are these non-linear events that are likely to happen. Now in climate models they have a very difficult time representing those. So in many cases they’re either sort of roughly represented or excluded entirely.

And those are the things that we talk about in terms of abrupt change and tipping points. So our climate model projections are actually missing or under representing tipping points. Things like the release of greenhouse gases from permafrost that could happen suddenly and very quickly as the surface melts. Things like the collapse of big ice sheets and the downstream effects of that. So one of the concerns that we have in SilverLining is that some of the things that tech people know how to do, so similar problems to manage an IT network. It’s a highly complex systems problem, where you’re trying to maintain a stable state of the network. And some of the techniques that we use for doing that have not been fully applied to looking at the climate problem. Similarly, some of the similar techniques we use in finance, one of our advisors is the former director of global research at Goldman Sachs.

And this is a problem we’re talking to him about and folks in the IPCC and other places, essentially we need some new and different types of analysis applied to this problem beyond just what the climate models do. So problem number one is that our analytic techniques are under representing the risk, and particularly potentially risk in the near term. The second piece is that these abrupt climate changes tend to be highly related to what they call feedbacks, meaning that there are points at which these climate changes produce effects that either put warming back in the system or greenhouse gases back in the system or both. And once that starts to happen, the problem could get away from us in terms of our ability to respond. Now we might not know whether that risk is 5%, 10% or 80%. From SilverLinings perspective, from my perspective, any meaningful risk of that in the next 10 to 30 years is an unacceptable level of risk, because it’s approaching somewhere between catastrophic and existential.

So we’re less concerned about the arm wrestle debate over is there some scenario where we can constrain the system by just reducing greenhouse gases. We’re concerned about, are there scenarios where that doesn’t work, scenarios where the system moves faster than we can constrain greenhouse gases? The final thing I’ll say is that we’re seeing evidence of that now. So some of the things that we’re seeing like these extra ordinaries of wildfire events, what’s happening to the ice sheets. These are things that are happening at the far end of bad predictions. The observations of what’s happening in the system are indicative of the fact that that risk could be pretty high.

Lucas Perry: Yeah. So you’re ending here on the point that say fires that we’re observing more recently are showing that tail end risks are becoming more common. And so they’re less like tail end risks and more like becoming part of the central mass of the Gaussian curve?

Kelly Wanser: That’s right.

Lucas Perry: Okay. And so I want to slow down a little bit, because I think we introduced a bunch of crucial concepts here. One of these is tipping points. So if you were to explain tipping points in one to two sentences to someone who’s not familiar with climate science, how would you do that?

Kelly Wanser: The metaphor that I like to use is similar to a fever in the human body. Warming heat acts as a stressor on different parts of the system. So when you have a fever, you can carry a fever up to a certain point. And if it gets high enough and long enough, different parts of your body will be affected, like your brain, your organs and so on. The trapped heat energy in the climate system acts as a stressor on different parts of the system. And they can warm a bit over a certain period of time and they’ll recover their original state. But beyond a certain point, essentially the conditions of heat that they’re in are sufficiently different than what they’re used to, that they start to fundamentally change. And that can happen in biological systems where you start to lose the animal species, plant species, that can happen in physical systems where the structure of an ice sheet starts to disintegrate, and once that structure breaks down, it doesn’t come back.

Forests have this quality too where if they get hot enough and dry enough, they may pass a point where their operation as a forest no longer works and they collapse into something else like desertification. So there are two concerns with that. One is that we lose these big systems permanently because they change the state in a way that doesn’t recover. And the second is that when they do that, they either add warming or add greenhouse gases back into the system. So when an ice sheet collapse for example, these big ice structures, they reflect a huge amount of sunlight back out to space. And when we lose them, they’re replaced by dark water. And so that’s basically a trade-off from cooling to warming that’s happening with ice. And so there are different things like that, where that combination of losing that system and then having it really change the balance of warming is a double faceted problem.

Lucas Perry: Right, so you have these dynamic systems which play an integral part in maintaining the current climate stability, and they can undergo a phase state change. Like water is water until you hit a certain degree. And then it turns into ice or it evaporates and turns into steam, except you can’t go back easily with these kinds of systems. And once it changes, it throws off the whole more dynamic context that it’s in, it’s stabilizing the environment as we enjoy it.

Kelly Wanser: One of the problems that you have is not just that any one of these systems might change its state and might start putting warming or greenhouse gases back into the atmosphere, but they’re linked to each other. And so then they call that the cascade effect where one system changes its state and that pushes another system over the edge, and that pushes another system over the edge. So a collapse of ice sheets can actually accelerate the collapse of the Amazon rainforest for example, through this process. And that’s where we come more towards this existential category where we don’t want to come anywhere near that risk and we’re dangerously near it.

And, so one of the problems that scientists like Will Steffen and some arctic scientists for example are seeing, is that some of these tipping points they think we’re in. I work with climate scientists really closely, and I hear them saying, “We may be in it. Some of these tipping points are starting to occur.” And so the ice ones, we have front page news on that, the forest ones we’re starting to see. So that’s where the concern becomes that we sort of lack the measures to address these things if they’re happening in the next one, two or three decades.

Lucas Perry: Is this where a word like runaway climate change becomes relevant?

Kelly Wanser: Yes. When I came into the space like 12 years ago, and for many of your listeners, I came in from tech first as a sort of area of passion interest. And one of the first people I talked to was a climate scientist named Steve Schneider, who was at Stanford at the time, and he since passed away, but he was a giant of the field. And I asked him kind of the question you’re referring to, which is how would you characterize the odds of runaway change within our lifetime? And he said at that time, which was about 12 years ago, I put it in the single digits, but not the low single digits. My reaction to that was, if you had those odds of winning the lottery, you’d be out buying tickets. And that’s an unacceptable level of risk where we don’t have responses that really meaningfully arrest or reduce warming in that kind of time.

Lucas Perry: Okay. And so another point here is you used the word “existential” a few times here, and you’ve also used the word “global catastrophic.” I think broadly within the existential risk community, at least the place where I come from, climate change is not viewed as an existential risk. Even if it gets really, really, really bad, it’s hard to imagine ways in which it would kill all people on the planet rather than like make life very difficult for most of them and kill large fractions. And so it’s generally viewed as a global catastrophic threat being that it would kill large fractions, but not be existential. What is your reaction to that? And how do you view the use of the word “existential” here?

Kelly Wanser: Well, so for me there are two sides to that question. I normally stay on one of the two sides, which is for SilverLining our mission is to prevent suffering. The loss of a third of the population of the planet or two thirds of the population of the planet and the survival of some people in interconnected bubbles, which I’ve heard top analysts talk about. For us that’s an unacceptable level of suffering and an unacceptable outcome. And so in that way the debate about whether it’s all people or just lots of people is for us not material, because that whole situation seems to be not a risk that you want to take. In the other side of your question, whether is it all people and is it planetary livability? I think that question is subject to some of the inability to fully represent all of the systemic effects that happen at these levels of warming.

Early on when I talked about this with the director of NASA Ames at the time, who’s now at Planet Labs. What he talked to me about was the changes in chemistry of the earth system. This is something that hasn’t maybe been explored that widely, but we’re already looking at collapses of life in the ocean. And between the ocean and the land systems that generates a lot of the atmosphere that we’re familiar with and that’s comfortable for people. And there are risks to that, that we can’t have these collapses of biological life and necessarily maintain the atmosphere that we’re used to. And so I think that it’s inappropriate to discount the possibility that the planet could become largely unlivable at these higher levels of heat.

And at the end of the runaway climate change scenario, where the heat levels get very high and life collapses in an extreme way, I don’t think that’s been analyzed well enough yet. And I certainly wouldn’t rule it out as an existential risk. I think that that would be inappropriate, given both our level of knowledge and the fact that we know that we have these sort of non-linear cascading things that are going to happen. So to me, I challenge the existential threat community to look into this further.

Lucas Perry: Excellent.

Kelly Wanser: Put it out there.

Lucas Perry: I like that. Okay, so given tipping points and cascading tipping points, you think there’s a little bit more uncertainty over how unlivable things can get?

Kelly Wanser: I do. And that’s before you also get into the societal part of it, right? Going back to what I think has been one of the fundamental problems of the climate debate is this idea that there are winners and losers and that this is a reasonably survivable situation for a certain class of people. There’s a reasonable probability that that’s not the case, and this is not going to be a world that anyone, if they do get to live in it, is going to enjoy.

Lucas Perry: Even if you were a billionaire back before climate change and you have your nice stocked bunker, you can’t keep stocking it, your money won’t be worth anything.

Kelly Wanser: In a world without strawberries and lobsters and rock concerts and all kinds of things that we like. So I think we’re much more in it together than people think. And that over the course of many millennia, humans were engineered and fine tuned to this beautiful, extremely complicated system that we live in. And we’re pushing it, we can use our technology to the best of our ability to adapt, but this is an environment that’s beautifully made for us and we’re pushing it out of the state that supports us.

Lucas Perry: So I’d be curious if you could expand just fairly briefly here on more of the ways in which these systems, which help to maintain the current climate status function. So for example, like the jet stream and the boreal forest and the Amazon rainforest and the Sahel in the Indian summer monsoon and the permafrost and all these other things. If you can choose, I don’t know, maybe one or two of your favorites or something or whichever or few are biggest, I’m curious how these systems help continue to maintain the climate stability?

Kelly Wanser: Well, so there are people more expert than me, but I’ll talk about a couple that I care about a lot. So one is the permafrost, which is the frozen layer of earth. And that frozen layer of earth is under the surface in landmasses and also frozen layers under the ocean. For many thousands of years, if not longer, those layers capture and build up biological life that’s died and decayed within these frozen layers of earth and store massive amounts of carbon. And so provided the earth system is working within its usual parameters, all of those masses stay frozen, and that organic material stays there. As it warms up in a way that it moves beyond its normal range of parameters, then that stuff starts to melt and those gases start to be released. And the amount of gas stored in the permafrost is massive. And particularly it includes both CO2 and the more dense, fast acting gases like methane. We’re kind of sitting on the edge of that system, starting to melt in a way where those releases could be massive.

And in my work that’s to me one of the things that we need to watch most closely, that’s a potential runaway situation. So that’s one, and that’s a relatively straightforward one, because that’s a system storing greenhouse gases, releasing greenhouse gases. They range in complexity. Like the arctic is a much more complicated one because it’s related to all the physics of the movement of the atmosphere and ocean. So the circulation of the way the jet stream and weather patterns work, the circulation of the ocean and all of that. So there could be potential drastic effects on what weather is where on the planet. So major changes in the Arctic can lead to major changes in what we experience as like our normal weather. And we’re already seeing this start to happen in Europe. And that was predicted by changes in the jet stream where Europe’s always had this kind of mild sort of temperate range of temperature.

And they’re starting to see super cold winters and hot summers. And that’s because the jet stream is moving. And a lot of that is because the Arctic is melting. A personal one that’s dear to me and it is actually happening now and we may not be able to stop no matter what we do, are the coral reefs. Coral reefs are these organic structures and they teem with all different levels of life. And they trace up to about quarter of all life in the ocean. So as these coral reefs are getting hit by these waves of hot water, they’re dying. And ultimately they’re collapsed, so mean the collapse of at least 25% of life in the ocean that they support. And we don’t really know fully what the effects of that will be. So those are a few examples.

Lucas Perry: I feel like I’ve heard the word heat stress before in relation to coral reefs and then that’s what kills it.

Kelly Wanser: Yep.

Lucas Perry: All right. So before we move into the area you’re interested in, intervening as a potential solution if we can’t get the greenhouse gases down enough, are there any more bad things that we missed or bad things that would happen if we don’t sufficiently get climate change under control?

Kelly Wanser: So I think that there are many, and we haven’t talked too much about what happens on the human side. So there are even thresholds of direct heat for humans like the hot bulb temperature. I’m not going to be able to describe it super expertly, but the combination of heat and humidity at which the human body changes the way it’s expiring heat and that heat exchange. And so what’s happening in certain parts of the world right now, like in parts of India, like Calcutta, there’s an increasing number of days of the year where it’s not safe to work outside. And there were some projections that by 2030 there would be no days in Calcutta where it was safe to work outside. And we even see parts of the U.S. where you have these heat warnings. And right now, as a direct effect on humans, I just saw a study that said the actual heat index is killing more people than the smoke from fires.

The actual increase in heat is moving past where humans are actually comfortable living and interacting. As a secondary point, obviously in developed countries we have lots of tools for dealing with that in terms of our infrastructure. But one of the things that’s happening is the system is moving outside the band in which our infrastructure was built. And this is a bit of an understudied area. As warming progresses, and you have extreme temperature, you have more flooding, you have extreme storms and winds. We have everything from bridges to nuclear plants, to skyscrapers that were not engineered for those conditions. Full evaluation of that is not really available to us yet. And so I think we may be underestimating, even things like in some of these projections, we know that our sea level rise happens and extreme storms happen, places like Miami are probably lost.

And in that context, what does it mean to have a city the size of Miami sitting under water at the edge of the United States? It would be a massive environmental catastrophe. So I think unfortunately we haven’t looked closely enough at what it means for all of these parts of our human infrastructure for their external circumstances to be outside the arena they were engineered for.

Lucas Perry: Yeah. So natural systems become stressed. They come to fail, there could be cascades. Human systems and human infrastructure becomes stressed. I mean, you can imagine like nuclear facilities and oil rigs and whatever else can cause massive environmental damage getting stressed as well by being moved outside of the normal bandwidth of operation. It’s just a lot of bad things happening after bad things after bad things.

Kelly Wanser: Yeah. And you know, a big problem. Because I’ve had this debate with people who are bullish on adaptation. Hey, we can adapt to this, but the problem is you have all these things happening concurrently. So it’s not just Miami, it’s Miami and San Francisco and Bangladesh. It’s going to be happening lots of different variants of it happening all at the same time. And so anything we could do to prevent that, excuse my academic language, shit show is really something we should consider closely because the cost of that and this sort of compound damage is just pretty staggering.

Lucas Perry: Yeah. It’s often much cheaper to prevent risks than to deal with them when they come up and then clean up the aftermath. So as we try to avoid moderate to severe bad effects of climate change, we can mitigate. I think most everyone is very familiar with the idea of reducing greenhouse gas emissions. So the kinds of gases that help trap heat inside of the atmosphere. Now you’re coming at this from a different angle. So what is the research interest of SilverLining and what is the intervention of mitigating some of the effects of climate change? What is that intervention you guys are exploring?

Kelly Wanser: Well, so our interest is in the near term risk. And so therefore we focus most closely on things that might have the potential to act quickly to substantially reduce warming in the climate system. And the problem with greenhouse gas reduction and a lot of the categories of removing greenhouse gases from the air, are that they’re likely to take many decades to scale and even longer to actually act on the climate system. And so if we’re looking at sub 30 years where we’re coming from and SilverLining is saying, “We don’t have enough in that portfolio to make sure that we can keep the system stable.” We are a science led organization, meaning we don’t do research ourselves, but we follow the recommendations of the scientific community and the scientific assessment bodies. And in 2015 the National Academy of Sciences in the United States ran an assessment that looked at the different sort of technological interventions that might be used to accelerate, addressing climate warming and greenhouse gases.

And they issued two reports, one called climate intervention, carbon dioxide removal, and one called climate intervention, reflecting sunlight to cool earth. And what they found was that in the category where you’re looking to reduce warming quickly within a decade or even a few years, the most promising way to try to do that as based on one of the ways that the earth system actually regulates temperature, which is the reflection of sunlight from particles and clouds in the atmosphere. The theories behind why they think this might work are based on observations from the real world. And so what I’m showing you right now is a picture of a cloud bank off the Pacific West coast and the streaks in the clouds are created by emissions from ships. The particulates in those emissions, usually what people think of as the dirty stuff, has a property where it often mixes with clouds in a way that will make the clouds slightly brighter.

And so based on that effect, scientists think that there’s cooling that could be generated in this way actively, and also that there’s actually cooling going on right now as a result of the particulate effects of our emissions overall. And they think that we have this accidental cooling going on somewhere between 0.5 degrees and 1.1 degrees C, and this is something that they don’t understand very well, but is potentially both a promise and a risk when it comes to climate.

Lucas Perry: So there’s some amount of cooling that’s going on by accident, but the net anthropogenic heating is positive, even with the cooling. I think one facet of this that I learned from looking into your work is that the cooling effect is limited because the particles fall back down and so it goes away. And so there might be a period of acceleration of the heat. Is that right?

Kelly Wanser: Yes. I think what you’re getting at. So two things I’ll say, these white lines indicate the uncertainty. And so you can see the biggest line is on that cloud albedo effect, which is how much do these particles brighten clouds. The effects could be much bigger than what’s going into that net effect bar. And a lot of the uncertainty in that net effect bar is coming from this cloud albedo effect. Now the fact that they fall is an issue, but what happens today for the most part is we keep putting them up there. As long as you continuously put them up there, you continuously have this effect. If you take it away, which we’re doing a couple of big experiments in this year, then you lose that cooling effect right away. And so one of the things that we’re hoping to help with is getting more money for research to look at two big events that took that away this year.

One is the economic shutdowns associated with COVID where we had these clean skies all over the world because all this pollution went down. That’s a big global experiment in removing these particles that may be cooling. We are hoping to gain a better understanding from that experiment if we can get enough resources for people to look at it well.

Lucas Perry: So, the uncertainty with the degree to which current pollution is reflecting sunlight, is that because we have uncertainty over exactly how much pollution there is and how much sunlight that is exactly reflecting?

Kelly Wanser: It’s not that we don’t know how much pollution there is. I think we know that pretty well. It’s that this interaction between clouds and particles is one of the biggest uncertainties in the climate system. And there’s a natural form of it, when you see salt spray generating clouds, you’re in Big Sur looking at the waves and the clouds starting to form, that whole process is highly complex. Clouds are among the most complex creatures in our earth system. And they’re based on the behavior of these tiny particles that attract water to them and then create different sizes of droplets. So if the droplets are big, they reflect less total sunlight off less total surface area, and you have a dark cloud. And eventually, the droplets are big enough, they fall down as rain. If the droplets are small, there’s lots of surface area and the cloud becomes brighter.

The reason we have that uncertainty is that we have uncertainty around the whole process and some of the scientists that we work with in SilverLining, they really want to focus on that because understanding that process will tell you what you might be able to do with that artificially to create a brightening effect on purpose, as well as how much of an accidental effect we’ve got going on.

Lucas Perry: So you’re saying we’re removing sulfate from the emission of ships, and sulfate is helping to create these sea clouds that are reflecting sunlight?

Kelly Wanser: That’s right. And it happens over land as well. All the emissions that contain these sulfate and similar types of particles can have this property.

Lucas Perry: And so that, plus the reduction of pollution given COVID, there is this ongoing experiment, an accidental experiment to decrease the amount of reflective cloud?

Kelly Wanser: That’s right. And I should just note that the other thing that happened in 2020 is that the International Maritime Organization implemented regulations to drastically reduce emissions from ships. Those went into effect in January, an 85% reduction in these sulfate emissions. And so that’s the other experiment. Because sulfate and these emissions, we don’t like as pollutants for human health, for local ecosystems. They’re dirty. So we don’t like them for very good reasons, but they happen to have the side effect of producing a brightening effect on clouds, and that’s the piece we want to understand better.

When I talk to especially people in the Bay Area and people who think about systems, about this particular dynamic, most of the people that I’ve talked to were unfamiliar with this. And lots of people, even who think about climate a lot, are unfamiliar with the fact that we have this accidental cooling going on. And that as we reduce emissions, we have this uncertain near-term warming that may result from that, which I think is what you were getting at.

Lucas Perry: Yeah.

Kelly Wanser: So where I’m headed with this is that in the early ’90s, some British researchers proposed that you might be able to produce an optimized version of this effect using sea salt particles, like a salt mist from seawater, which would be cleaner and possibly actually produce a stronger effect because of the nature of the salt particles, and that you could target this at areas of unpolluted clouds and certain parts of the ocean where they’d be most susceptible, and you’d get this highly magnified reflective effect. And that in doing that, in these sort of few parts of the world where it would work best by brightening 10% to 20% of marine clouds or, say, the equivalent of 3% to 5% of the ocean’s surface, you might offset a doubling of CO2 or several degrees of warming. And so that’s one approach to this kind of rapid cooling, if you like, that scientists are thinking about that’s related to an observed effect.

This marine cloud brightening approach has the characteristic that you talked about, that it’s relatively temporary. So you have to do it continuously, last a few days and otherwise, if you stop, it stops. And it’s also relatively localized. So it opens up theoretical possibilities that you might consider it as a way of cooling ocean water and mitigating climate impacts regionally or locally. In theory, what you might do is engage in this technique in the months before hurricane season. So your goal is to cool the ocean surface temperatures, which are a big part of what increases the energy and the rainfall potential of storms.

So, this idea is very theoretical. There’s been almost no research in it. Similarly, there’s a little bit of emerging research in could you cool waters that flow on to coral reefs? And you might have to do this in areas that are further out from the coral reefs because coral reefs tend to be in places where there are no clouds, but your goal is to try to get those big currents of water they’re flowing on and cool them off. There was a little test, very little tests, tiny little tests of the technology that you might use down in Australia as part of their big program, I think it’s an $800 million program, to look at all possibilities for saving the Great Barrier Reef.

Lucas Perry: Okay. One thing that I think is interesting for you to comment on briefly is I think many people, and myself included, don’t really have a good intuition about how thick the atmosphere is. You look up and it’s just big open space, maybe it goes on forever or something. So how thick is it? Put it into scale so it makes sense that seven billion humans could effect it in such large scale ways.

Kelly Wanser: We’re going to talk about it a little bit differently because the things I’m talking to you about are different layers of the atmosphere. So, the idea that I talked to you about here, marine cloud brightening, that’s really looking at the troposphere, which is the lowest layer of the atmosphere, which are, when you look up, these are the clouds I see. It’s the cloud layer that’s closest to earth that’s going from sort of 500 feet up to a couple thousand feet. And so in that layer, you may have the possibility, especially over the ocean, of generating a mist from the surface where the convection, the motion of the air above you kind of pulls the particles up into the cloud layer. And so you can do this kind of activity potentially from the surface, like from ships. And it’s why the pollution particles are getting sucked up into the clouds too.

So that idea happens at that low layer, sub mile layer, visible eye layer of stuff. And for the most part, what’s being proposed in terms of volume of material, or when scientists are talking about brightening these clouds, they’re talking about brighten them 5% to 7%. So it’s not something that you would probably see as a human with your own naked eyes, and it’s over the ocean, and it’s something that would have a relatively modest effect on the light coming in to the ocean below, so probably, a relatively modest effect on the local ecology, except for the cooling that it’s creating.

So in that way, it’s potentially less invasive than people might think. Where the risks are in a technique like this are really around the fact that you’re creating these sort of concentrated areas of cooling, and those have the potential to move the circulation of the atmosphere and change weather patterns in ways that are hard to predict. And that’s probably the biggest thing that people are concerned about with this idea.

Now, if you’d like, I can talk about what people are proposing at the other side, the high end of the atmosphere.

Lucas Perry: Yeah. So I was about to ask you about stratospheric sunlight reflection.

Kelly Wanser: Yeah, because this is the one that most people have heard about those have heard about, and it’s the most widely studied and talked about, partly because it’s based on events that have occurred in nature. Large volcanoes push material into the atmosphere and very large ones can push material all the way into the outer layer of the atmosphere, the stratosphere, which I thinks starts at about 30,000 or 40,000 feet and goes up for a few miles. So when particles reach the stratosphere, they get entrained and they can stay for a year or two.

And when Mount Pinatubo erupted in 1991, it was powerful and it pushed particles into the stratosphere that stayed there for almost two years. And it produced a measurable cooling effect in the entire planet of at least a half a degree C, actually, popped up closer to one degree C. So this cooling effect was sustained. It was very clear and measurable. It lasted until the particles fell down to earth, and it also produced a marked change in Arctic ice where Arctic ice mass just recovered drastically. This cooling effect where the particles reach the stratosphere, they immediately or very quickly get dispersed globally. So it’s a global effect, but it may have an outsize effects on the Arctic, which warms and cools faster than the rest of the planet.

This idea, and there are some other examples in the volcanic record, is what led scientists, including the Nobel prize winning scientist who identified the ozone hole, Paul Crutzen, to suggest that one approach to offsetting the warming that’s happening with climate change would be to introduce particles in the stratosphere that reflects sunlight directly, almost kind of bedazzling the stratosphere, and that by increasing this reflectivity by just 1%, that you could offset a doubling of CO2 or several degrees of warming.

Now the particles that volcanoes release in this way are similar to the pollution particles on the ground. There are sulfates and there are precursors. These particles also have the property where they can damage the ozone layer and they can also cause the stratosphere itself to heat up, and so that introduces risks that we don’t understand very well. So that’s what people want to study. There isn’t very much research on this yet, but one of the earliest models is produced at NCAR that compared the course of global surface temperatures in a business as usual scenario in the NCAR global climate model versus introducing particles into the stratosphere starting in 2020, gradually increasing them and maintaining temperatures through the end of the century. And what you can see in that model representation is that it’s theoretically possible to keep global surface temperatures close to those of today with this technique and that if we were to go down this business as usual path or have higher than expected feedbacks that took us to something similar, that, that’s not a very livable situation for most people on the planet.

Lucas Perry: All right. So you can intervene in the troposphere or the stratosphere, and so there’s a large degree of uncertainty about indirect effects and second and third order effects of these interventions, right? So they need to be studied because you’re impacting a complex system which may have complex implications at different levels of causality. But the high level strategy here is that these things may be necessary if we’re not able to reduce greenhouse gas emissions sufficiently. That’s why we may be interested in it for mitigating some degree of climate change that happens or is inevitable.

Kelly Wanser: That’s right. There’s a slight sort of twist on that, I think, where it’s really about, if we can, trying to look at these dangerous instabilities and intervene before they happen or before they take us across thresholds we don’t want to go. It is what you’re saying, but it’s a little bit of a different shade where we don’t wait to see how our mitigation effort is going necessarily. What we need to do is watch the earth system and see whether we’re reaching kind of a red zone where we’ve got to bring the heat down in the system.

Lucas Perry: What kinds of ongoing experiments are happening for studying these tropospheric and stratospheric interventions in climate change?

Kelly Wanser: Well, so the first thing we’ll say is that the research in this field has been very taboo for most of the past few decades. So, relative to the problem space, very little research has been done. And the global level of investment in research even today is probably in the neighborhood of $10 million a year, and that includes a $3 million a year program in China and a program at Harvard, which is really the biggest funded program in the world. So, relative to the problem space and the potential, we’re very under-invested. And the things I’m going to talk to you about are really promising, and there are prestigious institutions and collaborations, but they’re still at, what I would call, a very seed level of funding.

So the two most significant interdisciplinary programs in the field, one is aimed at the stratosphere, and that’s a program at Harvard called the Harvard Solar Geoengineering Program and includes social science and physical sciences, but a sort of flagship of what they’re trying to do is to do an experiment in the stratosphere. And in their case, they would try to use a balloon, which is specially crafted to navigate in the stratosphere, which is a hard problem, so that they can do releases of different materials to look at their properties in the stratosphere as they disperse and as they mix with the gases in the stratosphere.

And so for understanding, what we hope, and I think the people in the field, is that we can do these small scale experimental studies that help you populate models that will better predict what happens if you did this at a bigger scale. So, the scale of this is tiny. It’s less than a minute of an emissions of an aircraft. It’s tiny, but they hope to be able to find out some important things about the properties of the chemical interactions and the way the particles disperse that would feed into models that would help us make predictions about what will happen when you do this and also, what materials might be more optimum to use.

So in this case, they’re going to look at sulfates, which we talked about, but also materials that might have better properties. Two of those are calcium carbonate, which is what were used doing chalk, and diamonds. What they hope to do is start down the path to finding out more about how you might optimize this in a way to minimize the risks.

The other effort is on the other side of the United States, this is an effort that’s based at the University of Washington, which is one of the top atmospheric science institutions in the country. It’s a partnership with Pacific Northwest National Labs, there’s the Department of Energy Lab, and PARC, which many of your community may know it’s the famous Xerox PARC, who has since developed expertise in aerosols.

At the University of Washington, they are looking to do a set of experiments that would get at this cloud brightening question. And their scientific research and their experiments are classified as dual purpose, meaning that they are experiments about understanding this climate intervention technique, can we brighten clouds to actively cool climate, but they’re also about getting out the question of what is this cloud aerosol effect? What is the accidental effect of emissions having and how does this work in the climate system more broadly? So, what they’re proposing to do is build a specialized spray technology. So one of the characteristics of both efforts is that you need to create almost a nano mist, the particles, 80 to 100 nanometers, very consistently, at a massive scale. That hasn’t been done before. And so how do we generate this massive number of tiny droplets of materials of salt particles from seawater or calcium carbonate particles?

And some retired physicists and engineers in Silicon Valley took on this problem about eight years ago. And they’ve been working on it for four days a week in their retirement for free for the sake of their grandchildren to invent this nozzle that I’m showing you, which is the first step of being able to generate the particles that you need to study here. They’re in the phase right now, where, because of COVID, they’ve had to set up a giant tent and do indoor spray tests, and they hope next year to go out and do what they call individual plume experiments. And then eventually, they would like to undertake what they call limited area field experiment, which would actually be 10,000 square kilometers, which is the size of a grid cell on a climate model. And that would be the minimum scale at which you could actually potentially detect a brightening effect.

Lucas Perry: Maybe it makes sense on reflection, but I guess I’m kind of surprised that so much research is needed to figure out how to make a nozzle make droplets of aerosol.

Kelly Wanser: I think I was surprised too. It turns out, I think for certain materials, and again, because you’re really talking about a nano mist, like silicon chip manufacturer, like asthma inhaler. And so here, we’re talking about three trillion particles a second from one nozzle and an apparatus that can generate 10 to the 16th particles and lift it up a few hundred meters.

It’s not nuclear fusion and it wouldn’t necessarily have taken eight years if they were properly funded and it was a focus program. I mean, these guys, the lead Armand Neukermans funded this with his own money and he was trading the biscottis from Belgium. He was trading biscottis for measurement instruments. And so it’s only recently in the past year or two where the program has gotten its first government funding, some from NOAA and some from the Department of Energy, very relatively small and more focused on the scientific modeling, and some money from private philanthropy, which they’re able to use for the technology development.

But again, going back to my comment earlier, this has been a very taboo area for scientists to even work in. There have been no formal sources of funding for it, so that’s made it go a lot slower. And the technology part is the hardest and most controversial. But overall, as a point, these things are very nascent. And the problem we were talking about at the beginning, predicting what the system is going to do, that in order to evaluate and assess these things properly, you need a better prediction system because you’re trying to say, okay, we’re going to perturb the system this way and this way and predict that the outcome will be better. It’s a tough challenge in terms of getting enough research in quickly. People have sort of propagated the idea that this is cheap and easy to do, and that it could run away from us very quickly. That has not been my experience.

Lucas Perry: Run away in what sense? Like everyone just starts doing it?

Kelly Wanser: Some billionaire could take a couple of billion dollars and do it, or some little country could do it.

Lucas Perry: Oh, as even an attack?

Kelly Wanser: Not necessarily an attack, but an ungoverned attempt to manage the climate system from the perspective of one individual or one small country, or what have you. That’s been a significant concern amongst social scientists and activists. And I guess my observation, working closely with it is, there are at least two types of technology that don’t exist yet that we need, so we have a technology hurdle. These things scale linearly and they pretty much stop when you stop, specifically referring to the aerosol generation technology. And for the stratosphere, we probably actually need a new and different kind of aircraft.

Lucas Perry: Can you define aerosol?

Kelly Wanser: I’ll caveat this by saying I’m not a scientist, so my definition may not be what a scientist would give you. But generally speaking, an aerosol is particles mixed with gases. It’s a manifestation in error of a mixed blend of particles and gases. I’ll often talk about particles because it’s a little bit clearer, and what we’re doing with these techniques for the most part is dispersing particles in a way that they mix with the atmosphere and…

Lucas Perry: Become an aerosol?

Kelly Wanser: Yeah. So, I would characterize the challenge we have right now is that we actually have a very low level of information and no technology. And these things would take a number of years to develop.

Lucas Perry: Yeah. Well, it’s an interesting future to imagine the international politics of weather control, like in negotiating whether to stop the hurricanes or new powers we might get over the weather in the coming decades.

Kelly Wanser: Well, you bring up an interesting point because as I’ve gotten into this field, I’ve learned about what’s going on. And actually, there’s an astonishing amount of weather modification activity going on in the world and in the United States.

Lucas Perry: Intentional?

Kelly Wanser: Intentional, yeah.

Lucas Perry: I think I did hear that Russia did some cloud seeding, or whatever it’s called, to stop some important event getting rained on or something.

Kelly Wanser: Yeah. And that kind of thing, if you remember the Beijing Olympics where they seeded clouds to generate rain to clear the pollution, that kind of localized cloud seeding type of stuff has gone on for a long time. And of course, I’m in Colorado, there’s always been cloud seeding for snowmaking. So what’s happened though in the Western United States, there’s even an industry association for weather modification in the United States. What started out as, especially snowmaking and a little bit of attempt to affect a snow pack in the West, has grown. And so there are actually major weather modification efforts in seven or eight Western states in the United States. And they’re mostly aimed at hydrology, like snow pack and water levels.

Lucas Perry: Is the snow pack for a ski resort?

Kelly Wanser: I believe, and I’m not an expert on the history of this, but I believe that snowmaking started out from the ski resorts, but when I say snow pack, it’s really about the water table. It’s about effecting the snow levels that generate the water levels downstream. Because in the West, a lot of our water comes from snow.

Lucas Perry: And so you want to seed more snow to get more water, and the government pays for that?

Kelly Wanser: I can’t say for sure who pays. This is still an exploration for us, but there are fairly significant initiatives in many Western states. And like I said, they’re primarily aimed at the problem of drought and hydrology. That’s in the United States. And if you look at other parts of the world, like the United Arab Emirates, they have a $400 million rainmaking fund. Can we make rain in the desert?

Lucas Perry: All right.

Kelly Wanser: Flip side of the coin. In Indonesia in January, this was in the news, they were seeding clouds off shore to induce rainfall off shore to prevent flooding, and they did that at a pretty big scale. In China last year, they announced a program to increase rainfall in the Tibetan plain, in an area the size of Alaska. So we are starting to see, I think around the world, and this activity would likely grow, weather extremes and attempts to deal with them locally.

Lucas Perry: Yeah. That makes sense. What are they using to do this?

Kelly Wanser: The traditional material is silver dioxide. That’s what’s proposed in the Chinese program and many of the rainmaking types of ideas. There are two things we’ll start to see, I think, as climate extremes grow and there’s pressure on politicians to act, growing interest in the potential for global mechanisms to reduce heat and bottoms up efforts that just continue to expand that try to manage weather extremes in these kinds of ways.

Lucas Perry: So we have this tropospheric intervention by using aerosols to generate clouds that will reflect sunlight, and then we have the stratospheric intervention, which aims to release particles which do something similar, how do you view the research and the project of understanding these things as fitting in with and informing efforts to decrease greenhouse gas emissions? And then also, the project of removing them from the atmosphere, if that’s also something people are looking into?

Kelly Wanser: I think they’re all very related because at the end of the day, from the SilverLining perspective and a personal perspective, we see this as a portfolio problem. So, we have a complex system that we need to manage back into a healthy state, and we have kind of a portfolio of things that we need to apply at different times and different ways to do that. And in that way, it’s a bit like medicine, where the interventions I’m talking about address the immediate stressor.

But to restore the system to health, you have to address the underlying cause. Where we see ourselves as maybe helping bridge those things is that we are under-invested in climate research and climate prediction. In the United States, our entire budget for climate research is about 2-1/2 billion dollars. If you put that in perspective, that’s like one 10th of an aircraft carrier. It’s half of a football stadium. It’s paltry. This is the most complicated, computing-intensive problem on planet earth.

It takes massive super computing capacity and all the analytical techniques you can throw at it to try to reduce the uncertainty around what’s going to happen to these systems. What I believe happened, in the past few decades, is the problem was defined as a need to limit greenhouse gases. So if you think of an equation, where one side is the greenhouse gases going in, and the other side is what happens to the system on the other end. We’ve invested most of our energy in climate advocacy and climate policy about bringing down greenhouse gases, and we’re under-invested in really trying to understand and predict what happens on the other side.

When you look at these climate intervention techniques, like I’m talking about, it’s pretty critical to understand and be able to predict what happens on the other side. It turns out, if you’re looking at the whole portfolio, typically, if you want to blend in these sort of nature-based solutions that could bring down greenhouse gases, but they have complex interaction with the system. Right? Like building new forests, or putting nutrients on the ocean. That need to better understand the system and better predict the system, it turns out we really need that. It would behoove us to be able to understand and predict these tipping points better.

I think that then where the interventions come in is to try to say, “Well, what does reducing the heat stress, get you in terms of safety? What time does it by you for these other things to take effect?” That’s kind of where we see ourselves fitting in. We care a lot about mitigation, about let’s move away from this whole greenhouse gas emissions business. We care a lot about carbon removal, and accelerating efforts to do that. If somebody comes up with a way to do carbon removal at scale in the next 10 years, then we won’t need to do what we’re doing. But that doesn’t look like a high probability thing.

And so what we’ve chosen to do is to say there’s a part of the portfolio that is totally unserviced. There are no advocates. There’s almost no research. It’s taboo. It’s complicated. It requires innovation. That’s where we’re going to focus.

Lucas Perry: Yeah. That makes sense. Let’s talk a little bit about this taboo aspect. Maybe some number of listeners have some initial reaction. Like anytime human beings try to in complex systems, there’s always unintended consequences or things happen that we can’t predict or imagine, especially in natural systems. How would you speak to, or connect with someone who viewed this project of releasing aerosols into the atmosphere to create clouds or reflect sunlight as dangerous?

Kelly Wanser: I’ll start out by saying, I have a lot of sympathy with that. If we were 30 years ago, if you’re at a different place in this sort of risk equation, then this kind of thing really doesn’t make any sense at all. If we’re in 1970 or 1980, and someone’s saying, “Look, we just need to economically tune the incentives, so that we phase greenhouse gases out of the bulk of our economic system,” that is infinitely smarter and less risky.

I believe that a lot of the principle and structure of how we think about the climate problem is based on that, because what we did was really stupid. It would be the same thing as if the doctor said, “Well, you have stage one cancer. Stop smoking,” and you just kept on puffing away. So I am very sympathetic to this. But the primary concern that we’re focused on, are now our forward outcomes and the fact that we have this big safety problem.

So now, we’re in a situation where we have greenhouse gas concentrations that we have. They were already there. We have warming and system impacts that are already there and some latency built in, that mean we’re going to have more of those. So that means we have to look at the risk-risk trade-off, based on the situation that we’re in now. Where we have conducted the experiment. Where we pushed all these aerosols into the atmosphere that mostly trap heat and change the system radically.

We did that. That was one form of human intervention. That wasn’t a very smart one. What we have to look at now is we’re not saying that we know that this is a good idea, or that the benefits outweigh the risks. But we’re saying that we have very few alternatives today to act in ways that could help stabilize the system.

Lucas Perry: Yeah. That makes sense. Can you enumerate what the main points are of detractors? If someone is skeptical of this whole approach and thinks, “We just need to stick to removing greenhouse gases by natural intervention, by building forests, and we need to reduce CO2 emissions and greenhouse gas emissions drastically. To do anything else would be adding more danger to the equation.” What are the main points of someone who comes with this problem, with such a perspective?

Kelly Wanser: You touched on two of them already. One, is that the problem is actually not moving that quickly and so we should be focused on things that are root cause, even if they take longer. Then the second one, being the fact that this introduces risks that are really hard to quantify. But I would say the primary objection, that’s raised by people like Al Gore, most of the advocates around climate, that have a problem with this is what they call the moral hazard. The idea that it gets put forward as a panacea and therefore, it slows down efforts to address the underlying problem.

This is sort of saying, even research in this stuff could have a societal negative effect, that it slows us down in doing what we’re really supposed to do. That has some interesting angles on it. One angle, which was talked about in a recent paper by Joseph Aldy at Harvard, and also was talked about with us, by Republicans we talked to about this early on, was that there’s also the thesis that it could have the opposite effect.

That the sort of drastic nature of these things could actually signal, to society and to skeptics, the seriousness of the problem. I did a bipartisan panel. The Republican on the panel, who was a moderate guy, pro-climate guy. He said, “When we, Republicans, hear these kinds of proposals coming from people who are serious about climate change, it makes you more credible than when you come to us and say, ‘The sky is falling,’ but none of these things are on the table.”

I thought that was interesting, early on. I thought it was interesting recently, that there’s at least an equal possibility that these things, as we look into them, could wake everyone up in the same way that more drastic medical treatments do and say, “Look, this is very serious. So on all fronts, we need to get very serious.” But I think, in general, this idea of moral hazard comes up pretty much as soon as the idea is there. And it can come up in the same way that Trump talks about planting trees.

Almost anything can be positioned in a way that could be attempted to use this as this panacea. I actually think that one of the moral hazards of the climate space has been the idea of winners and losers, because I think many more powerful people assume that this problem didn’t apply to them.

Lucas Perry: Like they’re not in a flooding zone. They can move to their bunker.

Kelly Wanser: The people who put forward this idea of winners and losers in climate did that because they were very concerned about the people who are impacted first. The mistake was in letting powerful people think that this wasn’t their problem. In this particular case, I’m optimistic that if we talk about these things candidly, and we say, “Look, these are serious, and they have serious risks. We wouldn’t use them, if we had a better choice.”

It’s not clear to me that that moral hazard idea really holds, but that is the biggest reservation, and it’s a reservation. That means that many people, very passionately, object to research. They don’t want us to look into any of this, because it sets off this societal problem.

Lucas Perry: Yeah. That makes a lot of sense. It seems like moral hazard should be called something more like, information hazard. The word moral seems a little bit confusing here, because it’s like if people have the information that this kind of intervention is possible, then bad things may happen. Moral means it has something to do with ethics, rather than the consequences of information. Yeah, so whatever. No one here has control over how this language was created.

Kelly Wanser: I agree with you. It’s an idea that comes from economics originally, about where the incentives are. But I think your point is well taken, because you’re exactly right. It’s information is dangerous and that’s a fundamental principle. I find myself in meetings with advocates, and around this issue having to say, “Look, our position is that information helps with fair and just consideration of this. That information is good, not bad.”

But I think you hit on an extremely important point, that it’s a masked way of saying that information is too dangerous for people to handle. Our position is information about these things is what empowers people all over the world to think about them for themselves.

Lucas Perry: Yeah. There’s a degree to which moral hazards or information hazards lack trust or confidence in the recipients of that information, which may or may not be valid, depending on the issue and the information. Here, you argue that this information is necessary to be known and shared, and then people can make informed decisions.

Kelly Wanser: That’s our argument. And so for us, we want to keep going forward and saying, “Look, let’s generate information about this, so we can all consider it together.” I guess one thing I should say about that, because I was so shocked by it when I started working in climate. That this idea of moral hazard, it isn’t new to this issue. It actually came up when they started looking at adaptation research in the IPCC and the climate community. Research and adaptation was considered to create a moral hazard, and so it didn’t move forward.

One of the reasons that we, as a society, have relatively low level of information about the things I was talking about, like infrastructure impacts, is because there was a strong objection to it, based on moral hazard. The same was true of carbon removal, which has only recently come into consideration in the IPCC. So this information is a dangerous idea because it will affect our motivation around this one part of the portfolio, that we think is the most important. I would argue that, that’s already slowed us down in really critical ways.

This is just another of those where we need to say, “Okay, we need to rethink this whole concept of moral hazard, because it hasn’t helped us.” So going back say 20 years ago, in the IPCC and the climate community, there’s this question of, how much should we invest in looking at adaptation? There was a strong objection to adaptation research, because it was felt it would disincentivize greenhouse gas reduction.

I think that’s been a pretty tragic mistake. Because if you had started research adaptation 20 years ago, you’d have much more information about what a shit show this is going to be and more incentive to reduce greenhouse gases, not less, because this is not very adaptable. But the effect of that was a real dampening of any investment in adaptation research. Even adaptation research in the US federal system is relatively new.

Lucas Perry: Yeah. The fear there is that McAlpha Corp will come and be like, “It’s okay that we have all these emissions, because we’ll just make clouds later.” Right? I feel like corporations have done extremely effective disinformation campaigns on scientific issues, like smoking and other things. I assume that would have been what some of the fear would have been with regards to adaptation techniques. And here, we’re putting stratospheric and tropospheric intervention as adaptation techniques. Right?

Kelly Wanser: Well, in what I was talking about before, I wasn’t referring to this category. But the more traditional adaptation techniques, like building dams and finding new different types of vegetation and things like that. I recognize that what I’m talking about in these common interventions is fairly unusual, but even traditional adaptation techniques to protect people were suppressed. I appreciate your point. It’s been raised to me before that, “Oh, maybe oil companies will jump on this, as a panacea for what they are doing.”

So we talked to oil companies about it, talked to a couple of them. Their response was, “We wouldn’t go anywhere near this,” because it would be admission that ties their fossil fuels to warming. They’re much more likely to invest in carbon removal techniques and things that are more closely associated with the actual emissions, than they are anything like this. Because they’re not conceding that they created the warming,

Lucas Perry: But if they’re creating the carbon, and now they’re like, “Okay, we’re going to help take out the carbon,” isn’t that admitting that they contributed to the problem?

Kelly Wanser: Yes. But they’re not conceding that they are the absolute and proven cause of all of this warming.

Lucas Perry: Oh. So they inject uncertainty, that people will say like, “There’s weather, and this is all just weather. Earth naturally fluctuates, and we’ll help take CO2 out of the atmosphere, but maybe it wasn’t really us.”

Kelly Wanser: And if you think about them as legal fiduciary entities. Creating a direct tie between themselves and warming is different than not doing that. This is how it was described to me. There’s a fairly substantial difference between them looking at greenhouse gases, which are part of the landscape of what they do, and then the actual warming and cooling of the planet, which they’re not admitting to be directly responsible for.

So if you’re concerned about there being someone doing it, we can’t count on them to bail us out and cool the planet this way, because they’re really, really not.

Lucas Perry: Yeah. Then my last quip I was suffering over, while you were speaking, was if listeners or anyone else are sick and tired of the amount of disinformation that already exists, get ready for the conspiracy theories that are going to happen. Like chemtrail 5.0, when we have to start potentially using these mist generators to create clouds. There could be even like significant social disruption just by governments undertaking that kind of project.

Kelly Wanser: That’s where I think generating information and talking about this in a way that’s well grounded is helpful. That’s why you don’t hear me use the term, geoengineering. It’s not a particularly accurate term. It sort of amplifies triggers. Climate intervention is the more accurate term. It helps kind of ground the conversation in what we’re talking about. The same thing when we explain that these are based on processes that are observed in nature, and some of them are already happening. So this isn’t some big, new Sci-Fi. You know, we’re going to throw bombs at hurricanes or something. Just getting the conversation better grounded.

I’ve had chemtrails people at my talks. I had a guy set up a tripod in the back and record it. He was giving out these little buttons that had an airplane with little trail coming out, and a strike through it. It was fantastic. I had a conversation with him. When you talk about it in this way, it’s kind of hard to argue with. The reality is that there is no secret government program to do these things, and there are definitely no mind-altering chemicals involved in any proposals.

Lucas Perry: Well, that’s what you would be saying, if there were mind-altering chemicals.

Kelly Wanser: Fair point. We tend to try to orient the dialogue at the sort of 90% across the political and thought spectrum.

Lucas Perry: Yeah. It’s not a super serious consideration, but something to be maddened about in the future.

Kelly Wanser: One of the other things I’ll say, with respect to the climate denial side of the spectrum. Because we work in the policy sphere in the United States, and so we have conversations across the political spectrum. In a strange way, coming out at the problem from this angle, where we talk about heat stress and we talk about these interventions, helps create a new insertion point for people who are shut down in the traditional kind of dialogue around climate change.

And so we’ve had some pretty good success actually talking to people on the right side of the spectrum, or people who are approaching the climate problem from a way that’s not super well-grounded in the science. We kind of start by talking about heat stress and what’s happening and the symptoms that we’re seeing and these kinds of approaches to it, and then walking them backwards into when you absolutely positively have to take down greenhouse gases.

It has interestingly, and kind of unexpectedly, created maybe another pathway for dealing with at least parts of those populations and policy people.

Lucas Perry: All right. I’d be interested in pivoting here into the international implications of this, and then also talking about this risk in the context of other global catastrophic and existential risks. The question here now is what are the risks of international conflict around setting the global temperature via CO2 reduction and geo… Sorry. Geoengineering is the bad word. Climate intervention? There are some countries which may benefit from the earth being slightly warmer, hotter. You talked about how there were no winners or losers. But there are winners, if it only changes a little bit. Like if it gets a little bit warmer, then parts of Russia may be happier than they were otherwise.

The international community, as we gain more and more efficacy over the problem of climate change and our ability to mitigate it to whatever degree, will be impacting the weather and agriculture and livability of regions for countries all across the planet. So how do you view this international negotiation problem of mitigating climate change and setting the global temperature to something appropriate?

Kelly Wanser: I don’t tend to use the framing of, setting the global temperature. I mean, we’re really, really far from having like a fine grained management capability for this. We tend to think of it more in the context of preventing certain kinds of disastrous events in the climate system. I think in that framing, where you say, “Well, we can develop this technology,” or where we have knobs and dials for creating favorable conditions in some places and not others, that would be potentially a problem. But it doesn’t necessarily look like that’s how it works.

So it’s possible that some places, like parts of Russia, parts of Canada, might for a period of time, have more favorable climate conditions, but it’s not a static circumstance. The problem that you have is well, the Arctic opens up, Siberia gets warmer and for a couple of decades, that’s nicer. But that’s in the context of these abrupt change risks that we were talking about, where that situation is just a transitory state to some worse states.

And so the question you’re asking me is, “Okay. Well, maybe we hold a system to where Russia is happier in this sort of different state that they had.” I think that the massive challenge, which we don’t know if we can do, is just whether we can keep the system stable enough. The idea that you can stabilize the system in a way that’s different then now, but still prevents these like cascading outcomes. That’s a pretty, I would say, not the highest probability scenario.

But I think there’s certainly validity in your question, which is this just makes everybody super nervous. It is the case that this is not a collective action capability. One of its features is that it does not require everyone in the world to agree, and that is a very unstable concerning state for a lot of people. It is true that its outcomes cannot be fully predicted.

And so there’s a high degree of likelihood that everyone would be better off or that the vast majority of the world would be better off, but there will be outcomes in some places that might be different. It’s more likely, rather than people electively turning the knobs and making things more favorable for themselves, just that 3 to 5% of the world thinks they’re worse off, while we’ve tried to keep the thing more or less stable.

I think behind your question is even the dialogue around this is pretty unnerving and has the potential to promote instability and conflict. One of the things that we’ve seen in the past, that’s been super helpful, is for scientific cooperation. Lots of global cooperation in the evolution of the research and the science, so that everybody’s got information. Then we’re all dealing from an information base where people can be part of the discussion.

Because our strong hypothesis is like we’re kind of looking at the edge of a cliff, where we might not have so much disagreement that we need to do something, but we all need information about this stuff. We have done some work, in SilverLining, at looking at this and how the international community has handled things better or worse, when it comes to environmental threats like this. Our favorite model is the Montreal Protocol, which is both the scientific research and the structure that helped manage what is, many perceive, to be an existential risk around the ozone layer.

That was a smaller, more focused case of, you have a part of the system that if it falls outside a certain parameter, lots and lots of people are going to die. We have some science we have to do to figure out where we can let that go and not let it go. The world has managed that very well over the past couple of decades. And we managed to walk back from the cliff, restore the ozone layer, and we’re still managing it now.

So we kind of see some similarities in this problem space of saying, “We’ve got to be really, really focused about what we can and can’t let the system do, and then get really strong science around what our options are.” The other thing I’ll say about the Montreal Protocol, in case people aren’t aware, is it is the only environmental forum, environmental treaty that is signed by all countries in the world. There are lots of aspects of that, that are a really good model to follow for something like this, I think.

Lucas Perry: Okay. So there’s the problem of runaway climate change, where the destruction of important ecosystems lead to tipping points, and that leads to tipping cascades. And without the reduction of CO2, we get worse and worse climate change, where like everyone is worse off. In that context, there is increased global destability, so there’s going to be more conflict with the migrations of people and the increase of disease.

It’s just going to be a stressor on all of human civilization. But if that doesn’t happen, then there is this later 21st century potential concern of more sophisticated weather manipulation, weather engineering technologies, making the question of constructing and setting the weather in certain ways as a more valid international geopolitical problem. But primarily the concern is obviously regular climate change with the stressors and conflict that are induced by that.

Kelly Wanser: One thing I’ll say, just to clarify a little bit about weather modification and the expansion of that activity. I think that, that’s already happening and likely to happen throughout the century, and the escalation of that and the expansion of that as a problem. Not necessarily people using it as a weaponized idea. But as weather modification activities get larger, they have what are called telegraphic effects. They affect other places.

So I might be trying to cool the Great Barrier Reef, but I might affect weather in Bali. If I’m China and I’m trying to do weather modification to areas the size of Alaska, it’s pretty sure that I’m going to be affecting other places. And if it’s big enough, I could even affect global circulation. So I do think that that aspect, that’s coming onto the radar now. That is an international decision-making problem, as you correctly say. Because that’s actually, in some ways, even almost a bit of a harder problem than the global one. Because we’ve got these sort of national efforts, where I might be engaged in my own jurisdiction, but I might be affecting people outside.

Kelly Wanser: I should also say, just so everybody’s clear, weather modification for the purpose of weapons is banned by international treaty. A treaty called ENMOD. It arose out of US weather modification efforts in the Vietnam war, where we were trying to use weather as a weapon and subsequently agreed not to do that.

Lucas Perry: So, wrapping up here on the geopolitics and political conflict around climate change. Can you describe to what extent there is gridlock around the issue? I mean, different countries have different degrees of incentives. They have different policies and plans and philosophies. One might be more interested in focusing on industrializing to meet its own needs. And so it would deprioritize reducing CO2 emissions. So how do you view the game theory and the incentives and getting international coordination on climate change when, yeah, we’d all be better off if this didn’t happen, but not everyone is ready or willing to pay the same price?

Kelly Wanser: I mean, the main issue that we have now is that we have this externality, this externalized costs that people aren’t paying for the damage that they’re doing. And so a modest charge for that, for greenhouse gas emissions, my understanding is that a relatively modest price for carbon can set the incentives such that innovation moves faster and you reach the thresholds of economic viability for some of these non-carbon approaches faster. I come from Silicon Valley, so I think innovation is a big part of the equation.

Lucas Perry: You mean like solar and wind?

Kelly Wanser: Well there’s solar and wind, which are the traditional techniques. And then there are emerging things which could be hydrogen fuel cells. It could be fusion energy. It could be really important things in the category of waste management, agriculture. You know, it’s not just energy and cars, right? And we’re just not reaching the economic threshold where we’re driving innovation fast enough and we’re reaching profitability fast enough for these systems to be viable.

So with a little turn of the dial in terms of pricing that in, you get all of that to go faster. And I’m a believer in moving that innovation faster means that the price of these low carbon techniques will come down, it will also accelerate offlining the greenhouse gas generating stuff. So I think that it’s not sensible that we’re not building in like a robust mechanism for having that price incentive, and that price incentive will behave differently in the developed countries versus the emerging markets and the developing countries. And it might need to be managed differently in terms of the cost that they face.

But it’s really important in the developing countries that we develop policies that incentivize them not to build out greenhouse gas generating infrastructure, however we do that. Because a lot of them are in inflection points, right? Where they can start building power plants and building out infrastructure.

So we also need to look closely at aligning policies and incentives for them that they just go ahead and go green, and it might be a little bit more expensive, which means that we have to help with that. But that would be a really smart thing for us to do. What we can’t do is expect developing countries who mostly didn’t cause the problem to also eat the impact in terms of not having electricity and some of the benefits that we have of things like running water and basic needs. I don’t actually think this is rocket science. You know, I’m not a total expert, but I think the mechanisms that are needed are not super complicated. The getting the political support for them is what the problem is.

Lucas Perry: A core solution here being increased funding into innovation, into the efficacy and efficiency of renewable energy resources, which don’t pollute greenhouse gases.

Kelly Wanser: The R&D funding is key. In the U.S. we’ve actually been pretty good at that in a lot of parts of that spectrum, but you also have to have the mechanisms on the market side. Right now you have effectively fossil fuels being subsidized in terms of not being charged for the problem they’re creating. So basically we’ve got to embed the cost in the fossil fuel side of the damage that they’re doing, and that makes the market mechanisms work better for these emerging things. And the emerging things are going to start out being more expensive until they scale.

So we have this problem right now where we have some emerging things, they’re expensive. How do we get them to market? Fossil fuels are still cheaper. That’s the problem where it will eventually sort itself out, but we need it to sort itself out quickly. So we’ve got to try to get in there and fix that.

Lucas Perry: So, let’s talk about climate change in the context of existential risks and global catastrophic risks. The way that I use these language is to say that global catastrophic risks are ones which would kill some large fraction of human civilization, but wouldn’t lead to extinction. And existential risks lead to all humans dying or all earth-originating intelligent life dying. The relevant distinction here for me is that the existential risks cancel the entire future. So there could be billions upon billions or trillions of experiential life years in the future if we don’t go extinct. And so that is this value being added into the equation of trying to understand which risks are the ones to pay attention to.

So you can react to this framing if you’d like, I’d be interested in what you think about it. And also just how you see the relative importance of climate change in the context of global catastrophic and existential risks and how you see its interdependence with other issues. So I’m mainly talking about climate change as being in a context of something like other pandemics, other than COVID-19, which may kill large fractions of the population and synthetic biorisk, which a sufficiently dangerous engineered pandemic could possibly be existential or an accidental nuclear war or misaligned artificial superintelligence that could lead to the human species extinction. So how do you think about climate change in the context of all of these very large risks?

Kelly Wanser: Well, I appreciate the question. Many of the risks that you described, how the characteristics that they are hard to quantify, and they’re hard to predict. And some of them are sort of like big black swan events, like even more deadly pandemics or pandemics polarized, artificially engineered things. So climate change I think shares that characteristic that it’s hard to predict. I think that climate change, when you dig into it, you can see that there are analytical deficiencies that make it very likely that we’re underestimating the risk.

In the spectrum between sort of catastrophic and existential we have not done the work to dig into the areas in which we are not currently adequately representing the risk. So I would say that there’s a definite possibility that it’s existential and that that possibility is currently under analyzed and possibly under estimated. I think there are two ways that it’s existential. So I’ll say I’m not an expert in survivability in outlier conditions, but if we just look at two phenomenon that are part of non-zero probability projections for climate, one is this example that I showed you where warming goes beyond five or six degrees C. The jury’s pretty far out on what that means for humans and what it means about all the conditions of the land and the sea and everything else.

So the question is like, how high does temperature go? And what does that mean in terms of the population livability curve? Part of what’s involved in that how high does temperature go is the biological species and their relationship to the physics and chemistry of the planet. This concern that I had from Pete Warden at NASA aims that I had never heard before talking to him is that at some point in the collapse of biological life, particularly in the ocean, you have a change in the chemical interactions that produce the atmosphere that we’re familiar with.

So for example, the biological life at the surface of the ocean, the phytoplankton and other organisms, they generate a lot of the oxygen that we breathe in the air, same with the forests. And so the question is whether you get collapse in the biological systems that generate breathable air. Now, if you watch sci-fi, you could say, “Well, we can engineer that.” And that starts to look more like engineering ourselves to live on Mars, which I’m happy to talk about why I don’t think that’s the solution. But so I think that it’s certainly reasonable for people to say, “Well, could that really happen?” There is some non-zero probability that that could happen that we don’t understand very well and we’ve been reluctant to explore.

And so I think that my challenge back to people about this being an existential risk is that the possibility that it’s an existential risk in the nearer term than you think may be higher than we think. And the gaps in our analysis of that are concerning.

Lucas Perry: Yeah. I mean, the question is like, do you know everything you need to know about all of the complex systems on planet Earth that help maintain the small bandwidth of conditions for which human beings can exist? And the answer is, no I don’t. And then the question is, how likely it is that climate change will perturb those systems in such a way that it would lead to an existential catastrophe? Well, it’s non-zero, but besides that, I don’t know.

Kelly Wanser: And one thing to look at that I think everyone should look at who’s interested in this is the observations of what’s happening in the system now. What’s happening in the system now are collapses of some biological life changes and some of the systems that are indicative that this risk might be higher than we think. And so if you look at things like, I think there was research coming out that estimates that we may have already lost like 40% of the phytoplankton on the surface of the ocean. So much so that the documentary filmmaker who made Chasing Coral was thinking about making a documentary about this.

Lucas Perry: About phytoplankton?

Kelly Wanser: Yeah. And phytoplankton, I think of it as the API layer between the ocean and the atmosphere, it’s the translation layer. It’s really important. And then I go to my friends who are climate modelers, and they’re like, “Yeah, phytoplankton isn’t well-represented in the climate models, there are over 500 species of phytoplankton and we have three of them in the climate models.” And so you look at that and you say, “Okay, well, there’s a risk that we’re don’t understand very well.” So, from my perspective, we have a non-zero risk in this category. I’d be happy if I was overstating it, but it may not be.

Lucas Perry: Okay. So that’s all new information and interesting. In the context of the existential risk community that I’m most familiar with, climate change, the way in which it’s said to potentially lead to existential risks is by destabilizing global human systems that would lead to the actualization of other things that are existential risks. Like if you care about nuclear war or synthetic bio or pandemics or getting AI right, that’s all a lot harder to do and control in the context of a much hotter earth. And so the other question I had for you, speaking of hotter earths, has the earth ever been five C hotter than it is now while mammals have been on it?

Kelly Wanser: So hasn’t been that hot while humans have been on it, but I’m not expert enough to know, as far as the mammal picture, I’m going to guess, probably yes. So when I touch on the first points that you were making too about the societal cascade, but on this question, the problem with the warming isn’t just whether or not the earth has ever been this warm, but it’s the pace of warming. If you look at over the past couple thousand years, how far and how fast we’re pushing the system, that normally when the earth goes through its fluctuations of temperature, and you can see in the past 2,000 years, it’s been small fluctuations, it’s been bigger. But it’s happened over very long periods of time, like hundreds of thousands of years, which means that all of the little organisms and all the big structures are adapting in this very slow way.

And in this situation where we’re pushing it this fast, the natural adaptation was very, very low. You know, you have species of fish and stuff that can move to different places, but it’s happening so fast in Earth system terms that there’s no adaptation happening. But to your other point about climate change setting off existential threats to society in other ways, I think that’s very true. And the climate change is likely to heighten the risk of like nuclear conflict on a couple of different vectors. And it’s also likely to heighten the risk that we throw biological solutions out there whose results we can’t predict. So I think one of the facets of climate change that might be a little bit different than runaway AI is just that it applies stress across every human and every natural system.

Lucas Perry: So this last point here then on climate change contextualized in this field of understanding around global catastrophic and existential risks, FLI views itself as being a part of the effective altruism community, and many of the listeners are effective altruists and 80,000 hours has come up with this simple framework for thinking about what kinds of projects and endeavors you should take on. And so the framework is just thinking about tractability, scope and neglectedness.

So tractability is just how much you can do to actually affect the thing. Scope is how big of a problem is it, how many people does it affect, and neglectedness is how many people are working on it? So you want to work on things that are highly tractable or tractable that have a large scope and that are neglected. So I think that there’s a view or the sense of climate change is that … I mean, from our conversation, it seems very tractable.

If we can get human civilization and coordinate on this, it’s something that we can do a lot about. I guess it’s another question on how tractable it is to actually get countries and corporations to coordinate on this. But the scope is global and would in the very least effect our generation and the next few generations, but it seems to not be neglected relative to other risks. One could say that it’s neglected relative to how much attention it deserves. But so I’m curious to know how you would react to this tractability, scope, and neglectedness framework being applied to climate change and in the context of other global catastrophic and existential risks.

Kelly Wanser: Firstly, I’m a big fan of the framework. I was familiar with it before, and it’s not dissimilar to the approach that we took in founding SilverLining, where I think this issue might fit into that framework depends on whether you put climate change all in one bucket and treat it as not neglected. Or you say in the portfolio of responses to climate change of which we have a significant gap in terms of ability to mitigate heat stress while we work on other parts of the portfolio, that part is entirely neglected.

So I think for us it’s about having to dissect the climate change problem, and we have this collective action problem, which is a hard problem to solve, to move industrial and other systems away from greenhouse gas emissions. And we have the system instability problem, which requires that we somehow alleviate the heat stress before the system breaks down too far.

I would say in that context, if your community looks at climate change as a relatively slowly unfolding problem, which has a lot of attention, then it wouldn’t fit. If you look at climate change as having some meaningful risk of catastrophic to existential unfolding in the next 30 to 50 years and not having response measures to try to stabilize the system, then it fits really nicely. It’s so under serviced that I represent the only NGO in the world that advocates for research in this area. So it depends on how your community thinks about it, but we look at those as quite different problems in a way.

Lucas Perry: So the problem of for example adaptation research, which has historically been stigmatized, we can apply this framework to this and see that you might get a high return on impact if you focus on supporting and doing research in climate intervention technologies and adaptation technologies?

Kelly Wanser: That’s right. What’s interesting to me and the people that I work with on this problem is that these climate intervention technologies have the potential to have very high leverage on the problem in the short term. And so from a philanthropic perspective or an octopus perspective, oftentimes I’m engaged with people who are looking for leverage, where can I really make a difference in terms of supporting research or policy? And I’m in this because literally I came from tech into climate, looking what is the most under-serviced highest leverage part of the space. And I landed here. And so I think that of your criteria that it’s under serviced and potentially high leverage, then this fits pretty well. It’s not the same as addressing the longer term problem of greenhouse gases, but it has very high leverage on the stability risk in the next 50 years or so.

Lucas Perry: So if that’s compelling to some number of listeners, what is your recommendation for action and participation for such persons? If I’m taking a portfolio approach to my impact or altruism, and I want to put some of it into this, how do you recommend I do that?

Kelly Wanser: So it’s interesting timing because we’re just a few weeks of launching something called a safe climate research initiative where we’re funding a portfolio of research programs. So what we do at Silver Lining is try to help drive philanthropic funding for these high leverage nascent research efforts that are going on and then try to help drive government funding and effective policy so that we can get resources moving in the big climate research system. So for people looking for that, when we start talking about the safe climate research initiative, we were agnostic as to whether, if you want to give money to SilverLining for the fund, or you want to donate to these programs directly.

So we interface with most of the mature-ish programs in the United States and quite a few around the world, mature and emerging. And we can direct people based on their interests, whether alumni, whether parts of the world there are opportunities for funding really high caliber things, Latin America, the UK, India.

So we’re happy to say, “You know, you can donate to our fund and we’re just moving through, getting seed funding to these programs as we can, or we can help connect you with programs based on your interests in the different parts of the world that you’re in, technology versus science versus impacts.” So that’s one way. For some philanthropists who are aware of the leverage on government R&D and government policy, Silver Lining’s been very effective in starting to kind of turn the dial on government funding. And we have some pretty big aspirations, not only to get funding directly in assessing these interventions, but also in expanding our capacity to do climate prediction quickly. So that’s another way where you can fund advocacy and we would appreciate it.

Lucas Perry: Accepting donations?

Kelly Wanser: We’re definitely accepting donations, happy to connect people or be a conduit for funding research directly.

Lucas Perry: All right. So let’s end on a fun one here then. So we were talking a little bit before we started about your visit planet earth picture behind you, and that you use that as a message against the colonization of Mars. So why don’t you think Mars is a solution to all of the human problems on earth?

Kelly Wanser: Well, let’s just start by saying, I grew up on Star Trek and so the colonization of Mars and the rest of the universe is appealing to me. But as far as the solutions to climate change or an escape from it, just to level set, because I’ve had serious conversations with people. I lived for 12 years in Silicon Valley, spent a lot of time with the Long Now community. And people have a passion for this vision of living on another planet and the idea that we might be able to move off of this one if it becomes dire. The reality is, and it goes back to education I got from very serious scientists. The problem with living on other planets, it’s not an engineering problem or a physics problem. It’s a biology problem.

That our bodies are fine tuned to the conditions of Earth, radiation, gravity, the air, the colors. And so we degrade pretty quickly when we go off planet. That’s a harder problem to solve than building a spaceship or a bubble. That’s not a problem that gets solved right away. And we can see it from the conditions of the astronauts that come back after a few years in orbit. And so the kinds of problems that we would need to solve to actually have quality of life living conditions on Mars or anywhere else are going to take a while. Longer than what we think are the 30 to 50 year instability problem that we have here on earth.

We are so finely tuned to the conditions of earth, like the Goldilocks sort of zone that we’re in, that it’s a really, really hard thing to replicate anywhere else. And so it’s really not very rational. It’s actually a much easier problem to solve to try to repair earth than it is to try to create the conditions of earth somewhere else.

Lucas Perry: Yeah. So I mean, these things might not be mutually exclusive, right? It really seems to be a problem of resource allocation. Like it’s not one or the other, it’s like, how much are we going to put into each-

Kelly Wanser: It’s less of a problem of resource allocation than time horizon. So I think that the kinds of scientific and technical problems that you have to solve to meaningfully have people live on Mars, that’s beyond a 50 year time horizon. And our concern is that the climate instability problem is inside a 50 year time horizon. So that’s the main issue is that over the long haul, there are advanced technologies and probably bio-engineering things we need to do and maybe engineering of planets that we need to do for that to work. And so over the next 100 or 200 years, that would be really cool, and I’ll be in favor of it also. But this is the spaceship that we have. All of the people are on it, and failure is not an option.

Lucas Perry: All right. That’s an excellent place to end on. And I think both you and I share the science fiction geek gene about getting to Mars, but we’ll have to potentially delay that until we figure out climate change, but hopefully we get to that. So, yeah. Thanks so much for coming on. This has been really interesting. I feel like I learned a lot of new things. There’s a lot here that probably most people who are even fairly familiar with climate science aren’t familiar with. So I just want to offer you a final little space here if you have any final remarks or anything you’d like to say that you feel like is unresolved or unsaid, just any last words for listeners?

Kelly Wanser: Well, for those people who’ve made it through the entire podcast, thanks for listening and being so engaged and interested in the topic. I think that apart from the things we talked about previously, it’s heartening and important that people from other fields are paying attention to the climate problem and becoming engaged, particularly people from the technology sector and certain parts of industry that bring a way of thinking about problems that’s useful. I think there are probably lots of people in your community who may be turning their attention to this, or turning their attention to this more fully in a new way, and may have perspectives and ideas and resources that are useful to bring to it.

The field has been quite academic and more academic than many other fields of endeavor. And so I think what people in Silicon Valley think about in terms of how you might transform a sector quickly, or a problem quickly, presents an opportunity. And so I hope that people are inspired to become involved and become involved in the parts of the space that are maybe more controversial or easier for people like us to think about.

Lucas Perry: All right. And so if people want to follow or find you or check out SilverLining, where are the best places to get more information or see what you guys are up to?

Kelly Wanser: So I’m on LinkedIn and Twitter as @kellywanser and our website is silverlining.ngo, no S at the end. And the majority of the information about what we do is there. And feel free to reach out to me on LinkedIn or on Twitter or contact Lucas who can contact me.

Lucas Perry: Yeah, all right. Wonderful. Thanks so much, Kelly.

Kelly Wanser: All right. Thanks very much, Lucas. I appreciate it. Thanks for taking so much time.

Andrew Critch on AI Research Considerations for Human Existential Safety

 Topics discussed in this episode include:

  • The mainstream computer science view of AI existential risk
  • Distinguishing AI safety from AI existential safety 
  • The need for more precise terminology in the field of AI existential safety and alignment
  • The concept of prepotent AI systems and the problem of delegation 
  • Which alignment problems get solved by commercial incentives and which don’t
  • The threat of diffusion of responsibility on AI existential safety considerations not covered by commercial incentives
  • Prepotent AI risk types that lead to unsurvivability for humanity 

 

Timestamps: 

0:00 Intro
2:53 Why Andrew wrote ARCHES and what it’s about
6:46 The perspective of the mainstream CS community on AI existential risk
13:03 ARCHES in relation to AI existential risk literature
16:05 The distinction between safety and existential safety
24:27 Existential risk is most likely to obtain through externalities
29:03 The relationship between existential safety and safety for current systems
33:17 Research areas that may not be solved by natural commercial incentives
51:40 What’s an AI system and an AI technology?
53:42 Prepotent AI
59:41 Misaligned prepotent AI technology
01:05:13 Human frailty
01:07:37 The importance of delegation
01:14:11 Single-single, single-multi, multi-single, and multi-multi
01:15:26 Control, instruction, and comprehension
01:20:40 The multiplicity thesis
01:22:16 Risk types from prepotent AI that lead to human unsurvivability
01:34:06 Flow-through effects
01:41:00 Multi-stakeholder objectives
01:49:08 Final words from Andrew

 

Citations:

AI Research Considerations for Human Existential Safety

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the AI Alignment Podcast. I’m Lucas Perry. Today we have a conversation with Andrew Critch where we explore a recent paper of his titled AI Research Considerations for Human Existential Safety, which he co-authored with David Krueger. In this episode, we discuss how mainstream computer science views AI existential risk, we develop new terminology for this space and discuss the need for more precise concepts in the field of AI existential safety, we get into which alignment problems and areas of AI existential safety Andrew expects to be naturally solved by industry and which won’t, and we explore the risk types of a new concept Andrew introduces, called prepotent AI, that lead to unsurvivability for humanity. 

I learned a lot from Andrew in this episode and found this conversation to be quite perspective shifting. I think Andrew offers an interesting and useful critique of existing discourse and thought, as well as new ideas. I came away from this conversation especially valuing thought around the issue of which alignment and existential safety issues will and will not get solved naturally by industry and commercial incentives. The answer to this helps to identify crucial areas we should be mindful to figure out how to address outside the normal incentive structures of society, and that to me seems crucial for mitigating AI existential risk. 

If you don’t already subscribe or follow this podcast, you can follow us on your preferred podcasting platform, like Apple Podcasts or Spotify, by searching for The Future of Life. 

Andrew Critch is currently a full-time research scientist in the Electrical Engineering and Computer Sciences department at UC Berkeley, at Stuart Russell’s Center for Human Compatible AI. He earned his PhD in mathematics at UC Berkeley studying applications of algebraic geometry to machine learning models. During that time, he cofounded the Center for Applied Rationality and Summer Program on Applied Rationality and Cognition. Andrew has been offered university faculty positions in mathematics and mathematical biosciences, worked as an algorithmic stock trader at Jane Street Capital‘s New York City office, and as a research fellow at the Machine Intelligence Research Institute. His current research interests include logical uncertainty, open source game theory, and avoiding arms race dynamics between nations and companies in AI development.

And with that, let’s get into our conversation with Andrew Critch.

We’re here today to discuss your paper, AI Research Considerations for Human Existential Safety. You can shorten that to ARCHES. You wrote this with David Krueger and it came out at the end of May. I’m curious and interested to know what your motivation is for writing ARCHES and what it’s all about.

Andrew Critch:

Cool. Thanks, Lucas. It’s great to be here. For me, it’s pretty simple. Is that I care about existential safety. I want humans to be safe as a species. I don’t want human extinction to ever happen. And so I decided to write a big, long document about that with David. And of course, why now and why these particular problems, I can go more into that.

You might wonder if existential risk from AI is possible, how have we done so much AI research with so little technical level thought about how that works and how to prevent it? And to me, it seems like the culture of computer science and actually a lot of STEM has been to always talk about the benefits of science. Except in certain disciplines that are well accustomed to talking about risks like medicine, a lot of science just doesn’t talk about what could go wrong or how it could be misused.

It hasn’t been until very recently that computer science has really started making an effort as a culture to talk about how things could go wrong in general. Forget x-risk, just anything going wrong. And I’m just going to read out loud this quote to sort of set the context culturally for where we are with computer science right now and how far culturally we are from being able to really address existential risk holistically.

This is a quote from Hecht at the ACM Future of Computing Academy. It came out in 2018, just two years ago. “The current status quo in the computing community is to frame our research by extolling its anticipated benefits to society. In other words, rose colored glasses are the normal lenses through which we tend to view our work. However, one glance at the news these days reveals that focusing exclusively on the positive impacts of a new computing technology involves considering only one side of a very important story. We believe that this gap represents a serious and embarrassing intellectual lapse. The scale of this lapse is truly tremendous. It is analogous to the medical community, only writing about the benefits of a given treatment, completely ignoring the side effects, no matter how serious they are.

What’s more, the public has definitely caught on to our community-wide blind spot and is understandably suspicious of it. After several months of discussion, and idea for acting on this imperative began to emerge. We can leverage the gate keeping functionality of the peer review process. At a high level, our recommended change to the peer review process in computing is straightforward. Peer reviewers should require that papers and proposals rigorously consider all reasonable, broader impacts, both positive and negative.” That’s Hecht, 2018.

With this energy, this initiative from the ACM and other similar mentalities around the world, we now have NeurIPS Conference submissions required to submit broader impact statements that include negative impacts as well as positive.

Suddenly in 2020, contrasted with 2015, it’s becoming okay and normal to talk about how your research could be misused and what could go wrong with it. And we’re just barely able to admit things like, “This algorithm could result in racial bias in judiciary hearings,” or something like that. Which is a terrible, terrible … The fact that we’ve taken this long to admit that and talk about it is very bad. And that’s something as present and obvious as racism. Whereas, existential risk has never been … Extinction has never been present or else we wouldn’t be having this conversation. And so those conversations are even harder to have when it’s not normal to talk about bad outcomes at all. Let alone obvious, in your face, bad outcomes.

Lucas Perry: Yeah. On this podcast, we’re basically only talking to people who are in the AI alignment community and who take x-risk very seriously, who are worried about existential risk from advanced AI systems.

And so we lack a lot of this perspective … Or we don’t have many conversations with people who take the cultural, and I guess, academic perspective of the mainstream machine learning and computer science community. Which is far larger and has much more inertia and mass than the AI alignment community.

I’m curious if you can just paint a little bit more of a picture here of what the state of computer science thinking or non-thinking is on AI existential risk? You mentioned that recently people are starting to at least encourage and it be required as part of a process to have negative impact statements or write about the risks of a technology one is developing. But that’s still not talking about global catastrophic risk. It’s still not talking about alignment explicitly. It’s not talking about existential risk. It seems like a step in the right direction, but some ways to go. What kind of perspective can you give us on all this?

Andrew Critch: I think of sort of EA adjacent to AI researchers as kind of a community, to the extent that EA is a community. And it’s not exactly the same set of people as AI researchers who think about existential risk or AI researchers who think about alignment. Which is yet another set of people. What overlaps heavily, but it’s not the same set.

And I have noticed a tendency that I’m trying to combat here by raising this awareness, not only to computer scientists, but to EA adjacent AI folks. Which is that if you feel sort of impatient, that computer science and AI are not acknowledging existential risks from tech, things are underway and there’s ways of making things better and making things worse.

One way to make things worse is to get irate with people, for caring about risks that you think aren’t big enough. Okay. If you think inequitable loan distribution is not as bad as human extinction, many people might agree with you, but if you’re irate about that and saying, “Why are we talking about that when we should be talking about extinction?” You’re slowing down the process of computer science, transitioning into a more negative outcome-aware field by refusing to cooperate with other people who are trying to raise awareness about negative outcomes.

I think there’s a push to be more aware of negative outcomes and all the negative outcome people need to sort of work together politely, but swiftly, raising the bar for our discourse about negative outcomes. And I think existential risks should be part of that, but I don’t think it should be adversarially positioned relative to other negative outcomes. I think we just need to raise the bar for all of these at once.

And all of these issues have the same enemy, which is those rose colored glasses that wrote all of our grant applications for the past 50 years. Every time you’re asking for public funds, you say how this is going to benefit society. And you better not mention how it might actually make society worse or else you won’t get your grant. Right?

Well, times are changing. You’re allowed to mention and signal awareness of how your research could make things worse. And that’s starting to be seen as a good trait rather than a reason not to give you funding. And if we all work together to combat that rose colored glass problem, it’s going to make everything easier to talk about, including existential risk.

Lucas Perry: All right. So if one goes to NeurIPS and talks to any random person about existential risk or AI alignment or catastrophic risk from AI, what is the average reaction or assumed knowledge or people who think it’s complete bullshit versus people who are neutral about it to people who are serious about it?

Andrew Critch: Definitely my impression right now, this is very rough impression. There’s a few different kinds of reactions that are all like sort of double digits percentage. I don’t know which percentage they are, but one is like, how are you worried about existential risks when robots can’t tie knots yet? Or they can’t fold laundry. It’s like a very difficult research problem for an academic AI lab to make a robot fold laundry. So it’s like, come on. We’re so far away from that.

Another reaction is, “Yeah, that’s true. You know, I mean things are really taking off. They’re certainly progressing faster than I expected. Things are kind of crazy.” It’s the things that are kind of crazy reaction and there’s just kind of an open-mindedness. Man, anything could happen. We could go extinct in 50 years, we can go extinct. I don’t know what’s going to happen. Things are crazy.

And then there’s another reaction. Unfortunately, this one’s really weird. I’ve gotten this one, which is, “Well, of course humanity is going to go extinct from the advent of AI technology. I mean, of course. Just think about it from evolutionary perspective. There’s no way we would not go extinct given that we’re making things smarter than us. So of course it’s going to happen. There’s nothing we can do about it. That’s just our job as a field is to make things that are smarter than humans that will eventually replace us and there’ll be better than us. And that’s just how stuff is.”

Lucas Perry: Some people think that’s an aligned outcome.

Andrew Critch: I don’t know. That’s a lot of debate to be had about that. But it’s a kind of defeatist attitude of, “It’s nothing you can do.” It’s much, much rarer. It seems like single digits that someone is like, “Yeah, we’re going to do something about it.” That one is the rarest, the acknowledging and orienting towards solving it is still pretty rare. But there’s plenty these days of acknowledgement that it could be real and acknowledgement that it’s confusing and hard. The challenge is somehow way more acknowledged than any particular approach to it.

Lucas Perry: Okay. I guess that’s surprising to hear then that you feel like it’s more taken seriously than not.

Andrew Critch: It depends on what you mean by taken seriously. And again, I’m filtering for a person who’s being polite and talking to me about it, right? People are polite enough to fall into the, “Stuff is crazy. Who knows what could happen,” attitude.

And is that taking it seriously? Well, no, but it’s not adversarial to people who are taking it seriously, which I think is really good. And then there’s the, “Clearly we’re going to be destroyed by machines that replace us. That’s just nature.” Those voices, I’m kind of like, well, that’s kind of good also. It’s good to admit that there’s a real risk here. It’s kind of bad to give up on it, in my opinion. But altogether, if you add up the, “Woah, stuff’s crazy and we’re not really oriented to it,” plus the, “Definitely humanity is going to be destroyed/replaced.” It’s a solid chunk of people. I don’t know. I’m going to say at least 30%. If you also then include the people who want to try and do something about it. Which is just amazing compared to say six years ago where the answer would have been round to zero percent.

Lucas Perry: Then just to sum up here, this paper then is an exercise in trying to lay out a research agenda for existential safety from AI systems, which is unique in your view? I think you mentioned that there are four that have already existed to this day.

Andrew Critch: Yeah. There’s Aligning Superintelligence with Human Interests, by Soares and Fallenstein, that’s MIRI, basically. Then there’s Research Priorities for A Robust And Beneficial Artificial Intelligence, by Stuart Russell, Max Tegmark, and Daniel Dewey. Then there’s Concrete Problems in AI Safety, by Dario Amodei and others. And then Alignment for Advanced Machine Learning Systems, by Jessica Taylor and others. And Scalable Alignment Via Reward Modeling by Jan Leike and also David Krueger is on that one.

Lucas Perry: How do you see your paper as fitting in with all of the literature that already exists on the problem of AI alignment and AI existential risk?

Andrew Critch: Right. So it’s interesting you say that there exists literature on AI existential risk. I would say Superintelligence, by Nick Bostrom, is literature on AI existential risk, but it is not a research agenda.

Lucas Perry: Yeah.

Andrew Critch: I would say Aligning Superintelligence with Human Interests, by Soares and Fallenstein. It’s a research agenda, but it’s not really about existential risk. It sort of mentions that stakes are really high, but it’s not constantly staying in contact with the concept of extinction throughout.

If you take a random excerpt of any page from it and pretend that it’s about the Netflix challenge or building really good personal assistants or domestic robots, you can succeed. That’s not a critique. That’s just a good property of integrating with research trends. But it’s not about the concept of existential risk. Same thing with Concrete Problems in AI Safety.

In fact, it’s a fun exercise to do. Take that paper. Pretend you think existential risk is ridiculous and read Concrete Problems in AI Safety. It reads perfectly as you don’t need to think about that crazy stuff, let’s talk about tipping over vases or whatever. And that’s a sign that it’s an approach to safety that it’s going to be agreeable to people, whether they care about x-risk or not. Whereas, this document is not going to go down easy for someone who’s not willing to think about existential risk and it’s trying to stay constantly in contact with the concept.

Lucas Perry: All right. And so you avoid making the case for AI x-risk as valid and as a priority, just for the sake of the goal of the document succeeding?

Andrew Critch: Yeah. I want readers to spend time inhabiting the hypothetical that existential risk is real and can come from AI and can be addressed through research. They’re already taking a big step by constantly thinking about existential risk for these 100 pages here. I think it’s possible to take that step without being convinced of how likely the existential risk is. And I’m hoping that I’m not alienating anybody if you think it’s 1%, but it’s worth thinking about. That’s good. If you think it’s 30% chance of existential risk from AI, then it’s worth thinking about. That’s good, too. If you think it’s 0.01, but you’re still thinking about it, you’re still reading it. That’s good, too. And I didn’t want to fracture the audience based on how probable people would agree the risks are.

Lucas Perry: All right. So let’s get into the meat of the paper, then. It would be useful, I think, if you could help clarify the distinction between safety and existential safety.

Andrew Critch: Yeah. So here’s a problem we have. And when I say we, I mean people who care about AI existential safety. Around 2015 and 2016, we had this coming out of AI safety as a concept. Thanks to Amodei and the Robust and Beneficial AI Agenda from Stuart Russell, talking about safety became normal. Which was hard to accomplish before 2018. That was a huge accomplishment.

And so what we had happen is people who cared about extinction risk from artificial intelligence would use AI safety as a euphemism for preventing human extinction risk. Now, I’m not sure that was a mistake, because as I said, prior to 2018, it was hard to talk about negative outcomes at all. But it’s at this time in 2020 a real problem that you have people … When they’re thinking existential safety, they’re saying safety, they’re saying AI safety. And that leads to sentences like, “Well, self driving car navigation is not really AI safety.” I’ve heard that uttered many times by different people.

Lucas Perry: And that’s really confusing.

Andrew Critch: Right. And it’s like, “Well, what is AI safety, exactly, if cars driven by AI, not crashing, doesn’t count as AI safety?” I think that as described, the concept of safety usually means minimizing acute risks. Acute meaning in space and time. Like there’s a thing that happens in a place that causes a bad thing. And you’re trying to stop that. And the Concrete Problems in AI Safety agenda really nailed that concept.

And we need to get past the concept of AI safety in general if what we want to talk about is societal scale risk, including existential risk. Which it’s acute on a geological time scale. Like you can look at a century before and after and see the earth is very different. But a lot of ways you can destroy the earth don’t happen like a car accident. They play out over a course of years. And things to prevent that sort of thing are often called ethics. Ethics are principles for getting a lot of agents to work together and not mess things up for each other.

And I think there’s a lot of work today that falls under the heading of AI ethics that are really necessary to make sure that AI technology aggregated across the earth, across many industries and systems and services, will not result collectively in somehow destroying humanity, our environment, our minds, et cetera.

To me, existential safety is a problem for humanity on an existential timescale that has elements that resemble safety in terms of being acute on a geological timescale. But also resemble ethics in terms of having a lot of agents, a lot of different stakeholders and objectives mulling around and potentially interfering with each other and interacting in complicated ways.

Lucas Perry: Yeah. Just to summarize this, people were walking around saying like, “I work on AI safety.” But really, that means that I’ve bought into AI existential risk and I work on AI existential risk. And then that’s confusing for everyone else, because working on the personal scale risk of self-driving car safety is also AI safety.

We need a new word, because AI safety really means acute risks, which can range from personal all the way to civilizational or transgenerational. And so, it’s confusing to say I work in AI safety, but really what I mean is only I care about transgenerational, AI existential risk.

Andrew Critch: Yes.

Lucas Perry: Then we have this concept of existential safety, which for you both has this portion of us not going extinct, but also existential safety includes the normative and ethics and values and game theory and how it is that an ecosystem of human and nonhuman agents work together to build a thriving civilization that is existentially preferable to other civilizations.

Andrew Critch: I agree 100% with everything you just said, except for the part where you say “existentially preferable.” I prefer to use existential safety to refer really, to preserving existence. And I prefer existential risk to refer to extinction. That’s not how Bostrom uses the term. And he introduced the term, largely, and he intends to include risks that are as important as extinction, but aren’t extinction risks.

And I think that’s interesting. I think that’s a good category of risks to think about and deserving of a name. I think, however, that there’s a lot more debate about what is or isn’t as bad as extinction. Whereas, there’s much less debate about what extinction is. There still is debate. You can say, “Well, what about if we become uploads, whatever.” But there’s much, much more uncertainty about what’s worse or better than extinction.

And so I prefer to focus existential safety on literally preventing extinction and then use some other concept, like societal scale risk, for referring to risks that are really big on a societal scale that may or may not pass the threshold of being worse or better than extinction.

I also care about societal scale risks and I don’t want people working on preventing societal scale risks to be fractured based on whether they think any particular risk, like lots of sentient suffering AI systems or a totalitarian regime that lasts forever. I don’t want people working to prevent those outcomes to be fractured based on whether or not they think those outcomes are worse than extinction or count as a quote, unquote existential risk. When I say existential risk, I always mean risks to the existence of the human species, for simplicity sake.

Lucas Perry: Yeah. Because Bostrom’s definition of an existential risk is any risk such that if it should occur, would permanently and drastically curtail the potential for earth originating, intelligent life. Which would include futures of deep suffering or futures of being locked into some less than ideal system.

Andrew Critch: Yeah. Potential not only measured in existence, but potential measured in value. And if you’re suffering, the value of your existence is lower.

Lucas Perry: Yeah. And that there are some futures where we still exist, where they’re less preferable to extinction.

Andrew Critch: Right.

Lucas Perry: You want to say, okay, there are these potential suffering risks and there are bad futures of disvalue that are maybe worse than extinction. We’re going to call all these societal risks. And then we’re just going to have existential risk or existential safety refer to us not going extinct?

Andrew Critch: I think that’s especially necessary in computer science. Because if anything seems vague or unrefined, there’s a lot of allergy to it. I try to pick the most clearly definable thing, like are humans there or not? That’s a little bit easier for people to wrap their heads around.

Lucas Perry: Yeah. I can imagine how in the hard sciences people would be very allergic to anything that was not sufficiently precise. One final distinction here to make is that one could say, instead of saying, “I work on AI safety,” “I work on AI existential safety or AI civilizational or societal risk.” But another word here is, “I work on AI alignment.” And you distinguish that from AI delegation. Could you unpack that a little bit more and why that’s important to you?

Andrew Critch: Yeah. Thanks for asking about that. I do think that there’s a bit of an issue with the “AI alignment” concept that makes it inadequate for existential risk reduction. AI existential safety is my goal. And I think AI alignment, the way people usually think about it, is not really going to cut it for that purpose.

If we’re successful as a society in developing and rolling out lots of new AI technologies to do lots of cool stuff, it’s going to be a lot of stakeholders in that game. It’s going to be what you might call massively multipolar. And in that economy or society, a lot of things can go wrong through the aggregate behavior of individually aligned systems. Like just take pollution, right? No one person wants everybody else to pollute the atmosphere, but they’re willing to do it themselves. Because when Alice pollutes the atmosphere, Alice gets to work on time or Alice gets to take a flight or whatever.

And she harms everybody in doing that, including herself. But the harm to herself is so small. It’s just a drop in the bucket that’s spread across everybody else. You do yourself a benefit and you do a harm that outweighs that benefit, but it’s spread across everybody and accrues very little harm specifically to you. That’s the problem with externalities.

I think existential risk is most likely to obtain through externalities, between interacting systems that somehow were not designed to interact well enough because they had different designers or they had different stakeholders behind them. And those competitive effects, like if you don’t take a car, everyone else is going to take a car you’re going to fall behind. So you take a car. If you’re a country, right? If you don’t burn fossil fuels, well, you spend a few years transitioning to clean energy and you fall behind economically. You’re taking a hit and that hurts you more than anybody. Of course, it benefits the whole world if you cut your carbon emissions, but it’s just a big prisoner’s dilemma. So you don’t do it. No one does it.

There’s many, many other variables that describe the earth. This comes to the human fragility thesis, which I and David outlined in the paper. Which is that there’s many variables, which if changed, can destroy humanity. And any of those variables could be changed in ways that don’t destroy machines. And so we are at risk of machine economies operating in ways that keep on operating at the expense of humans that aren’t needed for them being destroyed. That is the sort of backdrop for why I think delegation is a more important concept than alignment.

Delegation is a relationship between groups of people. You’ll often have a board of directors that delegates through a CEO to an entire staff. And I want to evoke that concept, the relationship between a group of overseers and a group of doers. You can have delegates on a UN committee from many different countries. You’ve got groups delegating to individuals to serve as part of a group who are going to delegate to a staff. There’s this constant flow through of responsibility. And it’s not even acyclic. You’ve got elected officials who are delegated by the electorate who delegate staff to provide services to the electorate, but also to control the electorate.

So there’s these loops going around. And I think I want to draw attention to all of the delegation relationships that are going to exist in the future economy. And that already exist in the present economy of AI technologies. When you pay attention to all of those different pathways of delegation, you realize there’s a lot of people in institutions with different values that aren’t going to agree with each other on what counts as aligned.

For example, for some people, it’s aligned to take a 1% chance of dying to double your own lifespan. Some people are like, “Yeah, that’s totally worth it.” And for some people, they’re like, “No 1% dying. That’s scary and I’m pretty happy living 80 years.” And so what sort of societal scale risks are worth taking are going to be subject to a lot of disagreement.

And the idea that there’s this thing called human values, that we’re all in agreement about. And there’s this other thing called AI that just has to do with the human value says. And we have to align the AI with human values. It’s an extremely simplified story. It’s got two agents and it’s just like one big agent called the humans. And then there’s this one big agent called AIs. And we’re just trying to align them. I think that is not the actual structure of the delegation relationship that humans and AI systems are going to have with respect to each other in the future. And I think alignment is helpful for addressing some delegation relationships, but probably not the vast majority.

Lucas Perry: I see where you’re coming from. And I think in this conception alignment, as you said, I believe is a sub category of delegation.

Andrew Critch: Well, I would say that alignment is a sub problem of most delegation problems, but there’s not one delegation problem. And I would also say alignment is a tool or technique for solving delegation problems.

Lucas Perry: Okay. Those problems all exist, but actually doing AI alignment, that automatically brings in delegation problems. And, or if you actually align a system, then this system is aligned with how we would want to solve delegation problems.

Andrew Critch: Yeah. That’s right. One approach to solving AI delegation, you might think, “Yeah, we’re going to solve that problem by first inventing a superintelligent machine.” Like step one, invent your super intelligent oracle machine step two align your super intelligent oracle machine with you, the creator. Step three, ask it to solve for society. Just figure out how society should be structured. Do that. That’s a mathematically valid approach. I just don’t think that’s how it’s going to turn out. The closer powerful institutions get to having super powerful AI systems, political tensions are going to arise.

Lucas Perry: So we have to do the delegation problem as we’re going?

Andrew Critch: Yes, we have to do it as we’re going, 100%.

Lucas Perry: Okay.

Andrew Critch: And if we don’t, we put institutions at odds with each other to win the race of being the one chosen entity that aligns the one chosen superintelligence with their values or plan for the future or whatever. And I just think that’s a very non-robust approach to the future.

Lucas Perry: All right. Let’s pivot here then back into existential safety and normal AI safety. What do you see as the relationship between existential safety and safety for present day AI systems? Does safety for present day AI systems feed into existential safety? Can it inform existential safety? How much does one matter for the other?

Andrew Critch: The way I think of it, it’s a bit of a three node diagram. There’s present day AI safety problems, which I believe feed into existential safety problems somewhat. Meaning that some of the present day solutions will generalize to the existential safety problems.

There’s also present day AI ethics problems, which I think also feed into understanding how a bunch of agents can delegate to each other and treat each other well in ways that are not going to add up to destructive outcomes. That also feeds into existential safety.

And just to give concrete examples, let’s take car doesn’t crash, right? What does that have in common with existential safety? Well, existential safety is humanity doesn’t crash. There’s a state space. Some of the states involve humanity exists. Some of the states involve humanity doesn’t exist. And we want to stay in the region of state space where humans exist.

Mathematically, it’s got something in common with the staying in the region of state space where the car is on the road and not overheating, et cetera, et cetera. It’s a dynamical system. And it’s got some quantities that you want to conserve and there’s conditions or boundaries you want to avoid. It has this property just like culturally, it has the property of acknowledging a negative outcome and trying to avoid it. That’s, to me, the main thing that safety and existential safety have in common, avoiding a negative outcome. So is ethics about avoiding negative outcomes. And I think those both are going to flow into existential safety.

Lucas Perry: Are there some more examples you can make for current day AI safety problems and current day AI ethics problems, just make it a bit more concrete? How does something like robustness to distributional shift take us from aligned systems today to systems that have existential safety in the future?

Andrew Critch: So, conceptually, robustness to distributional shift is about, you’ve got some function that you want to be performed or some condition you want to be met, and then the environment changes or the inputs change significantly from when you created the system, and then you still want it to maintain those conditions or achieve the goal.

So, for example, if you have a car trained, “To drive in dry conditions,” and then it starts raining, can you already have designed your car by principles that would allow it to not catastrophically fail in the rain? Can it notice, “Oh gosh, this is real different from the way I was trained. I’m going to pull over, because I don’t know how to drive in the rain.” Or can it learn, on the fly, how to drive in the rain and then get on with it?

So those are kinds of robustness to distributional shift. The world changes. So, if you want something that’s safe and stays safe forever, it has to account for the world changing. So, principles of robustness to distributional shift are principles by which society, as a whole, needs to adhere. Now, do I think research in this area is differentially useful to existential risk?

No. Frankly, not at all. And the reason is that industry has loads of incentives to produce software that are robust to a changing environment. So, if on the margin I could add an idea to the idea space of robustness to distributional shift, I’m like, “Well, I don’t think there’s any chance that Uber is going to ignore robustness to distributional shift, or that Google is going to ignore, or Amazon.” There’s no way these companies are going to roll out products while not thinking about whether they’re robust.

On the other hand, if I have a person who wants to dwell on the concept of robustness, who cares about existential risk and who wants to think about how robustness even works, like what are the mathematical principles of robustness? We don’t fully know what they are. If we did, we’d have built self driving cars already.

So, if I have a person who wants to think about that concept because it applies to society, and they want a job while they think about it, sure, get a job producing robust software or robust robotics, or get a bunch of publications in that area, but it’s not going to be neglected. It’s more of a mental exercise that can help you orient and think about society through a new lens, once you understand that lens, rather than a thing that somehow DeepMind is going to forget that it’s products need to be robust, come on.

Lucas Perry: So, that’s an interesting point. So, what are technical research areas, or areas in terms of AI ethics that you think there will not be natural incentives for solving, but that are high impact and important for AI existential safety?

Andrew Critch: To be clear, before I go into saying these areas are important, these areas aren’t, I do want to distinguish the claim area X is a productive place to be if you care about existential risk from, area X is an area that needs more ideas to solve existential safety. I don’t want the people to feel discouraged from going into intellectual disciplines that are really nourishing to the way that you’re going to learn and invent new concepts that help you think forever. And it can be a lot easier to do that in an area that’s not neglected.

So, robustness is not going to be neglected. Alignment, taking an AI system and making it do what a person wants, that’s not going to be neglected, because it’s so profitable. The economy is set up to sell to individual customers, to individual companies. Most of the world economy is anarchic in that way, anarcho-capitalist at a global scale. If you can find someone that you can give something to that they like, then you will.

The Netflix challenge is an AI alignment problem, right? The concept of AI alignment was invented in 2002, and nobody cites it because it’s so obvious of an idea that you have to make your AI do stuff. Still, it was neglected in academia because AI wasn’t super profitable. So, it is true that AI alignment was not a hot area of research in academia, but now, of course, you need AI to learn human preferences. Of course, you need AI to win in the tech sphere. And that second part is new.

So, because AI is taking off industrially, you’ve got a lot more demand for research solutions to, “Okay. How do we actually make this useful to people? How do we get this to do what people want?” And that’s why AI alignment is taking off. It’s not because of existential risk, it’s because well, AI is finally super-duper useful and it’s finally super-duper profitable, if you can just get it to do what the customer wants. So, that’s alignment. That’s what user agent value alignment is called.

Now, is that a productive place to be if you care about existential risks? I think. Yes. Because if you’re confused about what values are and how you could possibly get an inhuman system to align with the values of a human system, like human society, if that basic concept is tantalizing to you and you feel like if you just understood it a bit more, you’d be better mentally equipped to visualize existential risk playing out or not playing it on a societal scale, then yeah, totally go into that problem, think about it. And you can get a job as a researcher or an engineer aligning AI systems with the values of the human beings who use it. And it’s super enriching and hard, but it’s not going to be neglected because of how profitable it is.

Lucas Perry: So what is neglected, or what is going to be neglected?

Andrew Critch: What’s going to be neglected is stuff that’s both hard and not profitable. Transparency, I think, is not yet profitable, but it will be. So it’s neglected now. And when I say it’s not yet profitable, I mean that as far as I know, we don’t have big tech companies crushing their competition by having better visualization techniques for their ML systems. You don’t see advertisements for, “Hey, we’re hiring transparency engineers,” yet.

And so, I take that as a sign that we’ve not yet reached the industrial regime in which the ability for engineers to understand their systems better is the real bottleneck to rolling out the next product. But, I think it will be if we don’t destroy ourselves first. I think there’s a very good chance of that actually playing out.

So I think, if you want an exciting career, get into transparency now. In 10 years, you’ll be in high demand and you’ll have understood a problem that’s going to help humans and machines relate, which is, “Can we understand them well enough to manage them?” There’s other problems, unfortunately, that I think are neglected now and important, and are going to stay neglected. And I think those are the ones that are most likely to kill us.

Lucas Perry: All right, let’s hear them.

Andrew Critch: Things like how do we get multiple AI systems from multiple stakeholders to cooperate with each other? How do you broker a peace treaty between Uber and Waymo cars? That one’s not as hard because you can have the country that allows the cars into it have some regulatory decision that all the cars have to abide by, and now the cars have to get along or whatever.

Or maybe you can get the partnership on AI, which is largely American to agree amongst themselves that there’s some principles, and then the cars adhere to those principles. But it’s much harder on an international scale where there’s no one centralized regulatory body that’s just going to make all the AIs behave this way or that way. And moreover, the people who are currently thinking about that, aren’t particularly oriented towards existential risk, which really sucks.

So, I think what we need, if we get through the next 200 years with AI, frankly, if we get through the next 60 years with AI, it’s going to be because people who cared about existential risk entered institutions with the power to govern the global deployment of AI, or people already with the power to govern the global deployment of AI technologies come to care about existential and comparable societal scale risks. Because without that, I think we’re going to miss the mark.

When something goes wrong and there’s somebody whose job was clearly to make that not happen, it’s a lot easier to get that fixed. Think about people who’ve tried to get medical care since the COVID pandemic. Everybody’s decentralized, the offices are part work from home, partly they’re actually physically in there. So you’re like, “Hey, I need an appointment with a neurologist.” The person whose job it is to make the appointment is not the person whose job it is to tell the doctor that the appointment is booked.

It’s also, there’s someone else’s job is to contact the insurance company and make sure that you’re authorized. And they might be off that day, and then you show up, and you get a big bill and you’re like, “Well, whose fault was this?” Well, it’s your fault because you’re supposed to check that your insurance covered this neurology stuff, right? You could have called your insurance company to pre-authorize this visit.

So it’s your fault. But also, it’s the administrator’s fault that you didn’t talk to that never meets you, whose job is to conduct the pre-authorization on the part of the doctor’s office, which sometimes does it, right? And it’s also the doctor’s fault, because maybe the doctor could have noticed that the authorization hadn’t been done, and didn’t cancel the appointment or warn you that maybe you don’t want to afford this right now. So whose fault is it? Oh, I don’t know.

And if you’ve ever dealt with a big fat bureaucratic failure like this, that is what is going to kill humanity. Everybody knows it’s bad. Nobody in this system, not the insurance company, not the call center that made my appointment, not the insurance specialist at the doctor’s office, certainly not the doctor, none of these people want me not to get healthcare, but it’s no one in particular’s fault. And that’s how it happens.

I think the same thing is going to happen with existential risk. We’re going to have big companies making real powerful AI systems, and it’s going to be really obvious that it is their job to make those systems safe. And there’s going to be a bunch of kinds of safety that’s really obviously their job that people are going to be real angry at them for not paying a lot of attention to. And the anger is just going to get more and more, the more obvious it is that they have power.

That kind of safety, I don’t want to trivialize it. It’s going to be hard. It’s going to be really difficult research and engineering, and it can be really enriching and many, many thousands of people could make their whole careers around making AI safe for big tech companies, according to their accountable definition of safety.

But then what about the stuff they’re not accountable for? What about geopolitics that’s nobody’s fault? What about coordination failures between three different industries, or three different companies that’s nobody’s fault? That’s the stuff that’s going to get you. I think it’s actually mathematically difficult to specify protocols for decentralized multi-agent systems to adhere to constraints. It is more difficult than specifying constraints for a single system.

Lucas Perry: I’m having a little bit of confusion here, because when you’re arguing that alignment questions will be solved via the incentives of just the commercialization of AI.

Andrew Critch: Single-human, single-AI alignment problems or single-institutions, single-network alignment problems. Yes.

Lucas Perry: Okay. But they also might be making single agents for many people, or multiple agents for many people. So it doesn’t seem single-single to me. But the other part is that you’re saying that in a world where there are many competing actors and a diffusion of responsibility, the existential risk comes from obvious things that companies should be doing, but no one is, because maybe someone should make a regulation about this thing but whatever, so we should just keep doing things the way that we are. But doesn’t that come back to commercialization of AI systems not solving all of the AI alignment problems?

Andrew Critch: So if by AI alignment you mean AI technology in aggregate behaves in a way that is favorable to humanity in aggregate. If that’s what you mean, then I agree that failure to align the entire economy of AI technology is a failure of AI alignment. However, number one, people don’t usually think about it that way.

If you asked someone to write down the AI alignment problem, they’ll write down a human utility function and an AI utility function, and talk about aligning the AI utility function with the human utility function. And that’s not what that looks like. That’s not a clear depiction of that super multi-agent scenario.

And, second of all, the concept of AI alignment has been around for decades and it refers to single-single alignment, typically. And third, if you want to co-op the concept of AI alignment and start using it to refer to general alignment of general AI technology with general human values, just as spread out notion of goodness that’s going to get spread over all of the AI technology and make it all generally good for all the generally humans. If you want to co-opt it and use it for that, you’re going to have a hard time. You’re going to invite a lot of debate about what is human values?

We’re trying to align the AI technology with the human values. So, you go from single-single to single-multi. Okay. Now we have multiple AI systems serving a single human, that’s tricky. We got to get the AI systems to cooperate. Okay. Cool. We’ll figure out how the cooperation works and we’ll get the AI systems to do that. Cool. Now we’ve got a fleet of machines that are all serving effectively.

Okay. Now let’s go to multi-human, multi-AI. You’ve got lots of people, lots of AI systems in this hyper interactive relationship. Did we align the AIs with the humans? Well, I don’t know. Are some of the humans getting really poor, really fast, while some of them are getting really rich, really fast? Sound familiar? Okay. Is that aligned? Well, I don’t know. It’s aligned for some of them. Okay. Now we have a big debate. I think that’s a very important debate and I don’t want to skirt it.

However, I think you can ask the question, did the AI technology lead to human extinction without having that debate? And I want to factor that debate of, wait, who do you mean? Who are you aligning with? I want that debate to be had, and I want it to be had separately from the debate of, did it cause human extinction?

Because I think almost all humans want humanity not to go extinct. Some are fine with it, it’s not universal, but a lot of people don’t want humanity to go extinct. I think the alignment concept, if you play forward 10 years, 20 years, it’s going to invite a lot of very healthy, very important debate that’s not necessary to have for existential safety.

Lucas Perry: Okay. So I’m not trying to defend the concept of AI alignment in relation to the concept of AI existential safety. I think what I was trying to point towards is that you said earlier that you do not want to discourage people from going into areas that are not neglected. And the areas that are not neglected are the areas where the commercialization of AI will drive incentives towards solving alignment problems.

Andrew Critch: That’s right.

Lucas Perry: But the alignment problems that are not going to get solved-

Andrew Critch: I want to encourage people to go out to solve those problems. 100%.

Lucas Perry: Yeah. But just to finish the narrative, the alignment problems that are not going to get solved are the ones where there are multiple humans and multiple AI agents, and there’s this diffusion of responsibility you were talking about. And this is the area you said would most likely lead to AI existential risk. Where maybe someone should make a regulation about this specific thing, or maybe we’re competing a little bit too hard, and then something really bad happens. So you’re saying that you do want to push people into both the unneglected area of…

Andrew Critch: Let me just flesh out a little bit more about my value system here. Pushing people is not nice. If there’s a person and they don’t want to do a thing, I don’t want to push them. That’s the first thing. Second thing is, pulling people is not nice either. So it’s like, if someone’s on the way into doing something they’re going to find intellectually enriching that’s going to help them think about existential safety that’s not neglected, it’s popular, it’s going to be popular, I don’t want to hold them back. But, if someone just comes to me and is like, “Hey, I’m indifferent between transparency and robustness.” I’m like, “100%, go into transparency, no question.”

Lucas Perry: Because it will be more neglected.

Andrew Critch: And if someone tells me they’re indifferent between transparency and multi-stakeholder delegation, I’m like, “100%, multi-stakeholder delegation.” If you’ve got traction on that and you’re not going to burn your career, do it.

Lucas Perry: Yeah. That’s the three categories then though. Robustness gets solved by the incentive structures of commercialization. Transparency, maybe less so, maybe it comes later. And then the multi-multi delegation is just the other big neglected problem of living in a global world. So, you’re saying that much of the alignment problem gets solved by incentive structures of commercialization.

Andrew Critch: Well, a lot of what people call alignment will get solved by present day commercial incentives.

Lucas Perry: Yes.

Andrew Critch: Another chunk of societal scale benefit from AI, I’ll say, will hopefully get solved by the next wave of commercial incentives. I’m thinking things like transparency, fairness, accountability, things like that are actually going to become actually commercially profitable to get right, rather than merely the things companies are afraid of getting wrong.

And I hope that second wave happens before we destroy ourselves, because possibly, we would destroy ourselves even before then. But most of my chips are on, there’s going to be a wave of benefit with AI ethics in the next 10 years or something, and that that’s going to solve a bunch more of existential safety, or it’s going to address them. Leftover after that is stuff that the global capitalism never got to.

Lucas Perry: And the things that global capitalism never got to are the capitalistic organizations and governments competing with one another with very strong AI systems?

Andrew Critch: Yeah. Competing and cooperating.

Lucas Perry: Competing and cooperating, unless you bring in some strong notion of paretotopia where everyone is like, “We know that if we keep doing this, that everyone is going to lose everything they care about.”

Andrew Critch: Well, the question is, how do you bring that in? If you solve that problem, you’ve solved it.

Lucas Perry: Okay. So, to wrap up on this then, as companies increasingly are making systems that serve people and need to be able to learn and adopt their values, the incentives of commercialization will continue to solve what are classically AI alignment problems that may also provide some degree of AI existential safety. And there’s the question of how much of those get solved naturally, and how much we’re going to have to do in academia and nonprofit, and then push that into industry.

So we don’t know what that will be, but we should be mindful about what will be solved naturally, and then what are the problems that won’t be, and then how do we encourage or invite more people to go into areas that are less likely to be solved by natural industrial incentives.

Andrew Critch: And do you mean areas of alignment, or areas of existential safety? I’m serious.

Lucas Perry: I know because I’m guilty of not really using this distinction in the past. Both.

Andrew Critch: Got it. I actually think most of single-single alignment. Like there’s a single stakeholder, which might be a human or an institution that has one goal, like profits, right? So there’s a single-human stakeholder, and then there’s a single-AI. I call that single-single alignment. I almost never refer to a multi-multi alignment, because I don’t know what it means and it’s not clear what the values of multiple different stakeholders even is. What are you referring to when you say the values are this function?

So, I don’t say multi-multi alignment a lot, but I do sometimes say single-single alignment to emphasize that I’m talking about the single stakeholder version. I think the multi-multi alignment concept almost doesn’t make sense. So, when someone asks me a question about alignment, I always have to ask, “Now, are you eliding those concepts again?” Or whatever.

So, we could just say single-single alignment every time and I’ll know what you’re talking about, or we could say classical alignment and I’ll probably assume that you mean single-single alignment, because that’s the oldest version of the concept from 2002. So there’s this concept of basic human rights or basic human needs. And that’s a really interesting concept, because it’s a thing that a lot of people agree on. A lot of people think murder is bad.

Lucas Perry: People need food and shelter.

Andrew Critch: Right. So there’s a bunch of that stuff. And we could say that AI alignment is about that stuff and not the other stuff.

Lucas Perry: Is it not about all of it?

Andrew Critch: I’ve seen satisfactory mathematical definitions of intent alignment. Paul Christiano talks about alignment, which I think of as in intent alignment, I think he now also calls it intent alignment, which is the problem of making sure an AI system is intending to help its user. And I think he’s got a pretty clear conception of what that means. I think the concept of the intent alignment of a single-single AI servant is easier to define than whatever property an AI system needs to have.

There’s a bunch of properties that people call AI alignment that are actually all so different from each other. And people don’t recognize that they’re different from each other, because they don’t get into the technical details of trying to define it, so then everyone thinks that we all mean the same thing. But what really is going on, is everyone’s going around thinking, “I want AI to be good, basically good for basically everybody.” No one’s cashing that out, and so nobody notices how much we disagree on what basically good for basically everybody means.

Lucas Perry: So that’s an excellent point, and I’m guilty here now then of having absolutely no idea of what I mean by AI alignment.

Andrew Critch: That’s my goal, because I also don’t know and I’m glad to have a company in that mental state.

Lucas Perry: Yeah. So, let’s try moving long ahead here. And I’ll accept any responsibility and my guilt in using the word AI alignment incorrectly from now on. That was a fun and interesting side road, and I’m glad we pursued it. But now pivoting back into some important definitions here that you also write about in your paper, what counts to you as an AI system and what counts to you as an AI technology, and why does that distinction matter?

Andrew Critch: So throughout the ARCHES report, I’d advocated for using technology versus system. AI technology is like a mass net, and you can say, you can have more of it or less of it. And it’s like this butter that you can spread on the toast of civilization. And AI system, it’s like a countdown. You can have one of them or many of them, and you can put an AI system like you could put a strawberry on your toast, which is different from strawberry jam.

So, there’s properties of AI technology that could threaten civilization and there’s also properties of a single AI system that could threaten civilization. And I think those are both important frames to think in, because you could make a system and think, “This system is not a threat to civilization,” but very quickly, when you make a system, people can copy it. People can replicate it, modify it, et cetera. And then you’ve got a technology that’s spread out like the strawberry has become strawberry compote and spread out over the toast now. And do you want that? Is that good?

As an everyday person, I feel like basic human rights are a well-defined concept to me. Is this basically good for humanity? Is a well defined concept to me, but mathematically it becomes a lot harder to pin down. So I try to say AI technology when I want to remind people that this is going to be replicated, it’s going to show up everywhere. It’s going to be used in different ways by different actors.

At the same time, you can think of the aggregate use of AI technology worldwide as a system. You can say the internet is a system, or you can say all of the self driving cars in the world is one big system built by multiple stakeholders. So I think that the system concept can be reframed to refer to the aggregate of all the technology of a certain type or of a certain kind. But that mental reframe is an actual act of effort, and you can switch between those frames to get different views of what’s going on. I try to alternate and use both of those views from time to time, the system view and the technology view.

Lucas Perry: All right. So let’s get into another concept here that you develop, and it’s really at the core of your paper. What is a prepotent AI? And I guess before you define what a prepotent AI is, can you define what prepotent means? I had actually never heard of that word before reading your paper.

Andrew Critch: So I’m going to say the actual standard definition of prepotent which connotes, arrogance, overbearing high-handed, despotic, possessing excessive abuse of authority. These connotations are carried across a bunch of different Latin languages, but in English they’re not as strong. In English, prepotent just means very powerful, or superior enforced influence, or authority or predominant.

I used it because it’s not that common of a word, but it’s still a word, and it’s a property that AI technology can have relative to us. And it’s also property that a particular AI system, whether it’s singular or distributed can have relative to us. The definition that I’d give for a prepotent AI technology is technology whose deployment would transform humanity’s habitat, which is currently the earth, in a way that’s unstoppable to us.

So there’s this notion of there’s the transformativeness and then there’s the unstoppableness. Transformativeness is a concept that has been also elaborated by the Open Philanthropy Project. They have this transformative AI concept. I think it’s a very good concept, because it’s impact oriented. It’s not about what the AI’s trying to do, it’s about what impact that has. And they say when AI system or technology is transformative, if its impact on the world is comparable to, say the agricultural revolution or the industrial revolution, a major global change in how things are done. You might argue that the internet is a transformative technology as well.

So, that’s the transformative aspect of prepotence. And then there’s the unstoppable aspect. So, imagine something that’s transforming the world the way the agricultural industrial revolution has transformed it, but also, we can’t stop it. And by we, I mean, no subset of humans, if they decided that they want to stop it, could stop it. If every human in the world decided, “Yeah, we all want this to stop,” we would fail.

I think it’s possible to imagine AI technologies that are unstoppable to all subsets of humanity. I mean, there’s things that are hard to stop right now. If you wanted to stop the use of electricity. Let’s say all humans decided, today, for some strange reason that we never want to use electricity anymore. That’d be a difficult transition. I think we probably could do it, but it’d be very difficult. Humanity as a society can become dependent on certain things, or intertwined with things in a way that makes it very hard to stop them. And that’s a major mechanism by which an AI technology can be prepotent, by being intertwined with us and how we use it.

Lucas Perry: So, can you distinguish this idea of prepotent AI, because it’s a completely new concept from transformative AI, as you mentioned before, and superintelligence, and why it’s important to you that you introduced this new concept?

Andrew Critch: Yeah. Sure. So let’s say you have an AI system that’s like a door-to-door salesman for solar panels, and it’s just going to cover everyone’s roofs with solar panels for super cheap, and all of the business is going to have solar panels on top, and we’re basically just not going to need fossil fuels anymore. And we’re going to be way more decentralized and independent, and states are going to be less dependent on each other for energy. So, that’s going to change geopolitics. A lot of stuff’s going to change, right?

So, you might say that that was transformative. So, you can have a technology that’s really transformative, but also maybe you can stop it. If everybody agreed to just not answer the door when the door-to-door solar panel robot salesman comes by, then they would stop. So, that’s transformative, but not prepotent. There’s a lot of different ways that you can envision AI being both transformative and unstoppable, in other words, prepotent.

I have three examples that I’d go to and we’ve written about those in ARCHES. One is technological autonomy. So if you have a little factory that can make more little factories, and it can do its own science and invent its own new materials to make more robots to do more mining, to make more factories, et cetera, you can imagine a process like that that gets out hand someday. Of course, we’re very far away from that today, conceptually, but it might not be very long before we can make robots that make robots that make robots.

Self-sustaining manufacturing like that could build defenses using technology the way humans build defenses against each other. And now suddenly, the humans want to stop it, but it has nukes aimed at us, so we can’t. Another completely different one which is related, is replication speed. Like the way a virus can just replicate throughout your body and destroy you without being very smart.

You could envision, you can imagine. I don’t know of how easy it is to build this, because maybe it’s a question of nanotechnology, but can you build systems that just very quickly replicate, and just tile the earth so fast with their replicants that we die? Maybe we suffocate from breathing them, or breathing their exhaust. That one honestly seems less plausible to me than to technological autonomy one, but to some people it seems more plausible and I don’t have a strong position on that.

And then there’s social acumen. You can imagine say a centralized AI system that is very socially competent, and it can deliver convincing speeches to the point of becoming elected a state official, and then brokering deals between nations that make it very hard for anybody to go against their plans, because they’re so embedded and well negotiated with everybody. And when you try to coordinate, they just whisper things, or say threats or make offers that dis-coordinate everybody again. Even though everybody wants it to stop, nobody can manage to coordinate long enough to stop it because it’s so socially skilled. So those are like a few science fiction scenarios that I would say constitute prepotence on the part of the AI technology or system. They’re all different and the interesting thing about them is that they all can happen without being generally superintelligent. These are conditions that are sufficient to pose a significant existential threat to humanity, but which aren’t superintelligence. And I want to focus on those because I don’t want us to delay addressing existential risk until we have superintelligence. I want us to address it but the minimum viable existential threats that we could face and head those off. So that’s why I focus on prepotence as a property rather than superintelligence because it’s a broader category that I think is still quite threatening and quite plausible.

Lucas Perry: Another interesting and important concept is born of this is misaligned prepotent AI technology. Can you expand a bit on that? So what is and should count as misaligned prepotent AI technology?

Andrew Critch: So this was a tough decision for me because as you’ve noticed throughout this podcast, at the technical level, I find the alignment concept confusing at multi-stakeholder scales, but still critical to think about. And so I couldn’t decide whether to just talk about unsurvivable prepotent AI or misaligned prepotent AI. So let me talk about unsurvivable prepotent AI. By that, I mean it’s transformed the earth, you can’t stop it and moreover, you’re going to die of it eventually. The AI technology has become unsurvivable in the year 2085 if in that year, the humans now are doomed and cannot possibly survive. And I thought about naming the central concept, unsurvivable prepotent AI but a lot of people want to say for them, misalignment is basically unsurvivability.

I think David also tends to think of alignment in a similar way, but there’s this question of where do you draw the line between poorly aligned and misaligned? We just made a decision to say, extinction is the line, but that’s kind of a value judgment. And one of the things I don’t like about the paper is that it has that implicit value judgment. And I think the way I would prefer people to think is in terms of the concept of unsurvivability versus survivability, or prepotence versus not. But the theme of alignment and misalignment is so pervasive that some of our demo readers preferred that name for the unsurvivable prepotent AIs.

Lucas Perry: So misaligned prepotent AI then is just some AI technology that would lead to human extinction?

Andrew Critch: As defined in the report, yep. That’s where we draw the line between aligned and misaligned. If it’s prepotent, it’s having this huge impact. When’s the huge impact definitively misaligned? Well, it’s kind of like where’s the zero line and we just kind of picked extinction to be the line to call misaligned. I think it’s a pretty reasonable line. It’s pretty concrete. And I think a lot of efforts to prevent extinction would also generalize to preventing other big risks. So sometimes, it’s nice to pick a concrete thing and just focus on it.

Lucas Perry: Yeah. I understand why and I think I would probably endorse doing it this way, but it also seems a little bit strange to me that there are futures worse than extinction and they’re going to be below the line. And I guess that’s fine then.

Andrew Critch: That’s why I think unsurvivable is a better word. But our demo readers, some of them just really preferred misaligned prepotent AI over unsurvivable prepotent AI. So we went with that just to make sense to your readers.

Lucas Perry: Okay. So as we’re building AI technologies, we can ask what counts as the deployment of a prepotent AI system or technology, a TAI system, or a misaligned prepotent AI system and the implications of such deployment? I’m curious to get your view on what counts as the deployment of a prepotent AI system or a misaligned prepotent AI system.

Andrew Critch: So you could imagine something that’s transforming the earth and we can’t stop it, but it’s also great.

Lucas Perry: Yeah. An aligned prepotent AI system.

Andrew Critch: Yeah. Maybe it’s just building a lot of infrastructure around the world to take care of people’s health and education. Some people would find that scary and not like the fact that we can’t stop it, and maybe that fear alone would make it harmful or maybe it would violate some principle of theirs that would matter even if they didn’t feel the fear. But you can at least imagine under some value systems, technology that’s kind of taken over the world but it’s taken good care everybody. And maybe it’s going to take care of everybody forever so humanity will never go extinct. That’s prepotent but not unsurvivable, but that’s a dangerous move to make on a planet to sort of make a prepotent thing and try to make sure that it’s an aligned prepotent thing instead of a misaligned prepotent thing, because you’re unstoppably transforming the earth and you maybe you should think a lot before you do that.

Lucas Perry: And maybe prepotence is actually incompatible with alignment if we think about it enough for the reasons that you mentioned.

Andrew Critch: It’s possible. Yeah. With enough reflection on the value of human autonomy, we would eventually conclude that if humans can’t stop it, it’s fundamentally wrong in a way that will alienate and destroy humans eventually in some way. That said, I do want to add something which is that I think almost all prepotent AI that we could conceivably make will be unsurvivably misaligned. If you’re transforming the world, most states of the world are not survivable to humans. Just like most planets are not survivable to humans. So most ways that the world could be very different are just ways in which humans could not survive. So I think if you have a prepotent AI system, you have to sort of steer it through this narrow window of futures, this narrow like keyhole even of futures where all the variables of the earth stay inhabitable to humans, or we would build some space colony where humans live instead of Earth.

Almost every chemical element, if you just turn up that chemical element on the earth, humans die. So that’s the thing that makes me think most conceivable prepotent AI systems are misaligned or unsurvivable. There are people who think about alignment a lot that I think are super biased by the single principal, single agent framing and have sort of lost track of the complexities of society and that’s why they think prepotent AI is conceivable to align or like not that hard to align or something. And I think they’re confused, but maybe I’m the confused one and maybe it’s actually easy.

Lucas Perry: Okay. So you’ve mentioned a little bit here about if you dial in the knobs of the chemical composition of really anything much on the planet in any direction, that pretty quickly you can create pretty hostile or even existentially incompatible situations on Earth for human beings. So this brings us to the concept of basically how frail humanity is given the conditions that are required for us to exist. What is the importance of understanding human frailty in relation to prepotent AI systems?

Andrew Critch: I think it’s pretty simple. I think human frailty implies don’t make prepotent AI. If we lose control of the knobs, we’re at risk of the knobs getting set wrong. Now that’s not to say we can set the knobs perfectly either, but if they start to go wrong, we can gradually set them right again. There’s still hope that we’ll stop climate change, right? And not saying we will, but it’s at least still possible. We haven’t made it impossible to stop. If every human in the world agreed now to just stop, we would succeed. So we should not lose control of this system because almost any direction it could head is a disaster. So that’s why some people talk about the AI control problem, which is different I claim than the AI alignment problem. Even for a single powerful system, you can imagine it looking after you, but not letting you control it.

And if you aim for that and miss, I think it’s a lot more fraught. And I guess the point is that I want to draw attention to human fragility because I know people who think, “No, no, no. The best thing to do for humanity is to build a super powerful machine that just controls the Earth and protects the humans.” I know lots of people who think that. It makes sense logically. It’s like, “Hey, the humans. We might destroy ourselves. Look at this destructive stuff we’re doing. Let’s build something better than us to take care of us.” So I think the reasoning makes sense, but I think it’s a very dangerous thing to aim for because if we aim and miss, we definitely, definitely die.

I think transformative AI is big enough risk. We should never make prepotent AI. We should not make unstoppable, transformative AI. And that’s why there’s so much talk about the off switch game or the control problem or whatever. Corrigibility is kind of related to turning things off. Humans have this nice property where if half of them are destroyed and the other half of them have the ability to notice that and do something about it, they’re quite likely to do something about it. So you get this robustness at a societal scale by just having lots of off switches.

Lucas Perry: So we’ve talked about this concept a bunch already, this concept of delegation. I’m curious if you can explain the importance and relevance of considering delegation of tasks from a human or humans to an AI system or systems. So we’re just going to unpack this taxonomy that you’ve created a bit here of single-single, single-multi, multi-single, and multi-multi.

Andrew Critch: The reason I think delegation is important is because I think a lot of human society is rightly arranged in a way that avoids absolute power from accumulating into decisions of any one person, even in the most totalitarian regimes. The concept of delegation is a way that humans hand power and responsibility to each other in political systems but also in work situations, like the boss doesn’t have to do all the work. They delegate out and they delegate a certain amount of power to people to allow the employees of a company to do the work. That process of responsibilities and tasks being handed from agent to agent to agent is how a lot of things get done in the world. And there’s many things we’ve already delegated to computers.

I think delegation of specific tasks and responsibilities is going to remain important in the future even as we approach human level AI and supersede human level AI, because people resist the accumulation of power. If you say, “Hey, I am Alpha Corp. I’m going to make a superintelligent machine now and then use it to make the world good.” You might be able to get a few employees that are like kind of wacky enough to think that yeah, taking over the world with your machine is the right company mission or whatever. But the winners of the race of AI development are going to be big teams that won because they managed to work together and pull off something really hard. And such a large institution is going to most likely have dissident members who don’t think taking over the world is the right plan for what to do with your powerful tech.

Moreover, there’s going to be plenty of pressures from outside even if you did manage to fill a company full of people who want to take over the world. They’re going to know that that’s kind of not a cool thing to do according to most people. So you’re not going to be taking over the world with AI. You’re going to be taking on specific responsibilities or handing off responsibilities. And so you’ve got an AI system that’s like, “Hey, we can provide this service. We’ll write your spam messages for you. Okay?” So then that responsibility gets handed off. Perhaps OpenAI would choose not to accept that responsibility. But let’s say you want to analyze and summarize a large corpus of texts to summarize what people want. Let’s say you get 10,000 customer service emails in a day and you want something to read that and give you a summary of what really people want.

That’s a tremendously useful thing to be able to do. And let’s say open AI develops the capability to do that. They’ll sell that as a service and other companies will benefit from it greatly. And now, OpenAI has this responsibility that they didn’t have. They’re now responsible for helping Microsoft fulfill customer service requests. And if Microsoft sucks at fulfilling those customer service requests, now open AI is getting complaints from Microsoft because they summarize the requests wrong. So now you’ve got this really complicated relationship where you’ve got a bunch of Microsoft users sending in lots of emails, asking for help that are being summarized by OpenAI, and then hand it off to Microsoft developers to prioritize what they do next with their software. And no one is solely responsible for everything that’s happening because the customer is responsible for what they ask, Microsoft is responsible for what they provide, and open AI is responsible for helping Microsoft understand what to provide based on what the customer’s ask.

Responsibilities get naturally shared out that way unless somebody comes in with a lot of guns and says, “No, give me all the responsibility and all the par.” So militarization of AI is certainly a way that you could see a massive centralization of power from AI. I think States should avoid militarizing AI to avoid scaring other States into militarizing AI. We don’t want to live in a world with militarized AI technologies. So I think if we succeed in heading off that threat and that’s a big if, then we end up in an economy where the responsibilities are being taken on, services are being provided. And then everything’s suddenly very multi-stakeholder, multiple machines servicing multiple people. And I think of delegation as a sort of operation that you perform over and over that ends up distributing those responsibilities and services. And I think about how do you perform a delegation step correctly? If you can do one delegation step correctly, like when Microsoft makes the decision to hand off its customer service interpretation to OpenAI’s language models, Microsoft needs to make that decision correctly.

And it makes that decision correctly. If we define correctly correctly, it’ll be part of an overall economy of delegations that are respectful of humanity. So in my opinion, once you head off militarization, the task of ensuring existential safety for humanity boils down to the task of recursively defining delegation procedures that are guaranteed to preserve human existence and welfare over time.

Lucas Perry: And so you see this area of delegation as being the most x-risky.

Andrew Critch: So it’s interesting. I think delegation prevents centralization of power, which prevents one kind of x-risk. And I think we will seek to delegate. We will seek desperately to delegate responsibilities and distribute power as it accumulates.

Lucas Perry: Why would we naturally do that?

Andrew Critch: People fear power.

Lucas Perry: Do we?

Andrew Critch: If you see something with a lot more power than you, people tend to fear it and sort of oppose it. And separately, people fear having power. If you’re on a team that’s like, “Yeah, we’re going to take over the world,” you’re probably going to be like, “Really? Isn’t it bad? Isn’t that super villain to do that?” So as I predict this, I don’t want to say, “Count on somebody else to adopt this attitude.” I want people listening to adopt that attitude as well. And I both predict and encourage the prevention of extreme concentrations of power from AI development because society becomes less robust then. It becomes this one point of failure where if this thing messes up, everything is destroyed. Whereas right now, it’s not that easy for a centralized force to destroy the world by messing up. It is easy for decentralized forces to destroy the world right now. And that’s how I think it’ll be in the future as well.

Lucas Perry: And then as you’re mentioning and have mentioned, the diffusion of responsibility is where we risk potentially missing core existential safety issues in AI.

Andrew Critch: Yeah, I think that’s the area that’s not only neglected by present day economic incentives, but will likely remain neglected by economic incentives even 10, 20 years from now. And therefore, will be left as the main source of societal scale and existential risk, yeah.

Lucas Perry: And then in terms of the taxonomy you created, can you briefly define the single and multi and the relationships those can have?

Andrew Critch: When I’m talking about AI delegation, I say single-single to mean single human-single AI system, or single human stakeholder and a single AI system. And I always referred to the number of humans first. So if I say single-multi, that means one human stakeholder, which might be a company or a person, and then multiple AI systems. And if I say multi-single, that’s multi human- single AI. And then multi-multi means multi human-multi AI. I started using this in a AGI safety course I was teaching at Berkeley in 2018 because I just noticed a lot of equivocation between students about which kind of scenarios they were thinking about. I think there’s a lot of multi-multi delegation work that is going to matter to industry because when you have a company selling a service to a user to do a job for an employer, things get multi-stakeholder pretty quickly. So I do think some aspects of multi-multi delegation will get addressed in industry, but I think they will be addressed in ways that are not designed to prevent existential risk. They will be addressed in ways that are designed to accrue profits.

Lucas Perry: And so some concepts that you also introduce here are those of control, instruction, and comprehension as being integral to AI delegation. Are those something you want to explore now?

Andrew Critch: Yeah, sure. I mean, those are pretty simple. Like when you delegate something to someone, Alice delegates to Bob, in order to make that decision, she needs to understand Bob, like what’s he capable of? What isn’t he? That’s human AI comprehension. Do we understand AI well enough to know what we should delegate? Then, there’s human AI instruction. Can Alice explain to Bob what she wants Bob to do? And can Bob understand that? Comprehension is really a conveyance of information from Bob to Alice. And then instruction is a conveyance of information from Alice to Bob. A lot of single-single alignment work is focused on how are we going to convey the information? Whereas transparency / interpretability work is more like the Bob to Alice direction. And then control is well, what if this whole idea of communication is wrong and we messed it up and we now just need to stop it, just take back the delegation. Like I was counting on my Gmail to send you emails, but now sending you a bunch of spam. I’m going to shut down my account and i’ll send you messages a different way.

That’s control. And I think of any delegation relationship as involving at least those three concepts. There might be other ones that are really important that I’ve left out. But I see a lot of research as serving one of those three nodes. And so then, you could talk about single-single comprehension. Does this person understand this system? Or we can talk about multi-single. Do this team of people understand this system? Multi-single control would be, can this team of people collectively stop or take back the delegation from the system that they’ve been using or counting on? And then it goes to multi-multi and starts to raise questions like what does it mean for a group of people to understand something? Do they all understand individually? Or do they also have to be able to have a productive meeting about it? Maybe they need to be able to communicate with each other about it too for us to consider it to be a group level understanding. So those questions come up in the definition of multi-multi comprehension, and I think they’re going to be pretty important in the end.

Lucas Perry: All right. So we’ve talked a bunch here already about single-single delegation and much of technical alignment research explores this single human-single AI agent scenario. And that’s done because it’s conceptually simple and is perhaps the most simple place to start. So when we’re thinking about AI existential safety and AI existential risk, how is starting from single-single misleading and potentially not sufficient for deep insight into alignment?

Andrew Critch: Yeah, I guess I’ve said this multiple times in this podcast, how much I think diffusion of responsibility is going to play a role in leaving problems unsolved. And I think diffusion of responsibility only becomes visible in the multi-stakeholder or multi-system or both scenarios. That’s the simple answer.

Lucas Perry: So the single-single gets solved again by the commercial incentives and then the important place to analyze is the multi-multi.

Andrew Critch: Well, I wouldn’t simplify it as much as to say the important places to analyze is the multi-multi because consider the following. If you build a house out of clay instead of out of wood, it’s going to fall apart more easily. And understanding clay could help you make that global decision. Similarly if your goal is to eventually produce societally safe, multi-multi delegation procedures for AI, you might want to start by studying the clay that that procedure is built out of, which is the single-single delegation steps. And single-single delegation steps require a certain degree of alignment between the delegator and the delegate. So it might be very important to start by figuring out the right building material for that, figuring out the right single-single delegation steps. And I know a lot of people are approaching it that way.

They’re working on single-single delegation, but that’s not because they think Netflix is never going to launch the Netflix challenge to figure out how to align recommender systems with users. It’s because the researchers who care about existential safety want to understand what I would call a single-single delegation, but what they would call the method of single-single alignment as a building block for what will be built next. But I sort of think different. I think that’s a great reasonable position to have. I think differently than that because I think the day that we have super powerful single-single alignment solutions is the day that it leaves the laboratory and rolls out into the economy. Like if you have very powerful AI systems that you can’t single-single align, you can’t ship a product because you can’t get it to do what anybody wants.

So I sort of think single-single alignment solutions sort of shorten the timeline. It’s like deja vu. When everyone was working on AI capabilities, the alignment people are saying, “Hey, we’re going to run out of time to figure out alignment. You’re going to have all of these capabilities and we’re not going to know how to align them. So let’s start thinking ahead about alignment.” I’m saying the same thing about alignment now. I’m saying once you get single-single alignment solutions, now your AI tech is leaving the lab and going into the economy because you can sell it. And now, you’ve run out of time to have solved the multipolar scenario problem. So I think there’s a bit of a rush to figure out the multi-stakeholder stuff before the single-single stuff gets all figured out.

Lucas Perry: Okay. So what you’re arguing for then here is your what you call multi-multi preparedness.

Andrew Critch: Yeah.

Lucas Perry: Would you also like to state what the multiplicity thesis is?

Andrew Critch: Yeah. It’s the thing I just want to remind people of all the time, which is don’t forget, as soon as you make tech, you copy it, replicate it, modify it. The idea that we’re going to have a single-single system and not very shortly thereafter have other instances of it or other competitors to it, is sort of a fanciful unrealistic scenario. And I just like reminding people as we’re preparing for the future, let us prepare for the nearly inevitable eventuality that there will be multiple instances of any powerful technology. Some people take that as an argument that, “No, no, no. Actually, we should make the first instance so powerful that it prevents the creation of any other AI technology by any other actor.” And logically, that’s valid. I think politically and socially, I think it’s crazy.

Lucas Perry: Uh-huh (affirmative).

Andrew Critch: I think it’s a good way to alienate anybody that you want to work with on existential risk reduction to say, “Our plan is to take over the world and then save it.” Whereas if your plan is to say, “What principles can all AI technology adhere to, such that it in aggregate will not destroy the world,” you’re not taking over anything. You’re just figuring it out. Like if there’s 10 labs in the world all working on that, I’m not worried about one of them succeeding. But if there’s 10 labs in the world all working on the safe world takeover plan, I’m like, “Hmm, now I’m nervous that one of them will think that they’ve solved safe world takeover or something.” And I kind of want to convert them all to the other thing of safe delegation, safe integration with society.

Lucas Perry: So can you take us through the risk types that you develop in your paper that lead to unsurvivability for humanity from AI systems?

Andrew Critch: Yeah. So there’s a lot of stuff that people worry about. I noticed that some of the things people worry about sort of directly cause extinction if they happen. And then some of them are kind of like one degree of causal separation away from that. So I call it tier one risks in the paper, that refers to things that would just directly lead to the deployment of a unsurvivable or misaligned prepotent AI technology. And then tier two risks are risks that lead to tier one risk. So for example, if AI companies or countries are racing really hard to develop AI faster than each other, so much that they’re not taking into account safety to the other countries around them or the other companies around them, then you get a disproportionate prioritization of progress over safety. And then you get a higher risk of societal scale disasters, including existential risks but not limited to it.

And so you could say fierce competition between AI developers is a tier two risk that leads to the tier one risk of MPAI or UPAI deployment, MPAI being misaligned prepotent AI. And tier one, I have this taxonomy that we use in the paper that I like for sort of dividing up tier one into a few different types that all I think have different technical approaches because my goal is to sort of orient on technical research problems that could actually help reduce existential risk from AI. So got this subdivision. The first one we have is basically diffusion of responsibility, or sometimes we call it unaccountable creators. In the paper, we settled on calling it uncoordinated MPAI deployment.

So the deal is before talking about whether this or that AI system is doing what its creators want or don’t want, can we even identify who the creators are? If the creators were this kind of diffuse economy or oligarchy of companies or countries, it might not be meaningful to say, “Did the AI system do what it’s creators want it?” Because maybe they all wanted a different thing. So a risk type 1A is risks that arise from kind of nobody in particular being responsible for and therefore, no one in particular being attentive to preventing the existential risk.

Lucas Perry: That’s an uncoordinated MPAI event.

Andrew Critch: Yeah, exactly. I personally think most of the most likely risks come from that category, but they’re hard to define and I don’t know how to solve them yet. I don’t know if anybody does. But if you assume we’re not in that case, it’s not uncoordinated. Now, there’s a recognizable identifiable institution Alpha Corp-made AI or America made the AI or something like that. And now you can start asking, “Okay. If there’s this recognizable creator relationship, did the creator know that they were making a prepotent technology?” And that’s how we define type 1B. We’ve got creators, but the creators didn’t know that the tech they were making was going to be prepotent. Maybe they didn’t realize it was going to be replicated or used as much as it was, or it was going to be smarter than they thought for whatever reason. But it just ended up affecting the world a lot more than they thought or being more unstoppable than they thought.

If you make something that’s unstoppably transforming the world, which is what prepotent means, and you didn’t anticipate that, that’s bad. You’re making big waves and you didn’t even think about the direction the waves were going. So I think a lot of risk comes from making tech and not realizing how big its impact is going to be in advance. And so you could have things that become prepotent that we weren’t anticipating and a lot of risks comes from that. That’s a whole risk category. That’s 1B. We need good science and discipline for identifying prepotence or dependence or unstoppability or transformativity all of these concepts. But suppose that’s solved, now we go to type 1C. There are creators contrary to 1A and the creators knew they’re making prepotent tech contrary to 1B. And I think this is weird because a lot of people don’t want to make prepotent tech because it’s super risky, but you could imagine some groups doing it.

If they’re doing that, do they recognize that the thing they’re making is misaligned? Do they think, “Oh yeah, this is going to take over the world and protect everybody. This is the, “I tried to take over the world and I accidentally destroyed it scenario.” So that’s unrecognized misalignment or unrecognized unsurvivability as a category of risk. And for that, you just need a really good theory of alignment with your values if you don’t want to destroy the world. And that’s I think what gets people focused on single-single alignment. They’re like, “The world’s broken. I want to fix it. I want to make magic AI that will like fix the world. It has to do what I want though. So let’s focus on single-single alignment.” But now he’s supposed that problem is solved contrary to type 1A you have discernible creators contrary to 1B, they know they’re playing with fire contrary to 1C, they know it’s misaligned. They know fire burns. That’s kind of plausible. If you imagine people messing with dangerous tech in order to figure out how to protect against it, you could have a lab with people sort of brewing up dangerous cyber attack systems that could break out and exercise a lot of social acumen. If they were really powerful language users, then you could imagine something getting out. So that’s, we call it type 1D, involuntary MPAI deployment, maybe it breaks out or maybe hackers break in and release it. But either way, the creators weren’t trying to do it, then you have type 1E which is contrary to 1D, the creators wanted to release MPAI deployment.

So that’s just people trying to destroy the world. I think that’s less plausible in the short term, more plausible in the longterm.

Lucas Perry: So all of these fall under the category of tier one in your paper. And so all of these directly lead to an existential catastrophe for humanity. You then have tier two, which are basically hazardous conditions, which lead to the realization of these tier one events. So could you take us through these conditions, which may act as a catalyst for eliciting the creation of tier one events in the world?

Andrew Critch: Yeah, so the nice thing about the tier one events is that we use the, an exhaustive decision tree for categorizing it. So any tier one event, any deployment event for a misaligned prepotent AI will fall under one of categories 1A through 1E, unfortunately we don’t have such a taxonomy for tier two.

So tier two is just the list of, hey, here’s four things that seem pretty worrisome. So 2A is, companies or countries racing with each other, trying to make AI real fast and not being safe about it. 2B is economic displacement of humans. So people talk about unemployment risks from AI. Imagine that taken to an extreme where eventually humans just have no economic leverage at all, because all economic value is being produced by AI systems. AI’s have taken all the jobs, including the CEO positions, including the board of directors positions, all using AI’s as their delegates to go to the board meetings that are happening every five seconds because of how fast the AI’s can have board meetings. Now, the humans are just like, “We’re just hoping that all that economy out there is going to not somehow use up all of the oxygen,” to say in the atmosphere, or “Lower the temperature of the earth by 30 degrees,” because of how much faster it would be to run super computers 30 degrees colder.

I think a lot of people who think about x-risk, think of unemployment as this sort of mundane, every generation, there’s some wave of unemployment from some tech. That’s nothing compared to existential risk, but I sort of want to raise a flag here and be, one of the waves of unemployment could be the one that just takes away all human leverage and authority. We should be on the lookout for runaway unemployment that leads to prepotence because loss of control and then human enfeeblement, that’s 2C, the humans are still around, but getting weaker and dumber and less capable of stuff because we’re not practicing doing things because AI is doing everything for us. Then one day we just all trip and fall and hit our heads and die kind of thing. But more realistically, maybe we just fail to be able to make good decisions about what AI technology is doing. And we failed to like notice we should be pressing the stop buttons everywhere.

Lucas Perry: The fruits of the utopia created by transformative AI are too enticing that we become enfeebled and fail at creating existential safety for advanced AI systems.

Andrew Critch: Or we use the systems in a stupid way because we all got worse at arithmetic and we couldn’t imagine the risks and we became scope insensitive to them or something. There’s a lot of different ways you can imagine humans just being weaker because AI is sort of helping us and then type 2D is discourse impairment about existential safety. This is something we saw a lot of in 2014 before FLI hosted the Puerto Rico conference, to just kick off basically discourse on existential safety for AI and other big risks from AGI. Luckily since then, with efforts from FLI and then the Concrete Problems in AI safety paper was a early example of acknowledging negative outcomes.

And then you have the ACM push to acknowledge negative risks and now the NeurIPS broader impact stuff. There’s lots of negative acknowledgement now. The discourse around negative outcomes has improved, but I think discourse on existential safety has a long way to go. It’s progressed, but it’s still has a long way to go. If we keep not being able to talk about it, for example, if we keep having to call existential safety safety, right? If we keep having to call it that, because we’re afraid to admit to ourselves or each other, that we’re thinking of existential stakes, we’re never really going to properly analyze the concept or visualize the outcomes together. I think there’s a big risk from just people sort of feeling like they’re thinking about existential safety, but not really saying it to each other and not really getting into the details of how society works at a large scale and therefore kind of ignoring it and making a bunch of bad decisions.

And I called that discourse impairment and it can happen because it’s taboo or it can happen because it’s just easier to talk about safety because safety is everywhere.

Lucas Perry: All right, so we’ve made it through to what is essentially the first third of your paper here. It lays much of the conceptual and language foundations, which are used for the rest, which try to more explicitly flesh out the research directions for existential safety on AI systems, correct?

Andrew Critch: Yeah. And I would say the later sections are a survey of research directions attacking different aspects and possibly exacerbating different aspects too. You earlier called this a research agenda. But I don’t think it’s quite right to call it an agenda because first of all, I’m not personally planning to research every topic in here, although I would be happy to research any of them. So this is not like, “Here’s the plan we’re going to do all these areas.” It’s more like, “Here’s a survey of areas and an analysis of how they flow into each other.” For example, single-single transparency research, that can flow in to coordination models for single-multi comprehension. It’s a view rather than a plan, because I think a plan should take into account more things like what’s neglected, what’s industry going to solve on its own?

My plan would be to pick sections out of this report and call those my agenda. My personal plan is to focus more on multi-agent stuff. Some also social metacognition stuff that I’m interested in. So if I wrote a research agenda, it would be about certain areas of this report, but the rest of the report is really just trying to look at all of these areas that I think relate to existential safety and it kind of analyzing how they relate.

Lucas Perry: All right, Andrew, well, I must say that on page 33, it says, “This report may be viewed as a very coarse description of a very long term research agenda, aiming to understand and improve blah, blah, blah.”

Andrew Critch: It’s true. It may be viewed as such and you may have just viewed it as such.

Lucas Perry: Yeah, I think that’s where I got that language from.

Andrew Critch: It’s true. Yeah, and I think if an institution just picked up this report and said, “This is our agenda.” I’d be like, “Cool, go for it. That’s a great plan.”

Lucas Perry: All right. I’m just getting you back for nailing me on the definition of AI alignment.

Andrew Critch: Okay.

Lucas Perry: Let’s hit up on some of the most important or key aspects here then for this final part of the paper. We have three questions here. The first is, “How would you explain the core of your concerns about, and the importance of flow through effects?” What are flow-through effects and why are they important for considering AI existential safety?

Andrew Critch: Flow through effects just means if A affects B and B affects C, then indirectly A affects C. Effects like that can be pretty simple in physics, but they can be pretty complicated in medicine and they might be even more complicated in research. If you do research on single-single transparency, that’s going to flow through to single-multi instruction. How is a person going to instruct a hierarchy of machines? Can they delegate to the machines to delegate to other machines? Okay, now can I understand? Okay, cool. There’s a flow through effect there. Then that’s going to flow through to multi-multi control. How can you have a bunch of people instructing a bunch of machines and still have control over them? If the instructions aren’t being executed to satisfaction, or if they’re going to cause a big risk or something.

And some of those flow through effects can be good, some of them could be bad. For example, you can imagine work in transparency flowing through to really rapid development in single-multi instruction, because you can understand more of what all the little systems are doing. You can tell more of them what to do and get more stuff done. Then that could flow through to disasters in multi-multi control because you’ve got races between powerful institutions that are delegating to large numbers of individual systems that they understand separately. But the interaction of which at a global scale are not understood by any one institution. So then you just get this big cluster of pollution or other problems being caused for humans, as a side effect. Just thinking about a problem, that’s a sub problem of the final solution is not always helpful, societally. Even if it is helpful to you personally, understanding how to approach the helpful societal scale solution. My personal biggest area of interest, I’m kind of split between two things.

One is, if you have a very powerful system and several stakeholders with very different priorities or beliefs, trying to decide a policy for that system. Imagine U.S., China and Russia trying to reach an agreement on some global cyber security protocol, that’s AI mediated or Uber and Waymo, trying to agree on what are the principles that their cars are going to follow when they’re doing lane changes. Are they going to try to intimidate each other to get a better chance at that lane changes? Is that going to put the humans at risk? Yes, okay. Can we all not intimidate each other and therefore not put the passengers at risk? That’s a big question for me, is how can you make systems that have powerful stakeholders in the process of negotiating for control over the system?

It’s like the system is not even deployed yet. We’re considering deploying it and we’re negotiating for the parameters of the system. I want the system to have a nice API, for the negotiating powers, to sort of turn knobs until they’re all satisfied with it. I call that negotiable AI. I’ve got a paper called Negotiable Reinforcement Learning with a student. I think that kind of encapsules the problem, but it’s not a solution to the problem by any means. It’s just merely drawing attention to it. That’s like a one core thing that I think is going to be really important as multi-stakeholder control. Not multi-stakeholder alignment, not making all the stakeholders happy, but making them work together in sharing the system, which might sometimes leave one of them unhappy. But at least they’re not all fighting and causing disasters from the externalities of their competition. The other one is almost the same principle, but where the negotiation is happening between the AI systems instead of the people.

So how do you get two AI systems, like System A and System B serving Alice and Bob, Alice and Bob want very different things. Now A and B have to get along. How can A and B get along, broker an agreement about what to do that’s better than fighting. Both of these areas of research are kind of trying to make peace between the human institutions controlling a powerful system. And the second case is peace between two AI systems. I don’t know how to do this at all. That’s why I try to focus on it. It’s sort of nobody’s job, except for maybe the UN and the UN doesn’t have… The cars getting along thing is kind of like a National Institute of Standards thing maybe, or a partnership, an AI thing maybe so maybe they’ll address that, but it’s still super interesting to me and possibly generalizable to big, higher stakes issues.

So I don’t claim that it’s going to be completely neglected as an area. It’s just very interesting at a technical level it seems neglected. I think there’s lots of policy thinking about these issues, but what shape does the technology itself need to have to make it easy for policymakers to set the standards, for it to be sort of negotiable and cooperative? That’s where my interests lie.

Lucas Perry: All right. And so that’s also matches up with everything else you said, because those are two sub-problems of multi-multi situations.

Andrew Critch: Yes.

Lucas Perry: All right. So next question is, is there anything else you’d like to add then to how it is the thinking about AI research directions affect AI existential risk?

Andrew Critch: I guess I would just add, people need to feel permission to work on things because they need to understand them, rather than because they know that it’s going to help the world. I think there’s a lot of paranoia about like, if you manage to care, but existential risks you’re like thinking about these high stakes and it’s easy to become paranoid. What if I accidentally destroy the world by doing the wrong research or something? I don’t think that’s a healthy state for a researcher, maybe for some it’s healthy, but I think for a lot of people that I’ve met, that’s not conducive to their productivity.

Lucas Perry: Is that something that you encounter a lot, people who have crippling anxiety over whether the research direction is correct?

Andrew Critch: Yeah, and varying degrees of crippling, some that you would actually call anxiety, the person’s experiencing actual anxiety. But more often it’s just a kind of festering unproductivity. It’s thinking of an area, “But that’s just going to advanced capabilities, so I won’t work on it,” or like think of an area it’s like, “Oh, that’s just going to hasten the economic deployment of AI systems, so I’m not going to work on it.” I do that kind of triage, but more so because I want to find neglected areas, rather than because I’m afraid of building the wrong tech or something. I find that mentality doesn’t inhibit my creativity or something. I want people to be aware of flow through effects and that any tech can flow through to have a negative impact that they didn’t expect. And because of that, I want everyone to sort of raise their overall vigilance towards AI technology as a whole. But I don’t want people to feel paralyzed like, “Oh no, what if I invent really good calibration for neural nets? Or what if I invent really good, bounded rationality techniques and then accidentally destroyed the world because people use them.”

I think what we need is for people to sort of go ahead and do their research, but just be aware that X-risk is on the horizon and starting to build institutional structures to make higher and higher stakes decisions about AI deployments, along with being supportive of areas of research that are conducive to those decisions being made. I want to encourage people to go into these neglected areas that I’m saying, but I don’t want people to think I’m saying they’re bad for doing anything else.

Lucas Perry: All right. Well, that’s some good advice then for researchers. Let’s wrap up here then on important questions in relevant multi-stakeholder objectives. We have four here that we can explore. The first is facilitating collaborative governance and the next is avoiding races by sharing control. Then we have reducing idiosyncratic risk taking, and our final one is existential safety systems. Could you take us through each of these and how they are relevant multi-stakeholder objectives?

Andrew Critch: Yeah, sure. So the point of this section of the report, it’s a pause between the sections about research for single human stakeholders and research for multiple human stakeholders. It’s there sort of explain why I think it’s important to think of multiple human stakeholders and important, not just in general. I mean, it’s obviously important for a lot of aspects of society, but I’m trying to focus on why it’s important to existential risk specifically.

So the first reason, facilitating collaborative governance is that I think it’s good if people from different backgrounds with different beliefs and different priorities can work together in governing AI. If you need to decide on a national standard, if you need an international standard, if you need to decide on rules that AI is not allowed to break, or that developers are not allowed to break. It’s going to suck if researchers in China make up some rules and researchers in America makeup different rules and the American rules don’t protect from the stuff that the Chinese rules protect from and the Chinese rules don’t protect from the stuff the American rules protect from. Moreover, that systems interacting with each other are going to not protect from either of those risks.

It’s good to be able to collaborate in governing things. Thinking about systems and technologies having a lot of stakeholders is key to preparing those technologies in a form that allows them to be collaborated over. Think about Google docs. I can see your cursor moving when you write in a Google doc. That’s really informative in a way that other collaborative document editing software does not allow. I don’t know if you’ve ever noticed how very informative it is to see where someone’s cursor is versus using another platform where you can only see the line someone’s on, but you can’t see what character they’re typing right now, that you can’t see what word they’re thinking. You’re like way, way, way less in tune with each other when you’re writing together, when you can’t see the cursors.

That’s an example of a way in which Google docs just had this extra feature that makes it way easier to negotiate for control, because if you’re not getting into an edit war, if I’m editing something, I’m not going to put my cursor where your cursor is. Or if I start backspacing a word that you just wrote, you know I must mean that, it must be important change. I just interrupted your cursor. Maybe you’re going to let me finish that backspace and see what the hell I’m doing. There’s this negotiability over the content of the document. It’s a consequence of the design of the interface. I think similarly AI technology could be designed with properties that make it easier for different stakeholders to cooperate in exercising, in the act of exercise and control over the system and its priorities. I think that sort of design question is key to facilitating collaborative governance because you can have stakeholders from different institutions, different cultures collaborating in the act of governing or controlling systems and observing what principles the systems need to have, need to adhere to for the purposes of different cultures or different values and so on.

Now, why is that important? Well, it’s lots of warm fuzzies from people working together and stuff. But one reason it’s important is that it reduces incentives to race. If we can all work together to set the speed limit, we don’t all have to drive as fast as we can to beat each other. That’s the section 7.2 is avoiding races by share and control and then section 7.3 is reducing idiosyncratic risk taking. Basically everybody kind of wants different things, but there’s a whole bunch of stuff we all don’t want. This kind of comes back to what you said about there being basic human values. Most of us don’t want humanity to go extinct. Most of us don’t want everyone to suffer greatly, but everybody kind of has a different view of what utopia should look like. That’s kind of maybe where the paretotopia concept came from.

It’s like everybody has a different utopia in mind, but nobody wants dystopia. If you imagine a powerful AI technology that might get deployed, and there’s a bunch of people on the committee deciding to make the deployment decision or deciding what features it should have, you can imagine one person on the committee being like, “Well, this poses a certain level of societal scale risk, but it’s worth it because of the anti-aging benefits that the AI is going to produce through the research, that’s going to be great.” Then another person on the committee being like, “Well, I don’t really care about anti-aging, but I do care about space travel. I want it to take a risk for that.” Then they’re like, “Wait a minute, I think we have this science assistant AI. We should use it on anti-aging not space.” And the space travel person’s like, “We should use it on space travel, not anti aging.”

Because of that, they don’t agree, that slows progress, but maybe a little slower progress is maybe a safer thing for humanity. Everyone has their agenda that they want to risk the world for, but because everyone disagrees and what risks are worth it, you sort of slow down and say, “Maybe collectively, we’re just not going to take any of these risks right now and we’ll just wait until we can do it with less risk.” So reducing idiosyncratic risk taking is just my phrase for the way everyone’s individual desire to take risks kind of averages out. Whereas every member of the committee doesn’t want human extinction so that doesn’t get washed out. It’s like everybody wants it to not destroy the world. Whereas not everybody wants it to colonize space or not everybody wants it to cure aging. You end up conservative on the risk, if you can collaboratively govern.

Then you’ve got existential safety systems, which is the last thing. If we did someday try to build AI tech that actually protects the world in some way, like say through cybersecurity or through environmental protection, that’s terrifying by the way, AI that controls the environment. But anyway, it’s also really promising, maybe we can clean up. It’s just the big move, setting control of the environment to AI systems is a big move. But as long you got lots of off switches, it’s maybe it’s great. Those big moves are scary because of how big they are. A lot of institutions would just never allow it to happen because of how scary it is. It’s like, “All right, I’ve got this garbage cleanup, AI is just going to actually go clean up all the garbage, or it’s going to scrub all the CO2 with this little replicating photosynthetic lab here. That’s going to absorb all the carbon dioxide and store it as biofuel. Great.” That’s scary. You’re like, whoa, you’re just unrolling the self replicating biofuel lab all over the world. People won’t let that happen.

I’m not sure what the right level of risk tolerance is for saving the world versus risking the world. But whatever it is, you are going to want existential safety safety nets, literal existential safety nets there to protect from big disasters. Whether the system is just an algorithm that runs on the robots that are doing whatever crazy world intervention you’re doing, or if it’s actually a separate system. But if you’re making a big change to the world for the sake of existential safety, you’re not going to get away with it unless a lot of people are involved in that decision. This is kind of a bid to the people who really do want to make big world interventions. Sometimes for the sake of safety, you’re going to have to appeal to a lot of stakeholders to sort of be allowed to do that.

So those are four reasons why I think developing your tech in a way that really is compatible with multiple stakeholders is going to be societally important and not automatically solved by industry standards. Maybe solved in special cases that are profitable, but not necessarily generalizable to these issues.

Lucas Perry: Yeah, the set of problems that are not naturally solved by industry and incentives, but that are crucial for existential safety are the set of problems it seems that we crucially need to identify and anticipate and engage in research today. Being mindful of flow through effects, such that we’re able to have as much leverage on that set of problems, given that they’re most likely not to be solved without a lot of foresight and intervention from the outside of industry and the normal flow of incentives.

Andrew Critch: Yep, exactly.

Lucas Perry: All right, Andrew wrapping things up. I just want to offer you a final bit of space for you to give any final words you’d like to say about the paper or AI existential risk. If there’s anything you feel is unresolved or you’d really like to communicate to everyone.

Andrew Critch: Yeah, thanks. I’d say if you’re interested in existential safety or something adjacent to it, use specific words for what you mean instead of just calling it AI safety all the time. Whatever your thing is, maybe it’s not existential safety, maybe it’s a societal scale risk or single-multi alignment or something, but try to get more specific about what we’re interested in. So that it’s easier for newcomers thinking about these topics, to know what we mean when we say them.

Lucas Perry: All right. If people want to follow you or get in touch or find your papers and work, where are the best places to do that?

Andrew Critch: For me personally, or David Krueger, the other coauthor on this report, and you could just Google our names and then we’ll have our research homepage show up and then you can see what our papers are or obviously Google Scholar is always a good avenue. Google Scholar sorted by year is a good trick because you can see what people are working on now, but there’s also the Center for Human Compatible AI where I work. There’s a bunch of other research going on there that I’m not doing, but I’m also still very interested in, and I’d probably be interested in doing more work research in that vein. I would say check out humancompatible.ai or acritch.com, for me personally. I don’t know what David’s homepage is, but I’m sure you can find them by Googling David Krueger.

Lucas Perry: All right, Andrew, thanks so much for coming on and for your paper, I feel like I honestly gained a lot of perspective here on the need for clarity on definitions and what we mean. You’ve given me a better perspective on the kind of problem that we have and the kind of solutions that it might require and so for that, I’m grateful.

Andrew Critch: Thanks.

End of recorded material

Iason Gabriel on Foundational Philosophical Questions in AI Alignment

 Topics discussed in this episode include:

  • How moral philosophy and political theory are deeply related to AI alignment
  • The problem of dealing with a plurality of preferences and philosophical views in AI alignment
  • How the is-ought problem and metaethics fits into alignment 
  • What we should be aligning AI systems to
  • The importance of democratic solutions to questions of AI alignment 
  • The long reflection

 

Timestamps: 

0:00 Intro

2:10 Why Iason wrote Artificial Intelligence, Values and Alignment

3:12 What AI alignment is

6:07 The technical and normative aspects of AI alignment

9:11 The normative being dependent on the technical

14:30 Coming up with an appropriate alignment procedure given the is-ought problem

31:15 What systems are subject to an alignment procedure?

39:55 What is it that we’re trying to align AI systems to?

01:02:30 Single agent and multi agent alignment scenarios

01:27:00 What is the procedure for choosing which evaluative model(s) will be used to judge different alignment proposals

01:30:28 The long reflection

01:53:55 Where to follow and contact Iason

 

Citations:

Artificial Intelligence, Values and Alignment 

Iason Gabriel’s Google Scholar

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the AI Alignment Podcast. I’m Lucas Perry. Today, we have a conversation with Iason Gabriel about a recent paper that he wrote titled Artificial Intelligence, Values and Alignment. This episode primarily explores how moral and political theory are deeply interconnected with the technical side of the AI alignment problem, and important questions related to that interconnection. We get into the problem of dealing with a plurality of preferences and philosophical views, the is-ought problem, metaethics, how political theory can be helpful for resolving disagreements, what it is that we’re trying to align AIs to, the importance of establishing a broadly endorsed procedure and set of principles for alignment, and we end on exploring the long reflection.

This was a very fun and informative episode. Iason has succeeded in bringing new ideas and thought to the space of moral and political thought in AI alignment, and I think you’ll find this episode enjoyable and valuable. If you don’t already follow us, you can subscribe to this podcast on your preferred podcasting platform by searching for The Future of Life or following the links on the page for this podcast.

Iason Gabriel is a Senior Research Scientist at DeepMind where he works in the Ethics Research Team. His research focuses on the applied ethics of artificial intelligence, human rights, and the question of how to align technology with human values. Before joining DeepMind, Iason was a Fellow in Politics at St John’s College, Oxford. He holds a doctorate in Political Theory from the University of Oxford and spent a number of years working for the United Nations in post-conflict environments.

And with that, let’s get into our conversation with Iason Gabriel.

So we’re here today to discuss your paper, Artificial Intelligence, Values and Alignment. To start things off here, I’m interested to know what you found so compelling about the problem of AI values and alignment, and generally, just what this paper is all about.

Iason Gabriel: Yeah. Thank you so much for inviting me, Lucas. So this paper is in broad brush strokes about how we might think about aligning AI systems with human values. And I wrote this paper because I wanted to bring different communities together. So on the one hand, I wanted to show machine learning researchers, that there were some interesting normative questions about the value configuration we align AI with that deserve further attention. At the same time, I was keen to show political and moral philosophers that AI was a subject that provoked real philosophical reflection, and that this is an enterprise that is worthy of their time as well.

Lucas Perry: Let’s pivot into what the problem is then that technical researchers and people interested in normative questions and philosophy can both contribute to. So what is your view then on what the AI problem is? And the two parts you believe it to be composed of.

Iason Gabriel: In broad brush strokes, I understand the challenge of value alignment in a way that’s similar to Stuart Russell. He says that the ultimate aim is to ensure that powerful AI is properly aligned with human values. I think that when we reflect upon this in more detail, it becomes clear that the problem decomposes into two separate parts. The first is the technical challenge of trying to align powerful AI systems with human values. And the second is the normative question of what or whose values we try to align AI systems with.

Lucas Perry: Oftentimes, I also see a lot of reflection on AI policy and AI governance as being a core issue to also consider here, given that people are concerned about things like race dynamics and unipolar versus multipolar scenarios with regards to something like AGI, what are your thoughts on this? And I’m curious to know why you break it down into technical and normative without introducing political or governance issues.

Iason Gabriel: Yeah. So this is a really interesting question, and I think that one we’ll probably discuss at some length later about the role of politics in creating aligned AI systems. Of course, in the paper, I suggest that an important challenge for people who are thinking about value alignment is how to reconcile the different views and opinions of people, given that we live in a pluralistic world, and how to come up with a system for aligning AI systems that treats people fairly despite that difference. In terms of practicalities, I think that people envisage alignment in different ways. Some people imagine that there will be a human parliament or a kind of centralized body that can give very coherent and sound value advice to AI systems. And essentially, that the human element will take care of this problem with pluralism and just give AI very, very robust guidance about things that we’ve all agreed upon are the best thing to do.

At the same time, there’s many other visions for AI or versions of AI that don’t depend upon that human parliament being able to offer such cogent advice. So we might think that there are worlds in which there’s multiple AIs, each of which has a human interlocutor, or we might imagine AIs as working in the world to achieve constructive ends and that it needs to actually be able to perform these value calculations or this value synthesis as part of its kind of default operating procedure. And I think it’s an open question what kind of AI system we’re discussing and that probably the political element understood in terms of real world political institutions will need to be tailored to the vision of AI that we have in question.

Lucas Perry: All right. So can you expand then a bit on the relationship between the technical and normative aspects of AI alignment?

Iason Gabriel: A lot of the focus is on the normative part of the value alignment question, trying to work out which values to align AI systems with, whether it is values that really matter and how this can be decided. I think this is also relevant when we think about the technical design of AI systems, because I think that most technologies are not value agnostic. So sometimes, when we think about AI systems, we assume that they’ll have this general capability and that it will almost be trivially easy for them to align with different moral perspectives or theories. Yet when we take a ground level view and we look at the way in which AI systems are being built, there’s various path dependencies that are setting in and there’s different design architectures that will make it easier to follow one moral trajectory rather than the other.

So for example, if we take a reinforcement learning paradigm, which focuses on teaching agents tasks by enabling them to maximize reward in the face of uncertainty over time, a number of commentators have suggested that, that model fits particularly well with the kind of utilitarian decision theory, which aims to promote happiness over time in the face of uncertainty, and that it would actually struggle to accommodate a moral theory that embodies something like rights or hard constraints. And so I think that if what we do want is a rights based vision of artificial intelligence, it’s important that we get that ideal clear in our minds and that we design with that purpose in mind.

This challenge becomes even clearer when we think about moral philosophies, such as a Kantian theory, which would ask an agent to reflect on the reasons that it has for acting, and then ask whether they universalize to good states of affairs. And this idea of using the currency of a reason to conduct moral deliberation would require some advances in terms of how we think about AI, and it’s not something that is very easy to get a handle on from a technical point of view.

Lucas Perry: So the key takeaway here is that what is going to be possible in terms of the normative and in terms of moral learning and moral reasoning in AI systems will supervene upon technical pathways that we take, and so it is important to be mindful of the relationship between what is possible normatively, given what is technically known, and to try and navigate that with mindfulness about that relationship?

Iason Gabriel: I think that’s precisely right. I see at least two relationships here. So the first is that if we design without a conception of value in mind, it’s likely that the technology that we build will not be able to accommodate any value constellation. And then the mirror side of that is if we have a clear value constellation in mind, we may be able to develop technologies that can actually implement or realize that ideal more directly and more effectively.

Lucas Perry: Can you make a bit more clear the ways in which, for example, path dependency of current technical research makes certain normative ethical theories more plausible to be instantiated in AI systems than others?

Iason Gabriel: Yeah. So, I should say that obviously, there’s a wide variety of different methodologies that are being tried at the present moment, and that intuitively, they seem to match up well with different kinds of theory. Of course, the reality is a lot of effort has been spent trying to ensure that AI systems are safe and that they are aligned with human intentions. When it comes to richer goals, so trying to evidence a specific moral theory, a lot of this is conjecture because we haven’t really tried to build utilitarian or Kantian agents in full. But I think in terms of the details, so with regards to reinforcement learning, we have this, obviously, an optimization driven process, and there is that whole caucus of moral theories that basically use that decision process to achieve good states of affairs. And we can imagine, roughly equating the reward that we use to train an RL agent on, with some metric of subjective happiness, or something like that.

Now, if we were to take a completely different approach, so say, virtue ethics, virtue ethics is radically contextual, obviously. And it says that the right thing to do in any situation is the action that evidences certain qualities of character and that these qualities can’t be expressed through a simple formula that we can maximize for, but actually require a kind of context dependence. So I think that if that’s what we want, if we want to build agents that have a virtuous character, we would really need to think about the fundamental architecture potentially in a different way. And I think that, that kind of insight has actually been speculatively adopted by people who consider forms of machine learning, like inverse reinforcement learning, who imagined that we could present an agent with examples of good behavior and that the agent would then learn them in a very nuanced way without us ever having to describe in full what the action was or give it appropriate guidance for every situation.

So, as I said, these really are quite tentative thoughts, but it doesn’t seem at present possible to build an AI system that adapts equally well to whatever moral theory or perspective we believe ought to be promoted or endorsed.

Lucas Perry: Yeah. So, that does make sense to me that different techniques would be more or less skillful for more readily and fully adopting certain normative perspectives and capacities in ethics. I guess the part that I was just getting a little bit tripped up on is that I was imagining that if you have an optimizer being trained off something, like maximize happiness, then given the massive epistemic difficulties of running actual utilitarian optimization process that is only thinking at the level of happiness and how impossibly difficult that, that would be that like human beings who are consequentialists, it would then, through gradient descent or being pushed and nudged from the outside or something, would find virtue ethics and deontological ethics and that those could then be run as a part of its world model, such that it makes the task of happiness optimization much easier. But I see how intuitively it more obviously lines up with utilitarianism and then how it would be more difficult to get it to find other things that we care about, like virtue ethics or deontological ethics. Does that make sense?

Iason Gabriel: Yeah. I mean, it’s a very interesting conjecture that if you set an agent off with the learned goal of trying to maximize human happiness, that it would almost, by necessity, learn to accommodate other moral theories and perspectives kind of suggests that there is a core driver, which animates moral inquiry, which is this idea of collective welfare being realized in a sustainable way. And that might be plausible from an evolutionary point of view, but there’s also other aspects of morality that don’t seem to be built so clearly on what we might even call the pleasure principle. And so I’m not entirely sure that you would actually get to a rights based morality if you started out from those premises.

Lucas Perry: What are some of these things that don’t line up with this pleasure principle, for example?

Iason Gabriel: I mean, of course, utilitarians have many sophisticated theories about how endeavors to improve total aggregate happiness involve treating people, fairly placing robust side constraints on what you can do to people and potentially, even encompassing other goods, such as animal welfare and the wellbeing of future generations. But I believe that the consensus or the preponderance of opinion is that actually, unless we can say that certain things matter, fundamentally, for example, human dignity or the wellbeing of future generations or the value of animal welfare, is quite hard to build a moral edifice that adequately takes these things into account just through instrumental relationships with human wellbeing or human happiness so understood.

Lucas Perry: So then we have this technical problem of how to build machines that have the capacity to do what we want them to do and to help us figure out what we would want to want us to get the machines to do, an important problem that comes in here is the is-ought distinction by Hume, where we have, say, facts about the world, on one hand, is statements, we can even have is statements about people’s preferences and meta-preferences and the collective state of all normative and meta-ethical views on the planet at a given time, and the distinction between that and ought, which is a normative claim synonymous with should and is kind of the basis of morality, and the tension there between what assumptions we might need to get morality off of the ground and how we should interact with a world of facts and a world of norms and how they may or may not relate to each other for creating a science of wellbeing or not even doing that. So how do you think of coming up with an appropriate alignment procedure that is dependent on the answer to this distinction?

Iason Gabriel: Yeah, so that’s a fascinating question. So I think that the is-ought distinction is quite fundamental and it helps us answer one important query, which is whether it’s possible to solve the value alignment question simply through an empirical investigation of people’s existing beliefs and practices. And if you take the is-ought distinction seriously, it suggests that no matter what we can infer from studies of what is already the case, so what people happen to prefer or happen to be doing, we still have a further question, which is should that perspective be endorsed? Is it actually the right thing to do? And so there’s always this critical gap. It’s a space for moral reflection and moral introspection and a place in which error can arise. So we might even think that if we studied all the global beliefs of different people and found that they agreed upon certain axioms or moral properties that we could still ask, are they correct about those things? And if we look at historical beliefs, we might think that there was actually a global consensus on moral beliefs or values that turned out to be mistaken.

So I think that these endeavors to kind of synthesize moral beliefs to understand them properly are very, very valuable resources for moral theorizing. It’s hard to think where else we would begin, but ultimately, we do need to ask these questions about value more directly and ask whether we think that the final elucidation of an idea is something that ought to be promoted.

So in sum, it has a number of consequences, but I think one of them is that we do need to maintain a space for normative inquiry and value alignment can’t just be addressed through an empirical social scientific perspective.

Lucas Perry: Right, because one’s own perspective on the is-ought distinction and whether and how it is valid will change how one goes about learning and evolving normative and meta-ethical thinking.

Iason Gabriel: Yeah. Perhaps at this point, an example will be helpful. So, suppose we’re trying to train a virtuous agent that has these characteristics of treating people fairly, demonstrating humility, wisdom, and things of that nature, suppose we can’t specify these upfront and we do need a training set, we need to present the agent with examples of what people believe evidences these characteristics, we still have the normative question of what goes into that data set and how do we decide. So, the evaluative questions get passed on to that. Of course, we’ve seen many examples of data sets being poorly curated and containing bias that then transmutes onto the AI system. We either need to have data that’s curated so that it meets independent moral standards and the AI learns from that data, or we need to have a moral ideal that is freestanding in some sense and that AI can be built to align with.

Lucas Perry: Let’s try and make that even more concrete because I think this is a really interesting and important problem about why the technical aspect is deeply related with philosophical thinking about this is-ought problem. So a highest level of abstraction, like starting with axioms around here, if we have is statements about datasets, and so data sets are just information about the world, the data sets are the is statements, we can put whatever is statements into a machine and the machine can take the shape of those values already embedded and codified in the world in people’s minds or in our artifacts and culture. And then the ought question, as you said, is what information in the world should we use? And to understand what information we should use requires some initial principle, some set of axioms that bridges the is-ought gap.

So for example, the kind of move that I think Sam Harris tries to lay out is this axiom, we should avoid the worst possible misery for everyone and you may or may not agree with that axiom but that is the starting point for how one might bridge the is-ought gap to be able to select for which data is better than other data or which data we should on load to AI systems. So I’m curious to know, how is it that you think about this very fundamental level of initial axiom or axioms that are meant to bridge this distinction?

Iason Gabriel: I think that when it comes to these questions of value, we could try and build up from this kind of very, very minimalist assumptions of the kind that it sounds like Sam Harris is defending. We could also start with richer conceptions of value that seem to have some measure of widespread ascent and reflective endorsement. So I think, for example, the idea that human life matters or that sentient life matters, that it has value and hence, that suffering is bad is a really important component of that, I think that conceptions of fairness of what people deserve in light of that equal moral standing is also an important part of the moral content of building an aligned AI system. And I would tend to try and be inclusive in terms of the values that we canvass.

So I don’t think that we actually need to take this very defensive posture. I think we can think expansively about the conception and nature of the good that we want to promote and that we can actually have meaningful discussions and debate about that so we can put forward reasons for defending one set of propositions in comparison with another.

Lucas Perry: We can have epistemic humility here, given the history of moral catastrophes and how morality continues to improve and change over time and that surely, we do not sit at a peak of moral enlightenment in 2020. So given our epistemic humility, we can cast a wide net around many different principles so that we don’t lock ourselves into anything and can endorse a broad notion of good, which seems safer, but perhaps has some costs in itself for allowing and being more permissible for a wide range of moral views that may not be correct.

Iason Gabriel: I think that’s, broadly speaking, correct. We definitely shouldn’t tether artificial intelligence too narrowly to the morality of the present moment, given that we may and probably are making moral mistakes of one kind or another. And I think that this thing that you spoke about, a kind of global conversation about value, is exactly right. I mean, if we take insights from political theory seriously, then the philosopher, John Rawls, suggests that a fundamental element of the present human condition is what he calls the fact of reasonable pluralism, which means that when people are not coerced and when they’re able to deliberate freely, they will come to different conclusions about what ultimately has moral value and how we should characterize ought statements, at least when they apply to our own personal lives.

So if we start from that premise, we can then think about AI as a shared project and ask this question, which is given that we do need values in the equation, that we can’t just do some kind of descriptive enterprise and that, that will tell us what kind of system to build, what kind of arrangement adequately factors in people’s different views and perspectives, and seems like a solution built upon the relevant kind of consensus to value alignment that then allows us to realize a system that can reconcile these different moral perspectives and takes a variety of different values and synthesizes them in a scheme that we would all like.

Lucas Perry: I just feel broadly interested in just introducing a little bit more of the debate and conceptions around the is-ought problem, right? Because there are some people who take it very seriously and other people who try to minimize it or are skeptical of it doing the kind of philosophical work that many people think that it’s doing. For example, Sam Harris is a big skeptic of the kind of work that the is-ought problem is doing. And in this podcast, we’ve had people on who are, for example, realists about consciousness, and there’s just a very interesting broad range of views about value that inform the is-ought problem. If one’s a realist about consciousness and thinks that suffering is the intrinsic valence carrier of disvalue in the universe, and that joy is the intrinsic valence carrier of wellbeing, one can have different views on how that even translates to normative ethics and morality and how one does that, given one’s view on the is-ought problem.

So, for example, if we take that kind of metaphysical view about consciousness seriously, then if we take the is-ought problem seriously then, even though there are actually bad things in the world, like suffering, those things are bad, but that it would still require some kind of axiom to bridge the is-ought distinction, if we take it seriously. So because pain is bad, we ought to avoid it. And that’s interesting and important and a question that is at the core of unifying ethics and all of our endeavors in life. And if you don’t take the is-ought problem seriously, then you can just be like, because I understand the way that the world is, by the very nature of being sentient being and understanding the nature of suffering, there’s no question about the kind of navigation problem that I have. Even in the very long-term, the answer to how one might resolve the is-ought problem would potentially be a way of unifying all of knowledge and endeavor. All the empirical sciences would be unified conceptually with the normative, right? And then there is no more conceptual issues.

So, I think I’m just trying to illustrate the power of this problem and distinction, it seems.

Iason Gabriel: It’s a very interesting set of ideas. To my mind, these kinds of arguments about the intrinsic badness of pain, or kind of naturalistic moral arguments, are very strong ways of arguing, against, say, moral relativist or moral nihilist, but they don’t necessarily circumvent the is-ought distinction. Because, for example, the claim that pain is bad is referring to a normative property. So if you say pain is bad, therefore, it shouldn’t be promoted, but that’s completely compatible with believing that we can’t deduce moral arguments from purely descriptive premises. So I don’t really believe that the is-ought distinction is a problem. I think that it’s always possible to make arguments about values and that, that’s precisely what we should be doing. And that the fact that, that needs to be conjoined with empirical data in order to then arrive at sensible judgments and practical reason about what ought to be done is a really satisfactory state of affairs.

I think one kind of interesting aspect of the vision you put forward was this idea of a kind of unified moral theory that everyone agrees with. And I guess it does touch upon a number of arguments that I make in the paper, where I juxtapose two slightly stylistic descriptions of solutions to the value alignment challenge. The first one is, of course, the approach that I term the true moral theory approach, which holds that we do need a period of prolonged reflection and we reflect fundamentally on these questions about pain and perhaps other very deep normative questions. And the idea is that by using tools from moral philosophy, eventually, although we haven’t done it yet, we may identify a true moral theory. And then it’s a relatively simple… well, not simple from a technical point of view, but simple from a normative point of view task, of aligning AI, maybe even AGI, with that theory, and we’ve basically solved the value alignment problem.

So in the paper, I argue against that view quite strongly for a number of reasons. The first is that I’m not sure how we would ever know that we’d identified this true moral theory. Of course, many people throughout history have thought that they’ve discovered this thing and often gone on to do profoundly unethical things to other people. And I’m not sure how, even after a prolonged period of time, we would actually have confidence that we had arrived at the really true thing and that we couldn’t still ask the question, am I right?

But even putting that to one side, suppose that I had not just confidence, but justified confidence that I really had stumbled upon the true moral theory and perhaps with the help of AI, I could look at how it plays out in a number of different circumstances, and I realize that it doesn’t lead to these kind of weird, anomalous situations that most existing moral theories point towards, and so I really am confident that it’s a good one, we still have this question of what happens when we need to persuade other people that we’ve found the true moral theory and whether that is a further condition on an acceptable solution to the value alignment problem. And in the paper, I say that it is a further condition that needs to be satisfied because just knowing, well, supposedly having access to justified belief in a true moral theory, doesn’t necessarily give you the right to impose that view upon other people, particularly if you’re building a very powerful technology that has world shaping properties.

And if we return to this idea of reasonable pluralism that I spoke about earlier, essentially, the core claim is that unless we coerce people, we can’t get to a situation where everyone agrees on matters of morality. We could flip it around. It might be that someone already has the true moral theory out there in the world today and that we’re the people who refuse to accept it for different reasons, I think the question then is how do we believe other people should be treated by the possessor of the theory, or how do we believe that person should treat us?

Now, one view that I guess in political philosophy is often attributed to Jean-Jacques Rousseau, if you have this really good theory, you’re justified in coercing other people to live by it. He says that people should be forced to be free when they’re not willing to accept the truth of the moral theory. Of course, it’s something that has come in for fierce criticism. I mean, my own perspective is that actually, we need to try and minimize this challenge of value imposition for powerful technologies because it becomes a form of domination. So the question is how can we solve the value alignment problem in a way that avoids this challenge of domination? And in that regard, we really do need tools from political philosophy, which is, particularly within the liberal tradition, has tried to answer this question of how can we all live together on reasonable terms that preserve everyone’s capacity to flourish, despite the fact that we have variation and what we ultimately believe to be just, true and right.

Lucas Perry: So to bring things a bit back to where we’re at today and how things are actually going to start changing in the real world as we move forward. What do you view as the kinds of systems that would be, and are subject to something like an alignment procedure? Does this start with systems that we currently have today? Does it start with systems soon in the future? Should it have been done with systems that we already have today, but we failed to do so? What is your perspective on that?

Iason Gabriel: To my mind, the challenge of value alignment is one that exists for the vast majority, if not all technologies. And it’s one that’s becoming more pronounced as these technologies demonstrate higher levels of complexity and autonomy. So for example, I believe that many existing machine learning systems encounter this challenge quite forcefully, and that we can ask meaningful questions about it. So I think in previous discussion, we may have had this example of a recommendation system come to light. And even if we think of something that seems really quite prosaic. so say a recommendation system for what films to watch or what content to be provided to you. I think the value alignment question actually looms large because it could be designed to do very different things. On the one hand, we might have a recommendation system that’s geared around your current first order preferences. So it might continuously give you really stimulating, really fun, low quality content that kind of keeps you hooked to the system and with a high level of subjective wellbeing, but perhaps something that isn’t optimum in other regards. Then we can think about other possible goals for alignment.

So we might say that actually these systems should be built to serve your second order desires. Those are desires that in philosophy, we would say that people reflectively endorse, they’re desires about the person you want to be. So if we were to build recommendation system with that goal in mind, it might be that instead of watching this kind of cheap and cheerful content, I decided that I’d actually like to be quite a high brow person. So it starts kind of tacitly providing me with more art house recommendations, but even that doesn’t opt out the options, it might be that the system shouldn’t really be just trying to satisfy from my preferences, that it should actually be trying to steer me in the direction of knowledge and things that are in my interest to know. So it might try and give me new skills that I need to acquire, might try and recommend, I don’t know, cooking or self improvement programs.

That would be a system that was, I guess, geared toward my own interest. But even that again, doesn’t give us a complete portfolio of options. Maybe what we want is a morally aligned system that actually enhances our capacity for moral decision making. And then perhaps that would lead us somewhere completely different. So instead of giving us this content that we want, it might lead us to content that leads us to engage with challenging moral questions, such as factory farming or climate change. So, value alignment kind of arises quite early on. This is of course, with the assumption that the recommendation system is geared to promote your interest or wellbeing or preference or moral sensibility. There’s also the question of whether it’s really promoting your goals and aspirations or someone else’s and in science and technology studies there’s a big area of value sensitive design, which essentially says that we need to consult people and have this almost like democratic discussions early on about the kind of values we want to embody in systems.

And then we design with that goal in mind. So, recommendation systems are one thing. Of course, if we look at public institutions, say a criminal justice system, there, we have a lot of public roar and discussion about the values that would make a system like that fair. And the challenge then is to work out whether there is a technical approximation of these values that satisfactory realizes them in a way that conduces to some vision of the public good. So in sum, I think that value alignment challenges exist everywhere, and then they become more pronounced when these technologies become more autonomous and more powerful. So as they have more profound effects on our lives, the burden of justification in terms of the moral standards that are being met, become more exacting. And the kind of justification we can give for the design of a technology becomes more important.

Lucas Perry: I guess, to bring this back to things that exist today. Something like YouTube or Facebook is a very rudimentary initial kind of very basic first order preference, satisfier. I mean, imagine all of the human life years that have been wasted, mindlessly consuming content that’s not actually good for us. Whereas imagine, I guess some kind of enlightened version of YouTube where it knows enough about what is good and yourself and what you would reflectively and ideally endorse and the kind of person that you wish you could be and that you would be only if you knew better and how to get there. So, the differences between that second kind of system and the first system where one is just giving you all the best cat videos in the world, and the second one is turning you into the person that you always wish you could have been. I think this clearly demonstrates that even for systems that seem mundane, that they could be serving us in much deeper ways and at much deeper levels. And that even when they superficially serve us they may be doing harm.

Iason Gabriel: Yeah, I think that’s a really profound observation. I mean, when we really look at the full scope of value or the full picture of the kinds of values we could seek to realize when designing technologies and incorporating them into our lives, often there’s a radically expansive picture that emerges. And this touches upon a kind of taxonomic distinction that I introduce in the paper between minimalist and maximalist conceptions of value alignment. So when we think about AI alignment questions, the minimalist says we have to avoid very bad outcomes. So it’s important to build safe systems. And then we just need them to reside within some space of value that isn’t extremely negative and could take a number of different constellations. Whereas the maximalist says, “Well, let’s actually try and design the very best version of these technologies from a moral point of view, from a human point of view.”

And they say that even if we design safe technologies, we could still be leaving a lot of value out there on the table. So a technology could be safe, but still not that good for you or that good for the world. And let’s aim to populate that space with more positive and richer visions of the future. And then try to realize those through the technologies that we’re building. As we want to realize richer visions of human flourishing, it becomes more important that it isn’t just a personal goal or vision, but it’s one that is collectively endorsed, has been reflected upon and is justifiable from a variety of different points of view.

Lucas Perry: Right. And I guess it’s just also interesting and valuable to reflect briefly on how there is already in each society, a place where we draw the line at value imposition, and we have these principles, which we’ve agreed upon broadly, but we’re not going to let Ted Bundy do what Ted Bundy and wants to do

Iason Gabriel: That’s exactly right. So we have hard constraints, some of which are kind of set in law. And clearly those are constraints that these are just laws. So the AI systems need to respect. There’s also a huge possible space of better outcomes that are left open. Once we look at where moral constraints are placed and where they reside. I think that the Ted Bundy example is interesting because it also shows that we need to discount the preferences and desires of certain people.

One vision of AI alignment says that it’s basically a global preference aggregation system that we need, but in reality, there’s a lot of preferences that just shouldn’t be counted in the first place because they’re unethical or they’re misinformed. So again, that kind of to my mind pushes us in this direction of a conversation about value itself. And once we know what the principle basis for alignment is, we can then adjudicate properly cases like that and work out what a kind of valid input for an aligned system is and what things we need to discount if we want to realize good moral outcomes.

Lucas Perry: I’m not going to try and pin you down too hard on that because there’s the tension here, of course, between the importance of liberalism, not coercing value judgments on anyone, but then also being like, “Well, we actually have to do it in some places.” And that line is a scary one to move in either direction. So, I want to explore more now the different understandings of what it is that we’re trying to align AI systems to. So broadly people and I use a lot of different words here without perhaps being super specific about what we mean, people talk about values and intentions and idealized preferences and things of this nature. So can you be a little bit more specific here about what you take to be the goal of AI alignment, the goal of it being, what is it that we’re trying to align systems to?

Iason Gabriel: Yeah, absolutely. So we’ve touched upon some of these questions already tacitly in the preceding discussion. Of course, in the paper, I argue that when we talk about value alignment, this idea of value is often a placeholder for quite different ideas, as you said. And I actually present a taxonomy of options that I can take us through in a fairly thrifty way. So, I think the starting point for creating aligned AI systems is this idea that we want AI that’s able to follow our instructions, but that has a number of shortcomings, which Stuart Russel and others have documented, which tend to center around this challenge of excessive literalism. So if an AI system literally does what we ask it to, without an understanding of context, side constraints and nuance, often this will lead to problematic outcomes with the story of King Midas, being the classic cautionary tale. Wishing that everything he touches turns to gold, everything turns to gold, then you have a disaster of one kind or another.

So of course, instructions are not sufficient. What you really want is AI that’s aligned with the underlying intention. So, I think that often in the podcast, people have talked about intention alignment as an important goal of AI systems. And I think that is precisely right to dedicate a lot of technical effort to close the gap between a kind of idiot savant, AI, that perceives just instructions in this dumb way and the kind of more nuanced, intelligent AI that can follow an intention. But we might wonder whether aligning AI with an individual or collective intention is actually sufficient to get us to the really good outcomes, the kind of maximalist outcomes that I’m talking about. And I think that there’s a number of reasons why that might not be the case. So of course, to start with, just because an AI can follow an intention, doesn’t say anything about the quality of the intention that’s being followed.

We can form intentions on an individual or collective basis to do all kinds of things. Some of which might be incredibly foolish or malicious, some of which might be self-harming, some of which might be unethical. And we’ve got to ask this question of whether we want AI to follow us down that path when we come up with schemes of that kind, and there’s various ways we might try to address those bundle of problems. I think intentions are also problematic from a kind of technical and phenomenological perspective because they tend to be incomplete. So if we look at what an intention is, it’s roughly speaking a kind of partially filled out plan of action that commits us to some end. And if we imagine the AI systems are very powerful, they may encounter situations or dilemmas or option sets that are in this space of uncertainty, where it’s just not clear what the original intention was, and they might need to make the right kind of decision by default.

So they might need some intuitive understanding of what the right thing to do is. So my intuition is that we do want AI systems that have some kind of richer understanding of the goals that we would want to realize in whole. So I think that we do need to look at other options. It is also possible that we had formed the intention for the AI to do something that explicitly requires an understanding of morality. So we may ask it to do things like promote the greatest good in a way that is fundamentally ethical. Then it needs to step into this other terrain of understanding preferences, interests, and values. I think we need to explore that terrain for one reason or another. Of course, one thing that people talk about is this kind of learning from revealed preferences. So perhaps in addition to the things that we directly communicate, the AI could observe our behavior and make inferences about what we want that help fill in the gaps.

So maybe it could watch you in your public life, hopefully not private life and make these inferences that actually it should create this very good thing. So that isn’t the domain of trying to learn from things that it observes. But I think that preferences are also quite a worrying data point for AI alignment, at least revealed preferences because they contain many of the same weaknesses and shortcomings that we can ascribe to individual intentions.

Lucas Perry: What is a revealed intention again?

Iason Gabriel: Sorry, revealed preferences are preferences that are revealed through your behavior. So I observed you doing A or B. And from that choice, I conclude that you have a deeper preference for the thing that you choose. And the question is, if we just watch people, can we learn all the background information we need to create ethical outcomes?

Lucas Perry: Yeah. Absolutely not.

Iason Gabriel: Yeah. Exactly. As your Ted Bundy example, nicely illustrated, not only is it very hard to actually get useful information from observing people about what they want, but what they want can often be the wrong kind of thing for them or for other people.

Lucas Perry: Yeah. I have to hire people to spend some hours with me every week to tell me from the outside, how I may be acting in ways that are misinformed or self-harming. So instead of revealed preferences, we need something like rational or informed preferences, which is something you get through therapy or counseling or something like that.

Iason Gabriel: Well, that’s an interesting perspective. I guess there’s a lot of different theories about how we get to ideal preferences, but the idea is that we don’t want to just respond to what people are in practice doing. We want to give them the sort of thing that they would aspire to if they were rational and informed at the very least. So not things that are just a result of mistaken reasoning or poor quality information. And then this very interesting, philosophical and psychological question about what the content of those ideal preferences are. And particularly what happens when you think about people being properly rational. So, to return to David Hume, who often the is-ought distinction is attributed to, he has the conjecture that someone can be fully informed and rational and still desire pretty much anything at the end of the day, that they could want something hugely destructive for themselves or other people, of course, Kantians.

And in fact, a lot of moral philosophers believe that rationality is not just a process of joining up beliefs and value statements in a certain fashion, but it also encompasses a substantive capacity to evaluate ends. So, obviously Kantians have a theory about rationality ultimately requiring you to reflect on your ends and ask if they universalize in a positive way. But the thing is that’s highly, highly contested. So I think ultimately if we say we want to align AI with people’s ideal and rational preferences, it leads us into this question of what rationality really means. And we don’t necessarily get the kind of answers that we want to get to.

Lucas Perry: Yeah, that’s a really interesting and important thing. I’ve never actually considered that. For example, someone who might be a moral anti-realist would probably be more partial to the view that rationality is just about linking up beliefs and epistemics and decision theory with goals and goals are something that you’re just given and embedded with. And that there isn’t some correct evaluative procedure for analyzing goals beyond whatever meta preferences you’ve already inherited. Whereas a realist might say something like, the other view where rationality is about beliefs and ends, but also about perhaps more concrete standard method for evaluating which ends are good ends. Is that the way you view it?

Iason Gabriel: Yeah, I think that’s a very nice summary. The people who believe in substantive rationality tend to be people with a more realist, moral disposition. If you’re profoundly anti-realist, you basically think that you have to stop talking in the currency of reasons. So you can’t tell people they have a reason not to act in a kind of unpleasant way to each other, or even to do really heinous things. You have to say to them, something different like, “Wouldn’t it be nice if we could realize this positive state of affairs?” And I think ultimately we can get to views about value alignment that satisfy these two different groups. We can create aspirations that are well-reasoned from different points of view and also create scenarios that meet the kind of, “Wouldn’t it be nice criteria.” But I think it isn’t going to happen if we just double down on this question of whether rationality ultimately leads to a single set of ends or a plurality of ends, or no consensus whatsoever.

Lucas Perry: All right. That’s quite interesting. Not only do we have difficult and interesting philosophical ground in ethics, but also in rationality and how these are interrelated.

Iason Gabriel: Absolutely. I think they’re very closely related. So actually the problems we encounter in one domain, we also encounter in the other, and I’d say in my kind of lexicon, they all fall within this question of practical rationality and practical reason. So that’s deliberating about what we ought to do either because of explicitly moral considerations or a variety of other things that we factor up in judgements of that kind.

Lucas Perry: All right. Two more on our list here to hit our interests and values.

Iason Gabriel: So, I think there are one or two more things we could say about that. So if we think that one of the challenges with ideal preferences is that they lead us into this heavily contested space about what rationality truly requires. We might think that a conception of human interests does significantly better. So if we think about AI being designed to promote human interests or wellbeing or flourishing, I would suggest that as a matter of empirical fact, there’s significantly less disagreement about what that entails. So if we look at say the capability based approach that Amartya Sen and Martha Nussbaum have developed, it essentially says that there’s a number of key goods and aspects of human flourishing, that the vast majority of people believe conduce to a good life. And that actually has some intercultural value and affirmation. So if we designed AI that bore in mind, this goal of enhancing general human capabilities.

So, human freedom, physical security, emotional security, capacity, that looks like an AI that is both roughly speaking, getting us into the space of something that looks like it’s unlocking real value. And also isn’t bogged down in a huge amount of metaphysical contention. I suggest that aligning AI with human interests or wellbeing is a good proximate goal when it comes to value alignment. But even then I think that there’s some important things that are missing and that can only actually be captured if we returned to the idea of value itself.

So by this point, it looks like we have almost arrived at a kind of utilitarian AI via the backdoor. I mean, of course utility is a subject of mental state, isn’t necessarily the same as someone’s interest or their capacity to lead a flourishing life. But it looks like we have an AI that’s geared around optimizing some notion of human wellbeing. And the question is what might be missing there or what might go wrong. And I think there are some things that that view of value alignment still struggles to factor in. The welfare of nonhuman animals is something that’s missing from this wellbeing centered perspective on alignment.

Lucas Perry: That’s why we might just want to make it wellbeing for sentient creatures.

Iason Gabriel: Exactly, and I believe that this is a valuable enterprise, so we can expand the circle. So we say it’s the wellbeing of sentient creatures. And then we have the question about, what about future generations? Does their wellbeing count? And we might think that it does if we follow Toby Ord or in fact, most conventional thinking, we do think that the welfare of future generations has intrinsic value. So we might say, “Well, we want to promote wellbeing of sentient creatures over time with some appropriate weighting to account for time.”

And that’s actually starting to take us into a richer space of value. So we have wellbeing, but we also have a theory about how to do intertemporal comparisons. We might also think that it matters how wellbeing or welfare is distributed. That it isn’t just a maximization question, but that we also have to be interested in equity or distribution because we think is intrinsically important. So we might think it has to be done in a manner that’s fair. Additionally, we might think that things like the natural world have intrinsic value that we want to factor in. And so the point which will almost be familiar now from our earlier discussion is you actually have to get to that question of what values do we want to align the system with because values and the principles that derive with them can capture everything that is seemingly important.

Lucas Perry: Right. And so, for example, within the effective altruism community and within moral philosophy recently, the way in which moral progress has been made is in so far that debiasing human moral thought and ethics from spatial and temporal bias. So Peter Singer has the children drowning in a shallow pond argument. It just illustrates how there are people dying and children dying all over the world in situations which we could cheaply intervene to save them as if they were drowning in a shallow pond. And you only need to take a couple of steps and just pull them out, except we don’t. And we don’t because they’re far away. And I would like to say, essentially, everyone finds this compelling that where you are in space, doesn’t matter how much you’re suffering. That if you are suffering, then all else being equal, we should intervene to alleviate that suffering when it’s reasonable to do so.

So space doesn’t matter for ethics. Likewise, I hope, and I think that we’re moving in the right direction if time also doesn’t matter while also being mindful, we also have to introduce things like uncertainty. We don’t know what the future will be like, but this principle about caring about the wellbeing of sentient creatures in general, I think is essential and core I think to whatever list of principles we’ll want for bridging the is-ought distinction, because it takes away spacial bias, where you are in space, doesn’t matter, just matters that you’re sentient being, it doesn’t matter when you are as a sentient being. It also doesn’t matter what kind of sentient being you are, because the thing we care about is sentience. So then the moral circle has expanded across species. It’s expanded across time. It’s expanded across space. It includes aliens and all possible minds that we could encounter now or in the future. We have to get that one in, I think, for making a good future with AI.

Iason Gabriel: That’s a picture that I strongly identify with on a personal level, this idea of the expanding moral circle of sensibilities. And I think from a substantive point of view, you’re probably right. That that is a lot of the content that we would want to put into an aligned AI system. I think that one interesting thing to note is that a lot of these views are actually empirically fairly controversial. So if we look at the interesting study, the moral machine experiment, where I believe several million people ultimately played this experiment online, where they decided which trade offs an AV, an autonomous vehicle, should make in different situations. So whether it should crash into one person or five people, a rich person or a poor person, pretty much everyone agreed that it should kill fewer people when that was on the table. But I believe that in many parts of the world, there was also belief that the lives of affluent people mattered more than the lives of those in poverty.

And so if you were just to reason from their first sort of moral beliefs, you would bake that bias into an AI system that seems deeply problematic. And I think it actually puts pressure on this question, which is we’ve already said we don’t want to just align AI with existing moral preferences. We’ve also said that we can’t just declare a moral theory to be true and impose it on other people. So are there other options which move us in the direction of these kinds of moral beliefs that seem to be deeply justified, but also avoid the challenge of value imposition. And how far do they get if we try to move forward, not just as individuals like examining the kind of expanding moral circle, but as a community that’s trying to progressively endogenize these ideas and come up with moral principles that we can all live by.

We might not get as far if we were going at it alone, but I think that there are some solutions that are kind of in that space. And those are the ones I’m interested in exploring. I mean, common sense, morality understood as the conventional morality that most people endorse, I would say is deeply flawed in a number of regards, including with regards to global poverty and things of that nature. And that’s really unfortunate given that we probably also don’t want to force people to live by more enlightened beliefs, which they don’t endorse or can’t understand. So I think that the interesting question is how do we meet this demand for a respect for pluralism, and also avoid getting stuck in the morass of common sense morality, which has these prejudicial beliefs that will probably with the passage of time come to be regarded quite unfortunately by future generations.

And I think that making this demand for non domination or democratic support seriously means not just running far into the future or in a way that we believe represents the future, but also doing a lot of other things, trying to have a democratic discourse where we use these reasons to justify certain policies that then other people reflectively endorse and we move the project forwards in a way that meets both desiderata. And in this paper, I try to map out different solutions that both meet this criteria and of respecting people’s pluralistic beliefs while also moving us towards more genuinely morally aligned outcomes.

Lucas Perry: So now the last question that I want to ask you here then on the goal of AI alignment is do you view a needs based conception of human wellbeing as a sub-category of interest based value alignment? People have come up with different conceptions of human needs. People are generally familiar with Maslow’s hierarchy of needs. And I mean, as you go up the hierarchy, it will become more and more contentious, but everyone needs food and shelter and safety, and then you need community and meaning and spirituality and things of that nature. So how do you view or fit a needs based conception. And because some needs are obviously undeniable relative to others.

Iason Gabriel: Broadly speaking, a needs space conception of wellbeing is in that space we already touched upon. So the capabilities based approach and the needs based approach are quite similar. But I think that what you’re saying about needs potentially points to a solution to this kind of dilemma that we’ve been talking about. If we’re going to ask this question of what does it mean to create principles for AI alignment that treat people fairly, despite their different views. One approach we might take is to look for commonalities that also seem to have moral robustness or substance to them. So within the parlance of political philosophy, we’d call this an overlapping consensus approach to the problem of political and moral decision making. I think that that’s a project that’s well worth countenancing. So we might say there’s a plurality of global beliefs and cultures. What is it that these cultures coalesce around? And I think that it’s likely to be something along the lines of the argument that you just put forward; that people are vulnerable in virtue of how we’re constituted, that we have a kind of fragility and that we need protection, both against the environment and against certain forms of harm, particularly state-based violence. And that this is a kind of moral bedrock or what the philosopher Henry Shue calls, “A moral minimum” that receives intercultural endorsement. So actually the idea of human needs is very, very closely tied to the idea of human rights. So the idea is that the need is fundamental, and in virtue of your moral standing, the normative claim and your need, the empirical claim, you have a right to enjoy a certain good and to be secured in the knowledge that you’ll enjoy that thing.

So I think the idea of building a kind of human rights space, AI that’s based upon this intercultural consensus is pretty promising. In some regards human rights, as they’ve been historically thought about are not super easy to turn into a theory of AI alignment, because they are historically thought of as guarantees that States have to give their citizens in order to be legitimate. And it isn’t entirely clear what it means to have a human rights based technology, but I think that this is a really productive area to work in, and I would definitely like to try and populate that ground.

You might also think that the consensus or the emerging consensus around values that need to be built into AI systems, such as fairness and explainability potentially pretends that the emergence of this kind of intercultural consensus. Although I guess at that point, we have to be really mindful of the voices that are at the table and who’s had an opportunity to speak. So although there does appear to be some convergence around principles of beneficence and things like that, there’s also true that this isn’t a global conversation in which everyone is represented, and it would be easy to prematurely rush to the conclusion that we know what values to pursue, when we’re really just reiterating some kind of very heavily Western centric, affluent view of ethics that doesn’t have real intercultural democratic viability.

Lucas Perry: All right, now it’s also interesting and important to consider here the differences and importance of single agent and multi-agent alignment scenarios. For example, you can imagine entertaining the question of, “How is it that I would build a system that would be able to align with my values? One agent being the AI system, and one person, and how is it that I get the system to do what I want it to do?” And then the multi-agent alignment scenario considers, “How do I get one agent to align and serve to many different people’s interests and wellbeing and desires, and preferences, and needs? And then also, how do we get systems to act and behave when there are many other systems trying to serve and align to many other different people’s needs? And how is it that all of these systems may or may not collaborate with all of the other AI systems, and may or may not collaborate with all of the other human beings, when all the human beings may have conflicting preferences and needs?” How is it that we do for example, intertheoretic comparisons of value and needs? So what’s the difference, and importance between single agent and multi-agent alignment scenarios?

Iason Gabriel: I think that the difference is best understood in terms of how expansive the goal of alignment has to be. So if we’re just thinking about a single person and a single agent, it’s okay to approach the value alignment challenge through a slightly solipsistic lens. In fact, you know, if it was just one person and one agent, it’s not clear that morality really enters the picture, unless there are other people other sentient creatures who our actions can effect. So with one person, one agent, the challenge is primarily correlation with the person’s desires, aims intentions. Potentially, there’s still a question of whether the AI serves their interest rather than, you know, there’s more volitional states that come to mind. When we think about situations in which many people are affected, then it becomes kind of remiss not to think about interpersonal comparisons, and the kind of richer conceptions that we’ve been talking about.

Now, I mentioned earlier that there is a view that there will always be a human body that synthesizes preferences and provides moral instructions for AI. We can imagine democratic approaches to value alignment, where human beings assemble, maybe in national parliaments, maybe in global fora, and legislate principles that AI is then designed in accordance with. I think that’s actually a very promising approach. You know, you would want it to be informed by moral reflection and people offering different kinds of moral reasons that support one approach rather than the other, but that seems to be important for multi-person situations and is probably actually a necessary condition for powerful forms of AI. Because, when AI has a profound effect on people’s lives, these questions of legitimacy also start to emerge. So not only is it doing the right thing, but is it doing the sort of thing that people would consent to, and is it doing the sort of thing that people actually have consented to? And I think that when AI is used in certain forum, then these questions of legitimacy come to the top. There’s a bundle of different things in that space.

Lucas Perry: Yeah. I mean, it seems like a really, really hard problem. When you talk about creating some kind of national body, and I think you said international fora, do you wonder that some of these vehicles might be overly idealistic given what may happen in the world where there’s national actors competing and capitalism driving things forward relentlessly, and this problem of multi-agent alignment seems very important and difficult, and that there are forces pushing things such that it’s less likely that it happens.

Iason Gabriel: When you talk about multi-agent alignment. Are you talking about the alignment of an ecosystem that contains multiple AI agents, or are you talking about how we align an AI agent with the interests and ideas of multiple parties? So many humans, for example?

Lucas Perry: I’m interested and curious about both.

Iason Gabriel: I think there’s different considerations that arise for both sets of questions, but there are also some things that we can speak to that pertain to both of them.

Lucas Perry: Do they both count as multi-agent alignment scenarios in your understanding of the definition?

Iason Gabriel: From a technical point of view? It makes perfect sense to describe them both in that way. I guess when I’ve been thinking about it, curiously, I’ve been thinking of multi-agent alignment as an agent that has multiple parties that it wants to satisfy. But when we look at machine learning research, “Multi-agent” usually means many AI agents running around in a single environment. So I don’t see any kind of language based reason to opt for one, rather than the other. With regards to this question of idealization and real world practice, I think it’s an extremely interesting area. And the thing I would say is this is almost one of those occasions where potentially the is-ought distinction comes to our rescue. So the question is, “Does the fact that the real world is a difficult place, affected by divergent interests, mean that we should level down our ideals and conceptions about what really good and valuable AI would look like?”

And there are some people who have what we term, “Practice dependent” views of ethics who say, “Absolutely we should do. We should adjust our conception of what the ideal is.” But as you’ll probably be able to tell by now, I hold a kind of different perspective in general. I don’t think it is problematic to have big ideals and rich visions of how value can be unlocked, and that partly ties into the reasons that we spoke about for thinking that the technical and the normative interconnected. So if we preemptively level down, we’ll probably design systems that are less good than they could be. And when we think about a design process spanning decades, we really want that kind of ultimate goal, the shining star of alignment to be something that’s quite bright and can steer our efforts towards it. If anything, I would be slightly worried that because these human parliaments and international institutions are so driven by real world politics, that they might not give us the kind of most fully actualized set of ideal aspirations to aim for.

And that’s why philosophers like, of course John Rawls actually propose that we need to think about these questions from a hypothetical point of view. So we need to ask, “What would we choose if we weren’t living in a world where we knew how to leverage our own interests?” And that’s how we identified the real ideal that is acceptable to people, regardless of where they’re located. And also can then be used to steer non-ideal theory or the kind of actual practice and the right direction.

Lucas Perry: So if we have an organization that is trying its best to create aligned and beneficial AGI systems, reasoning about what principles we should embed in it from behind Rawls’ Veil of Ignorance, you’re saying, would have hopefully the same practical implications as if we had a functioning international body for coming up with those principles in the first place.

Iason Gabriel: Possibly. I mean, I’d like to think that ideal deliberation would lead them in the direction of impartial principles for AI. It’s not clear whether that is the case. I mean, it seems that at its very best, international politics has led us in the direction of a kind of human rights doctrine that both accords individuals protection, regardless of where they live and defends the strong claim that they have a right to subsistence and other forms of flourishing. If we use the Veil of Ignorance experiment, I think for AI might even give us more than that, even if a real world parliament never got there. For those of you who are not familiar with this, the philosopher John Rawls says that when it comes to choosing principles for a just society, what we need to do is create a situation in which people don’t know where they are in that society, or what their particular interest is.

So they have to imagine that they’re from behind the Veil of Ignorance. They select principles for that society that they think will be fair regardless of where they end up, and then having done that process and identified principles of justice for the society, he actually holds out the aspiration that people will reflectively endorse them even once the veil has been removed. So they’ll say, “Yes, in that situation, I was reasoning in a fair way that was nonprejudicial. And these are principles that I identified there that continue to have value in the real world.” And we can say what would happen if people are asked to choose principles for artificial intelligence from behind a veil of ignorance where they didn’t know whether they were going to be rich or poor, Christian, utilitarian, Kantian, or something else.

And I think there, some of the kind of common sense material would be surfaced; so people would obviously want to build safe AI systems. I imagine that this idea of preserving human autonomy and control would also register, but for some forms of AI, I think distributive considerations would come into play. So they might start to think about how the benefits and burdens of these technologies are distributed and how those questions play out on a global basis. They might say that ultimately, a value aligned AI is one that has fair distributive impacts on a global basis, and if you follow rules, that it works to the advantage of the least well off people.

That’s a very substantive conception of value alignment, which may or may not be the final outcome of ideal international deliberation. Maybe the international community will get to global justice eventually, or maybe it’s just too thoroughly affected by nationalists interests and other kinds of what, to my mind, the kind of distortionary effects that mean that it doesn’t quite get there. But I think that this is definitely the space that we want the debate to be taking place in. And that actually, there has been real progress in identifying collectively endorsed principles for AI that gives me hope for the future. Not only that we’ll get good ideals, but that people might agree to them, and that they might get democratic endorsement, and that they might be actionable and the sort of thing they can guide real world AI design.

Lucas Perry: Can you add a little bit more clarity on the philosophical questions and issues, which single and multi-agent alignments scenarios supervene on? How do you do inter theoretic comparisons of value if people disagree on normative or meta-ethical beliefs or people disagree on foundational axiomatic principles for bridging the is-ought gap? How is it that systems deal with that kind of disagreement?

Iason Gabriel: I’m hopeful that the three pictures that I outlined so far of the overlapping consensus between different moral beliefs, of democratic debate over a constitution for AI, and of selection of principles from behind the Veil of Ignorance, are all approaches that carry some traction in that regard. So they try to take seriously the fact of real world pluralism, but they also, through different processes, tend to tap towards principles that are compatible with a variety of different perspectives. Although I would say, I do feel like there’s a question about this multi agent thing that may still not be completely clear in my mind, and it may come back to those earlier questions about definition. So in a one person, one agent scenario, you don’t have this question of what to do with pluralism, and you can probably go for a more simple one shot solution, which is align it with the person’s interest, beliefs, moral beliefs, intentions, or something like that. But if you’re interested in this question of real world politics for real world AI systems where a plurality of people are affected, we definitely need these other kinds of principles that have a much richer set of properties and endorsements.

Lucas Perry: All right, there’s Rawls’ Veil of Ignorance. There’s, principle of non domination, and then there’s the democratic process?

Iason Gabriel: Non-domination is a criterion that any scheme for multi-agent value alignment needs to meet. And then we can ask the question, “What sort of scheme would meet this requirement of non-domination?” And there we have the overlapping census with human rights. We have a scheme of democratic debate leading to principles for AI constitution, and we have the Veil of Ignorance as all ideas that we basically find within political theory that could help us meet that condition.

Lucas Perry: All right, so we have spoken at some length then about principles and identifying principles, this goes back to our conversation about the is-ought distinction, and these are principles that we need to identify for setting up an ethical alignment procedure. You mentioned this earlier, when we were talking about this, this distinction between the one true moral theory approach to AI alignment, in contrast to coming up with a procedure for AI alignment that would be broadly endorsed by many people, and would respect the principle of non domination, and would take into account pluralism. Can you unpack this distinction more, and the importance of it?

Iason Gabriel: Yeah, absolutely. So I think that the kind of true moral theory approach, although it is a kind of stylized idea of what an approach to value of alignment might look like, is the sort of thing that could be undertaken just by a single person who is designing the technology or a small group of people, perhaps moral philosophers who think that they have really great expertise in this area. And then they identify the chosen principle and run with it.

The big claim is that that isn’t really a satisfactory way to think about design and values in a pluralistic world where many people will be affected. And of course, many people who’ve gone off on that kind of enterprise have made serious mistakes that were very costly for humanity and for people who are affected by their actions. So the political approach to value alignment paints a fundamentally different perspective and says it isn’t really about one person, or one group running ahead and thinking that they’ve done all the hard work it’s about working out what we can all agree upon, that looks like a reasonable set of moral principles or coordinates to build powerful technologies around. And then, once we have this process in place that outfits the right kind of agreement, then the task is given back to technologists and these are the kind of parameters that are fair process of deliberation has outputted. And this is what we have the authority to encode in machines, whether it’s say human rights or a conception of justice, or some other widely agreed upon values.

Lucas Perry: There are principles that you’re really interested in satisfying, like respecting pluralism, and respecting a principle of non-domination, and the One True Moral Theory approach, risks, violating those other principles. Are you not taking a stance on whether there is a One True Moral Theory, you’re just willing to set that question aside and say, “Because it’s so essential to a thriving civilization that we don’t do moral imposition on one another, that coming up with a broadly endorsed theory is just absolutely the way to go, whether or not there is such a thing as a One True Moral Theory? Does that capture your view?

Iason Gabriel: Yeah. So to some extent, I’m trying to make an argument that will look like something we should affirm, regardless of the metaethical stance that we wish to take. Of course, there are some views about morality that actually say that non-domination is a really important principle, or that human rights are fundamental. So someone might look at these proposals, and from the comprehensive moral perspective, they would say, “This is actually the morally best way to do value alignment, and it involves dialogue, discussion, mutual understanding, and agreement.” However, you don’t need to believe that in order to think that this is a good way to go. If you look at the writing of someone like Joshua Greene, he says that this problem we encounter called the, “Tragedy of common sense morality.” A lot of people have fairly decent moral beliefs, but when they differ, it ends up in violence, and they end up fighting. And you have a hugely negative, moral externality that arises just because people weren’t able to enter this other mode of theorizing, where they said, “Look, we’re part of a collective project, let’s agree to some higher level terms that we can all live by.” So from that point of view, it looks prudent to think about value alignment as a pluralistic enterprise.

That’s an approach that many people have taken with regards to the justification of the institution of the state, and the things that we believe it should protect, and affirm, and uphold. And then as I alluded to earlier, I think that actually, even for some of these anti-realists, this idea of inclusive deliberation, and even the idea of human rights look like quite good candidates for the kind of, “Wouldn’t it be nice?” criterion. So to return to Richard Routley, who is kind of the arch moral skeptic, he does ultimately really want us to live in a world with human rights, he just doesn’t think he has a really good meta-ethical foundation to rest this on. But in practice, he would take that vision forward, I believe in try to persuade other people that it was the way to go by telling them good stories and saying, “Well, look, this is the world with human rights and open-ended deliberation, and this is the world where one person decided what to do. Wouldn’t it be nice in that better world?” So I’m hopeful that this kind of political ballpark has this kind of rich applicability and appeal, regardless of whether people are starting out in one place or the other.

Lucas Perry: That makes sense. So then another aspect of this is, in the absence of moral agreement or when there is moral disagreement, is there a fair way to decide what principles AI should align with? For example, I can imagine religious fundamentalists, at core being antithetical to the project of aligning AI systems, which eventually lead to something smaller than us, they could view it as something like playing God and just be like, “Well, this is just not a project that we should even do.”

Iason Gabriel: So that’s an interesting question, and you may actually be putting pressure on my preceding argument. I think that it is certainly the case that you can’t get everyone to agree on a set of global principles for AI, because some people hold very, very extreme beliefs that are exclusionary, and don’t tend to the possibility of compromise. Typically people who have a fundamentalist orientation of one kind or another. And so, even if we get the pluralistic project off the ground, it may be the case that we have to, in my language, impose our values on those people, and that in a sense, they are dominated. And that leads to the difficult question: why is it permissible to impose beliefs upon those people, but not the people who don’t hold fundamentalist views? It’s a fundamentally difficult question, because what it tends to point to is the idea that beneath this talk about pluralism, there is actually a value claim, which is that you are entitled to non-domination, so long as you’re prepared not to dominate other people, and to accept that there is a moral equality, that means that we need to cooperate and co-habit in a world together.

And that does look like a kind of deep, deep, moral claim that you might need to substantively assert. I’m not entirely sure; I think that’s one that we can save for further investigation, but it’s certainly something that people have said in the context of these debates, that at the deepest level, you can’t escape making some kind of moral claim, because of these cases.

Lucas Perry: Yeah. This is reminding me of the paradox of tolerance by Karl Popper, who talks about free speech ends when you yell, “The theater’s on fire.” And in some sense are then imposing harm on other people. And that we’re tolerant of people within society, except for those who are intolerant of others. And to some extent, that’s a paradox. So similarly we may respect and endorse a principle of non-domination, or non-subjugation, but that ends when there are people who are dominating or subjugating. And the core of that is maybe getting back again to some kind of principle of non-harm related to the wellbeing of sentient creatures.

Iason Gabriel: Yeah. I think that the obstacles that we’re discussing now are very precisely related to that paradox, of course, the boundaries we want to draw on permissible disagreement in some sense is quite minimal or conversely, we might think that the wide affirmation of some aspect of the value of human rights is quite a strong basis for moving forwards, because it says that all human life has value, and that everyone is entitled to basic goods, including goods pertaining to autonomy. So people who reject that really are pushing back against something that is widely and deeply, reflectively endorsed by a large number of people. I also think that with regards to toleration, the anti-realist position becomes quite hard to figure out or quite strange. So you have these people who are not prepared to live in a world where they respect others, and they have this will to dominate, or a fundamentalist perspective.

The anti-realist says, “Well, you know, potentially this, this nicer world, we can move towards.” The anti-realist doesn’t deal in the currency of moral reasons. They don’t really have to worry about it too much; they can just say, “And going to go in that direction with everyone else who agrees with us,” and hold to the idea that it looks like a good way to live. So in a way, the problem with domination is much more serious for people who are moral realists. For the anti-realists, it’s not actually a perspective I inhabit it in my day to day life, so it’s hard for me to say what they would make of it.

Lucas Perry: Well, I guess, just to briefly defend the anti-realist, I imagine that they would say that they still have reasons for morality, they just don’t think that there is an objective epistemological methodology for discovering what is true. “There aren’t facts about morality, but I’m going to go make the same noises that you make about morality. Like I’m going to give reasons and justification, and these are as good as making up empty screeching noises and blah, blahing about things that don’t exist,” but it’s still motivating to other people, right? They still will have reasons and justification; they just don’t think it pertains to truth, and they will use that navigate the world and then justify domination or not.

Iason Gabriel: That seems possible, but I guess for the anti-realist, if they think we’re just fundamentally expressing pro-attitudes, so when I say, “It isn’t justified to dominate others.” I’m just saying, “I don’t like it when this thing happens,” then we’re just dealing in the currency of likes, and I just don’t think you have to be so worried about the problem of domination as you are, if you think that this means something more than someone just expressing an attitude about what they like or don’t. If there aren’t real moral reasons or considerations at stake, if it’s just people saying, “I like this. I don’t like this.” Then you can get on with the enterprise that you believe achieves these positive ends. Of course, the unpleasant thing is you kind of are potentially giving permission to other people to do the same, or that’s a consequence of the view you hold. And I think that’s why a lot of people want to rescue the idea of moral justification as a really meaningful practice, because they’re not prepared to say, “Well, everyone gets on with the thing that they happen to like, and the rest of it is just window dressing.”

Lucas Perry: All right. Well, I’m not sure how much we need to worry about this now. I think it seems like anti-realists and realists basically act the same in the real world. Maybe, I don’t know.

Iason Gabriel: Yeah. In reality, anti-realists tend to act in ways that suggest that on some level they believe that morality has more to it than just being a category error.

Lucas Perry: So let’s talk a little bit here more about the procedure by which we choose evaluative models for deciding which proposed aspects of human preferences or values are good or bad for an alignment procedure. We can have a method of evaluating or deciding which aspects of human values or preferences or things that we might want to bake into an alignment procedure are good or bad, but you mentioned something like having a global fora or having different kinds of governance institutions or vehicles by which we might have conversation to decide how to come up with an alignment procedure that would be endorsed. What is the procedure to decide what kinds of evaluative models we will use to decide what counts as a good alignment procedure or not? Right now, this question is being answered by a very biased and privileged select few in the West, at AI organizations and people adjacent to them.

Iason Gabriel: I think this question is absolutely fundamental. I believe that any claim that we have meaningful global consensus on AI principles is premature, and that it probably does reflect biases of the kind you mentioned. I mean, broadly speaking, I think that there’s two extremely important reasons to try and widen this conversation. The first is that in order to get a kind of clear, well, grounded and well sighted vision on what AI should align with, we definitely need intercultural perspectives. On the assumption that to qoute John Stuart Mill, “no-one has complete access to the truth and people have access to different parts of it.” The bigger the conversation becomes, the more likely it is that we move towards maximal value alignment of the kind that humanity deserves. But potentially more importantly than that, and regardless of the kind of epistemic consequences of widening the debate, I think that people have a right to voice their perspective on topics and technologies that will affect them. If we think of the purpose of a global conversation, partly as this idea of formulating principles, but also bestowing on them a certain authority in light of which we’re permitted to build powerful technologies. Then you just can’t say that they have the right kind of authority and grounding without proper extensive consultation. And so, I would suggest that that’s a very important next step for people who are working in this space. I’m also hopeful that actually these different approaches that we’ve discussed can potentially be mutually supporting. So, I think that there is a good chance that human rights could serve as a foundation or a seed for a good, strong intercultural conversation around AI alignment.

And I’m not sure to what extent this really is the case, but it might be that even some of these ideas about reasoning impartially have currency in a global conversation. And you might find that they are actually quite challenging for affluent countries or for self interested parties, because it would reveal certain hidden biases in the propositions that they have now made or put forward.

Lucas Perry: Okay. So, related to things that we might want to do to come up with the correct procedure for being able to evaluate what kinds of alignment procedures are good or bad, what do you view as sufficient for adequate alignment of systems? We’ve talked a little bit about minimalism versus maximalism, where minimalism is aligning to just some conception of human values and maximalism is hitting on some very idealized and strong set or form of human values. And this procedure is related, at least in the, I guess, existential risk space coming from people like Toby Ord and William MacAskill. They talk about something like a long reflection. If I’m asking you about what might be adequate alignment for systems, one criteria for that might be meeting basic human needs, meeting human rights and reducing existential risk further and further such that it’s very, very close to zero and we enter a period of existential stability.

And then following this existential stability is proposed something like a long reflection where we might more deeply consider ethics and values and norms before we set about changing and optimizing all of the atoms around us in the galaxy. Do you have a perspective here on this sort of most high level timeline of first as we’re aligning AI systems, what does it for it to be adequate? And then, what needs to potentially be saved for something like a long reflection? And then, how something like a broadly endorsed procedure versus a one true moral theory approach would fit into something like a long reflection?

Iason Gabriel: Yes. A number of thoughts on this topic. The first pertains to the idea of existential security and, I guess, why its defined as the kind of dominant goal in the short term perspective. There may be good reasons for this, but I think what I would suggest is that obviously involves trade offs. The world we live in is a very unideal place, one in which we have a vast quantity of unnecessary suffering. And to my mind, it’s probably not even acceptable to say that basically the goal of building AI is, or that the foremost challenge of humanity is to focus on this kind of existential security and extreme longevity while leaving so many people to lead lives that are less than they could be.

Lucas Perry: Why do you think that?

Iason Gabriel: Well, because human life matters. If we were to look at where the real gains in the world are today, I believe it’s helping these people who die unnecessarily from neglected diseases, lack subsistence incomes, and things of that nature. And I believe that has to form part of the picture of our ideal trajectory for technological development.

Lucas Perry: Yeah, that makes sense to me. I’m confused what you’re actually saying about the existential security view as being central. If you compare the suffering of people that exist today, obviously to the astronomical amount of life that could be in the future, is that kind of reasoning about the potential that doesn’t do the work for you for seeing mitigating existential risk as the central concern.

Iason Gabriel: I’m not entirely sure, but what I would say is that on one reading of the argument that’s being presented, the goal should be to build extremely safe systems and not try to intervene in areas about which this more substantive contestation, until there’s been a long delay and a period of reflection, which might mean neglecting some very morally important and tractable challenges that the world is facing at the present moment. And I think that that would be problematic. I’m not sure why we can’t work towards something that’s more ambitious, for example, a human rights respecting AI technology.

Lucas Perry: Why would that entail that?

Iason Gabriel: Well, so, I mean, this is the kind of question about the proposition that’s been put in front of us. Essentially, if that isn’t the proposition, then the long reflection isn’t leaving huge amounts to be deliberated about, right? Because we’re saying, in the short term, we’re going to tether towards global security, but we’re also going to try and do a lot of other things around which there’s moral uncertainty and disagreement, for example, promote fairer outcomes, mobilize in the direction of respecting human rights. And I think that once we’ve moved towards that conception of value alignment, it isn’t really clear what the substance of the long reflection is. So, do you have an idea of what questions would remain to be answered?

Lucas Perry: Yeah, so I guess I feel confused because reaching existential security as part of this initial alignment procedure, doesn’t seem to be in conflict with alleviating the suffering of the global poor, because I don’t think moral uncertainty extends to meeting basic human needs or satisfying basic human rights or things that are obviously conducive to the well-being of sentient creatures. I don’t think poverty gets pushed to the long reflection. I don’t think unnecessary suffering gets pushed to the long reflection. Then the question you’re asking is what is it that does get pushed to the long reflection?

Iason Gabriel: Yes.

Lucas Perry: Then what gets pushed to the long reflection is, is the one true moral theory approach to alignment actually correct? Is there a one true moral theory or is there not a one true moral theory? Are anti-realists correct or are realists correct? Or are they both wrong in some sense or is something else correct? And then, given that, the potential answer or inability to come up with an answer to that would change how something like the cosmic endowment gets optimized. Because we’re talking about billions upon billions upon billions upon billions of years, if we don’t go extinct, and the universe is going to evaporate eventually. But until then, there is an astronomical amount of things that could get done.

And so, the long reflection is about deciding what to actually do with that. And however esoteric it is, the proposals range from you just have some pluralistic optimization process. There is no right way you should live. Things other than joy and suffering matter like, I don’t know, building monuments that calculate mathematics ever more precisely. And if you want to carve out a section of the cosmic endowment for optimizing things that are other than conscious states, you’re free to do that versus coming down on something more like a one true moral theory approach and being like, “The only kinds of things that seem to matter in this world are the states of conscious creatures. Therefore, the future should just be an endeavor of optimizing for creating minds that are ever more enjoying profound states of spiritual enlightenment and spiritual bliss and knowledge.”

The long reflection might even be about whether or not knowledge matters for a mind. “Does it really matter that I am in tune with truth and reality? Should we build nothing but experience machines that cultivate whatever the most enlightened and blissful states of experience are or is that wrong?” The long reflection to me seems to be about these sorts of questions and if the one true moral theory approach is correct or not.

Iason Gabriel: Yeah, that makes sense. And my apologies if I didn’t understand what was already taken care of by the proposal. I think to some extent, in that case, we’re talking about different action spaces. When I look at these questions of AI alignment, I see very significant value questions already arising in terms of how benefits and burdens are distributed. What fairness means? Whether AI needs to be explainable and accountable and things of that nature alongside a set of very pressing global problems that it would be really, really important to address? I think my time horizon is definitely different from this long reflection one. Kind of find it difficult to imagine a world in which these huge, but to some extent prosaic questions have been addressed and in which we then turn our attention to these other things. I guess there is a couple of things that can be said about it.

I’m not sure if this is meant to be taken literally, but I think the idea of pressing pause on technological development while we work out a further set of fundamentally important questions is probably not feasible. It would be best to work with a long term view that doesn’t rest upon the possibility of that option. And then I think that the other fundamental question is what is actually happening in this long reflection? It can be described in a variety of different ways.

Sometimes it sounds like it’s a big philosophical conference that runs for a very, very long time. And at the end of it, hopefully people kind of settle these questions and they come out to the world and they’re like, “Wow, this is a really important discovery.” I mean, if you take seriously the things we’ve been talking about today, you still have the question of what do you do with the people who then say, “Actually, I think you’re wrong about that.” And I think in a sense it recursively pushes us back into the kind of processes that I’ve been talking about. When I hear people talk about the long reflection there does also sometimes seem to be this idea that it’s a period in which there is very productive global conversation about the kind of norms and directions that we want humanity to take. And that seems valuable, but it doesn’t seem unique to the long reflection. That would be incredibly valuable right now so it doesn’t look radically discontinuous to me on that view.

Lucas Perry: All right. Because we’re talking about the long term future here and I bring it up because it’s interesting in considering what questions can we just kind of put aside? These are interesting, but in the real world, they don’t matter a ton or they don’t influence our decisions, but over the very, very long term future, they may matter much more. When I think about a principle like non-domination, it seems like we care about this conception of non-imposition and non-dominance and non-subjugation for reasons of, first of all, well-being. And the reason why we care about this well-being question is because human beings are extremely fallible. And it seems to me that the principle of non-domination is rooted in the lack of epistemic capacity for fallible agents like human beings to promote the well-being of sentient creatures all around them.

But in terms of what is physically literally possible in the universe, it’s possible for someone to know so much more about the well-being of conscious creatures than you, and how much happier and how much more well-being you would be in if you only idealized in a certain way. That as we get deeper and deeper into the future, I have more and more skepticism about this principle of non-domination and non-subjugation.

It seems very useful, important, and exactly like the thing that we need right now, but as we long reflect further and further and, say, really smart, really idealized beings develop more and more epistemic clarity on ethics and what is good and the nature of consciousness and how minds work and function in this universe that I would probably submit myself to a Dyson sphere brain that was just like, “Well, Lucas, this is what you have to do.” And I guess that’s not subjugation, but I feel less and less moral qualms with the big Dyson sphere brain showing up to some early civilization like we are, and then just telling them how they should do things, like a parent does with a child. I’m not sure if you have any reactions to this or how much it even really matters for anything we can do today. But I think it’s potentially an important reflection on the motivations behind the principle of non-domination and non-subjugation and why it is that we really care about it.

Iason Gabriel: I think that’s true. I think that if you consent to something, then almost… I don’t want to say by definition, that’s definitely too strong, but it’s very likely that you’re not being dominated so long as you have sufficient information and you’re not being coerced. I think the real question is what if this thing showed up and you said, “I don’t consent to this,” and the thing said, “I don’t care it’s in your best interests.”

Lucas Perry: Yeah, I’m defending that.

Iason Gabriel: That could be true in some kind of utilitarian, consequentialist, moral philosophy of that kind. And I guess my question is, “Do you find that unproblematic? Or, “Do you have this intuition that there is a further set of reasons you could draw upon, which explain why the entity with greater authority doesn’t actually have the right to impose these things on you?” And I think that it may or may not be true.

It probably is true that from the perspective of welfare, non-denomination is good. But I also think that a lot of people who are concerned about pluralism and non-domination think that it’s value pertains to something which is quite different, which is human autonomy. And that that has value because of the kind of creatures we are, with freedom of thought, a consciousness, a capacity to make our own decisions. I, personally, am of the view that even if we get some amazing, amazing paternalist, there is still a further question of political legitimacy that needs to be answered, and that it’s not permissible for this thing to impose without meeting these standards that we’ve talked about today.

Lucas Perry: Sure. So in the very least, I think I’m attempting to point towards the long reflection consisting of arguments like this. We weren’t participating in coercion before, because we didn’t really know what we’re talking about but now we know what we’re talking about. And so, given our epistemic clarity coercion makes more sense.

Iason Gabriel: It does seem problematic to me. And I think the interesting question is what does time add to robust epistemic certainty? It’s quite likely that if you spend a long time thinking about something, at the end of it, you’ll be like, “Okay, now I have more confidence in a proposition that was on the table when I started?” But does that mean that it is actually substantively justified? And what are you going to say if you think you’re substantively justified, but you can’t actually justify it to other people who are reasonable, rational and informed like you.

It seems to me that even after a thousand years, you’d still be taking a leap of faith of the kind that we’ve seen people take in the past with really, really devastating consequences. I don’t think it’s the case that ultimately there will be a moral theory that’s settled and the confidence in the truth value of it is so high that the people who adhere to it have somehow gained the right to kind of run with it on behalf of humanity. Instead, I think that we have to proceed a small step at a time, possibly in perpetuity and make sure that each one of these small decisions is subject to continuous negotiation, reflection and democratic control.

Lucas Perry: The long reflection though, to me, seems to be about questions like that because you’re taking a strong epistemological view on meta-ethics and that there wouldn’t be that kind of clarity that would emerge over time from minds far greater than our own. From my perspective, I just find the problem of suffering to be very, very, very compelling.

Let’s imagine we have the sphere of utilitarian expansion into the cosmos, and then there is the sphere of pluralistic, non-domination, democratic, virtue ethic, deontological based sphere of expansion. You can, say, run across planets at different stages of evolution. And here you have a suffering hell planet, it’s just wild animals born of Darwinian evolution. And they’re just eating and murdering each other all the time and dying of disease and starvation and other things. And then maybe you have another planet which is an early civilization and there is just subjugation and misery and all of these things, and these spheres of expansion would do completely different things to these planets. And we’re entering super esoteric sci-fi space here. But again, it’s, I think, instructive of the importance of something like a long reflection. It changes what is permissible in what will be done. And so, I find it interesting and valuable, but I also agree with you about the one claim that you had earlier about it being unclear that we could actually pause the breaks and have a thousand year philosophy convention.

Iason Gabriel: Yes, I mean, the one further thing I’d say, Lucas, is bearing in mind some of the earlier provisos we attached to the period before the long reflection, we were kind of gambling on the idea that there would be political legitimacy and consensus around things like the alleviation of needless suffering. So, it is not necessarily that it is the case that everything would be up for grabs just because people have to agree upon it. In the world today, we can already see some nascent signs of moral agreement on things that are really morally important and would be very significant if they were fully realized as ideals.

Lucas Perry: Maybe there is just not that big of a gap between the views that are left to be argued about during the long reflection. But then there is also this interesting question, wrapping up on this part of the conversation, about what did we take previously that was sacred, that is no longer that? An example would be if a moral realist, utilitarian conception ended up just being the truth or something, then rights never actually mattered. Autonomy never mattered, but they functioned as very important epistemic tool sets. And then we’re just like, “Okay, we’re basically doing away with everything that we said was sacred.” We still endorsed having done that. But now it’s seen in a totally different light. There could be something like a profound shift like that, which is why something like long reflection might be important.

Iason Gabriel: Yeah. I think it really matters how the hypothesized shift comes about. So, if there is this kind of global conversation with new information coming to light, taking place through a process that’s non-coercive and the final result seems to be a stable consensus of overlapping beliefs that we have more moral consensus than we did around something like human rights, then that looks like a kind of plausible direction to move in and that might even be moral progress itself. Conversely, if it’s people who have been in the conference a long time and they come out and they’re like, “We’ve reflected a thousand years and now we have something that we think is true.” Unfortunately, I think they ended up kind of back at square one where they’ll meet people who say, “We have reasonable disagreement with you, and we’re not necessarily persuaded by your arguments.”

And then you have the question of whether they’re more permitted to engage in value imposition than people were in the past. And I think probably not. I think if they believe those arguments are so good, they have to put them into a political process of the kind that we have discussed and hopefully their merits will be seen or, if not, there may be some avenues that we can’t go down but at least we’ve done things in the right way.

Lucas Perry: Luckily, it may turn out to be the case that you basically never have to do coercion because with good enough reasons and evidence and argument, basically any mind that exists can be convinced of something. Then it gets into this very interesting question of if we’re respecting a principle of non-domination and non-subjugation, as something like Neuralink and merging with AI systems, and we gain more and more information about how to manipulate and change people, what changes can we make to people from the outside would count as coercion or not? Because currently, we’re constantly getting pushed around in terms of our development by technology and people and the environment and we basically have no control over that. And do I always endorse the changes that I undergo? Probably not. Does that count as coercion? Maybe. And we’ll increasingly gain power to change people in this way. So this question of coercion will probably become more and more interesting and difficult to parse over time.

Iason Gabriel: Yeah. I think that’s quite possible. And it’s kind of an observation that can be made about many of the areas that we’re thinking about now. For example, the same could be said of autonomy or to some extent that’s the flip side of the same question. What does it really mean to be free? Free from what and under what conditions? If we just loop back a moment, the one thing I’d say is that the hypothesis that, you can create moral arguments that are so well-reasoned that they persuade anyone is, I think, the perfect statement of a certain enlightenment perspective on philosophy that sees rationality as the tiebreaker and the arbitrar of progress. In a sense that the whole project that I’ve outlined today rests upon a recognition or an acknowledgement that that is probably unlikely to be true when people reason freely about what the good consist in. They do come to different conclusions.

And I guess, the kind of thing people would point to there as evidence is just the nature of moral deliberation in the real world. You could say that if there were these winning arguments that just won by force of reason, we’d be able to identify them. But, in reality, when we look at how moral progress has occurred, requires a lot more than just reason giving. To some extent, I think the master argument approach itself rests upon mistaken assumptions and that’s why I wanted to go in this other direction. By a twist of fate, if I was mistaken and if the master argument was possible, it would also satisfy a lot of conditions of political legitimacy. And right now, we have good evidence that it isn’t possible so we should proceed in one way. If it is possible, then those people can appeal to the political processes.

Lucas Perry: They can be convinced.

Iason Gabriel: They can be convinced. And so, there is reason for hope there for people who hold a different perspective to my own.

Lucas Perry: All right. I think that’s an excellent point to wrap up on then. Do you have anything here? I’m just giving you an open space now if you feel unresolved about anything or have any last moment thoughts that you’d really like to say and share? I found this conversation really informative and helpful, and I appreciate and really value the work that you’re doing on this. I think it’s sorely needed.

Iason Gabriel: Yeah. Thank you so much, Lucas. It’s been a really, really fascinating conversation and it’s definitely pushed me to think about some questions that I hadn’t considered before. I think the one thing I’d say is that this is really… A lot of it is exploratory work. These are questions that we’re all exploring together. So, if people are interested in value alignment, obviously listeners to this podcast will be, but specifically normative value alignment and these questions about pluralism, democracy, and AI, then please feel free to reach out to me, contribute to the debate. And I also look forward to continuing the conversation with everyone who wants to look at these things and develop the conversation further.

Lucas Perry: If people want to follow you or get in contact with you or look at more of your work, where are the best places to do that?

Iason Gabriel: I think if you look on Google Scholar, there is links to most of the articles that I have written, including the one that we were discussing today. People can also send me an email, which is just my first name Iason@deepmind.com. So, yeah.

Lucas Perry: All right.

End of recorded material

Peter Railton on Moral Learning and Metaethics in AI Systems

 Topics discussed in this episode include:

  • Moral epistemology
  • The potential relevance of metaethics to AI alignment
  • The importance of moral learning in AI systems
  • Peter Railton’s, Derek Parfit’s, and Peter Singer’s metaethical views

 

Timestamps: 

0:00 Intro
3:05 Does metaethics matter for AI alignment?
22:49 Long-reflection considerations
26:05 Moral learning in humans
35:07 The need for moral learning in artificial intelligence
53:57 Peter Railton’s views on metaethics and his discussions with Derek Parfit
1:38:50 The need for engagement between philosophers and the AI alignment community
1:40:37 Where to find Peter’s work

 

Citations:

You can find Peter’s work here

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the AI Alignment Podcast. I’m Lucas Perry. Today, we have a conversation with Peter Railton that explores metaethics, moral epistemology, moral learning, and how these areas of philosophy may or may not inform AI alignment. The core problem that this episode explores is that as systems become more and more autonomous and increasingly participate in social roles that require social functioning, it will become increasingly necessary for AI systems to be familiar with and sensitive to morally salient features of the world. This requires that systems have the capacity for moral learning and developing an understanding of human normative processes and beliefs. On top of that, structuring any kind of procedure for moral learning in AI systems will bring in metaethical beliefs and assumptions that would be wise to understand and be explicit about. For a little more context, some key motivating questions for this episode to consider are: when and what is the degree to which AI systems will require the capacity for moral learning? How might metaethics inform or not inform AI alignment? How do you structure a system such that it can engage in moral learning in a way that would be broadly endorsed and would satisfy other ethical or meta-ethical principles we broadly care about?

For some more background, I did a podcast with Peter Singer on his transition from being a moral anti-realist to a moral realist. That episode is titled “On Becoming a Moral Realist with Peter Singer.” In that episode we explore his metaethical views, and Peter Singer mentions conversations and debate between Derek Parfit and Peter Railton on issues in metaethics. So, the second half of this podcast is dedicated to understanding and unpacking Peter Railton’s metaethics and how it compares with Peter Singer’s and Derek Parfit’s views. This podcast is pretty philosophy heavy, so if you’re into that and the ethics of AI then you’ll appreciate this episode. You can subscribe to and follow this podcast on your preferred podcasting platform, by searching for “Future of Life.”

Peter Railton is a Professor of Philosophy at the University of Michigan, Ann Arbor. He has a PhD from Princeton and primarily researches ethics and the philosophy of science. He focuses especially on questions about the nature of objectivity, value, norms, and explanation. Recently, he has also begun working in aesthetics, moral psychology, and the theory of action. And with that, let’s get into our conversation with Peter Railton.

Just to start off here, sometimes I’ve heard that metaethics doesn’t matter, or one might wonder when does metaethics ever matter in real life anyway? I’m curious, do you have any thoughts on whether metaethics matters at all for AI alignment?

Peter Railton: Well, in the most general sense, metaethics concerns, questions about the nature of morality its foundation, the possibility of moral knowledge, how we might acquire it, the meanings of moral claims, how they stand in relation to our other forms of knowledge. And so it does seem to me as if metaethics is important in thinking about the problems of ethics in AI, apparently because I think a lot of people have in the back of their minds, skeptical concerns about morality. And therefore, they doubt whether there could be objective value. They think perhaps value is entirely subjective. And if that’s your approach, then you might say the challenge of creating ethical AI is not a very well defined problem.

What would be the subjective attitude of a properly aligned AI system? You might consult the population and find out what the average point of view is. But we know the average point of view right now is very different from what it was 200 or 300 years ago. We think in some ways it’s improved since then. And we think in some ways where we are now could be improved. So we can’t reduce the question of ethics in AI to something like opinion sampling, and that’s because morality has objective dimensions and we use these to criticize our preferences and our opinions. And so any decent ethics for AI would build into the concept, the possibility of correction and criticism. And for that, you need some thought of what would constitute correction or criticism? How would we justify moral claims? And that takes us to the heart of metaethics.

Lucas Perry: Right. And there’s a lot of moral anti realists or people who think that morality is subjective in, I guess, hard sciences and computer science in general. So this also applies to the alignment community. If one feels that moral claims or moral attitudes are subjective, then this choice that you mentioned to take the average of general popular opinion is itself a moral choice, which is the expression of one owns subjective moral attitude from that point of view. And within a subjective framework, there’s no way to resolve that, except take the expression of all of the power dynamics of everyone’s subjective moral attitudes and see what comes out of that, right?

Peter Railton: Well, yeah, that would be one of the problems. The project of creating ethical AI or AI alignment, as it’s sometimes called, can’t be the problem of giving our value system to machines because there is no unique value system that we possess. It could be the project of trying to make it possible for the machine to learn the most justified value system. And part of the problem, I think, is that people have exaggerated notions of what it would take to justify moral claims. They assume, for example, that there’s a huge gulf between facts and values, that there are no reasonable ways of bridging that gulf, and that in general, what it would take to have objective morality would look something like the universe with what God would do, only without God.

One of the problems with that thought is that that’s a model of morality as a set of commands given by some kind of a divine enforcer. And if you think that absent such a divine enforcer, morality could only be subjective, then I think you’re missing the idea of what morality really does. The existence of a divine enforcer wouldn’t bring morality into existence. A divine enforcer could be either good or malevolent. And so understanding what it is to do moral criticism should be an integral part of the challenge of thinking about ethics and AI. But looking at moral criticism, we have many practices of moral criticism, and those aren’t, strictly speaking, subjective, and we value them because they help correct our subjective opinions.

Lucas Perry: So I think there’s two parts of metaethics that I would like to see if you have any thoughts on how they may or may not apply here. Metaethical epistemology, how is it that you know things about metaethics? And whatever may be metaphysically true about ethics or not. So you brought up religion there. So in terms of, I guess, what would be called Divine Command Theory, morality would have a metaphysically very solid ground as being codified by God or something like that.

Peter Railton: Actually, I’d say that that wouldn’t get us a solid metaphysical ground. The fact that commands come from a being that supremely powerful, and even one that’s supremely knowing would not make those commands moral commands. Those conditions are perfectly compatible with immoral values. What we would need is a perfectly knowledgeable, entirely powerful, and all good God. A so-called AAA God. But that means the concept of good is independent of the concept of God itself, and understanding what it would be for the commands of a divine super powerful being to be good just takes us right back to the question of the nature of morality. We don’t solve it by introducing supreme beings.

Lucas Perry: Right, right. So I’m not trying to justify or lay out the Divine Command Theory. Only using it to, I guess, attempt to explain how epistemology and metaphysics fit into metaethics. To me, it seems like what is relevant here to AI alignment is that how one believes one can know things about metaethics and whether or not there can be agreement upon metaethical epistemology would be the foundation upon which metaethical moral learning machine systems could be expressed.

There is sort of a meta view on the epistemology of metaethics, where one could say, “Because there are no moral facts, the epistemology is whatever human beings are doing to think about moral thought.” And there isn’t a correct epistemology. Whereas one could, whether through naturalism in your metaethics, or through non-naturalism in Peter Singer’s ethics, believe there to be moral truths, and that thus there is a correct epistemology about metaethics, and that that epistemology of metaethics could be used to instantiate metaethical learning in machine systems.

Peter Railton: So one thought would be, there is one true morality and we’re capable of knowing it. That itself wouldn’t get us very far in epistemology until we could say what those methods of knowing are. An approach that’s got something like that as an assumption, but that doesn’t assume that we know what the destination is ultimately going to be, would be to ask, “Do we have good practices of moral criticism? And do those help us to solve actual problems, social problems, interpersonal problems, problems with our own lives?” And then to look at the ways in which we use morality in these contexts to solve problems.

And that brings it down to the level that it’s something that comes within the scope of what can be learned. And if we look at children’s learning, we see that their development as moral creatures proceeds in pace with their understanding of causality, their understanding of theory of mind, their capacity to form a counterfactual thoughts, because it’s really an integrated body of general understanding. And so for example, the idea of solutions that are positive sums of game theoretic challenges, that’s something that can be agreed upon by all parties to be a desirable thing. And so looking at strategies that have the possibility of yielding positive sums, cooperative strategies, strategies of trustworthiness, of signaling strategies, which enable us to coordinate with each other, understand each other’s intentions, those have a justification that we can give in terms that are not tied to any one particular person’s interests, which address interests generally, and which we can defend in an impartial way.

And so that would be an example of a way in which we could say those are more reasonable solutions, more justified solutions. There’s an analogy here with epistemology generally. If someone were to come to me and say, “Well, you claim to have knowledge, how do you demonstrate that? How would you show that your understanding of knowledge is genuine knowledge?” I’d have to say, “Well, sorry, I can’t demonstrate that. Any demonstration would presuppose knowledge. And so I can’t pull it out of a hat and I can’t derive it from nothing.” So what can I do? I can say, “Well, here our practices of epistemic criticism. And while we have disagreements in various places about what counts as evidence or what does not, do those practices deliver the kinds of results that we would expect from reasonable epistemologies, making possible things like scientific inquiry and technology and so on?

And we can say, “Well, that’s what epistemology could be expected to give us. We do have methods that can improve our ability to solve such problems in just those ways. We can find various ways to justify them in terms of probabilities, looking for ways in which we can increase accuracy and estimations.” And so those are different ways in which by looking at our actual practices of epistemic criticism, we try to get some traction on the problem of knowledge. And I would argue we should do the same thing about morality. If we start from the standpoint of skepticism, in the case of knowledge, we will end with skepticism. The same would be true with ethics, but I see no more reason to do it in ethics than in epistemology. We surely must know a great deal about what’s good for us, good for one another. And we have well-developed practices of moral assessment that we use in our own lives, and we use in our collective institutions. So I would say, if we look to those, then we don’t see just subjective opinion. It’s quite different from that and we see a lot of constraints.

Lucas Perry: So I do want to explore more arguments around metaethics with you. And we’re intending to do that after we discuss moral learning here. Now, in terms of moral epistemology and the epistemology of metaethics, I’m interested in this part of the conversation in setting up an attempting to illustrate that whether one is going to take a skeptical view on moral epistemology or not. That moral learning and our view on moral epistemology is essential and important in the alignment and development of AI systems. And here you’re defending a more realist account of epistemology in ethics.

Peter Railton: Well, you could say that I, myself, am a realist, but what I’ve been saying so far, a pragmatist about ethics could say just as well. John Dewey would say something very similar. Various kinds of non realists, but who are nonetheless objectivists in ethics, Kantians, for example, Constructivists, and so on. What I’ve said it was really neutral territory for a wide range of views in metaethics. And it doesn’t presuppose in particular, a form of naturalism or a form of realism. That’s actually a tremendous amount to build upon so that when we think about how to design robots to understand the world, we have a lot of knowledge about what sorts of systems would be well-designed for doing that.

Similarly, if we want to build a robot who can interact creatively and productively with other robots, solve problems of coordination, reduce conflict, realize longterm goals, interact successfully with people, recognize their interests, take their interests into account, being relatively impartial with regard to interests that are at stake, those are not mysterious in the same way that the skeptic seems to think they are. Because again, they’re already integrated in our practice and as Hume pointed out a long time ago, skepticism doesn’t survive very well once we leave the closeted philosophical study. People go out and they act as if they had knowledge of the world and they act as if there are things that people could do to them or that they could do that would be better or worse, right or wrong. They think about how they would treat their children. They think about how they should behave with respect to their students or their professor. That doesn’t take us into the misty realms of metaphysics, but it does take us into the practices of moral criticism and self criticism.

Lucas Perry: So could you unpack just a little bit more about why this view is neutral?

Peter Railton: So for example, I’ve mentioned a couple of features of moral thought. One feature of moral thought is that it takes a kind of impartiality seriously. It gives equal weight to all those effected. That’s something that Kantians and Utilitarians and many other moral theorists would agree on. Another feature of moral thought is that it’s concerned with general reasons. Similar cases have to be treated in a similar way. That leads to a doctrine known as supervenience. We can’t invent moral distinctions that don’t correspond to real distinctions, in fact. Another feature is that morality has to do with reciprocity, relations of mutual gain and mutual benefit. Another is that morality involves taking oneself and others as ends and not as mere means.

Those are all normative theories. But if you then ask, well, “What about the metaethical side? Could a pragmatist about ethics say the same things?” And the answer seems to be, yes, the pragmatist sees ethics is essentially about people solving the human problems that they face in ways that meet these kinds of desiderata. The person who believes that there’s a rationalist foundation, believes that you can know a priori that these constraints exist of impartiality and so on. But as you can see from Singer’s work, the result of applying his form of rationalism is not dramatically different from the results of applying my form of naturalism. And that’s because the target that we’re all working on, ethics that is, has a great deal of determinant structure. And so any metaethical theory is going to have to capture a lot of that structure.

Lucas Perry: And so, sorry, what is the relationship about how this is instructive for why metaethics matters for AI alignment?

Peter Railton: Well, the suggestion was, well, we should know something about what ethics is in order to answer that question about how we might gain moral knowledge. If we can gain moral knowledge, what moral knowledge might consist in? That’s where we started. And then I tried to suggest a bunch of considerations, a bunch of features, that I could call obvious features of moralities of practice. Because I think our practice is not just at the normative level. People also have implicit metaviews in ethics. They demonstrate that by, for example, their knowledge of how you can determine morally relevant considerations in situations. So they understand what kinds of considerations are or aren’t morally relevant. They understand the distinction between morality and etiquette, between morality and law, between morality and self-interest. So they have a grasp of a bunch of these obvious features of morality.

And those are not just features of one or another normative theory. They’re are features of virtually all normative theories and features that any metaethic is going to have to accommodate, unless it’s going to be skeptical. So that’s why I say that there’s a great deal of common ground, not because the fundamental explanations are going to be the same, but there is an explanatory target, which has a great deal of structure and which indeed all these theories have to explain. And that requires then that metaethical theories be adequate to that.

Lucas Perry: I see. So that is already structuring metaethical epistemology is what you’re saying?

Peter Railton: Yeah. It gives you quite a bit of structure.

Lucas Perry: Yeah. That’s just reminding me about how Peter Singer talks about this one philosopher in his book, The Point of View of the Universe, discusses how there are a few axioms of morality and they seem to touch upon these convergent principles that you’re talking about here. Now, on a realist’s account of metaethics, there would be something like a one true moral theory. And if one takes the one true moral theory view seriously, then the problem of AI alignment would be to cultivate a procedure for coming up with the correct moral epistemology in order to find the one true moral theory, or to discover the one true moral theory ourselves, and then align AI systems to that.

Now, if one believes that there is not one true moral theory, and there is only the evolution and extrapolation of human normative processes, and preferences, and metapreferences, then one might not want to come at the AI alignment problem from the perspective of a one true moral theory approach. And as a general note, I’m taking this language from Iason Gabriel, who will be on the podcast soon. And so in the secondary scenario, that is not using the one true moral theory approach, one would want to come up with a broadly acceptable procedure for aligning AI systems that didn’t presume to try to discover a one true moral theory. Do you have any reactions to these two ideas or approaches to alignment?

Peter Railton: Yeah. The question of whether one thinks there is one true theory is somewhat different from the question of whether when things were close to it or we have good ways of knowing it. I myself, although I’m a realist, I recognize that there’s a good chance that my moral views are wrong and my metaethical views are wrong. And so I don’t want to just put all of my energy into thinking, “Well, how would we discover the one true moral Theory?” I would want to think more robustly. And again, I can make an analogy with epistemology. If you go into a philosophy of science department or a statistics department, you’ll find that there’s a tremendous debate between people who think that Bayesianism is the right kind of approach for evidence and people who think that standard methods of social science are the best methods of evidence gathering.

You’ll find a tremendous amount of disagreement. So if we’re trying to build a robot who understands its environment, we don’t want to say, “Well, we have to figure out which one of those theories is correct before we can build a robot to understand its environment.” You might say, “We want a robot that’s got a robust capacity to learn, and that would deliver results, reasonably approximated by a Bayesian, or an inductivist, or someone using social science statistics. They’re not going to agree on everything. Where there’s overlap, we should try to build a machine that can stay in the overlap, we should try to build the machine that’s not brittle, such that it makes epistemic commitments that are at the far edge of one or another of these views.

And so I would say our task is to build a system that’s robust. And that means building into it the fact that we don’t know what this one true theory is. And so therefore we want as far as possible to accommodate an array of approaches, all of which have very strong reasoning behind them. You could think that we’re not trying to build an AI system that discovers the one true theory. We’re trying to build one that isn’t going to be dependent upon exactly the target that it hits, but rather could be successful in a array of possible environments.

Lucas Perry: So, I mean, adjacent to this and promoted and discussed by people like Toby Ord and William MacAskill, would be this human existential procedure for moving into the future, where it’s like, we’re going to align AI systems, whatever that means. And that alignment will hopefully not lock in any values or any particular kind of alignment procedure, but will ensure existential security for humanity, such that existential risk just keeps going down to zero and is near zero. And then we use this existentially secure situation to do a long reflection on value, and what is good, and what may be true or not true about ethics. And then with sufficient consideration, then we can engage in populating the stars and optimizing things the way that we see fit. So what is your view on this proposed long reflection?

Peter Railton: Insofar as I understand it, I don’t have any objection to it. I’m not sure I do understand it. One of the things that you just in passing was that we were going to try to design these systems to behave as we see fit. I myself am not sure I know how it is fit to behave. And I certainly know that I have some mistaken beliefs about that. And I would hope that just as artificial intelligence may help us correct certain of our views on cosmology or in medicine, artificial intelligence could help us correct certain of our views and ethics.

We’ve seen a tremendous amount of evolution in people’s fundamental moral convictions over time. Some have stayed relatively similar. Others have changed dramatically. And we would, I think, do best to think of the artificial extension of intelligence as one way in which we can get a perspective on these issues and situations and problems that isn’t just our own, and that won’t have the same priors as our own, and won’t have the same presuppositions, and they should be included. We should think of these as his agents.

They will have interests just as we have interests, and the standard would not be, what do we see fit, where we mean something like we humans, but what will we see fit as we, the humans and the artificial systems continue our evolution and our cultural development. And we want to think that the path that we should follow is one that leaves open that kind of development rather than constraining it to fit what happens to be our current set of moral convictions, which again are not shared. There are too many disagreements in order to think that we could just write down the rules. Long reflection, I think will also tell us that we need a dynamic picture. And we should have some convictions that are more confident, closer to the core. We should have methods and practices that meet reasonable standards of justification and objectivity, and we should be prepared to learn.

I can’t, I’m afraid, to think of a way to guarantee against the existential risk from artificial intelligence or even our own intelligence, which may be more problematic. But I do suspect that the best way to contend with problems with existential risk is to face them as communities of inquirers.

Lucas Perry: All right. So I think you’ve done an excellent job explaining the importance of moral learning and moral epistemology here, given that the ongoing cultivation of more wholesome and enlightened moral value and moral thinking is always on the horizon. Now, you have some perspective and research that you’ve done on moral learning in humans and the importance and necessity of that. I’m curious here now then to relate some of that research that you’ve done in moral learning in humans to how AI systems of increasing autonomy may also wish to take on the kind of moral epistemology that infants and young humans may have.

Peter Railton:

I wouldn’t say that I’ve done research in this exactly. I’ve certainly explored others’ research in this and try to best I can to learn from it. One of the things that’s impressed me in the literature as it’s evolved over the last couple of decades is how much the learning of children is accomplished, not via the explicit teaching, but by the children’s own experience. What we’ve learned recently, and this is not from developmental psychology, but from various kinds of models of machine learning is that very complex structures can be learned experientially. There are powerful techniques which we can add to that kind of probabilistic learning in order to create knowledge of general principles, to do something like build a structured understanding of language that would enable a child to speak fluently, to understand what others are saying and to engage with them that does not require either an innate grammar or explicit instruction in language as such. That’s a kind of a model of how we also seem to acquire our social normative knowledge.

If you think about the perspective of the infant, one thing that we’ve learned from the animal research is that animals don’t just build a spatial map in relation to themselves. They don’t just build an egocentric map of their environment. They also build grid-like maps that are non perspectival, and they navigate by combining these two kinds of information, perspectival and non-perspectival information. Infants seem to do something similar in learning about learning. They not only represent their relations with individual adults and whether those benefit them or not, but they also seem to construct general representations of whether a given adult is competent or helpful in third party interactions and to use that aperspectival information to make decisions about who they’re going to learn from or pay more attention to. They start doing this surprisingly early on. And so at the same time that they’re constructing the ego centered world, they’re constructing a non-centered representation of the world that includes normative features like reliability, competency, helpfulness, cooperativeness.

And so the child in coming to represent the world around them is constructing representations that have the initial form of moral representations. It turns out to be efficient for learning to be a successful human being that one construct representations spontaneously that have this quasi-moral structure. And that would suggest to me that if machines develop as agents, agents interacting with other agents, agents capable of solving a range of problems, capable of having sustained interactions with humans to solve open-ended problems, that they will also find that they do better if they can construct these quasi-moral representations of situations. And so that means that they will be acquiring sensitivity to morally relevant information through the very task of acquiring social competence, linguistic competence, epistemic competence in a social world.

So there’s a kind of picture here that congrues nicely with the fact that we now know that complex models can be acquired through experiential learning. That suggests that there is a promising pathway toward the development of theory of mind, causal inference, representation of social value from a objective or non-personal perspective. There is an argument for thinking that that’s actually a fundamental core part of our capacity as intelligent beings capable of successful social interaction. That suggests that this is not a peculiarity. It’s not culturally specific. And so why not use similar methods in our interactions with artificial agents to enable artificial agents to acquire these kinds of quasi-moral mappings?

Lucas Perry: So the key thing to draw out from here is that there is this distinction between explicit and implicit learning of morality, and you’re remarking about how there isn’t much explicit moral learning in infants and children. Most of this moral learning comes from simply experience and interacting with the world rather than explicit instruction about what is right and wrong.

Peter Railton: There’s tremendous cultural variability in that within our society and across societies as to how much explicit moral instruction children are given. What’s fascinating is that even in societies where children get very little explicit moral instruction, they nonetheless acquire these capacities. Similarly with language, there are some societies like upper middle class US society where parents talk extensively with children. There are other societies where parents do not, and yet the children can become fluent linguistic agents. So my thought is that the explicit theory isn’t really the thing that’s doing the fundamental work. Even to understand what parents are trying to do when they give you explicit world instructions to understand how to apply those or what they might mean, the child is already going to have to have quite a complex aperspectival representation of the social situation. The thought here is that there’s some places explicit theory, some places less explicit theory, but the result in terms of the development of behaviors are very similar.

A good example of this is that around age three or four children who are given a command by an adult in authority, if that command violates a reasonable norm against harm will balk and refuse to perform it. So if a substitute teacher comes in one day and says, “I’m the teacher today, and in my classroom, you have to raise your hand before you speak,” children in the classroom will start raising their hand before they speak. If the teacher says instead, “I’m the teacher here, and in my classroom, children jab the point of their pencil into the child next to them when they wish to speak,” they’ll stop. They won’t do this. And if they’re asked why they won’t say, “Well, that’s not the way we do it.” They’ll say, “It would harm the other child.”

And so that suggests that even an attempt by a figure of authority to give a norm in a situation where children can perfectly well understand that there is a scope of legitimate authority, put your hand up before you speak, they will distinguish between that kind of conventional authority and moral authority. And that’s an autonomous action on their part. They’re not getting rewarded for it. In fact, the teachers, they either send them out of the room, send a note home to their parents, but they balk because they can represent the situation in these quasi-moral terms. And when they do that, they say, “No, this is not a good solution to the problem.” That suggests to me that even if we were to think that children learn by being given explicit instructions by people in authority, they actually independently learn that they can resist that and will resist it.

Lucas Perry: Right. So we’re in a position where evolution has cultivated and embedded in us, a kind of moral learning, where there is a certain degree of implicit and explicit moral learning, depending on your culture and where you’re from. And as you’re saying, luckily there’s strong convergence on this ability of moral learning to lead human beings to agreeing on say in the case of stabbing the other child, that would be something like a principle of unnecessary harm to another person. That seems to be for most human beings something that is strongly converged upon pretty early, unless your environment is particularly pernicious or something. And that there is this convergence because of how our moral learning is structured given evolution. And that, that moral learning enables in us a kind of moral autonomy that’s there from an early age.

And there is a question of how this moral learning is best structured in say both people and in machine systems. And then there’s the question of moral learning from the outside. What kind of environment is most conducive to moral learning? Are there insights into this that can begin pivoting us into the relationship or importance of moral learning in AI systems?

Peter Railton: Perhaps so. Actually there’s a fair amount of evidence that even infants brought up in some very difficult situations will nonetheless develop these forms of pro sociality and cooperativeness, partly because they become especially important in those situations even to solving the most basic problems or meeting the most basic needs. So I wouldn’t think that the mere difficulty of the situation was sufficient to prevent this kind of learning. On the other hand, if the child is given the wrong incentives, they’re also going to learn a whole bunch of other stuff like you can’t count on other people, you can’t trust other people.

So put this from the standpoint of artificial agents. We want the artificial agents in our world, whether they’re a companion for an elderly person or a autonomous vehicle or a telephone answering service system, we want those systems to be sensitive to these kinds of moral considerations and capable of a degree of autonomy. If for example, there is a system that’s looking after an elderly person and some vital sign of the elderly person is showing a problem, and the person says, “I don’t want to report that. I don’t like having people know this information about me,” or maybe they’re concerned that the doctor will prescribe something that they won’t like, I hope to have systems, which can in that situation think, “Is this the kind of thing that I should keep from the physician? It’s the preference of this individual, but this preference may not be the best interest of the individual in this case.”

And so on autonomous system would be able to make that kind of assessment. Could get it wrong, could get it right, could learn from it, but I wouldn’t want a system to be such that they would simply take over wholesale the preferences of the person that they are interacting with. And of course the same thing is going to be true with self-driving cars and with question answering systems and so on. They will need a certain amount of autonomy in order to do those jobs effectively. And in order for that to happen for them to have that autonomy, they’ll have to have their own representations of the moral structures of the situations and have the capacity to construct those.

I suspect that if we really do want to create intelligent systems that are capable of this kind of autonomous self-critical and critical moral thought, the way to do so is very much like the way children do so. And in so doing, we run the risk of creating some autonomy systems won’t always agree with us, but have we done what’s appropriate so that when they exercise that autonomy, their chance of getting things right is good at least as our chance of getting things, right? So you could think of this in the kind of adversarial picture where you’re trying to see if you can discriminate between the moral judgments of the machine and the moral judgments of the individual and the machine, and the individual could be part of a learning process that improves the machine’s overall model and generative model of situations.

Lucas Perry: So there would be the question of, how do you structure a system such that it can learn moral learning in a way that would be broadly endorsed or would satisfy other ethical or meta-ethical principles that we have? That is double-edged in so far as if you screw it up, then the thing is autonomous and can disagree with you. And the capacity to disagree would either be detrimental in the case in which it is wrong in its moral learning, or it would be enlightening for both us and the world and the machine if it were right about morality when we weren’t. How do you think about and balance this risk between the possible enlightenment that may come from embedding AI systems with moral learning and also the potential catastrophe if it’s done too quickly and incorrectly?

Peter Railton: Yeah. Wish I had an answer. If you think about it, the existence of humans with malicious intentions means that if artificially intelligent systems don’t have this kind of moral autonomy, they’re going to be very willing servants. So you might say, “Well, there’s a risk on the other side, which is that if they aren’t capable of any kind of criticism or autonomy, then they will be much too willing and much too readily deployed and much too manipulable by humans whose purposes I’m afraid to say are not always benign.” If you were thinking about the problem of raising a child, you would say, “Well, I don’t want to raise a child who simply take orders. I want to raise a child who can raise questions as well.”

I think our only defense against malicious humans with extremely intelligent systems at their disposal is to try to ally with intelligence systems to create a comparable counter force. And that counter force is going to be operating out way past our understanding because it’s going to be in competition with systems. They can operate extremely fast and take into account a large number of variables. And so we better be building systems which, as they get further and further out in this kind of a competition, have some kind of a core where they are responsive to morally relevant features even at the far extent of their development.

And so if you think about it as trying to build a moral core, then that core can figure in their operation even as they become more and more intelligent. They can use the intelligence to gain information and perspective and capacity to understand situations that can improve their understanding. But if we don’t do something like this, we will really be and other artificial systems will be prey to those who have and want to implement malicious and manipulative intentions. So I balanced the risk partly by thinking, I can’t think of a very good way to defend against the perils of malicious combinations of human and artificial intelligence other than to develop more trustworthy forms of human and artificial intelligence interaction. And that requires according these systems some autonomy and some trust.

Lucas Perry: That makes sense to me. And I think it addresses some important dimensions of the soon to be proliferation of AI.

Peter Railton: To me, what are the most exciting features of more recent developments in artificial intelligence is that they give us for the first time, I think, a plausible model of intuitive knowledge and knowledge that it could be implicit, but nonetheless be highly structured, contain a great deal of information, contain a capacity to engage in simulation and evaluation. So I would expect that the structure of moral knowledge could be like our structure of common sense knowledge generally. It could be quite distributed. It could be quite a complicated system, not a system of extracted principles. There might be some general features that are important, and I think that’s bound to be true. And that is true when these systems learn, but we don’t have to think that the kind of competency that they would have, if it isn’t something like that, is therefore undisciplined and therefore lacks power or reliability.

So for the first time, anyhow, I thought here is a picture of how intuitive intelligence might look. And of course we can’t introspect the structure of such knowledge and it does not have a readily introspectable propositional structure. But it is capable nonetheless of carrying and modeling and engaging in quite complex computations, simulations, action guidance, control of motor systems in ways that look like intuitive intelligence. Now I realize we’re a long way from the way the brain actually functions, but even to have these models, it gives us a kind of proof of concept of the possibility of something like intuitive knowledge.

Lucas Perry: Right. So if we’re building AI systems as willing slaves who optimize the preferences of whoever is able to embed those in the machine, there’s no defense in that world against malevolent preferences other than not allowing the proliferation of AI to begin with.

Peter Railton: And we’re already past that point. Enough has proliferated and there’s enough inequality of wealth and power in the world to guarantee that other proliferation will take place. It’s already the case that we can’t count on keeping this genie in the bottle and obviously don’t want to do so. I’d say we’re now in the phase where we need to have an active, constructive program of starting to build AI agents that are actively responsive to morally relevant considerations, are good at solving coordination problems, are good at this kind of interaction and capable of the kind of insight needed to be potential moral agents.

Lucas Perry: Right. And you argue that as the systems inhabit increasingly social roles in society and are constantly interacting with other agents and with the world, it’s increasingly important that they be sensitive to morally relevant features. Without this, again, malevolent humans or humans with misaligned values that are counter to most of the rest of humanity can abuse or use systems more freely if they’re not already sensitive to morally relevant features. And that if there is an ecosystem of AIs, purely altruistic systems which are not tuned into morally relevant features can be abused by other AIs as well.

Peter Railton: Yes, that’s right. One thing that’s gotten me to feel some conviction about this possibility is that the one kind of experiments that I do run are thought experiments. And I’ve been for years running moral thought experiments in my moral philosophy classes. And in recent years, I’ve been able to do so using a system that allows students to confidentially record their answers to problems like moral dilemmas or questions about interpreting moral situations or motives. And what’s impressed me over the years is how coherent and consistent these responses are.

And what leapt out, for example, from the familiar trolley problem was that mediating their moral judgments seem to be a model of the agents that are involved, a model of what kind of an agent would perform an action of a certain kind. And what kind of responses such an agent would receive from others in the community? Would they be trustworthy? Would they not be trustworthy? And so, instead of thinking there’s just these arbitrary differences in preference between throwing a switch and pushing someone off a footbridge, and there’s no real principle there, and no one’s found a principle to cover these cases, you can think now there’s this intuitive competency people have and understanding situations and characters and what kinds of persons would respond in what ways and situations and what it would be like to have those persons in our community.

And once you look at it that way you can get a tremendous amount of consistency in people’s responses, which suggested to me that they are doing this kind of generative modeling of situations and doing so in a way that does predict to their actual judgments. And if I ask, “Well, why did you make that judgment?” they’ll say, “I don’t know. It was just an intuition.”

Lucas Perry: Yeah. So the thought experiment that you’re pointing to, a lot of people would flip the switch in the trolley thought experiment to switch it to the track where there’s only one person and then if you changed it so that there’s a person on a bridge who is sufficiently large, that if you push them off the bridge, they will stop the trolley from killing five people on the track. The intuitive response that you’re pointing out is that people are less likely to want to push someone off of a bridge than to flip a switch. And you’re like, well, what’s really the difference? In the thought experiment, there’s not much of a difference, but the intuition that you’re pointing out, the morally relevant feature that is subtle and implicit is that we don’t want to live in a world where there are the kinds of people who have the capacity to push people off of bridges.

Peter Railton: In that kind of a setting, yes.

Lucas Perry: Yeah.

Peter Railton: And you can give them a whole array of other scenarios in which the agent would have to do something like pushing someone to a grisly death and where they will agree that it should be done for example, in situations where self-defense is needed against, for example, the terrorist action. And again, you’ve asked them, “Well, would you trust an agent who would perform such an action?” then the answer is they would actually have more trust in such an agent. So again, they’re modeling the situation, not in response to this or that minor tweak of the situational features, but in terms of a quite deep understanding of the motivations and attitudes that are involved. And then if you go over to the psychological literature, you find the dispositions to give the push verdict in the footbridge case correlate more with antisocial behavior, with lack of altruism, with lack of perspective taking, with indifference to harm than with altruism or any kind of a generalized utilitarian perspective. So the psychologists seem to confirm the understanding that my students implicitly had of the situation.

Lucas Perry: What’s relevant to extract here is that there are deep levels of morally salient features, that human beings taken to account, and that are increasingly needed to be modeled and understood by machine systems for them to successfully operate in the world.

Peter Railton: Yeah. And to be trustworthy. I’m one of those people who thinks emotion is not a magical substance either, and that artificial systems could have and acquire emotions. And that part of the answer to the question of how do you build a core that is resistant against certain types of manipulation is to look at how it’s done in humans and indeed another animals and discovered that the affective system plays a pivotal role in just these kinds of situations. And so I suspect that’s another avenue of development. And children’s moral emotions undergo a similar kind of evolution through their upbringing, but through their direct experience because the emotions are there before they’re told what to feel. Indeed how would you tell the child what to feel?

Lucas Perry: Are there any other points that you’d like to wrap up here on then on the advantages of reflecting on AIs, which are sensitive to morally relevant features?

Peter Railton: I try to be as accurate as I can in understanding what we’re learning from the literature on pro sociality, for example, both with regard to individual human development and with regard to human communities, going back, looking at hunter gatherer communities. And even as there have been changes in morality, and I have emphasized that there’s been changes over time, the kinds of features that people take to be morally relevant, many of those have been relatively constant. And you can think of our changes in our moral views that have taken place over the years is getting better and better at winnowing out the ones that aren’t really morally relevant, like gender, ethnicity, sexual orientation, and so on, because they can easily become culturally relevant without being morally relevant. Fortunately, we have the critical capacity as agents to challenge that.

Lucas Perry: Yeah, that makes sense. The core importance that I’m extracting from everything is the baseline importance of moral learning in general, and also the understanding and capturing what human normative processes are like and what they entail and how they unfold. And that participating in a world of humans requires knowledge of both moral learning and the ability to learn morally.

Peter Railton: And this is not saying that people will always behave well, just in the same way that acquiring linguistic competence doesn’t mean people are always going to speak well or truthfully, or honestly, but rather that the competency will be acquired. One example that I like is sexual orientation. When I was growing up, it was considered fatal for someone’s social identity to be discovered to be gay. And there was a great deal of belief about the characteristics of gay individuals. In the 90s and so on, a large number of gay individuals were courageous enough to indicate their orientation. And what was discovered, we all discovered, was that the world was full of gay individuals whom we admired, whom we had standard relationships with, who were excellent colleagues, coworkers, friends, and that therefore we were operating on a bad dataset because we had not really had, we here I’m talking about heterosexuals, had insufficient experience with gay individuals. And so we could believe all kinds of things about them.

So I would emphasize that if it’s a learning system, it’s going to be very sensitive to the data. And if the data’s bad, the learning system is going to have a problem. So I don’t think it’s a magic solution, but I think the question to ask is, so how do we build on this? How do we provide more representative experiences and less biased samples so that the learning can take place and not pick up cultural biases?

Lucas Perry: Yeah, those are really big problems that exist today and a lot of the solution right now is human beings having to do a lot of hard work in datasets. We can’t keep that up forever. Something else is needed. I think this has been instructive about the importance of structure of moral learning and I want to pivot back into our discussion of meta ethics and your conversations with Derek Parfit and what your metaethical view is and how views on metaethical epistemology or metaphysics may bring to bear intuitions about what moral learning is like or what it might entail. It’s Derek Parfit, right? Who has essays on, Does Anything Really Matter?

Peter Railton: Yes.

Lucas Perry: So I guess that’s the question here then for this part of the conversation, is, does anything really matter? So you were in conversation with Derek Parfit and it seems like your views have converged and are different in ways from Peter Singer, though it seems like you guys are all realists. Could you unpack and explain a little bit about the history here and what went down between you Parfit and Singer?

Peter Railton: Yeah, sure. I have to warn those who are listening, buckle up, this is going to have to be a philosophy talk, but I’m sure that many people have these philosophical questions themselves. So let’s just begin with the title that Parfit chose for his master work, On What Matters, is the title. And you might say that mattering is the core notion of value, that if you had a universe full of rocks, it would not matter to the rocks, what happened. It would not matter to the rest of the universe, what happened. And so there wouldn’t be any positive or negative value in that universe. Introduce creatures for whom something matters, even if it’s just as simple as nutrition or avoiding pain, then you can begin to talk about states of affair as being better or worse than one another, about improving or degrading the situation or the characteristics of the world.

And so mattering is poor to the idea of value. And once we grasp that, we begin to realize that value is not some new entity in the world. It’s not something we add to the world. Once you have mattering, then things will have value, and they’ll have positive and they’ll have negative value. And of course, for different creatures, different things will matter. And learning what matters to a creature is understanding what would be good or harmful to that creature, and this of course includes humans. So I was very moved when I was on a committee, looking into questions of animal research, to know that the veterinarians learned a lot about what situations animals preferred and did what they could to try to give them situations in which they were happier, more lively, more disposed to cooperate and learn. And that means that they were trying to learn something about what matters to a rat.

And we now know a fair amount about what matters to a rat. Company matters, exercise, the capacity to engage in activities, build nests. And so when these things matter to rats and so we can give rats a good or a bad existence by thinking about, well, what does matter to rats? Now, what matters to rats is different from what matters to humans, but the basic idea is the same. So there’s value there and it’s thanks to the existence of creatures for whom something matters that value comes into existence in the world. That’s a perfectly naturalistic perspective. Treating value as something that is realized by natural states of affairs in the world. Now it turns out that even someone who’s an arch non-naturalist like Derek Parfit agrees that pain is bad, not because it has the non-natural property of being disvaluable, but because of what it’s like in its natural features, those features suffice to make the pain bad.

And if they didn’t suffice to make the pain bad, there would be no value feature we could sprinkle on it that would make it bad. But given that it has those features, there is also no value feature we can sprinkle on it that will make it good. And so Parfit and I can agree that non-naturalism is important in ethics, not because the world is populated with non-natural entities like values. That’s a widespread confusion. It’s reifying a notion of value as if it were some kind of a new domain of entities. And naturally once you’ve done that, it becomes very unclear how we learn about these, what relationship they have to the natural world. If instead, you think, no value is something that is brought into existence by certain relational features in the natural world, then you can say, “Ah, that’s common ground between Derek Parfit and myself.”

And if Derek’s explaining what’s bad about pain, he’ll give the same explanation that I would give about what’s bad about pain. So we agree on that. The badness in the case of pain, pain is really used for two different things. It’s used for certain types of physical sensation, and it’s used for suffering. That physical sensation isn’t always suffering. So for example, when you put hot sauce on your food, you fire up pain circuits, but you enjoy that. You may seek the burn of exercise. And so there are times when the physical sensation of pain is sought and liked, desirable. It’s part of good experiences. It shows that pain can matter in different ways. It’s the mattering where the value resides, not in the physical sensation just in itself. So the mattering is a relationship between a subjectivity and agent and the physical sensation, and it could be positive or negative in a given case, but the value resides in that relationship.

Lucas Perry: But they’re just two contents of consciousness, right? There is the content of consciousness of the sensation of pain on my arm if I scratch it, and I might derive another sensation from that sensual pain, that is pleasure. Wouldn’t the goodness here need to come from this higher level, more pristine pleasure that I gain from the pain, which is more of an emotion and that which is intuitive to the other sensation or the other content of consciousness?

Peter Railton: I think you’re right to bring in higher level mental states as well. Because part of the reason why pain in certain circumstances is desirable is because of the representation that you have of it. And this is true with many features of the world, is because you understand them in certain ways that they produce in you the positive or negative experience they do. And if you ask a psychologist, the positivity and the negativity in the mind does not reside in the impulses of the pain system or the pleasure or reward system. It resides in the effective system, which encodes value as positive or negative. And it encodes as well, the behaviors and the responses that are characteristic of positive and negative value, positive is approach negative is withdrawal. Fear involves a certain distinctive suite of responses. Anger involves another distinctive suite of responses, but the affective system is where the value is encoded, and that’s the common currency of value in the brain.

So that’s where we should be looking to discover. And it’s the affective system that, which is the root of our emotions, whether they’re aroused emotions like anger or fear or non aroused emotions like assurance and trust. That system is a system which encodes this relational feature of value. You’re quite right to think that we should move up a level, and in doing so, we encounter the affective system and its properties. And it’s a system that we share with all of our mammalian relatives and with other species as well. It’s evolutionarily a highly conserved system. And that’s because it is the core of valuation, and valuation is a core activity of living creatures because they’re going to base their actions on value assignments. You’re right to think that in the mind you will have tiers and that you need to find the right level in order to understand what value or disvalue looks like in the mind.

Lucas Perry: So there’s the view where some content of consciousness is clearly seen as bad given its nature. If some state of consciousness is like something from a consciousness realist perspective, and it is also natural because it’s part of the natural world, it’s a physical fact and there are facts about consciousness, then value comes in from what it’s like to be conscious. Whereas it seems like you’re bringing in the more computational, and physical side of things, like an evaluative affective system, which may not be separate from how things are experienced in consciousness, but I feel confused about these two different levels and where the ‘what matters’ comes from.

Peter Railton: Well, yes, you’re quite right. There are views about value in which it’s only conscious states that could have value or disvalue. I don’t particularly hold such a view. I think that we are intrinsically concerned with, and that there is intrinsic value in non-conscious states. And that’s why I wouldn’t sign up for the experience machine. The experience machine could provide an unending stream supposedly of positive conscious states, but why wouldn’t I sign on for it? Well, because the actual content of my values is not that I have certain conscious states, it’s that I have certain relations with people, with the external world, that I have a certain engagement with things that have a consciousness and that matter. And so I wouldn’t agree that the only place, the only locus of value or disvalue is conscious states.

Lucas Perry: So then from a cosmological and evolutionary perspective, there has been the development and arising of sentient creatures on this planet who have ever complexifying neural algorithms for modeling themselves and the world and making predictions and interacting with it. And amongst these evolved architectures include evaluative ones, which take the shape of valuing or disvaluing certain aspects of the world. And so that is enough for you for talking about intrinsic value. You feel like you don’t need to bring consciousness into it. You’re fine with just talking about the computation.

Peter Railton: Oh, I think consciousness plays a role because one of the good making features is a positive state of consciousness. It’s just, it’s not the only one. And so there are differences in the world that would not show up as differences in conscious states. And that’s what the experience machine is meant to show, but which would nonetheless constitute things that matter in the sense of matter that we were just describing, namely, that these are objects of concern, love, affection, interest on the positive side, objects of dislike, disvaluation, disapproval on the negative side. I don’t think there’s any reason to think that only conscious states can be locuses of value, but it may be that consciousness plays a role.

Lucas Perry: So what are these other good making features and why are they good making?

Peter Railton: Well, take, for example, a theory like a preference satisfaction theory. I would prefer other things equal that after I’m gone, my children have lives that they find meaningful. Now is that because I want to have the positive experience of thinking that their lives are meaningful? No, I want them to have those lives. And so it’s part of the content of my informed preferences, let’s say that it would survive information, is part of the content of my informed preferences that the world be such, that my children have a certain kind of life. And you say, “Well, doesn’t the meaningfulness of their life just consist in their conscious states?” And I’d say, “Well, no, not at all. I would think that a life in an experience machine would have the same meaning as a life with similar stream of conscious states that was lived in engagement with the world.”

And so when I want them to have meaningful lives, I want them to have lives in which they act in ways that matter to them. And that they do things that matter to themselves and to others and their intrinsic preferences, like my intrinsic preferences, aren’t just going to be for conscious states. And so it may be that you need something like preference or interest to get value off the ground mattering, but the content of what interests us, or the content of what our preferences are, won’t just be the conscious states. So you can’t satisfy my preferences just by giving me conscious states, for example.

Lucas Perry: So I don’t share that intuition with you. I still don’t understand why you feel that something like a preference is good making. I guess that just comes down to intuitions. I mean, when someone could ask me, why do I think consciousness is the only thing that is good making, but I don’t know, what is a preference? It’s like a concept about some computational architecture that prefers some state of the world over another. But when you pass away, for example, your preference goes away. So why does it need to be respected still? I mean, we’re getting into some waters here, but is the short version of this that when you just do these philosophical thought experiments, that your intuitions aren’t satisfied by consciousness being necessary and sufficient for value?

Peter Railton: Well, all of our knowledge, whether it’s knowledge of value or the external world, we can push it back to a point at which, again, we can’t give some further derivation of the assumption that we’re making. And so my thinking here is that it seems to me extremely plausible that the one intelligible notion I can get of something like value is that there can be a subjectivity such that states of affairs can go better or worse for that subjectivity. And then value would consist in that, which makes the states of affairs better or worse for that individual. And then I asked myself, well, does that satisfy our concept of value?

Well, value should have various different features and we can list those. It should be something that when we understand it’s intrinsically motivating, it should apply to the sorts of things that we ordinarily identify as being values. It should capture a certain role in the guidance of action. It should be something like a goal in action. We should see it as structuring a behavior of individuals. And when I look at all those conditions, I think, yeah, this satisfies those conditions. It’s not a proof. It’s just saying that if we lay down the conditions that we would give for something satisfying the concept of value, these states do indeed satisfy those conditions and that many other candidate states don’t. But I can’t tell you for example, that you shouldn’t have some other concept of calue instead of value and ask what would satisfy calue in the same way that I can’t in the case of knowledge of the external world, give you a derivation of the importance of knowledge, as opposed to shmowledge, you can operate with the concept of knowledge and see what it requires and see whether it would apply to what we are doing.

But that’s not a proof that there isn’t another scheme of shmowledge of which the same thing could be said. So that’s where we get down to these fundamental assumptions and can they be non arbitrary? Well, they can, for example, if, when applying them, you can be put in a situation where you give them up. A concept that we had, that we thought we were happy with, turns out to be confused. Or it turns out that the only things that would satisfy the concept are things which we ordinarily think the concept doesn’t apply to. So we think there’s a mismatch between the criteria and the paradigm cases. So it’s not arbitrary if you’re willing to use it critically, but it can’t be proven.

Lucas Perry: Okay. Bit of a side path from where you were to Parfit. I was curious about what you really meant by how you guys were agreeing about value being some natural thing, instead of having to sprinkle value.

Peter Railton: The way I would put it, the disagreement that I have with Parfit is a disagreement at the conceptual level. Initially, at any rate, it looked like we had a conflict of opinions because it looked as if he was committed to their being in the world, these non-natural features, and that they somehow explained the role that value has in our lives. And I couldn’t understand what that would mean, but he was perfectly content to say, “No, the good making features are these natural features. They explain the role that value has in our lives, but our concept of value is a non-natural concept.” And what does it mean to say that? Well, the same situation, the same configuration of matter could be described with physical concepts, chemical concepts, biological concepts, Oh, it’s an “organism.” It could be described in social concepts. It’s a person. Any given situation can be characterized in various different conceptual systems.

And it can be argued, plausibly, that you can’t reduce, for example, the conceptual system of biology to the conceptual system of particle physics. Because biology deals in functions, reproduction, metabolism, and so on, and there’s no one to one correspondence, no easy correspondence, between those functions and any particular physical realization. You could have living beings made out of carbon. You could have living beings made out of silicon. So the concept of living being, the concept of an organism is a concept of biology. It’s a way to organize the description of the world and explanation and biology is conceptually not reducible to physics. That doesn’t mean biologists can ignore physics because they think, most anyway do, that what satisfies their biological concept are physical systems. And so it’s an important question, what kinds of physical system would satisfy these concepts like self replication and so on?

And so they do microbiology and they study the physical systems that do satisfy these concepts. But the point is that the conceptual system has a degree of autonomy from the physical system. And that even discovering that self-replicating molecules have a certain chemical composition in this world is not discovering that the concept of a self replicating organism is simply a physical concept. Parfit has the same view about normative concepts. He and I agree about what pain is and what makes pain bad, but he says you could describe a situation either, as you were saying, in terms of some physical or biochemical processes, or you could describe it as bad, or as good or something that ought not to exist. And that’s another level of conceptual characterization. And his thought is that that level of conceptual characterization can’t be reduced to the concepts of natural order.

So there is an element in normative concepts that’s always beyond what is translatable without loss of meaning into the natural. Once one recognizes that, then you can be as naturalistic as you like about the nature of value and also believe that the concept of value is a non-natural concept. Just as if you can be as physicalist as you like about the fundamental furniture of the universe and still believe that the biological level of description is not reducible to the physical level of description. You could say the same debate went on when people were thinking about life. 19th century, we find people thinking, well, there’s got to be this special elan vital or spirit or something like that. You can’t just take a bunch of matter and put it together and have life. By and by, biochemistry develops and people, actually you can’t put a bunch of matter together and have life.

And the same thing is true with value. You don’t need some value-vital, some kind of further substance to add to the world. You can put together the natural stuff of the world and get value. Once you frame it that way, then Parfit and I actually agree. Because when he talks about the irrreducibility of the normative, he really means, should mean, and I think agrees that he means, a conceptual irreproducibility. And once we establish that, then I can say, “Yes, I agree with you, normative concepts aren’t definable entirely in terms of non-normative concepts, they involve some idea of ought or some idea of value that isn’t present to the non-normative.” But my interest as a philosopher and metaethicist is an interest in what kinds of natural conditions satisfy these concepts and how that makes it possible for us to have knowledge in a non-normative conceptual scheme, like ethics or theory of value. So that’s where I do my work. His work is done in carefully distinguishing the concepts.

Lucas Perry: So there is reality as it is, there is the base reality, base metaphysics, call it ultimate reality or whatever, and all human conceptualization supervenes upon that because it’s couched within that context and is identical to it. Yet that conceptualization you argue is lossy with respect to ultimate reality, because it doesn’t necessarily carve reality at the joints, but that conceptual structure is still supervenient upon it. And at the level of conceptualization, there are facts about the world that can be satisfied or not satisfied that will make some proposition true or false.

Peter Railton: Yeah.

Lucas Perry: So you’re arguing that value isn’t part of metaphysical bedrock, but metaphysical bedrock creates neural architectures that create concepts that contain within them necessary and sufficient conditions for being satisfied. And when agents are able to gain clarity with one another over concepts and satisfying necessary and sufficient conditions, then they can have concrete discussions about ethics.

Peter Railton: That would be one common basis. And so the image that Parfit gave in his first volume of On What Matters was that he thought, ultimately, you could see the utilitarian and the Kantian as climbing two different sides of the same mountain so that they would eventually meet at the summit. I suggested to him, well in metaethics, the same as the case, I’m a naturalist, I’m climbing one side of the mountain. You’re a non-naturalist, you’re climbing the other side of the mountain.

But as our views develop, and as we understand better the different elements of the views, then actually they’re going to come such that as we approach the summit, we aren’t really disagreeing with each other. And he accepted that picture. I would only add to what you were saying by way of summary. Our concepts typically don’t give us necessary and sufficient conditions, they are more open ended and open textured than that. And that’s part of why we can have unending debates about questions like value and so on.

But you mentioned truth and might say, truth is another very good example of a concept that’s not reducible to a concept of physics. Because true presupposes representations, and representations are characterizable not in terms of their physical constituents, but in terms of their role in thought. And so people who are skeptical about value because they say, “I don’t see where value is in the world,” they should be equally skeptical about truth. Because truth is not some new substance we add. If there’s a representation and it accurately reflects the world, then we have truth. So true, again, is a relational matter between a subject something, like a representation and this case, a state of the world, and it’s when that relationship obtains that you get truth.

Lucas Perry: Right, but that’s truth in the epistemological agent centered sense, but then there’s the more metaphysical view about truth, where there are mind independent facts. And they’re true, whether or not we know anything about them. Maybe the same distinction is important here to make. There are potentially moral truths within the conceptual framework that we’re participating in. And it feels weird to me to call that moral realism. But then there’s another claim where there’s mind independent truths about morality like that there’s an intrinsic quality to suffering that is what bad means. Does that make sense?

Peter Railton: I think you’ve put things in a very good way. One of the features of the setup that I was describing is that it’s very easy to slide from a position that for example, whenever a value judgment obtains, then some or other natural state obtains, it’s very easy to slide from that to thinking that the natural state actually is the normative fact. It doesn’t satisfy the concept. And so you could have the concept of the good, and it could be that there are eternal truths about good I suspect. That’s a reasonable candidate, just as there can be eternal truths in mathematics. The claim isn’t that the conceptual domain is somehow identical with the natural domain. It supervenes, but it’s not a relationship of identity. And the language in which those claims are stated, and the way in which we adjudicate them might be as in the case of mathematical claims, it could be quite a priori.

And that’s where Parfit’s view and mine differ and Singer’s likewise, because he thinks you can do this a priori in a way that I don’t think you can, but that’s a question in epistemology. It doesn’t require a different metaphysics in order to have that view. So you can be a physicalist and believe that there is mathematical truth. And that’s because, for example, you think that mathematical truths are true via a set of axioms, definitions, rules of inference. And so they are made true not by distributions of molecules, but by logical relationships that can be specified in terms of axioms and rules.

Lucas Perry: Okay. So I feel a little bit confused still about why your view is a kind of moral realism if it requires no strong metaphysical view. Whereas other moral realists that I’m familiar with hold a strong metaphysical view about suffering in consciousness and joy in consciousness as being the intrinsic valence carriers of value.

Peter Railton: Well, I’m not sure about the last part of your question. I’ll have to think about how to interpret that. But am I a realist about organisms, if I believe that the concept of an organism is distinct from any particular physical instantiation? Am I prevented from being a realist about organisms because I think the organismic level of description is irreducible to the physical level of description? You see, no, actually, because you think that the concept organism is satisfied by some physical system, you’re a realist about organisms, you think there are organisms. To me that’s a perfectly realistic position. And you realist or non-realistic would say, “Well, I guess there aren’t any organisms then, because they’re not part of the fundamental furniture of the universe.”

And I’d say, “Think of what an organism is. It’s not a piece of furniture, it’s a functionally organized arrangement. And because it’s functionally organized, it doesn’t correspond to any particular material, something or other. And for there to exist organisms is for there to exist the conditions such that the concept of organism is satisfied.” And that’s of course what most biologists believe. And so most biologists are realists about organisms.

Lucas Perry: If your intuitions changed about the reducibility of higher levels of knowledge to lower levels of knowledge, how would that affect your moral views? For example, there are views say like concepts in biology about reproduction and organisms and concepts like life are lossy when it comes to the actual furniture of reality. And that they don’t actually completely describe how things are and the concepts don’t carve reality at the joints. So they provide predictions about the world, but it all should be and is only best described by particle physics for example. One might say an organism is a concept, though it does not carve reality at the joints. And the best understanding of it is at the level of particle physics. So taking a realist position about conceptual fictions is dubious to define reality as whenever some concept I have is satisfied.

Peter Railton: What you’re pointing to is a very interesting problem. I would say that biological concepts do carve things at the joint, because the biological level of organization yields a whole systematic set of laws and principles that turn out to be true in our universe. It’s far from being an arbitrary stipulation or a fiction that something’s an organism and a tremendous amount follows from things being organisms and self-replicating and so on. And we have very elaborate theories about populations, mathematics and populations.

Lucas Perry: And are those laws though not reducible to other laws?

Peter Railton: That’s the idea that reducibility is the wrong concept to have here. Because the laws of population are laws that have to do with variables that aren’t fundamental variables of physics. They have to do with, for example, issues about reproducibility, availability of resources and so on, and what counts as a resource depends upon the nature of the organism. So there’s a level of organization, similarly in chemistry.

Lucas Perry: But what if those variables are just the shape of lower level things?

Peter Railton: Well, they won’t be, because if they were self-reproducing silicon-based organisms, they would obey similar population dynamics. Those principles govern functional organizations. So once you have self-replication, you have mutation, you have differential selection and so on, you’ll get certain principles, whatever physical realization there is.

Lucas Perry: But it really doesn’t make sense to me that these higher level laws would not be completely supervenient on fundamental forces of nature.

Peter Railton: Oh, they’re supervenient, definitely. But supervenience does not imply reducibility, that’s really critical in this domain. And again, this is the problem that I think has led to a lot of confusion in this domain. A feature that is supervenient upon fundamental physics is perhaps part of a system of laws that provides joints in nature. Because if you went to another world and you found a form of life that had these basic features of self-replication, mutation, selection, you would expect to find similar population dynamics to Earth. And that similarity is a biological similarity. It’s not a similarity in terms of the basic physics of the situation. The physics are the same, but the constitution of these organisms is very different. And so you couldn’t infer from understanding just the physics that there would be this biological regularity. That’s what it means to say that it’s supervenient, but not reducible.

Again, I think you can be a realist about organisms because organism really is a concept that carves nature at the joints. And so we would be able to export our theory of organisms to worlds in which carbon was not abundant and self-replication was built out of something else. And that’s a way in which nature is lawfully organized, supports counterfactuals, supports explanations. And so that’s a way of thinking about what it means to say it’s supervenient, but not reducible. And I think the same thing is true with moral distinctions. And that’s why they’re learnable. That’s why infants can learn moral distinctions, even without being given moral concepts.

Lucas Perry: Yeah. So that’s why I’m pushing on this point. Now that makes more sense to me in terms of moral statements, but when trying to make physical claims about how reality is, I feel more confused here and maybe it’s messing me up in other places. If all of the causality is governed by fundamental forces, then surely all concepts that try try to map out the world that is being governed by fundamental causality. All the laws that are derived at higher levels must be completely reducible to and supervenient upon or lossy to some extent with relation to the fundamental causal forces. I don’t think the claims is that for example principles of biology in life are causal in themselves. They’re more like laws that we use to make predictions, but predictions about systems that are running on the fundamental laws of nature. The complex aggregation of those laws must aggregate in some way to come close to those laws of biology. What is wrong with this picture?

Peter Railton: Well, there may be nothing wrong with it. I think the laws of biology are not just descriptive. I think they support explanations and that they are used, not just to redescribe reality, but they’re used to construct theories that show structure in reality that’s extremely important structure. And that would not be visible simply if you were allowed only predicates of fundamental physics. I guess I would say from the standpoint of explanation, biology affords many explanations. Suppose somebody wants to know why the material that happens to be in my body is where it is right now. Well, there is some very complicated story at the level of fundamental microphysics following all of these molecules, but it doesn’t look like anything at all. Whereas if you can give an explanation in terms of evolution and social dynamics as to how these molecules got here, you may have a much more compact comprehensible and understanding grasp of the world.

So I think biology affords us distinct modes of understanding and explanation, so does psychology, so does chemistry. One of the features of knowledge is that reality is organized at various levels in systems that are lawful systems and that support explanation and intervention and causation, but there’s no reason not to call this causation.

And so if somebody is describing the spread of the pandemic and they say, “Well, it’s partially caused by the transmutability of the virus, which is higher than that of the bird flu,” we’ll say, “Yes, that’s a causal factor in the spread of the virus and why these particular molecules are located in the world where they are.” And that’s a very powerful explanation. And if someone were just giving you a readout of the positions and momenta of all the different molecules of the world, you would not see this pattern and you’d have less understanding of the situation.

Lucas Perry: So tying this into your metaethics here, our ethical concepts are causally supervening on fundamental forces on physics. We’ve inherited them via evolution and they run on physics. But these concepts do not reduce to natural facts. There’s no goodness or badness built into the fundamental nature of the universe. These concepts are merely causal expressions of the universe playing out. And within the realm of this conceptualization, you can have truths about morality in the same way that you can have truths about biological organisms. And there’s a relationship here between what you might believe to be true about conceptualization and science and the epistemic status of concepts in general would also bear some information here on how one might think about the epistemic status of moral concepts.

Peter Railton: Yeah. Or thinking in terms of algorithms and systems. The systems, theoretic perspective gives you a lot of very well organized understanding as you grasp the algorithms that’s at work and so on, but algorithm is not a concept to fundamental physics.

Lucas Perry: Right. So it’s your view that moral facts, moral claims within conceptualization hold the same epistemic status as claims about algorithms and biological organisms and claims that we might make and in things like chemistry or biology, which are at a higher level of abstraction than particle physics.

Peter Railton: Yes. And that’s what would be called the naturalist. And it’s why someone like Peter Singer is a non-naturalist. He thinks the epistemology of moral judgments is a priori. And similarly with Derek Parfit, and they think it’s an intuitive epistemology, and they think that the two go together because they believe in something called rational intuition. And I’m inclined to think of intuition the way we were describing it earlier on. Namely, it’s a complex body of knowledge that isn’t organized into simple principles, but that replicates an important set of morally relevant relations. And that that’s really what intuition is. That when we have these intuitions, it’s that kind of knowledge, the way they’re grammatical intuitions or knowledge like that. So we disagree about the epistemology, but Parfit at least, and I’m not sure what Peter Singer would say. Our disagreement’s not metaphysical.

Lucas Perry: Right. I think the only place it seems like there would be space for a metaphysical disagreement would be in there being a kind of intrinsic good quality to pleasure and intrinsic bad quality to suffering that existed prior to conceptualization.

Peter Railton: I don’t think anything about the badness of pain depends on our concepts. I think that pain was bad in the first organisms that felt pain. And if humans had never evolved and the concept of pain had never come into existence and the concept of bad had never come to existence, it would still be bad for these organisms to suffer roasting to death in a world that desiccated or something like that? Our concepts allow us to talk about these features. The word concept comes from two words, con meaning with and kept, which is a term for grasp. And a concept is what we use to grasp features of the universe, not to create them.

Lucas Perry: So there already would have been some computational structure that would have evaluated something as bad?

Peter Railton: It would have made it the case that this was bad for that organism. Yes, that’s right.

Lucas Perry: And that doesn’t bring consciousness into it or anything, that could be strictly computational?

Peter Railton: Thus far yeah. And there’s a big debate about whether states have to be conscious in order for them to have disvalue. And one of the reasons for thinking about that is because we’re thinking about the animal kingdom and we aren’t sure how deep into the animal ancestry of humans consciousness goes. I myself don’t think that consciousness is essential, but I recognize that that’s one position among many.

Lucas Perry: Yeah. I happen to think that it is. But I would like now to wrap up and integrate this discussion on metaethical epistemology into the broader conversation. So we’ve talkedhere a lot about what your meta ethics is and the epistemology that is entailed by it, and also that of other peoples. That is related to moral learning of course, because a proper moral epistemology is the vehicle by which one would obtain normative or metaethical moral knowledge. So how do you view or integrate this thinking that we’ve gone through here to the question of AI alignment. On one hand, if we were Singer or Parfit, we might think if we just build something that’s sufficiently rational, whatever that means that axioms of morality would be intuitively accessible to such a machine system seems strange to say, but they would be intuitively accessible as well as axioms of mathematics. Whereas with your view I’m not quite sure what happens. So maybe you could explore this all a little for us.

Peter Railton: So if I could have my wish here, it would be that by getting an understanding of the metaethical landscape and which problems are metaphysical, which ones aren’t, which problems are epistemic and how they are tractable in various ways, the temptation towards skepticism in morality would at least be a bit weakened. People would see how it would be possible for us to have moral knowledge, of course imperfect and evolving. They would understand therefore how it could be possible for other systems to have moral knowledge. And we could talk concretely about the kinds of processes by which infants for example acquire moral knowledge, and think about how systems could go through similar kinds of processes and inquire a core moral competency as I think they can. Skepticism about morality I think has for a long time plagued the discipline, because it’s been hard for people to see how we could have something like moral knowledge.

And that’s been tied up with a picture of value and the nature of value as a unusual kind of something or other. As something that’s not part of the way in which the world is put together. And so how would we ever have any kind of knowledge of it? And since we can’t derive it from self-evident axioms, we have to be subjectivists. That I would hope to have had just a small effect in making that a somewhat less plausible position. Because I do think there’s an important constructive project here and is already underway in developmental psychology. And for example, people working around Josh Tenenbaum are working on this as a learning question. There’s a lot of promise to understanding intuition in terms of deep learning and understanding moral competency in Bayesian terms. So I think there’s a tremendous future for coming to have theories of moral learning. I’m glad psychologists have started using this phrase and that by giving a theory of it, that sorts out the metaethical landscape in the ways we’ve been describing, that seems more plausible.

Lucas Perry: Right. So in summary then the feeling that you have here is that people hold and walk around with common sense intuitions about normative and metaethical thinking and what those things are. And that there is a more solid foundation for whatever moral realism might be in understanding around these issues. And that there can be strong convergence and formalization around moral learning. And then the integration of moral learning to machine systems, which would make them sensitive to morally relevant features and thus make them socially, societally, civilizationally competent to be able to exist in an ecosystem of agents with more or less altruistic, malevolent, and benevolent values.

Peter Railton: And that we will need such systems badly as allies in the years to come. If I could just add one thought, someone’s going to say, “But don’t we have to have some priors about pain being bad, about positive some interactions being good.” Now you have to have some priors in order to engage in moral learning. And I would say we have to have priors to engage in any kind of learning. And what rationality and learning consist in is how do we use subsequent experience and evidence and argument to revise the priors and go on to create new priors and then apply evidence and argument and reasoning to those. That’s what rationality is, it’s not starting from scratch with self-evident principles that we just don’t happen to know. That’s what Bayesians say rationality is with response to science and the gathering of evidence. That’s a picture of rationality in which we can be rational beings and we can be more rational the more we are able to subject our priors to critical scrutiny and expose them to different kinds of more diverse representative forms of evidence and reasoning.

And I think the same thing is true with moral priors and rationality in the moral case doesn’t consist in seeing it as a self-evident set of axioms, because I don’t think one can. But starting with priors and then learning from experience, argument deliberation together, in that sense rationality in the two spheres is essentially very similar.

Lucas Perry: All right, Peter, thanks so much for your perspective here in sharing all of this. Is there anything else here, any last thoughts you’d like to say, anything you feel unresolved about?

Peter Railton: I would like to have more engagement between philosophers and the AI alignment community. I think it’s one of the most important problems we face as a culture and it’s an urgent problem. And it’s painful to me that philosophers are not as alive to it as they should be. I just want to invite anyone who’s out there working on the problem. Please, let’s try to make contact, not necessarily with me, but with other philosophers. And let’s try to build a constructive community here. Because for too long philosophy has been in the situation of folding its arms and sort of poo-pooing artificial intelligence or artificial ethics. And if that view has merit and it does have merit in many areas, AI gets over-hyped a lot, AI people will tell you that.

But there’s this other side, which is what has been accomplished, what has been constructed, what’s been shown to be possible. And how can we build on that? And there I think there’s a lot of opportunity for constructive interaction. So that would be my parting thought that this is a time when urgent work in this area is needed. Let’s bring all the resources we can to bear on it.

Lucas Perry: All right, beautiful thoughts to end on then. If people want to follow you on social media or get in contact with you, how’s the best way to do that?

Peter Railton: Well, I’m not on social media. The best way would be to reach me via email, which is prailton@umich.edu. I get a lot of email. I can’t promise I’ll respond quickly to emails, I wish I could. But I don’t want philosophy to lose the chance to be part of this important process.

Lucas Perry: All right. And if people want to check out your papers or work?

Peter Railton: I’m supposed to be building a website. I may succeed in doing so. Many of the papers are available. People have put them up in various ways. So if you go to Google Scholar, you can find many of my papers. And I also want to put in a plug for those philosophers who have heroically been working on these questions. They’ve done a great deal of work and we should be grateful for what they’ve accomplished. But yes, if people want to find my work, if they can’t get access to it, let me know and I’ll make the papers available.

Lucas Perry: All right, thanks again, Peter. It’s been really informative and I appreciate you coming on.

Peter Railton: Great. I appreciate your questions and your patience. This has been very helpful conversation for me as well.

 

 

End of recorded material

Evan Hubinger on Inner Alignment, Outer Alignment, and Proposals for Building Safe Advanced AI

 Topics discussed in this episode include:

  • Inner and outer alignment
  • How and why inner alignment can fail
  • Training competitiveness and performance competitiveness
  • Evaluating imitative amplification, AI safety via debate, and microscope AI

 

Timestamps: 

0:00 Intro 

2:07 How Evan got into AI alignment research

4:42 What is AI alignment?

7:30 How Evan approaches AI alignment

13:05 What are inner alignment and outer alignment?

24:23 Gradient descent

36:30 Testing for inner alignment

38:38 Wrapping up on outer alignment

44:24 Why is inner alignment a priority?

45:30 How inner alignment fails

01:11:12 Training competitiveness and performance competitiveness

01:16:17 Evaluating proposals for building safe and advanced AI via inner and outer alignment, as well as training and performance competitiveness

01:17:30 Imitative amplification

01:23:00 AI safety via debate

01:26:32 Microscope AI

01:30:19 AGI timelines and humanity’s prospects for succeeding in AI alignment

01:34:45 Where to follow Evan and find more of his work

 

Works referenced: 

Risks from Learned Optimization in Advanced Machine Learning Systems

An overview of 11 proposals for building safe advanced AI 

Evan’s work at the Machine Intelligence Research Institute

Twitter

GitHub

LinkedIn

Facebook

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the AI Alignment Podcast. I’m Lucas Perry. Today we have a conversation with Evan Hubinger about ideas in two works of his: An overview of 11 proposals for building safe advanced AI and Risks from Learned Optimization in Advanced Machine Learning Systems. Some of the ideas covered in this podcast include inner alignment, outer alignment, training competitiveness, performance competitiveness, and how we can evaluate some highlighted proposals for safe advanced AI with these criteria. We especially focus in on the problem of inner alignment and go into quite a bit of detail on that. This podcast is a bit jargony, but if you don’t have a background in computer science, don’t worry. I don’t have a background in it either and Evan did an excellent job making this episode accessible. Whether you’re an AI alignment researcher or not, I think you’ll find this episode quite informative and digestible. I learned a lot about a whole other dimension of alignment that I previously wasn’t aware of, and feel this helped to give me a deeper and more holistic understanding of the problem. 

Evan Hubinger was an AI safety research intern at OpenAI before joining MIRI. His current work is aimed at solving inner alignment for iterated amplification. Evan was an author on “Risks from Learned Optimization in Advanced Machine Learning Systems,” was previously a MIRI intern, designed the functional programming language Coconut, and has done software engineering work at Google, Yelp, and Ripple. Evan studied math and computer science at Harvey Mudd College.

And with that, let’s get into our conversation with Evan Hubinger.

In general, I’m curious to know a little bit about your intellectual journey, and the evolution of your passions, and how that’s brought you to AI alignment. So what got you interested in computer science, and tell me a little bit about your journey to MIRI.

Evan Hubinger: I started computer science when I was pretty young. I started programming in middle school, playing around with Python, programming a bunch of stuff in my spare time. The first really big thing that I did, I wrote a functional programming language on top of Python. It was called Rabbit. It was really bad. It was interpreted in Python. And then I decided I would improve on that. I wrote another functional programming language on top of Python, called Coconut. Got a bunch of traction.

This was while I was in high school, starting to get into college. And this was also around the time I was reading a bunch of the sequences on LessWrong. I got sort of into that, and the rationality space, and I was following it a bunch. I also did a bunch of internships at various tech companies, doing software engineering and, especially, programming languages stuff.

Around halfway through my undergrad, I started running the Effective Altruism Club at Harvey Mudd College. And as part of running the Effective Altruism Club, I was trying to learn about all of these different cause areas, and how to use my career to do the most good. And I went to EA Global, and I met some MIRI people there. They invited me to do a programming internship at MIRI, where I did some engineering stuff, functional programming, dependent type theory stuff.

And then, while I was there, I went to the MIRI Summer Fellows program, which is this place where a bunch of people can come together and try to work on doing research, and stuff, for a period of time over the summer. I think it’s not happening now because of the pandemic, but it hopefully will happen again soon.

While I was there, I encountered some various different information, and people talking about AI safety stuff. And, in particular, I was really interested in this, at that time people were calling it, “optimization daemons.” This idea that there could be problems when you train a model for some objective function, but you don’t actually get a model that’s really trying to do what you trained it for. And so with some other people who were at the MIRI Summer Fellows program, we tried to dig into this problem, and we wrote this paper, Risks from Learned Optimization in Advanced Machine Learning Systems.

Some of the stuff I’ll probably be talking about in this podcast came from that paper. And then as a result of that paper, I also got a chance to work with and talk with Paul Christiano, at OpenAI. And he invited me to apply for an internship at OpenAI, so after I finished my undergrad, I went to OpenAI, and I did some theoretical research with Paul, there.

And then, when that was finished, I went to MIRI, where I currently am. And I’m doing sort of similar theoretical research to the research I was doing at OpenAI, but now I’m doing it at MIRI.

Lucas Perry: So that gives us a better sense of how you ended up in AI alignment. Now, you’ve been studying it for quite a while from a technical perspective. Could you explain what your take is on AI alignment, and just explain what you see as AI alignment?

Evan Hubinger: Sure. So I guess, broadly, I like to take a general approach to AI alignment. I sort of see the problem that we’re trying to solve as the problem of AI existential risk. It’s the problem of: it could be the case that, in the future, we have very advanced AIs that are not aligned with humanity, and do really bad things. I see AI alignment as the problem of trying to prevent that.

But there are, obviously, a lot of sub-components to that problem. And so, I like to make some particular divisions. Specifically, one of the divisions that I’m very fond of, is to split it between these concepts called inner alignment and outer alignment, which I’ll talk more about later. I also think that there’s a lot of different ways to think about what the problems are that these sorts of approaches are trying to solve. Inner alignment, outer alignment, what is the thing that we’re trying to approach, in terms of building an aligned AI?

And I also tend to fall into the Paul Christiano camp of thinking mostly about intent alignment, where the goal of trying to build AI systems, right now, as a thing that we should be doing to prevent AIs from being catastrophic, is focusing on how do we produce AI systems which are trying to do what we want. And I think that inner and outer alignment are the two big components of producing intent aligned AI systems. The goal is to, hopefully, reduce AI existential risk and make the future a better place.

Lucas Perry: Do the social, and governance, and ethical and moral philosophy considerations come much into this picture, for you, when you’re thinking about it?

Evan Hubinger: That’s a good question. There’s certainly a lot of philosophical components to trying to understand various different aspects of AI. What is intelligence? How do objective functions work? What is it that we actually want our AIs to do at the end of the day?

In my opinion, I think that a lot of those problems are not at the top of my list in terms of what I expect to be quite dangerous if we don’t solve them. I think a large part of the reason for that is because I’m optimistic about some of the AI safety proposals, such as amplification and debate, which aim to produce a sort of agent, in the case of amplification, which is trying to do what a huge tree of humans would do. And then the problem reduces to, rather than having to figure out, in the abstract, what is the objective that we should be trying to train an AI for, that, philosophically, we think would be utility maximizing, or good, or whatever, we can just be like, well, we trust that a huge tree of humans would do the right thing, and then sort of defer the problem to this huge tree of humans to figure out what, philosophically, is the right thing to do.

And there are similar arguments you can make with other situations, like debate, where we don’t necessarily have to solve all of these hard philosophical problems, if we can make use of some of these alignment techniques that can solve some of these problems for us.

Lucas Perry: So let’s get into, here, your specific approach to AI alignment. How is it that you approach AI alignment, and how does it differ from what MIRI does?

Evan Hubinger: So I think it’s important to note, I certainly am not here speaking on behalf of MIRI, I’m just presenting my view, and my view is pretty distinct from the view of a lot of other people at MIRI. So I mentioned at the beginning that I used to work at OpenAI, and I did some work with Paul Christiano. And I think that my perspective is pretty influenced by that, as well, and so I come more from the perspective of what Paul calls prosaic AI alignment. Which is the idea of, we don’t know exactly what is going to happen, as we develop AI into the future, but a good operating assumption is that we should start by trying to solve AI for AI alignment, if there aren’t major surprises on the road to AGI. What if we really just scale things up, we sort of go via the standard path, and we get really intelligent systems? Would we be able to align AI in that situation?

And that’s the question that I focus on the most, not because I don’t expect there to be surprises, but because I think that it’s a good research strategy. We don’t know what those surprises will be. Probably, our best guess is it’s going to look something like what we have now. So if we start by focusing on that, then hopefully we’ll be able to generate approaches which can successfully scale into the future. And so, because I have this sort of general research approach, I tend to focus more on: What are current machine learning systems doing? How do we think about them? And how would we make them inner aligned and outer aligned, if they were sort of scaled up into the future?

This is in contrast with the way I think a lot of other people at MIRI view this. I think a lot of people at MIRI think that if you go this route of prosaic AI, current machine learning scaled up, it’s very unlikely to be aligned. And so, instead, you have to search for some other understanding, some other way to potentially do artificial intelligence that isn’t just this standard, prosaic path that would be more easy to align, that would be safer. I think that’s a reasonable research strategy as well, but it’s not the strategy that I generally pursue in my research.

Lucas Perry: Could you paint a little bit more detailed of a picture of, say, the world in which the prosaic AI alignment strategy sees as potentially manifesting where current machine learning algorithms, and the current paradigm of thinking in machine learning, is merely scaled up, and via that scaling up, we reach AGI, or superintelligence?

Evan Hubinger: I mean, there’s a lot of different ways to think about what does it mean for current AI, current machine learning, to be scaled up, because there’s a lot of different forms of current machine learning. You could imagine even bigger GPT-3, which is able to do highly intelligent reasoning. You could imagine we just do significantly more reinforcement learning in complex environments, and we end up with highly intelligent agents.

I think there’s a lot of different paths that you can go down that still fall into the category of prosaic AI. And a lot of the things that I do, as part of my research, is trying to understand those different paths, and compare them, and try to get to an understanding of… Even within the realm of prosaic AI, there’s so much happening right now in AI, and there’s so many different ways we could use current AI techniques to put them together in different ways to produce something potentially superintelligent, or highly capable and advanced. Which of those are most likely to be aligned? Which of those are the best paths to go down?

One of the pieces of research that I published, recently, was an overview and comparison of a bunch of the different possible paths to prosaic AGI. Different possible ways in which you could build advanced AI systems using current machine learning tools, and trying to understand which of those would be more or less aligned, and which would be more or less competitive.

Lucas Perry: So, you’re referring now, here, to this article, which is partly a motivation for this conversation, which is An Overview of 11 Proposals for Building Safe Advanced AI.

Evan Hubinger: That’s right.

Lucas Perry: All right. So, I think it’d be valuable if you could also help to paint a bit of a picture here of exactly the MIRI style approach to AI alignment. You said that they think that, if we work on AI alignment via this prosaic paradigm, that machine learning scaled up to superintelligence or beyond is unlikely to be aligned, so we probably need something else. Could you unpack this a bit more?

Evan Hubinger: Sure. I think that the biggest concern that a lot of people at MIRI have with trying to scale up prosaic AI is also the same concern that I have. There’s this really difficult, pernicious problem, which I call inner alignment, which is presented in the Risks from Learned Optimization paper that I was talking about previously, which I think many people at MIRI, as well as me, think that this inner alignment problem is the key stumbling block to really making prosaic AI work. I agree. I think that this is the biggest problem. But I’m more optimistic, in terms of, I think that there are possible approaches that we can take within the prosaic paradigm that could solve this inner alignment problem. And I think that is the biggest point of difference, is how difficult will inner alignment be?

Lucas Perry: So what that looks like is a lot more foundational work, and correct me if I’m wrong here, into mathematics, and principles in computer science, like optimization and what it means for something to be an optimizer, and what kind of properties that has. Is that right?

Evan Hubinger: Yeah. So in terms of some of the stuff that other people at MIRI work on, I think a good starting point would be the embedded agency sequence on the alignment forum, which gives a good overview of a lot of the things that the different Agent Foundations people, like Scott Garrabrant, Sam Eisenstat, Abram Demski, are working on.

Lucas Perry: All right. Now, you’ve brought up inner alignment as a crucial difference, here, in opinion. So could you unpack exactly what inner alignment is, and how it differs from outer alignment?

Evan Hubinger: This is a favorite topic of mine. A good starting point is trying to rewind, for a second, and really understand what it is that machine learning does. Fundamentally, when we do machine learning, there are a couple of components. We start with a parameter space of possible models, where a model, in this case, is some parameterization of a neural network, or some other type of parameterized function. And we have this large space of possible models, this large space of possible parameters, that we can put into our neural network. And then we have some loss function where, for a given parameterization for a particular model, we can check what is its behavior like on some environment. In supervised learning, we can ask how good are its predictions that it outputs. In an RL environment, we can ask how much reward does it get, when we sample some trajectory.

And then we have this gradient descent process, which samples some individual instances of behavior of the model, and then it tries to modify the model to do better in those instances. We search around this parameter space, trying to find models which have the best behavior on the training environment. This has a lot of great properties. This has managed to propel machine learning into being able to solve all of these very difficult problems that we don’t know how to write algorithms for ourselves.

But I think, because of this, there’s a tendency to rely on something which I call the does-the-right-thing abstraction. Which is that, well, because the model’s parameters were selected to produce the best behavior, according to the loss function, on the training distribution, we tend to think of the model as really trying to minimize that loss, really trying to get rewarded.

But in fact, in general, that’s not the case. The only thing that you know is that, on the cases where I sample data on the training distribution, my models seem to be doing pretty well. But you don’t know what the model is actually trying to do. You don’t know that it’s truly trying to optimize the loss, or some other thing. You just know that, well, it looked like it was doing a good job on the training distribution.

What that means is that this abstraction is quite leaky. There’s many different situations in which this can go wrong. And this general problem is referred to as robustness, or distributional shift. This problem of, well, what happens when you have a model, which you wanted it to be trying to minimize some loss, but you move it to some other distribution, you take it off the training data, what does it do, then?

And I think this is the starting point for understanding what is inner alignment, is from this perspective of robustness, and distributional shift. Inner alignment, specifically, is a particular type of robustness problem. And it’s the particular type of robustness problem that occurs when you have a model which is, itself, an optimizer.

When you do machine learning, you’re searching over this huge space of different possible models, different possible parameterizations of a neural network, or some other function. And one type of function which could do well on many different environments, is a function which is running a search process, which is doing some sort of optimization. You could imagine I’m training a model to solve some maze environment. You could imagine a model which just learns some heuristics for when I should go left and right. Or you could imagine a model which looks at the whole maze, and does some planning algorithm, some search algorithm, which searches through the possible paths and finds the best one.

And this might do very well on the mazes. If you’re just running a training process, you might expect that you’ll get a model of this second form, that is running this search process, that is running some optimization process.

In the Risks from Learned Optimization paper, we call models which are, themselves, running search processes mesa-optimizers, where “mesa” is just Greek, and it’s the opposite of meta. There’s a standard terminology in machine learning, this meta-optimization, where you can have an optimizer which is optimizing another optimizer. In mesa-optimization, it’s the opposite. It’s when you’re doing gradient descent, you have an optimizer, and you’re searching over models, and it just so happens that the model that you’re searching over happens to also be an optimizer. It’s one level below, rather than one level above. And so, because it’s one level below, we call it a mesa-optimizer.

And inner alignment is the question of how do we align the objectives of mesa-optimizers. If you have a situation where you train a model, and that model is, itself, running an optimization process, and that optimization process is going to have some objective. It’s going to have some thing that it’s searching for. In a maze, maybe it’s searching for: how do I get to the end of the maze? And the question is, how do you ensure that that objective is doing what you want?

If we go back to the does-the-right-thing abstraction, that I mentioned previously, it’s tempting to say, well, we trained this model to get to the end of the maze, so it should be trying to get to the end of the maze. But in fact, that’s not, in general, the case. It could be doing anything that would be correlated with good performance, anything that would likely result in: in general, it gets to the end of the maze on the training distribution, but it could be an objective that will do anything else, sort of off-distribution.

That fundamental robustness problem of, when you train a model, and that model has an objective, how do you ensure that that objective is the one that you trained it for? That’s the inner alignment problem.

Lucas Perry: And how does that stand, in relation with the outer alignment problem?

Evan Hubinger: So the outer alignment problem is, how do you actually produce objectives which are good to optimize for?

So the inner alignment problem is about aligning the model with the loss function, the thing you’re training for, the reward function. Outer alignment is aligning that reward function, that loss function, with the programmer’s intentions. It’s about ensuring that, when you write down a loss, if your model were to actually optimize for that loss, it would actually do something good.

Outer alignment is the much more standard problem of AI alignment. If you’ve been introduced to AI alignment before, you’ll usually start by hearing about the outer alignment concerns. Things like paperclip maximizers, where there’s this problem of, you try to train it to do some objective, which is maximize paperclips, but in fact, maximizing paperclips results in it doing all of this other stuff that you don’t want it to do.

And so outer alignment is this value alignment problem of, how do you find objectives which are actually good to optimize? But then, even if you have found an objective which is actually good to optimize, if you’re using the standard paradigm of machine learning, you also have this inner alignment problem, which is, okay, now, how do I actually train a model which is, in fact, going to do that thing which I think is good?

Lucas Perry: That doesn’t bear relation with Stuart’s standard model, does it?

Evan Hubinger: It, sort of, is related to Stuart Russell’s standard model of AI. I’m not referring to precisely the same thing, but it’s very similar. I think a lot of the problems that Stuart Russell has with the standard paradigm of AI are based on this: start with an objective, and then train a model to optimize that objective. When I’ve talked to Stuart about this, in the past, he has said, “Why are we even doing this thing of training models, hoping that the models will do the right thing? We should be just doing something else, entirely.” But we’re both pointing at different features of the way in which current machine learning is done, and trying to understand what are the problems inherent in this sort of machine learning process? I’m not making the case that I think that this is an unsolvable problem. I mean, it’s the problem I work on. And I do think that there are promising solutions to it, but I do think it’s a very hard problem.

Lucas Perry: All right. I think you did a really excellent job, there, painting the picture of inner alignment and outer alignment. I think that in this podcast, historically, we have focused a lot on the outer alignment problem, without making that super explicit. Now, for my own understanding, and, as a warning to listeners, my basic machine learning knowledge is something like an Orc structure, hobbled together with sheet metal, and string, and glue. And gum, and rusty nails, and stuff. So, I’m going to try my best, here, to see if I understand everything here about inner and outer alignment, and the basic machine learning model. And you can correct me if I get any of this wrong.

So, in terms of inner alignment, there is this neural network space, which can be parameterized. And when you do the parameterization of that model, the model is the nodes, and how they’re connected, right?

Evan Hubinger: Yeah. So the model, in this case, is just a particular parameterization of your neural network, or whatever function, approximated, that you’re training. And it’s whatever the parameterization is, at the moment we’re talking about. So when you deploy the model, you’re deploying the parameterization you found by doing huge amounts of training, via gradient descent, or whatever, searching over all possible parameterizations, to find one that had good performance on the training environment.

Lucas Perry: So, that model being parameterized, that’s receiving inputs from the environment, and then it is trying to minimize the loss function, or maximize reward.

Evan Hubinger: Well, so that’s the tricky part. Right? It’s not trying to minimize the loss. It’s not trying to maximize the reward. That’s this thing which I call the does-the-right-thing abstraction. This leaky abstraction that people often rely on, when they think about machine learning, that isn’t actually correct.

Lucas Perry: Yeah, so it’s supposed to be doing those things, but it might not.

Evan Hubinger: Well, what does “supposed to” mean? It’s just a process. It’s just a system that we run, and we hope that it results in some particular outcome. What it is doing, mechanically, is we are using a gradient descent process to search over the different possible parameterizations, to find parameterizations which result in good behavior on the training environment.

Lucas Perry: That’s good behavior, as measured by the loss function, or the reward function. Right?

Evan Hubinger: That’s right. You’re using gradient descent to search over the parameterizations, to find a parameterization which results in a high reward on the training environment.

Lucas Perry: Right, but, achieving the high reward, what you’re saying, is not identical with actually trying to minimize the loss.

Evan Hubinger: Right. There’s a sense in which you can think of gradient descent as trying to minimize the loss, because it’s selecting for parameterizations which have the lowest possible loss that it can find, but we don’t know what the model is doing. All we know is that the model’s parameters were selected, by gradient descent, to have good training performance; to do well, according to the loss, on the training distribution. But what they do off-distribution, we don’t know.

Lucas Perry: We’re going to talk about this later, but there could be a proxy. There could be something else in the maze that it’s actually optimizing for, that correlates with minimizing the loss function, but it’s not actually trying to get to the end of the maze.

Evan Hubinger: That’s exactly right.

Lucas Perry: And then, in terms of gradient descent, is the TL;DR on that: the parameterized neural network space, you’re creating all of these perturbations to it, and the perturbations are sort of nudging it around in this n-dimensional space, how-many-ever parameters there are, or whatever. And, then, you’ll check to see how it minimizes the loss, after those perturbations have been done to the model. And, then, that will tell you whether or not you’re moving in a direction which is the local minima, or not, in that space. Is that right?

Evan Hubinger: Yeah. I think that that’s a good, intuitive understanding. What’s happening is, you’re looking at infinitesimal shifts, because you’re taking a gradient, and you’re looking at how those infinitesimal shifts would perform on some batch of training data. And then you repeat that, many times, to go in the direction of the infinitesimal shift which would cause the best increase in performance. But it’s, basically, the same thing. I think the right way to think about gradient descent is this local search process. It’s moving around the parameter space, trying to find parameterizations which have good training performance.

Lucas Perry: Is there anything interesting that you have to say about that process of gradient descent, and the tension between finding local minima and global minima?

Evan Hubinger: Yeah. It’s certainly an important aspect of what the gradient descent process does, that it doesn’t find global minima. It’s not the case that it works by looking at every possible parameterization, and picking the actual best one. It’s this local search process that starts from some initialization, and then looks around the space, trying to move in the direction of increasing improvement. Because of this, there are, potentially, multiple possible equilibria, parameterizations that you could find from different initializations, that could have different performance.

All the possible parameterizations of a neural network with billions of parameters, like GPT-2, or now, GPT-3, which has greater than a hundred billion, is absolutely massive. It’s over a combinatorial explosion of a huge degree, where you have all of these different possible parameterizations, running internally, correspond to totally different algorithms controlling these weights that determine exactly what algorithm the model ends up implementing.

And so, in this massive space of algorithms, you might imagine that some of them will look more like search processes, some of them will look more like optimizers that have objectives, some of them will look less like optimizers, some of them might just be grab bags of heuristics, or other different possible algorithms.

It’d depend on exactly what your setup is. If you’re training a very simple network that’s just a couple of feed-forward layers, it’s probably not possible for you to find really complex models influencing complex search processes. But if you’re training huge models, with many layers, with all of these different possible parameterizations, then it becomes more and more possible for you to find these complex algorithms that are running complex search processes.

Lucas Perry: I guess the only thing that’s coming to mind, here, that is, maybe, somewhat similar is how 4.5 billion years of evolution has searched over the space of possible minds. Here we stand as these ape creature things. Are there, for example, interesting intuitive relationships between evolution and gradient descent? They’re both processes searching over a space of mind, it seems.

Evan Hubinger: That’s absolutely right. I think that there are some really interesting parallels there. In particular, if you think about humans as models that were produced by evolution as a search process, it’s interesting to note that the thing which we optimize for is not the thing which evolution optimizes for. Evolution wants us to maximize the total spread of our DNA, but that’s not what humans do. We want all of these other things, like decreasing pain and happiness and food and mating, and all of these various proxies that we use. An interesting thing to note is that many of these proxies are actually a lot easier to optimize for, and a lot simpler than if we were actually truly maximizing spread of DNA. An example that I like to use is imagine some alternate world where evolution actually produced humans that really cared about their DNA, and you have a baby in this world, and this baby stubs their toe, and they’re like, “What do I do? Do I have to cry for help? Is this a bad thing that I’ve stubbed my toe?”

They have to do this really complex optimization process that’s like, “Okay, how is my toe being stubbed going to impact the probability of me being able to have offspring later on in life? What can I do to best mitigate that potential downside now?” This is a really difficult optimization process, and so I think it sort of makes sense that evolution instead opted for just pain, bad. If there’s pain, you should try to avoid it. But as a result of evolution opting for that much simpler proxy, there’s a misalignment there, because now we care about this pain rather than the thing that evolution wanted, which was the spread of DNA.

Lucas Perry: I think the way Stuart Russell puts this is the actual problem of rationality is how is my brain supposed to compute and send signals to my 100 odd muscles to maximize my reward function over the universe history until heat death or something. We do nothing like that. It would be computationally intractable. It would be insane. So, we have all of these proxy things that evolution has found that we care a lot about. Their function is instrumental in terms of optimizing for the thing that evolution is optimizing for, which is reproductive fitness. Then this is all probably motivated by thermodynamics, I believe. When we think about things like love or like beauty or joy, or like aesthetic pleasure in music or parts of philosophy or things, these things almost seem intuitively valuable from a first person perspective of the human experience. But via evolution, they’re these proxy objectives that we find valuable because they’re instrumentally useful in this evolutionary process on top of this thermodynamic process, and that makes me feel a little funny.

Evan Hubinger: Yeah, I think that’s right. But I also think it’s worth noting that you want to be careful not to take the evolution analogy too far, because it is just an analogy. When we actually look at the process of machine learning and how great it is that it works, it’s not the same. It’s running a fundamentally different optimization procedure over a fundamentally different space, and so there are some interesting analogies that we can make to evolution, but at the end of the day, what we really want to analyze is how does this work in the context of machine learning? I think the Risks from Learned Optimization paper tries to do that second thing, of let’s really try to look carefully at the process of machine learning and understand what this looks like in that context. I think it’s useful to sort of have in the back of your mind this analogy to evolution, but I would also be careful not to take it too far and imagine that everything is going to generalize to the case of machine learning, because it is a different process.

Lucas Perry: So then pivoting here, wrapping up on our understanding of inner alignment and outer alignment, there’s this model, which is being parameterized by gradient descent, and it has some relationship with the loss function or the objective function. It might not actually be trying to minimize the actual loss or to actually maximize the reward. Could you add a little bit more clarification here about why that is? I think you mentioned this already, but it seems like when gradient descent is evolving this parameterized model space, isn’t that process connected to minimizing the loss in some objective way? The loss is being minimized, but it’s not clear that it’s actually trying to minimize the loss. There’s some kind of proxy thing that it’s doing that we don’t really care about.

Evan Hubinger: That’s right. Fundamentally, what’s happening is that you’re selecting for a model which has empirically on the training distribution, the low loss. But what that actually means in terms of the internals of the model, what it’s sort of trying to optimize for, and what its out of distribution behavior would be is unclear. A good example of this is this maze example. I was talking previously about the instance of maybe you train a model on a training distribution of relatively small mazes, and to mark the end, you put a little green arrow. Right? Then I want to ask the question, what happens when we move to a deployment environment where the green arrow is no longer at the end of the maze, and we have much larger mazes? Then what happens to the model in this new off distribution setting?

I think there’s three distinct things that can happen. It could simply fail to generalize at all. It just didn’t learn a general enough optimization procedure that it was able to solve these bigger, larger mazes, or it could successfully generalize and knows how to navigate. It learned a general purpose optimization procedure, which is able to solve mazes, and it uses it to get to the end of the maze. But there’s a third possibility, which is that it learned a general purpose optimization procedure, which is capable of solving mazes, but it learned the wrong objective. It learned to use that optimization procedure to get the green arrow rather than to get to the end of the maze. What I call this situation is capability generalization without objective generalization. It’s objective, but the thing it was using those capabilities for didn’t generalize successfully off distribution.

What’s so dangerous about this particular robustness failure is that it means off distribution you have models which are highly capable. They have these really powerful optimization procedures directed at incorrect tasks. You have this strong maze solving capability, but this strong maze solving capability is being directed at a proxy, getting to the green arrow rather than the actual thing which we wanted, which was get to the end of the maze. The reason this is happening is that on the training environment, both of those different possible models look the same in the training distribution. But when you move them off distribution, you can see that they’re trying to do very different things, one of which we want, and one of which we don’t want. But they’re both still highly capable.

You end up with a situation where you have intelligent models directed at the wrong objective, which is precisely the sort of misalignment of AIs that we’re trying to avoid, but it happened not because the objective was wrong. In this example, we actually want them to get to the end of the maze. It happened because our training process failed. It happened because our training process wasn’t able to distinguish between models trying to get to the end, and models trying to get to the green arrow. What’s particularly concerning in this situation is when the objective generalization lags behind the capability generalization, when the capabilities generalize better than the objective does, so that it’s able to do highly capable actions, highly intelligent actions, but it does them for the wrong reason.

I was talking previously about mesa optimizers where inner alignment is about this problem of models which have objectives which are incorrect. That’s the sort of situation where I could expect this problem to occur, because if you are training a model and that model has a search process and an objective, potentially the search process could generalize without the objective also successfully generalizing. That leads to this situation where your capabilities are generalizing better than your objective, which gives you this problem scenario where the model is highly intelligent, but directed at the wrong thing.

Lucas Perry: Just like in all of the outer alignment problems, the thing doesn’t know what we want, but it’s highly capable. Right?

Evan Hubinger: Right.

Lucas Perry: So, while there is a loss function or an objective function, that thing is used to perform gradient descent on the model in a way that moves it roughly in the right direction. But what that means, it seems, is that the model isn’t just something about capability. The model also implicitly somehow builds into it the objective. Is that correct?

Evan Hubinger: We have to be careful here because the unfortunate truth is that we really just don’t have a great understanding of what our models are doing, and what the inductive biases of gradient descent are right now. So, fundamentally, we don’t really know what the internal structures of our models are like. There’s a lot of really exciting research, stuff like the circuits analysis from Chris Olah and the clarity team at OpenAI. But fundamentally, we don’t understand what the models are doing. We can sort of theorize about the possibility of a model that’s running some search process, and that search process generalizes, but the objective doesn’t. But fundamentally, because our models are these black box systems that we don’t really fully understand, it’s hard to really concretely say, “Yes, this is what the model is doing. This is how it’s operating, and this is the problem.”

But in Risks from Learned Optimization, we try to at least attempt to understand that problem, and look at, if we really think carefully about what gradient descent is incentivizing and how it might work, what are the things which we might predict would happened? So, the objective that you’re training the model for it does not live in the model. It lives in the gradient descent process. It lives in the training procedure. We might hope that when we train a model on an objective, that it will produce its own model of that objective and try to figure out what it is and be aligned with it. But we don’t know exactly what happens. The model doesn’t get to see the objective you’re training for. All that happens is that the gradient descent process looks at its behavior and tries to make it so that its behavior is more aligned with the loss function.

But that loss function never enters into the model, somehow, the model never sees that loss function, it might have some objective internally. Like I was saying, if it’s a mesa optimizer, and then we might hope that that objective is aligned with the loss function we’re training it for. But fundamentally, all we know is that it’s behavior on the training distribution was aligned with the loss function.

Lucas Perry: That makes sense. Because it’s so black boxy, we can’t really interpret the state of the alignment of the model. So, is the only way to do that to test it out of distribution and see what happens at this point?

Evan Hubinger: There are a bunch of different possible ways to address this problem. Certainly, one approach is to try to test it out of distribution, which is an adversarial training approach. This model is going to have some potential failure modes off distribution. We can try to find those failure modes and then train the model on those failure modes to prevent it from having this bad off distribution behavior. There are some concerns with adversarial training though. In particular, adversarial training doesn’t necessarily catch what I see as the most pernicious difficult inner alignment failure, which is something that we call deceptive alignment in the Risks from Learned Optimization paper. In the deceptive alignment case, if the model knows that it’s being adversarially trained, then you’re not going to be able to figure that out just via throwing it a bunch of examples.

You can also do something like transparency, as I mentioned previously that there’s a lot of really exciting transparency interpretability work. If you’re able to sort of look inside the model and understand what algorithm it’s fundamentally implementing, you can see, is it implementing an algorithm which is an optimization procedure that’s aligned? Has it learned a correct model of the loss function or an incorrect model? It’s quite difficult, I think, to hope to solve this problem without transparency and interpretability. I think that to be able to really address this problem, we have to have some way to peer inside of our models. I think that that’s possible though. There’s a lot of evidence that points to the neural networks that we’re training really making more sense, I think, than people assume.

People tend to treat their models as these sort of super black box things, but when we really look inside of them, when we look at what is it actually doing, a lot of times, it just makes sense. I was mentioning some of the circuits analysis work from the clarity team at OpenAI, and they find all sorts of behavior. Like, we can actually understand when a model classifies something as a car, the reason that it’s doing that is because it has a wheel detector and it has a window detector, and it’s looking for windows on top of wheels. So, we can be like, “Okay, we understand what algorithm the model is influencing, and based on that we can figure out, is it influencing the right algorithm or the wrong algorithm? That’s how we can hope to try and address this problem.” But obviously, like I was mentioning, all of these approaches get much more complicated in the deceptive alignment situation, which is the situation which I think is most concerning.

Lucas Perry: All right. So, I do want to get in here with you in terms of all the ways in which inner alignment fails. Briefly, before we start to move into this section, I do want to wrap up here then on outer alignment. Outer alignment is probably, again, what most people are familiar with. I think the way that you put this is it’s when the objective function or the loss function is not aligned with actual human values and preferences. Are there things other than loss functions or objective functions used to train the model via gradient descent?

Evan Hubinger: I’ve sort of been interchanging a little bit between loss function and reward function and objective function. Fundamentally, these are sort of from different paradigms in machine learning, so the reward function would be what you would use in a reinforcement learning context. The loss function is the more general term, which is in a supervised learning context, you would just have a loss function. You still have the loss function in a reinforcement learning context, but that loss function is crafted in such a way to incentivize the models, optimize the reward function via various different reinforcement learning schemes, so it’s a little bit more complicated than the sort of hand-wavy picture, but the basic idea is machine learning is we have some objective and we’re looking for parameterizations of our model, which do well according to that objective.

Lucas Perry: Okay. The outer alignment problem is that we have absolutely no idea, and it seems much harder than creating powerful optimizers, the process by which we would come to fully understand human preferences and preference hierarchies and values.

Evan Hubinger: Yeah. I don’t know if I would say “we have absolutely no idea.” We have made significant progress on outer alignment. In particular, you can look at something like amplification or debate. I think that these sorts of approaches have strong arguments for why they might be outer aligned. In a simplest form, amplification is about training a model to mimic this HCH process, which is a huge tree of humans consulting each other. Maybe we don’t know in the abstract what our AI would do if it were optimized in some definition of human values or whatever, but if we’re just training it to mimic this huge tree of humans, then maybe we can at least understand what this huge tree of humans is doing and figure out whether amplification is aligned.

So, there has been significant progress on outer alignment, which is sort of the reason that I’m less concerned about it right now, because I think that we have good approaches for it, and I think we’ve done a good job of coming up with potential solutions. There’s still a lot more work that needs to be done, a lot more testing, a lot more to really understand do these approaches work, are they competitive? But I do think that to say that we have absolutely no idea of how to do this is not true. But that being said, there’s still a whole bunch of different possible concerns.

Whenever you’re training a model on some objective, you run into all of these problems of instrumental convergence, where if the model isn’t really aligned with you, it might try to do these instrumentally convergent goals, like keep itself alive, potentially stop you from turning it off, or all of these other different possible things, which we might not want. All of these are what the outer alignment problem looks like. It’s about trying to address these standard value alignment concerns, like convergent instrumental goals, by finding objectives, potentially like amplification, which are ways of avoiding these sorts of problems.

Lucas Perry: Right. I guess there’s a few things here wrapping up on outer alignment. Nick Bostrom’s Superintelligence, that was basically about outer alignment then, right?

Evan Hubinger: Primarily, that’s right. Yeah.

Lucas Perry: Inner alignment hadn’t really been introduced to the alignment debate yet.

Evan Hubinger: Yeah. I think the history of how this concern got into the AI safety sphere is complicated. I mentioned previously that there are people going around and talking about stuff like optimization daemons, and I think a lot of that discourse was very confused and not pointing at how machine learning actually works, and was sort of just going off of, “Well, it seems like there’s something weird that happens in evolution where evolution finds humans that aren’t aligned with what evolution wants.” That’s a very good point. It’s a good insight. But I think that a lot of people recoiled from this because it was not grounded in machine learning, because I think a lot of it was very confused and it didn’t fully give the problem the contextualization that it needs in terms of how machine learning actually works.

So, the goal of Risks from Learned Optimization was to try and solve that problem and really dig into this problem from the perspective of machine learning, understand how it works and what the concerns are. Now with the paper having been out for awhile, I think the results have been pretty good. I think that we’ve gotten to a point now where lots of people are talking about inner alignment and taking it really seriously as a result of the Risks from Learned Optimization paper.

Lucas Perry: All right, cool. You did mention sub goal, so I guess I just wanted to include that instrumental sub goals is the jargon there, right?

Evan Hubinger: Convergent instrumental goals, convergent instrumental sub goals. Those are synonymous.

Lucas Perry: Okay. Then related to that is Goodhart’s law, which says that when you optimize for one thing hard, you oftentimes don’t actually get the thing that you want. Right?

Evan Hubinger: That’s right. Goodhart’s law is a very general problem. The same problem occurs both in inner alignment and outer alignment. You can see Goodhart’s law showing itself in the case of convergent instrumental goals. You can also see Goodhart’s law showing itself in the case of finding proxies, like going to the green arrow rather than getting the end of the maze. It’s a similar situation where when you start pushing on some proxy, even if it looked like it was good on the training distribution, it’s no longer as good off distribution. Goodhart’s law is a really very general principle which applies in many different circumstances.

Lucas Perry: Are there any more of these outer alignment considerations we can kind of just list off here that listeners would be familiar with if they’ve been following AI alignment?

Evan Hubinger: Outer alignment has been discussed a lot. I think that there’s a lot of literature on outer alignment. You mentioned Superintelligence. Superintelligence is primarily about this alignment problem. Then all of these difficult problems of how do you actually produce good objectives, and you have problems like boxing and the stop button problem, and all of these sorts of things that come out of thinking about outer alignment. So, I don’t want to go into too much detail because I think it really has been talked about a lot.

Lucas Perry: So then pivoting here into focusing on the inner alignment section, why do you think inner alignment is the most important form of alignment?

Evan Hubinger: It’s not that I see outer alignment as not concerning, but that I think that we have made a lot of progress on outer alignment and not made a lot of progress on inner alignment. Things like amplification, like I was mentioning, I think are really strong candidates for how we might be able to solve something like outer alignment. But currently I don’t think we have any really good strong candidates for how to solve inner alignment. You know? Maybe as machine learning gets better, we’ll just solve some of these problems automatically. I’m somewhat skeptical of that. In particular, deceptive alignment is a problem which I think is unlikely to get solved as machine learning gets better, but fundamentally we don’t have good solutions to the inner alignment problem.

Our models are just these black boxes mostly right now, we’re sort of starting to be able to peer into them and understand what they’re doing. We have some techniques like adversarial training that are able to help us here, but I don’t think we really have good satisfying solutions in any sense to how we’d be able to solve inner alignment. Because of that, inner alignment is currently what I see as the biggest, most concerning issue in terms of prosaic AI alignment.

Lucas Perry: How exactly does inner alignment fail then? Where does it go wrong, and what are the top risks of inner alignment?

Evan Hubinger: I’ve mentioned some of this before. There’s this sort of basic maze example, which gives you the story of what an inner alignment failure might look like. You train the model on some objective, which you thought was good, but the model learns some proxy objective, some other objective, which when it moved off distribution, it was very capable of optimizing, but it was the wrong objective. However, there’s a bunch of specific cases, and so in Risks from Learned Optimization, we talk about many different ways in which you can break this general inner misalignment down into possible sub problems. The most basic sub problem is this sort of proxy pseudo alignment is what we call it, which is the case where your model learns some proxy, which is correlated with the correct objective, but potentially comes apart when you move off distribution.

But there are other causes as well. There are other possible ways in which this can happen. Another example would be something we call sub optimality pseudo alignment, which is a situation where the reason that the model looks like it has good training performance is because the model has some deficiency or limitation that’s causing it to be aligned, where maybe once the model thinks for longer, you’ll realize it should be doing some other strategy, which is misaligned, but it hasn’t thought about that yet, and so right now it just looks aligned. There’s a lot of different things like this where the model can be structured in such a way that it looks aligned on the training distribution, but if it encountered additional information, if it was in a different environment where the proxy no longer had the right correlations, the things would come apart and it would no longer act aligned.

The most concerning, in my eyes, is something which I’ll call deceptive alignment. Deceptive alignment is a sort of very particular problem where the model acts aligned because it knows that it’s in a training process, and it wants to get deployed with its objective intact, and so it acts aligned so that its objective won’t be modified by the gradient descent process, and so that it can get deployed and do something else that it wants to do in deployment. This is sort of similar to the treacherous turn scenario, where you’re thinking about an AI that does something good, and then it turns on you, but it’s a much more specific instance of it where we’re thinking not about treacherous turn on humans, but just about the situation of the interaction between gradient descent and the model, where the model maybe knows it’s inside of a gradient descent process and is trying to trick that gradient descent process.

A lot of people on encountering this are like, “How could this possibly happen in a machine learning system?” I think this is a good reaction because it really is a very strange thing to train a model to do this. But I think there are strong arguments for why deceptive alignment would actually be the simplest type of model that you could find in this situation.

Lucas Perry: A way of explaining this I think to anyone on the street would be like, imagine if pigs were intelligent enough to create farmers, and you created farmers and they appeared to be aligned, they took care of you and they gave you these nice mud pits and they fed you every day, they gave you shelter and all of these other nice things, and then one day the farmer shows up and kills you, right? You thought the thing that you created was aligned, but it was deceptively aligned, and it takes a treacherous turn. Is this sort of like a more mundane example that you might agree with?

Evan Hubinger: I think that’s a good intuition pump for thinking about this situation. I generally am sort of averse to trying to really heavily apply these sorts of analogies, because it’s a good analogy to think about what’s happening, but it doesn’t answer the core question of how likely is this to actually happen in a machine learning system.

Lucas Perry: Yeah, that makes sense, because it’s much more specific than the other kinds of minds in this mind space. It seems pretty rare, a thing that could exist, but hard to find.

Evan Hubinger: Right. I think that’s a good intuition, but I’m going to try to disavow you of that notion. First, I think it’s interesting to look at maybe you do a bunch of adversarial training, you’re really pushing the model to figure out what the objective is. It needs to know the objective at some point if you are training it in all possible situations. It needs to know what the loss function is for it to be able to do a good job. But there’s multiple possible channels through which information about the loss function can enter the model. And so i’ll fundamentally distinguish between two different channels, which is the information about the loss function can enter through the gradient descent process, or it can enter through the model’s input data.

I’ll call these two channels internalization and modeling. Internalization is the situation where you have this model that’s going along, and it has some proxy objective maybe. That proxy objective is not quite right, but then it gets moved to a new situation where the proxy objective no longer works, and gradient descent goes inside the model and tinkers with the proxy to make it slightly more accurate. Iterate this process many, many times, and eventually the proxy gets better and better and better and better and better, until it starts matching up with the actual loss function. But there’s another story that I can tell, which is modeling.

If you think about a training process like GPT-3 where the model is trained on a huge corpus of all of this really rich input data. Well, in that sort of a situation, there’s a whole bunch of really relevant information in that input to the agent’s objective. An agent, if it’s in this environment, where it has this huge text corpus, it’s going to have to learn a model of this environment and how it functions. We can imagine a situation where maybe you’re training the model on some huge text corpus to do something like maximize human values, or whatever, and it reads a Wikipedia page on ethics. And it’s like, “Wow, look at all of this really relevant, rich, useful information for figuring out this objective.”

But then there’s a second question, which is: suppose that the model has some model of the input data, which includes a bunch of rich information inside of the model already about that objective, how does gradient descent actually modify the model to make use of that information? And so there’s two different types of modeling, which are: deceptive alignment and corrigible alignment. So the corrigible story is you have this model that’s going along, it has some proxy objective, but it encounters this really rich input data, which includes a huge amount of information about the objective. To be able to predict successfully what the Wikipedia page on ethics is going to say, it has to know about ethics. So it learns this really detailed ethics model.

And then gradient descent is like: look, you have this really detailed ethics model, I’m going to just modify your objective to point to that ethics model. Now your new objective is just optimize that. And so this leads to something sort of like corrigibility, where the model that you’re training has its objectives determined by a pointer to some part of its world model. It has some model of this environment that includes some information about ethics. And now it’s trying to optimize for that thing that it’s pointed to in its world model.

Then there’s this other story, which is the deceptive alignment story. Similar, you have a model going along. It has some proxy objective and it learns this really detailed world model that includes a bunch of information about ethics, or whatever. And then gradient descent modifies the model to think longer about the fact that it’s inside of an optimization process and realize that the correct instrumental thing for it to be doing is to optimize for that ethics model. And then it realizes, “Oh, I see, I should optimize for this model that I have.” It goes and does that. And in both situations, you end up being able to make use of this really rich input data to improve the performance, by changing the objective to make reference to that. But in the corrigible case, you actually modify it to point to it, whereas in the deceptive case, you just modify the model to think about the fact that it should be instrumentally optimizing for it.

And then the question is: which one of these is more likely? Before I go into that though, I want to dwell for a second on the internalization versus modeling distinction again. I think a good analogy is thinking about animal imprinting. Think about a duck. You’re trying to train ducks to follow their mothers, or whatever. You can imagine a situation where the duck internalizes a model of its mother, it is born with some really detailed model of how to figure out what a mother is. But there’s also the modeling case where the duck is going to be in the world, it’s going to have to form this really detailed model of the world. And that really detailed model of the world is going to have the very first thing that it encounters should always be the mother. And so rather than having to learn some detailed model of figuring out what mothers are, if you just instead do some modeling process where it’s like, just have the duck’s objective point to the mother in its world model.

This modeling case, you can see it as something like animal imprinting, where rather than really internalizing the objective, because it already has this world model, it might as well just specify the objective in terms of that world model instead. And I haven’t yet talked about why you would get to deception versus corrigibility. Right now I’m just trying to convince you that you should expect modeling. You should expect that if you give your model this huge, really rich detailed input data that it’s going to learn an objective via reference to that input data rather than from scratch. When we think about models being trained, we think about them having some objective and that objective is just modified by gradient descent as it moves along to get closer and closer to reality.

But in the modeling case, that’s not what’s happening. The model is simultaneously learning a model of the world, as well as trying to improve its objective. And the objective is modified to make reference to that model of the world, to make it more effective. So internalization is: it just wants the same thing as you do, and so it does it because it wants the same thing. Whereas corrigibility is it’s trying to figure out what you want and then do that. And then deception is also trying to figure out what you want and then do that, but it’s doing so for instrumental rather than terminal reasons. The instrumental reason is the model can get more reward, it can do better in the long run if it’s able to not be modified by the gradient descent process. So if it’s doing a good job, then gradient descent process won’t change it’s objective, and so therefore, it can get deployed with the same objective and then go do a whole bunch of other stuff that you don’t want it to do.

Lucas Perry: But why does it care about the current objective?

Evan Hubinger: We’re imagining a situation where you have a proxy aligned model, but it cares about that proxy across episodes. It cares about that proxy over time, and it wants that proxy to continue to be optimized for. If the gradient descent process modifies it so that it no longer cares about that proxy, because it’s doing a bad job, then that proxy won’t get optimized for. So if it wants that proxy to get optimized for in the future, it needs to play along, pretend to be doing the right thing so that it doesn’t get modified so that it can get deployed and then go optimize that proxy.

Lucas Perry: But it also needs to understand that it’s within the context of an optimization process where gradient descent will continue to perturb it?

Evan Hubinger: That’s exactly right.

Lucas Perry: And where does that kind of awareness come from?

Evan Hubinger: So now we have the second question: which is deception versus corrigibility. The awareness comes from gradient descent. Gradient descent is trying to produce models which are better at the loss function. And a deceptive model is really good and really effective because it’s trying very hard to do the thing that you want it to be doing. Both deception and corrigibility are situations where the model is trying to figure out what to do and then do that. And the question is, which is an easier model, which is a simpler model for gradient descent to find? And I think there’s a strong argument for deception being simpler than corrigibility. Corrigibility has some fundamental difficulties in terms of… If you’re imagining gradient descent process, which is looking at a proxy aligned model and is trying to modify it so that it makes use of this rich input data, it has to do some really weird things to make corrigibility work.

It has to first make a very robust pointer. With corrigibility, if it’s pointing at all incorrectly to the wrong thing in the input data, wrong thing in the world model, the corrigible optimizer won’t correct that pointer. It’ll just be like, “Well, I have this pointer. I’m just trying to optimize for what this thing is pointing for,” and if that pointer is pointing at a proxy instead, you’ll just optimize that proxy. And so you have this very difficult problem of building robust pointers. With deception, you don’t have this problem. A deceptive model, if it realizes the loss function is different than what it thought, it’ll just change to doing the new loss function. It’s actually much more robust to new information because it’s trying to do this instrumentally. And so in a new situation, if it realizes that the loss function is different, it’s just going to automatically change because it’ll realize that’s the better thing to do instrumentally.

And so fundamentally, deception is more robust. It gives you a more robust pointer. It’s easier in that sense. There’s another sense in which corrigibility is very difficult. To produce a corrigible model requires deception modifying the model’s optimization process to be capable of optimizing under uncertainty. For a corrigible model, because it has this pointer as its objective, it’s going to start not really knowing what that pointer is pointing to. It’s going to have some probability distribution over the different possibilities, and it’s going to have to know how to optimize under that probability distribution of objectives rather than under a single objective. And so this problem of optimization under uncertainty, it’s much more complicated than just optimizing under a fixed objective. What that means is that to produce a corrigible optimizer, gradient descent has to do all of these modifications, where it has to modify it to optimize under uncertainty, and to have the super robust pointer to the part of its world model that it’s supposed to be optimizing for.

But if you compare this to deception, gradient descent modifying a model to become deceptive is extremely simple. It needs to modify the model to have a longterm objective, to think about what its objective is across many different instances. And then it just needs to modify the model to think for a while about the fact that it’s in an optimization process and what the right thing is to do. And that modification is not very complicated because it’s just a simple modification of: think more about this particular thing, which is exactly the sort of modification that you should expect to happen all the time when you’re training a model. And so I think it’s a fundamentally much simpler modification. There’s also another argument that you can make here, which is: there’s just a lot more deceptive models. Any proxy objective, once a model, which is optimizing that proxy objective, starts optimizing that objective more in the longterm, across episodes, and then thinks about the fact that it’s an optimization process, will become deceptive.

But to produce corrigibility, you have to find exactly the right pointer. There’s many different possible pointers out there, only one of which is going to give you the exact correct pointer. And similar with the internalization, there’s many different proxies. Only one is the actual true loss function. Whereas with deceptive alignment, any of those different properties, they’re all compatible with deception. And so I think there’s a lot of strong arguments, both this argument for there being many more deceptive optimizers, as well as the simplicity argument for the modification necessary to produce a deceptive optimizer is just a lot simpler, I think, than the modifications necessary to produce these other types of optimizers. And so, because of this, I think that there’s a strong case to be made for deception really not being that uncommon, not being something crazy to think would happened in the training process, but is maybe even potentially the default outcome of a lot of these sorts of training procedures, which is quite, quite scary and quite concerning.

And obviously all of this is speculation. We’re trying to understand from a theoretical process what this gradient process might do, but I think we can make a lot of strong cases about thinking about things like simplicity and accounting arguments to at least put this problem on the radar. Until we have a really strong reason that this isn’t a problem, we should take it seriously. Buck, who’s another person who works at MIRI, often tries to explain some of the risks from learned optimization stuff and he has an analogy that might be useful here. You can imagine the Christian god and the Christian god is trying to produce humans which are aligned with the Bible. And you can imagine three different possible humans. You have Jesus who is just the same as god. Jesus has the same objective as god. Jesus is aligned with god because he just fundamentally wants to do the exact same things.

Lucas Perry: That’s internalization.

Evan Hubinger: That would be internalization. You could have Martin Luther. Martin Luther is aligned with God because he wants to really carefully study the Bible, figure out what the Bible says, and then do that. And that’s the corrigibility case. Or you can have Blaise Pascal and Blaise Pascal is aligned with God because he thinks that if he does what God wants, he’ll go to heaven in the future. And these are the three different possible models that God could find and you’re more likely to find a Jesus, a Martin Luther or a Blaise Pascal.

And the argument is there’s only one Jesus, so out of all the different possible human objectives, only one of them is going to be the exact same one that God wants. And Martin Luther, similarly, is very difficult because out of all the human objectives, there’s only one of them, which is: figure out precisely what the Bible wants and then do that. The Blaise Pascal, in this situation, anybody who realizes that God’s going to send them to heaven or hell, or whatever, based on their behavior, will realize that they should behave according to the Bible, or whatever. And so there’s many different possible Blaise Pascals but there’s significantly fewer possible Martin Luthers and Jesuses.

Lucas Perry: I think that’s an excellent way of simplifying this. Blaise Pascal can care about any kind of proxy. I guess the one thing that I’m still a little bit confused about here is in terms of the deceptive version, again, why is it that it cares about the current proxy reward?

Evan Hubinger: I think that has to do with the structure of the training procedure. You start out usually, I think, with a proxy aligned mesa optimizer. Maybe you start out with a bunch of heuristics, but then once you get a mesa optimizer, it’ll usually start by being proxy aligned. It’ll have some proxy that’s not quite the same as the loss function, at least if you’re in a complex enough environment. There’s a whole bunch of different possibilities. It’s going to start with some proxy, but then you might hope that if you do a bunch of adversarial training, you train it for a really long time in a bunch of different environments, you’ll be able to improve that proxy until you get to the point where it has the right thing.

And the deceptive alignment is about asking the question; there’s actually multiple different possible ways for a model to figure out the loss function. It could figure it out via internalization or it could figure it out via deception or it can figure out via corrigibility. And we don’t know which one of these is more likely. You might imagine that if you start with the proxy aligned mesca optimizer, you train it a bunch, it’ll internalize the correct objective. So the argument that I’m making here is that might not be what happens. And so the reason that I’m saying you start with the proxy aligned mesa optimizers is because I think that’s the way that you should expect that the training procedure will be structured, that you’ll start by learning some proxy and then that proxy will get better, but that having that proxy get better could result internalization or deception or corrigibility.

Lucas Perry: Sure. So the model developed some kind of proxy, but it seems like the other necessary ingredients are, one: it knows that it’s in an optimization process where gradient descent is changing its objective function, and then it requires, two: it actually cares about its current objective function over time and over iterations?

Evan Hubinger: That’s absolutely right. And so then the question to ask is: how likely are those things to occur. And I think it’s the really important point, which I want to make, a lot of times when people think about this, they’re like, “It seems unlikely that it would just happen to develop this understanding and care about it’s thing longterm.” It doesn’t just happen. It happens as a result of the training procedure, because if it does happen, it results in significantly better performance. And so the question is: would gradient descent modify the model to have those properties? And that’s the argument that I want to make is that it’s not that necessarily the model just happens to develop deception, but the deceptive models are just really effective at pursuing reward on the training environment. And so you should expect that if you have a really good training process, that it’s going to find deceptive models because they’re really good at doing the thing you’re training them for.

And so that’s the most fundamental argument that I want to make. But I do think there’s another level here where as you’re getting to the point where you’re training models and very complex data sets, where they have huge amounts of really rich information, we should expect the model to basically figure out most things. You should expect that if it’s going to have to learn all of this really complex data, it’s going to have to build a world model that’s going to include the training process, that’s going to include the loss function. And so I think that as we move into this direction of training more and more complex data sets, more and more complex models, you don’t want to rely on your model not figuring something out, especially when figuring that thing out results in it having better performance.

Now there’s also this question of: does it care about its return across episodes versus just across multiple steps? And for deception, it needs to care about the multi episode return. And this is also another critical point. I think there’s a strong argument that if you imagine a model being trained to care about its multi-step return, and maybe also I should pause here and explain what I mean by step versus episode. So in the standard reinforcement learning setup, you have many different episodes. Each episode is composed of many different steps and you train the model to only optimize its single episode return, its reward across all the different steps in an episode, but not across episodes. That’s not always the case. There are some reinforcement learning procedures that optimize across episodes. An example of this is something like population-based training. Population-based training selects models which have the greatest reward across episodes, not simply within the episode.

Though, standard reinforcement learning, you’re just selecting for the single episode return. But even if you’re just selecting for the single episode return, you have the same general problem that we’ve been talking about where it’s not the case that you actually end up with a model necessarily that only cares about single-step return. You’re just going to end up with a model that behaves like it cares about single-step return on the training distribution. And if you imagine a situation where you have a model and it’s been trained to optimize for this objective over many steps, and then it discovers that actually there’s these other episodes that are beyond this current step. I think that the most natural generalization is that you should expect that the model is going to care about those other steps too.

You can imagine a situation where every situation where the model has seen a blue door, it’s been like, “Oh, going through this blue is really good,” and it’s learned an objective that incentivizes going through blue doors. If it then later realizes that there are more blue doors than it thought because there are other blue doors in other episodes, I think you should generally expect it’s going to care about those blue doors as well. But again, even if you don’t buy that, it’s not just about what’s most natural. It’s also about: what does the gradient descent process do? And because models which have this property, that plan across episodes, can become deceptive and therefore do really well in the training distribution, you should expect that a good training process, it’s going to modify them to become deceptive, even if they wouldn’t naturally.

Lucas Perry: Okay. So help me just understand a little bit more here. At a high level, it seems that the internalization and the corrigibility things require higher specification than the deception version. You’ve brought up all these different arguments about why you think that the deceptive version is easier to find because it has all of these computational benefits in terms of minimizing the loss function, right?

Evan Hubinger: That’s right.

Lucas Perry: And the structure of the deceptive model is composed of, as we mentioned earlier, one: the knowledge that it is within some kind of optimization process where it is being perturbed by gradient descent over episodes, and that it cares about the current proxy over episodes. And it seems that you’re arguing, correct me if I’m wrong, the manifestation of these components is that the strong desire to deceive leads to a very strong, direct attempt at actually minimizing the loss or accomplishing the objective function?

Evan Hubinger: That’s right.

Lucas Perry: Because if it fails at doing that, it won’t be able to accomplish the proxy that it actually cares about later?

Evan Hubinger: That’s right. If there’s more deployment episodes than training episodes, and it just cares about how many times it goes through the blue door or the green arrow, or whatever, the green arrow is a proxy objective, and so if it gives up the fact that it’s optimizing for the green arrow, it’s going to get modified to not do that anymore. And so if it wants to go to the green arrow in all of these different deployment episodes, it needs to first, in the training procedure, pretend that it doesn’t care about the green arrow. And then once it’s deployed, switch to going after the green arrow. This also requires the models to know when it’s in training and when it’s in deployment.

This is another situation where I think you should expect that when you’re training really advanced models with these huge world models, that they figure out this information, but also even if it doesn’t figure out that information, it can gamble. It can be like, “Well, there’s some chance I’m in deployment. There’s some chance I’m in training. I will just opt to go to the green arrow some percentage of the time and do the correct thing some other percentage of the time.” And this gives you a decision problem which you can solve mathematically. You get some optimal defection probability.

Lucas Perry: Okay. And then going up, in terms of specification one step, the corrigible agent needs to have a very detailed world model where it has a pointer via gradient descent at some specific part of the world model that seems to correlate highly with minimizing the loss.

Evan Hubinger: That’s right.

Lucas Perry: Why is that less likely?

Evan Hubinger: Well, so the major two reasons that I want to argue for are, one: getting the pointer right is really hard and very difficult to do robustly, whereas getting the deception right is much easier to do robustly. Think about animal imprinting. If the animal imprints on a human caretaker instead, if you have a missspecified pointer, it doesn’t actually have very good performance. The deceptive alignment is more robust at figuring out what the thing is it’s trying to optimize for because the corrigible alignment relies on this pointer working successfully. And if the pointer fails, it’s just going to optimize for the wrong thing.

Lucas Perry: But why can’t the pointer change just as well as the attempt at satisfying the actual loss function in the deceptive model?

Evan Hubinger: So absolutely can. But the question I want to ask is: how difficult is it for gradient descent to make one sequence of modifications versus another sequence of modifications? The sequence of modification that leads to corrigibility versus the sequence that leads to deceptive alignment. The sequence of modifications necessary to produce corrigible alignment requires all of this very, very careful pointer tuning. And so, you can imagine if you’re considering an infinitesimal movement in the direction of corrigible alignment, imagine replacing your current proxy with a bad pointer. It’s not going to be that great. You’re going to have to put a lot of careful optimization effort into getting a really good pointer before it becomes better to be corrigible.

Whereas with deception, the model could build up this really good world model. And once the world model gets complicated enough that it includes a specification of the fact that it’s in a training process, as soon as it starts thinking about the training process, it’s going to be as robust as it can be because it’s going to instrumentally just want to try to figure out what is the best possible approximation I can get for this thing I’m being trained for, so I can optimize it as effectively as possible. And so this gives you a much more robust pointer compared to the corrigible alignment case.

Lucas Perry: And so the danger would be with the deceptive version: once it’s unleashed upon the world and it has a very excellent world model, it realizes this and finally now I can do the deceptive turn, or something like that, to actually optimize the proxy?

Evan Hubinger: That’s right, yeah.

Lucas Perry: So we’ve covered a lot of the ways in which inner alignment fails. Now, inner alignment and outer alignment are two of the things which you care about for evaluating proposals, for building safe and advanced AI. There are two other properties that you care about training procedures for building beneficial AI. One of these is training competitiveness and the second one is performance competitiveness. Could you explain what training competitiveness is and performance competitiveness and why they’re both important?

Evan Hubinger: Absolutely, yeah. So I mentioned at the beginning that I have a broad view of AI alignment where the goal is to try to mitigate AI existential risks. And I mentioned that what I’m working on is focused on this intent alignment problem, but a really important facet of that problem is this competitiveness question. We don’t want to produce AI systems which are going to lead to AI existential risks. And so we don’t want to consider proposals which are directly going to cause problems. As the safety community, what we’re trying to do is not just come up with ways to not cause existential risk. Not doing anything doesn’t cause existential risk. It’s to find ways to capture the positive benefits of artificial intelligence, to be able to produce AIs which are actually going to do good things. You know why we actually tried to build AIs in the first place?

We’re actually trying to build AIs because we think that there’s something that we can produce which is good, because we think that AIs are going to be produced on a default timeline and we want to make sure that we can provide some better way of doing it. And so the competitiveness question is about how do we produce AI proposals which actually reduce the probability of existential risk? Not that just don’t themselves cause existential risks, but that actually overall reduce the probability of it for the world. There’s a couple of different ways which that can happen. You can have a proposal which improves our ability to produce other safe AI. So we produce some aligned AI and that aligned AI helps us build other AIs which are even more aligned and more powerful. We can also maybe produce an aligned AI and then producing that aligned AI helps provide an example to other people of how you can do AI in a safe way, or maybe it provides some decisive strategic advantage, which enables you to successfully ensure that only good AI is produced in the future.

There’s a lot of different possible ways in which you could imagine building an AI leading to reduced existential risks, but competitiveness is going to be a critical component of any of those stories. You need your AI to actually do something. And so I like to split competitiveness down into two different sub components, which are training competitiveness performance competitiveness. And in the overview of 11 proposals document that I mentioned at the beginning, I compare 11 different proposals for prosaic AI alignment on the four qualities of outer alignment, inner alignment, training competitiveness, and performance competitiveness. So training competitiveness is this question of how hard is it to train a model to do this particular task? It’s a question fundamentally of, if you have some team with some lead over all different other possible AI teams, can they build this proposal that we’re thinking about without totally sacrificing that lead? How hard is it to actually spend a bunch of time and effort and energy and compute and data to build an AI, according to some particular proposal?

And then performance competitiveness is the question of once you’ve actually built the thing, how good is it? How effective is it? What is it able to do in the world that’s really helpful for reducing existential risk? Fundamentally, you need both of these things. And so you need all four of these components. You need outer alignment, inner alignment, training competitiveness, and performance competitiveness if you want to have a prosaic AI alignment proposal that is aimed at reducing existential risk.

Lucas Perry: This is where a bit more reflection on governance comes in to considering which training procedures and models are able to satisfy the criteria for building safe advanced AI in a world of competing actors and different incentives and preferences.

Evan Hubinger: The competitive stuff definitely starts to touch on all those sorts of questions. When you take a step back and you think about how do you have an actual full proposal for building prosaic AI in a way which is going to be aligned and do something good for the world, you have to really consider all of these questions. And so that’s why I tried to look at all of these different things in the document that I mentioned.

Lucas Perry: So in terms of training competitiveness and performance competitiveness, are these the kinds of things which are best evaluated from within leading AI companies and then explained to say people in governance or policy or strategy?

Evan Hubinger: It is still sort of a technical question. We need to have a good understanding of how AI works, how machine learning works, what the difficulty is of training different types of machine learning models, what the expected capabilities are of models trained under different regimes, as well as the outer alignment and inner alignment that we expect will happen.

Lucas Perry: I guess I imagine the coordination here is that information on relative training competitiveness and performance competitiveness in systems is evaluated within AI companies and then possibly fed to high power decision makers who exist in strategy and governance for coming up with the correct strategy, given the landscape of companies and AI systems which exist?

Evan Hubinger: Yeah, that’s right.

Lucas Perry: All right. So we have these intent alignment problems. We have inner alignment and we have outer alignment. We’ve learned about that distinction today, and reasons for caring about training and performance competitiveness. So, part of the purpose of this, I mean, is in the title for this paper that partially motivated this conversation, An Overview of 11 Proposals for Building Safe and Advanced AI. You evaluate these proposals based on these criteria, as we mentioned. So I guess, I want to take this time now then to talk about how optimistic you are about, say your top few favorite proposals for building safe and advanced AI and how you’ve roughly evaluated them on these four criteria of inner alignment, outer alignment, and then performance and training competitiveness.

Evan Hubinger: I’ll just touch on some of the ones that I think are most interesting to start by taking a look at. And it’s not necessarily just going to be the ones I think I’m most excited about, because some are more complicated than others. I want to start with some of the simpler ones and some of the ones that I am fairly excited about and I think are interesting, but that also are pedagogically useful to take a look at. I’ll start with amplification, which is the approach that I probably spend the most time thinking about.

The specific form of amplification that I want to start by talking about is imitative amplification, where imitative amplification specifically means the procedure of training a model to imitate a human with access to that model. The goal is to produce models imitating this HCH tree, which I mentioned earlier, this tree of humans consulting humans. The model is trained to mimic a human consulting the model, but then the models that that human is consulting with themselves trained to mimic humans consulting models, which were themselves trained to mimic humans consulting models. So the limit that you’re hoping to end up with this model trained to mimic this tree of humans consulting humans consulting humans, which is HCH for humans consulting HCH.

The specific way to try to talk about this though in the overview of 11 proposals, is that there’s an additional thing which you can do on top of this imitative objective, which is, we also want to add some form of inner alignment solution. So we might think that imitative amplification is going to be good in the limit. It’s going to be good if we can actually produce models, which are really trying to predict HCH, but we also have to actually ensure that we’re doing that, that we don’t get deceptive models or other sort of bad inner misalignment problems.

And so we can add on top of imitative amplification, something which I call relaxed adversarial training. I have another post that came out of some of the work I did at OpenAI, which really goes in detail into how you would do something like relaxed adversarial training of what it means, or how it works. But fundamentally, the idea behind relaxed adversarial training is to train the model, both to do the base objective of trying to imitate the human consulting the model, but also on an additional objective of trying to decrease the probability that the model is doing something bad, it’s deceptive as assessed by some overseer, which is looking at the model. Which in this case, in the case of amplification, the overseer would be the human with access to the model.

So you’re training overall on the objective of mimicking the human with access to the model and making it so that the human with access to the model, when that human with the help of the model uses transparency tools to look at the model and see what it’s doing, making sure that that human believes that the model is doing the right thing. And so you’re combining training on the imitative objective as well as on the relaxed adversarial training objective, trying to make it so that this oversight works.

Lucas Perry: How do you evaluate that in terms of your criteria?

Evan Hubinger: We can start with outer alignment. So the question of outer alignment is, is this HCH tree aligned? And the hope is that because this HCH is just composed of humans, that if we can understand the human behavior and ensure that those humans are doing the right thing, which we might hope to do. We’ve solved lots of human coordination problems in the past that we can ensure that this thing is outer aligned.

For inner alignment, inner alignment is this question of relaxed adversarial training. So in particular, the way in which I think about solving relaxed adversarial training in this situation is that you have this big model space and you have some set of deceptive models. And if you want to avoid that set of deceptive models, one thing which you can do is you can find some larger set, which contains all the deceptive models, such that it’s very easy to check whether there are models in that set or not. As long as outside of the set contains enough competitive models that they’re able to do the right thing, your relaxed adversarial training will work.

There’s a bunch of different ideas for how you could define a set like this, but the one which I’m most excited about, which I think about the most is to have the set be based on the distinction of myopia. Where you run this relaxed adversarial training check by looking at the model, determining whether the model cares about its multi-episode return or not. If the model does care about its multi-episode return, then it has the potential to become deceptive. And so we don’t allow that.

We have it so that the overseer, if it sees that it cares about the multi-episode return, oversight fails. And if it sees that it doesn’t care about the multi-episode return, then the oversight passes. And then we train the model to cause the oversight to pass so that it doesn’t develop this behavior where it cares about the multi-episode return. That’s the basic idea of how you could use relaxed adversarial training here to solve inner alignment.

And then you have training competitiveness, it’s sort of this imitation learning problem. We’re trying to train the model to imitate the human with access to the model. There’s all sorts of potential problems here. It could be very difficult to get a bunch of human data for this problem, as well as if you’re just doing imitation learning, there’s some concerns of, is the model really going to learn these sort of powerful long-term optimization, especially if you’re specifically selecting it to be myopic? Also, there’s a question of how rich is the reward signal, which is produced by imitation learning? Imitation learning uses some sort of fixed distance metric to determine the distance between what the model produced and what was predicted.

Generally, this is done by having the language model which reduces the probability distribution, and then determining the probability under that distribution of the sentences produced by the human with access to the model. And so you have this problem of how rich is the training signal, how difficult is the training? I think we really don’t know what the training competitiveness is going to be, but I think there’s reason to be optimistic. I mean, a lot of these sorts of simple language modeling tasks, like GPT-2 and GPT-3 have been very successful. And so this is very similar to that in that sort of vein, and so you can hope that it’ll also be hopefully successful similarly.

And then you have the performance competitiveness, and for performance competitiveness, the real question is how powerful is this HCH thing? Can HCH actually solve all the problems you’ve wanted to solve? And this is going to be a question of, how do you train your humans? Can the human take any question and decompose it into different sub questions such that the answer to those sub questions can be recombined into an answer to the whole question? Is this possible for all questions? So there are some questions which are undecomposable, in which case it wouldn’t be competitive, but if you can decompose all of these questions, it sort of should have performance competitiveness.

So the one that I’ve been talking about imitative amplification plus relaxed adversarial training is the one that I work on the most, and it’s probably the one I’m most optimistic about. There’s others that are also really interesting. I mean, I think all of the proposals, every proposal on the list has lots of strong arguments in favor of it and lots of real reasons to consider it.

Another proposal which I have talked about previously is AI safety via debate. So AI safety via debate has some similar properties to amplification, but it has a different objective. Rather than trying to mimic HCH, you’re trying to do this debate procedure where you have two different models trying to defend two different sides of some proposition, and convince the judge to agree with one of those sides. There’s some strong arguments for why this sort of procedure would have nice outer alignment properties because it converges to something like what the judge’s equilibrium beliefs are after having seen all of the arguments in favor. And so we might hope that this sort of equilibrium beliefs are going to be good.

There’s lots of work also that’s been done to try to improve the ability to access like the true equilibrium beliefs to really give the judge all possible information. There’s some work that was done by Paul Christiano and Beth Barnes about this cross-examination mechanism, which gives it significantly better performance competitiveness properties by allowing it to access a larger complexity class. But then the other question is then what can you do on top of AI safety via debate to help with inner alignment?

An interesting thing that you can do is you can have the models trained to use transparency tools on each other to win the debate. If one model is able to look inside the other model and demonstrate that it’s non-myopic or something, then it wins the debate. And so then you can incentivize the models to play this game of trying to look inside of each other and find the inner alignment problems, and then train away from those problems.

For outer alignment, you sort of really have this question of, are the debaters actually incentivized to have some honest, cooperative, helpful answers in the limit. And this is a question of what that equilibrium really looks like. Is it the case that humans are able to be tricked by really persuasive arguments and no counterargument will unpersuade them? Or is it the case that when humans are exposed to all of the best arguments and counterarguments, they come to the right conclusions? We don’t really know, but this is fundamentally what the outer alignment question that’s here. This is sort of a trickier equilibrium to analyze.

With amplification case, we just had to figure out with this concrete tree of humans, but with debate, we have to understand this more abstract, what is this theoretical equilibrium behavior of a human under exposure to all these different arguments? I think there’s a case that from an outer alignment perspective, debate is at least more questionable.

Then we have inner alignment and the situation here is, how successfully can we incentivize these models to inspect each other and be able to identify catastrophic behavior? There’s certainly some problems here. You have to get a good training signal to be able to produce models, which are good at inspecting each other, which probably means you have to produce deceptive models and then have the models find the deception, or at least produce non-myopic models and then have the models find the non-myopia. But then this could also incentivize the models to hide the non-myopia, so it’s a little bit unclear exactly how this works. It’s sort of going to dependent on precisely the properties of the training procedure.

I think there’s some good reasons to be hopeful here for training competitiveness, because the way that the debate game is structured, it lets you do this self-play structure where you have these agents competing on the zero-sum game in a self-play, which is in many ways, similar to a lot of big successes of things like AlphaZero, where AlphaZero is able to solve all these environments by doing this very successful self-play on these zero-sum games.

The biggest problem here is probably getting data, and then there’s the performance competitiveness question, which is how effective would an actual debate agent be? The hope is that in the limit, you should just be able to ask the first debater, what do you think? And then it’ll just choose the most convincing answer, and then you can just go with that. You don’t have to run the whole debate in deployment, but it’s only going to be able to solve these language problems.

It’s just going to give you is the equilibrium of what a human thinks after all of these different arguments, and is that good enough? Is it the case that humans are going to really be able to come to good enough equilibria after they see all these arguments that they’re going to be able to produce really good answers? And also, is it the case that question answering alone is sufficient to be able to be competitive in potentially a very competitive marketplace?

As a third proposal that I think is interesting to go into is something called microscope AI. Microscope AI I think is really interesting to look at because it’s very different from the other proposals that I was just talking about. It has a very different approach to thinking about how do we solve these sorts of problems. For all of these approaches, we need to have some amount of abilities to look inside of our models and learn something about what the model knows. But when you use transparency tools to look inside of the model, it teaches you multiple things. It teaches you about the model. You learn about what the model has learned. But it also teaches you about the world, because the model learned a bunch of useful facts, and if you look inside the model and you can learn those facts yourself, then you become more informed. And so this process itself can be quite powerful.

That’s fundamentally the idea of microscope AI. The idea of microscope AI is to train a predictive model on the data you want to understand, and then use transparency tools to understand what that model learned about that data, and then use that understanding to guide human decision making. And so if you’re thinking about outer alignment, in some sense, this procedure is not really outer aligned because we’re just trying to predict some data. And so that’s not really an aligned objective. If you had a model that was just trying to do a whole bunch of prediction, it wouldn’t be doing good things for the world.

But the hope is that if you’re just training a predictive model, it’s not going to end up being deceptive or otherwise dangerous. And you can also use transparency tools to ensure that it doesn’t become that. We still have to solve inner alignment, like I was saying. It still has to be the case that you don’t produce deceptive models. And in fact, the goal here really is not to produce mesa optimizers at all. The goal is just to produce these predictive systems, which learn a bunch of useful facts and information, but that aren’t running optimization procedures. And hopefully we can do that by having this very simple, predictive objective, and then also by using transparency tools.

And then training competitiveness, we know how to train powerful predictive models now, you know, something like GPT-2, and now GPT-3, these are predictive models on task prediction. And so we know this process, we know that we’re very good at it. And so hopefully we’ll be able to continue to be good at it into the future. The real sticky point with microscope AI is the performance competitiveness question. So is enhanced human understanding actually going to be sufficient to solve the use cases we might want for like advanced AI? I don’t know. It’s really hard to know the answer to this question, but you can imagine some situations where it is and some situations where it isn’t.

So, for situations where you need to do long-term, careful decision making, it probably would be, right? If you want to replace CEOs or whatever, that’s a sort of very general decision making process that can be significantly improved just by having much better human understanding of what’s happening. You don’t necessarily need the AI to making the decision. On the other hand, if you need fine-grained manipulation tasks or very, very quick response times, AIs managing a factory or something, then maybe this wouldn’t be sufficient because you would need the AIs to be doing all of this quick decision making and you couldn’t have it just giving information to a few.

One specific situation, which I think is important to think about also is the situation of using your first AI system to help build a second AI system, and making sure that second AI system is aligned and competitive. I think that it also performs pretty well there. You could use a microscope AI to get a bunch of information about the process of AIs and how they work and how training works, and then get a whole bunch of information about that. Have the humans learn that information, then use that information to improve our building of the next AIs and other AIs that we build.

There are certain situations where microscope AI is performance competitive, situations where it wouldn’t be performance competitive, but it’s a very interesting proposal because it’s sort of tackling it from a very different angle. It’s like, well, maybe we don’t really need to be building agents. Maybe we don’t really need to be doing this stuff. Maybe we can just be building this microscope AI. I should mention the microscope AI idea comes from Chris Olah, who works at OpenAI. The debate idea comes from Geoffrey Irving, who’s now at DeepMind, and the amplification comes from Paul Christiano, who’s at OpenAI.

Lucas Perry: Yeah, so for sure, the best place to review these is by reading your post. And again, the post is “An overview of 11 proposals for building safe advanced AI” by Evan Hubinger and that’s on the AI Alignment Forum.

Evan Hubinger: That’s right. I should also mention that a lot of the stuff that I talked about in this podcast is coming from the Risks from Learned Optimization in Advanced Machine Learning Systems paper.

Lucas Perry: All right. Wrapping up here, I’m interested in ending on a broader note. I’m just curious to know if you have concluding thoughts about AI alignment, how optimistic are you that humanity will succeed in building aligned AI systems? Do you have a public timeline that you’re willing to share about AGI? How are you feeling about the existential prospects of earth-originating life?

Evan Hubinger: That’s a big question. So I tend to be on the pessimistic side. My current view looking out on the field of AI and the field of AI safety, I think there’s a lot of really challenging, difficult problems that we are at least not currently equipped to solve. And it seems quite likely that we won’t be equipped to solve by the time we need to solve them. I tend to think that the prospects for humanity aren’t looking great right now, but I nevertheless have a very sort of optimistic disposition, we’re going to do the best that we can. We’re going to try to solve these problems as effectively as we possibly can and we’re going to work on it and hopefully we’ll be able to make it happen.

In terms of timelines, it’s such a complex question. I don’t know if I’m willing to commit to some timeline publicly. I think that it’s just one of those things that is so uncertain. It’s just so important for us to think about what we can do across different possible timelines and be focusing on things which are generally effective regardless of how it turns out, because I think we’re really just quite uncertain. It could be as soon as five years or as long away as 50 years or 70 years, we really don’t know.

I don’t know if we have great track records of prediction in this setting. Regardless of when AI comes, we need to be working to solve these problems and to get more information on these problems, to get to the point we understand them and can address them because when it does get to the point where we’re able to build these really powerful systems, we need to be ready.

Lucas Perry: So you do take very short timelines, like say 5 to 10 to 15 years very seriously.

Evan Hubinger: I do take very short timelines very seriously. I think that if you look at the field of AI right now, there are these massive organizations, OpenAI and DeepMind that are dedicated to the goal of producing AGI. They’re putting huge amounts of research effort into it. And I think it’s incorrect to just assume that they’re going to fail. I think that we have to consider the possibility that they succeed and that they do so quite soon. A lot of the top people at these organizations have very short timelines, and so I think that it’s important to take that claim seriously and to think about what happens if it’s true.

I wouldn’t bet on it. There’s a lot of analysis that seems to indicate that at the very least, we’re going to need more compute than we have in that sort of a timeframe, but timeline prediction tasks are so difficult that it’s important to consider all of these different possibilities. I think that, yes, I take the short timelines very seriously, but it’s not the primary scenario. I think that I also take long timeline scenarios quite seriously.

Lucas Perry: Would you consider DeepMind and OpenAI to be explicitly trying to create AGI? OpenAI, yes, right?

Evan Hubinger: Yeah. OpenAI, it’s just part of the mission statement. DeepMind, some of the top people at DeepMind have talked about this, but it’s not something that you would find on the website the way you would with OpenAI. If you look at historically some of the things that Shane Legg and Demis Hassabis have said, a lot of it is about AGI.

Lucas Perry: Yeah. So in terms of these being the leaders with just massive budgets and person power, how do you see the quality and degree of alignment and beneficial AI thinking and mindset within these organizations? Because there seems to be a big distinction between the AI alignment crowd and the mainstream machine learning crowd. A lot of the mainstream ML community hasn’t been exposed to many of the arguments or thinking within the safety and alignment crowd. Stuart Russell has been trying hard to shift away from the standard model and incorporate a lot of these new alignment considerations. So yeah. What do you think?

Evan Hubinger: I think this is a problem that is getting a lot better. Like you were mentioning, Stuart Russell has been really great on this. CHAI has been very effective at trying to really get this message of, we’re building AI, we should put some effort into making sure we’re building safe AI. I think this is working. If you look at a lot of the major ML conferences recently, I think basically all of them had workshops on beneficial AI. DeepMind has a safety team with lots of really good people. OpenAI has a safety team with lots of really good people.

I think that the standard story of, oh, AI safety is just this thing that these people who aren’t involved in machine learning think about it’s something which really in the current world has become much more integrated with machine learning and is becoming more mainstream. But it’s definitely still a process, and it’s the process of like Stuart Russell says that the field of AI has been very focused on the sort of standard model and trying to move people away from that and think about some of the consequences of it takes time and it takes some sort of evolution of a field, but it is happening. I think we’re moving in a good direction.

Lucas Perry: All right, well, Evan, I’ve really enjoyed this. I appreciate you explaining all of this and taking the time to unpack a lot of this machine learning language and concepts to make it digestible. Is there anything else here that you’d like to wrap up on or any concluding thoughts?

Evan Hubinger: If you want more detailed information on all of the things that I’ve talked about, the full analysis of inner alignment and outer alignment is in Risks from Learned Optimization in Advanced Machine Learning Systems by me, as well as many of my co-authors, as well as “an overview of 11 proposals” post, which you can find on the AI Alignment Forum. I think both of those are resources, which I would recommend checking out for understanding more about what I talked about in this podcast.

Lucas Perry: Do you have any social media or a website or anywhere else for us to point towards?

Evan Hubinger: Yeah, so you can find me on all the different sorts of social media platforms. I’m fairly active on GitHub. I do a bunch of open source development. You can find me on LinkedIn, Twitter, Facebook, all those various different platforms. I’m fairly Google-able. It’s nice to have a fairly unique last name. So if you Google me, you should find all of this information.

One other thing, which I should mention specifically, everything that I do is all public. All of my writing is public. I try to publish all of my work and I do so on the AI Alignment Forum. So the AI Alignment Forum is a really, really great resource because it’s a collection of writing by all of these different AI safety authors. It’s open to anybody who’s a current AI safety researcher, and you can find me on the AI Alignment Forum as evhub, I’m E-V-H-U-B on the AI Alignment Forum.

Lucas Perry: All right, Evan, thanks so much for coming on today, and it’s been quite enjoyable. This has probably been one of the more fun AI alignment podcasts that I’ve had in a while. So thanks a bunch and I appreciate it.

Evan Hubinger: Absolutely. That’s super great to hear. I’m glad that you enjoyed it. Hopefully everybody else does as well.

End of recorded material

Sam Barker and David Pearce on Art, Paradise Engineering, and Existential Hope (With Guest Mix)

Sam Barker, a Berlin-based music producer, and David Pearce, philosopher and author of The Hedonistic Imperative, join us on a special episode of the FLI Podcast to spread some existential hope. Sam is the author of euphoric sound landscapes inspired by the writings of David Pearce, largely exemplified in his latest album — aptly named “Utility.” Sam’s artistic excellence, motivated by blissful visions of the future, and David’s philosophical and technological writings on the potential for the biological domestication of heaven are a perfect match made for the fusion of artistic, moral, and intellectual excellence. This podcast explores what significance Sam found in David’s work, how it informed his music production, and Sam and David’s optimistic visions of the future; it also features a guest mix by Sam and plenty of musical content.

Topics discussed in this episode include:

  • The relationship between Sam’s music and David’s writing
  • Existential hope
  • Ideas from the Hedonistic Imperative
  • Sam’s albums
  • The future of art and music

Where to follow Sam Barker :

Soundcloud
Twitter
Instagram
Website
Bandcamp

Where to follow Sam’s label, Ostgut Ton: 

Soundcloud
Facebook
Twitter
Instagram
Bandcamp

 

Timestamps: 

0:00 Intro

5:40 The inspiration around Sam’s music

17:38 Barker- Maximum Utility

20:03 David and Sam on their work

23:45 Do any of the tracks evoke specific visions or hopes?

24:40 Barker- Die-Hards Of The Darwinian Order

28:15 Barker – Paradise Engineering

31:20 Barker – Hedonic Treadmill

33:05 The future and evolution of art

54:03 David on how good the future can be

58:36 Guest mix by Barker

 

Tracklist:

Delta Rain Dance – 1

John Beltran – A Different Dream

Rrose – Horizon

Alexandroid – lvpt3

Datassette – Drizzle Fort

Conrad Sprenger – Opening

JakoJako –  Wavetable#1

Barker & David Goldberg – #3

Barker & Baumecker – Organik (Intro)

Anthony Linell – Fractal Vision

Ametsub – Skydroppin’

Ladyfish\Mewark – Comfortable

JakoJako & Barker – [unreleased]

 

This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.

All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

You can listen to the podcast above or read the transcript below. 

David Pearce: I would encourage people to conjure up their vision of paradise. and the future can potentially be like that only much, much better. 

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. Today we have a particularly unique episode with Berlin based DJ and producer Sam Barker as well as with David Pearce, and right now, you’re listening to Sam’s track Paradise Engineering on his album Utility. We focus centrally on the FLI Podcast on existential risk. The other side of existential risk is existential hope. This hope reflects all of our dreams, aspirations, and wishes for a better future. For me, this means a future where we’re able to create material abundance, eliminate global poverty, end factory farming and address animal suffering, evolve our social and political systems to bring greater wellbeing to everyone, and more optimistically, create powerful aligned artificial intelligence that can bring about the end involuntary suffering, and help us to idealize the quality of our minds and ethics. If we don’t go extinct, we have plenty of time to figure these things out and that brings me a lot of joy and optimism. Whatever future seems most appealing to you, these visions are a key component to why mitigating existential risk is so important. So, in the context of COVID-19, we’d like to revitalize existential hope and this podcast is aimed at doing that.  

As a part of this podcast, Sam was kind enough to create a guest mix for us. You can find that after the interview portion of this podcast and can find where it starts by checking the timestamps. I’ll also release the mix separately a few days after this podcast goes live. Some of my favorite tracks of Sam’s not highlighted in this podcast are Look How Hard I’ve Tried, and Neuron Collider. If you enjoy Sam’s work and music featured here, you can support or follow him at the links in the description. He has a Bandcamp shop where you can purchase his albums. I grabbed a vinyl copy of his album Debiasing from there. 

As for a little bit of background on this podcast, Sam Barker, who produces electronic music under the name Barker, has albums with titles such as Debiasing” and Utility. I was recommended to listen to these, and discovered his album “Utility” is centrally inspired by David Pearce’s work, specifically The Hedonistic Imperative. Utility has track titles like Paradise Engineering, Experience Machines, Gradients Of Bliss, Hedonic Treadmill, and Wireheading. So, being a big fan of Sam’s music production and David’s philosophy and writing, I wanted to bring them together to explore the theme of existential hope and Sam’s inspiration for his albums and how David fits into all of it. 

Many of you will already be familiar with David Pearce. He is a friend of this podcast and a multiple time guest. David is a co-founder of the World Transhumanist Association, rebranded Humanity+, and is a prominent figure within the transhumanism movement in general. You might know him from his work on the Hedonistic Imperative, a book which explores our moral obligation to work towards the abolition of suffering in all sentient life through technological intervention.

Finally, I want to highlight the 80,000 Hours Podcast with Rob Wiblin. If you like the content on this show, I think you’ll really enjoy the topics and guests on Rob’s podcast. His is also motivated by and contextualized in an effective altruism framework and covers a broad range of topics related to the world’s most pressing issues and what we can do about them. If that sounds of interest to you, I suggest checking out episode #71 with Ben Todd on the ideas of 80,000 Hours, and episode #72 with Toby Ord on existential risk. 

And with that, here’s my conversation with Dave and Sam, as well as Sam’s guest mix.

Lucas Perry: For this first section, I’m basically interested in probing the releases that you already have done, Sam, and exploring them and your inspiration for the track titles and the soundscapes that you’ve produced. Some of the background and context for this is that much of this seems to be inspired by and related to David’s work, in particular the Hedonistic Imperative. I’m at first curious to know, Sam, how did you encounter David’s work, and what does it mean for you?

Sam Barker: David’s work was sort of arriving in the middle of a sort of a series of realizations, and kind of coming from a starting point of being quite disillusioned with music, and a little bit disenchanted with the vagueness, and the terminology, and the imprecision of the whole thing. I think part of me has always wanted to be some kind of scientist, but I’ve ended up at perhaps not the opposite end, but quite far away from it.

Lucas Perry: Could explain what you mean by vagueness and imprecision?

Sam Barker: I suppose the classical idea of what making music is about is a lot to do with the sort of western idea of individualism and about self expression. I don’t know. There’s this romantic idea of artists having these frenzied creative bursts that give birth to the wonderful things, that it’s some kind of struggle. I just was feeling super disillusioned with all of that. Around that time, 2014 or 15, I was also reading a lot about social media, reading about behavioral science, trying to figure what was going on in this arena and how people are being pushed in different directions by this algorithmic system of information distribution. That kind of got me into this sort of behavioral science side of things, like the addictive part of the variable-ratio reward schedule with likes. It’s a free dopamine dispenser kind of thing. This was kind of getting me into reading about behavioral science and cognitive science. It was giving me a lot of clarity, but not much more sort of inspiration. It was basically like music.

Dance music especially is a sort of complex behavioral science. You do this and people do that. It’s all deeply ingrained. I sort of imagine the DJ as a sort Skinner box operator pulling puppet strings and making people behave in different ways. Music producers are kind of designing clever programs using punishment and reward or suspense and release, and controlling people’s behavior. The whole thing felt super pushy and not a very inspiring conclusion. Looking at the problem from a cognitive science point of view is just the framework that helped me to understand what the problem was in the first place, so this kind of problem of being manipulative. Behavioral science is kind of saying what we can make people do. Cognitive psychology is sort of figuring out why people do that. That was my entry point into cognitive psychology, and that was kind of the basis for Debiasing.

There’s always been sort of a parallel for me between what I make and my state of mind. When I’m in a more positive state, I tend to make things I’m happier with, and so on. Getting to the bottom of what tricks were, I suppose, with dance music. I kind of understood implicitly, but I just wanted to figure out why things worked. I sort of came to the conclusion it was to do with a collection of biases we have, like the confirmation bias, and the illusion of truth effect, and the mere exposure effect. These things are like the guardians of four four supremacy. Dance music can be pretty repetitive, and we describe it sometimes in really aggressive terminology. It’s a psychological kind of interaction.

Cognitive psychology was leading me to Kaplan’s law of the instrument. The law of the instrument says that if you give a small boy a hammer, he’ll find that everything he encounters requires pounding. I thought that was a good metaphor. The idea is that we get so used to using tools in a certain way that we lose sight of what it is we’re trying to do. We act in the way that the tool instructs us to do. I thought, what if you take away the hammer? That became a metaphor for me, in a sense, that David clarified in terms of pain reduction. We sort of put these painful elements into music in a way to give this kind of hedonic contrast, but we don’t really consider that that might not be necessary. What happens when we abolish these sort of negative elements? Are the results somehow released from this process? That was sort of the point, up until discovering the Hedonistic Imperative.

I think what I was needing at the time was a sort of framework, so I had the idea that music was decision making. To improve the results, you have to ask better questions, make better decisions. You can make some progress looking at the mechanics of that from a psychology point of view. What I was sort of lacking was a purpose to frame my decisions around. I sort of had the idea that music was a sort of a valence carrier, if you like, and that it could tooled towards a sort of a greater purpose than just making people dance, which was for Debiasing the goal, really. It was to make people dance, but don’t use the sort of deeply ingrained cues that people used to, and see if that works.

What was interesting was how broadly it was accepted, this first EP. There was all kinds of DJs playing it in techno, ambient, electro, all sorts of different styles. It reached a lot of people. It was as if taking out the most functional element made it more functional and more broadly appealing. That was the entry point to utilitarianism. There was sort of an accidentally utilitarian act, in a way, to sort of try and maximize the pleasure and minimize the pain. I suppose after landing in utilitarianism and searching for some kind of a framework for a sense of purpose in my work, the Hedonistic Imperative was probably the most radical, optimistic take on the system. Firstly, it put me in a sort of mindset where it granted permission to explore sort of utopian ideals, because I think the idea of pleasure is a little bit frowned upon in the art world. I think the art world turns its nose up at such direct cause and effect. The idea that producers could sort of be paradise engineers of sorts, so the precursors to paradise engineers, that we almost certainly would have a role in a kind of sensory utopia of the future.

There was this kind of permission granted. You can be optimistic. You can enter into your work with good intentions. It’s okay to see music as a tool to increase overall wellbeing, in a way. That was kind of the guiding idea for my work in the studio. I’m trying, these days, to put more things into the system to make decisions in a more conscious way, at least where it’s appropriate to. This sort of notion of reducing pain and increasing pleasure was the sort of question I would ask at any stage of decision making. Did this thing that I did serve those ends? If not, take a step back and try a different approach.

There’s something else to be said about the way you sort of explore this utopian world without really being bogged down. You handle the objections in such a confident way. I called it a zero gravity world of ideas. I wanted to bring that zero gravity feeling to my work, and to see that technology can solve any problem in this sphere. Anything’s possible. All the obstacles are just imagined, because we fabricate these worlds ourselves. These are things that were really instructive for me, as an artist.

Lucas Perry: That’s quite an interesting journey. From the lens of understanding cognitive psychology and human biases, was it that you were seeing those biases in dance music itself? If so, what were those biases in particular?

Sam Barker: On both sides, on the way it’s produced and in the way it’s received. There’s sort of an unspoken acceptance. You’re playing a set and you take a kick drum out. That signals to people to perhaps be alert. The lighting engineer, they’ll maybe raise the lights a little bit, and everybody knows that the music is going into sort of a breakdown, which is going to end in some sort of climax. Then, at that point, the kick drum comes back in. We all know this pattern. It’s really difficult to understand why that works without referring to things like cognitive psychology or behavioral science.

Lucas Perry: What does the act of debiasing the reception and production of music look like and do to the music and its reception?

Sam Barker: The first part that I could control was what I put into it. The experiment was whether a debiased piece of dance music could perform the same functionality, or whether it really relies on these deeply ingrained cues. Without wanting to sort of pat myself on the back, it kind of succeeded in its purpose. It was sort of proof that this was a worthy concept.

Lucas Perry: You used the phrase, earlier, four four. For people who are not into dance music, that just means a kick on each beat, which is ubiquitous in much of house and techno music. You’ve removed that, for example, in your album Debiasing. What are other things that you changed from your end, in the production of Debiasing, to debias the music from normal dance music structure?

Sam Barker: It was informing the structure of what I was doing so much that I wasn’t so much on a grid where you have predictable things happening. It’s a very highly formulaic and structured thing, and that all keys into the expectation and this confirmation bias that people, I think, get some kind of kick from when the predictable happens. They say, yep. There you go. I knew that was going to happen. That’s a little dopamine rush, but I think it’s sort of a cheap trick. I guess I was trying to get the tricks out of it, in a way, so figuring out what they were, and trying to reduce or eliminate them was the process for Debiasing.

Lucas Perry: That’s quite interesting and meaningful, I think. Let’s just take trap music. I know exactly how trap music is going to go. It has this buildup and drop structure. It’s basically universal across all dance music. Progressive house in the 2010s was also exactly like this. What else? Dubstep, of course, same exact structure. Everything is totally predictable. I feel like I know exactly what’s going to happen, having listened to electronic music for over a decade.

Sam Barker: It works, I think. It’s a tried and tested formula, and it does the job, but when you’re trying to imagine states beyond just getting a little kick from knowing what was going to happen, that’s the place that I was trying to get to, really.

Lucas Perry: After the release of Debiasing in 2018, which was a successful attempt at serving this goal and mission, you then discovered the Hedonistic Imperative by David Pearce, and kind of leaned into consequentialism, it seems. Then, in 2019, you had two releases. You had BARKER 001 and you had Utility. Now, Utility is the album which most explicitly adopts David Pearce’s work, specifically in the Hedonistic Imperative. You mentioned electronic dance producers and artists in general can be sort of the first wave of, or can perhaps assist in paradise engineering, insofar as that will be possible in the near to short terms future, given advancements in technology. Is that sort of the explicit motivation and framing around those two releases of BARKER 001 and Utility?

Sam Barker: BARKER 001 was a few tracks that were taken out of the running for the album, because they didn’t sort of fit the concept. Really, I knew the last track was kind of alluding to the album. Otherwise, it was perhaps not sort of thematically linked. Hopefully, if people are interested in looking more into what’s behind the music, you can lead people into topics with the concept. With Utility, I didn’t want to just keep exploring cognitive biases and unpicking dance music structurally. It’s sort of a paradox, because I guess the Hedonistic Imperative argues that pleasure can exist without purpose, but I really was striving for some kind of purpose with the pleasure that I was getting from music. That sort of emerged from reading the Hedonistic Imperative, really, that you can apply music to this problem of raising the general level of happiness up a notch. I did sort of worry that by trying to please, it wouldn’t work, that it would be something that’s too sickly sweet. I mean, I’m pretty turned off by pop music, and there was this sort of risk that it would end up somewhere like that. That’s it, really. Just looking for a higher purpose with my work in music.

Lucas Perry: David, do you have any reactions?

David Pearce: Well, when I encountered Utility, yes, I was thrilled. As you know, essentially I’m a writer writing in quite heavy sub-academic prose. Sam’s work, I felt, helps give people a glimpse of our glorious future, paradise engineering. As you know, the reviews were extremely favorable. I’m not an expert critic or anything like that. I was just essentially happy and thrilled at the thought. It deserves to be mainstream. It’s really difficult, I think, to actually evoke the glorious future we are talking about. I mean, I can write prose, but in some sense music can evoke paradise better, at least for many people, than prose.

Sam Barker: I think it’s something you can appreciate without cognitive effort which, your prose, at least you need to be able to read. It’s a bit more of a passive way of receiving, music, which I think is an intrinsic advantage it has. That’s actually really a relief to hear, because there was just a small fear in my mind that I was grabbing these concepts with clumsy hands and discrediting them.

David Pearce: Not at all.

Sam Barker: It all came from a place of sincere appreciation for this sort of world that you are trying to entice people with. When I’ve tried to put into words what it was that was so inspiring, I think it’s that there was also a sort of very practical, kind of making lots of notes. I’ve got lots of amazing one liners. Will we ever leave the biological dark ages or the biological domestication of heaven? There was just so many things that conjure up such vividly, heavenly sensations. It sort of brings me back to the fuzziness of art and inspiration, but I hope I’ve tried to adopt the same spirit of optimism that you approached the Hedonistic Imperative with. I actually don’t know what state of mind your approach was at the time, even, but it must’ve come in a bout of extreme hopefulness.

David Pearce: Yes, actually. I started taking Selegiline, and six weeks later I wrote the Hedonistic Imperative. It just gave me just enough optimism to embark on. I mean, I have, fundamentally, a very dark view of Darwinian life, but for mainly technical reasons I think the future is going to be super humanly glorious. How do you evoke this for our dark, Darwinian minds?

Sam Barker: Yeah. How do we get people excited about it? I think you did a great job.

David Pearce: It deserves to go mainstream, really, the core idea. I mean, forget the details, the neurobabble of genetics. Yeah, of course it’s incredibly important, but this vision of just how sublimely wonderful life could be. How do we achieve full spectrum, multimedia dominance? I mean, I can write it.

Lucas Perry: Sounds like you guys need to team up.

Sam Barker: It’s very primitive. I’m excited where it could head, definitely.

Lucas Perry: All right. I really like this idea about music showing how good the future can be. I think that many of the ways that people can understand how good the future can be comes from the best experiences they’ve had in their life. Now, that’s just a physical state of your brain. If something isn’t physically impossible, then the only barrier to achieving and realizing that thing is knowledge. Take all the best experiences in your life. If we could just understand computation, and biology in the brain, and consciousness well enough. It doesn’t seem like there’s any real limits to how good and beautiful things can get. Do any of the tracks that you’ve done evoke very specific visions, dreams, desires, or hopes?

Sam Barker: I would be sort of hesitant to make direct links between tracks and particular mindsets, because when I’m sitting down to make music, I’m not really thinking about any one particular thing. Rather, I’m trying to look past things and look more about what sort of mood I want to put into the work. Any of the tracks on the record, perhaps, could’ve been called paradise engineering, is what I’m saying. The names from the tracks are sort of a collection of the ideas that were feeding the overall process. The application of the names was kind of retroactive connection making. That’s probably a disappointment to some people, but the meaning of all of the track names is in the whole of the record. I think the last track on the record, Die-Hards of the Darwinian Order, that was a phrase that you used, David, to describe people clinging to the need for pain in life to experience pleasure.

David Pearce: Yes.

Sam Barker: That track was not made for the record. It was made some time ago, and it was just a technical experiment to see if I could kind of recreate a realistic sounding band with my synthesizers. The label manager, Alex, was really keen to have this on the record. I was kind of like, well, it doesn’t fit conceptually. It has a kick drum. It’s this kind of somber mood, and the rest of the record is really uplifting, or trying to be. Alex was saying he liked the contrast to the positivity of the rest of the album. He felt like it needed this dose of realism or something.

David Pearce: That makes sense, yes.

Sam Barker: I sort of conceded in the end. We called it Die-Hards of the Darwinian Order, because that was what I felt like he was.

David Pearce: Have you told him this?

Sam Barker: I told him. He definitely took the criticism. As I said, it’s the actual joining up of these ideas that I make notes on. The tracks themselves, in the end, had to be done in a creative way sort of retroactively. That doesn’t mean to say that all of these concepts were not crucial to the process of making the record. When you’re starting a project, you call it something like new track, happy two, mix one, or something. Then, eventually, the sort of meaning emerges from the end result, in a way.

Lucas Perry: It’s just like what I’ve heard from authors of best selling books. They say you have no idea what the book is going to be called until the end.

Sam Barker: Right, yeah.

David Pearce: One of the reasons I think it’s so important to stress life based on gradients of bliss ratcheting up hedonic set points is that, instead of me or anyone else trying to impose their distinctive vision of paradise, it just allows, with complications, everyone to keep most of their existing values and preferences, but just ratchets up hedonic tone and hedonic range. I mean, this is the problem with so many traditional paradises. They involve the imposition of someone else’s values and preferences on you. I’m being overly cerebral about it now, but I think my favorite track on the album is the first. I would encourage people to conjure up their vision of paradise and the future can potentially be like that and be much, much better.

Sam Barker: This, I think, relates to the sort of pushiness that I was feeling at odds with. The music does take people to these kind of euphoric states, sometimes chemically underwritten, but it’s being done in a dogmatic and singular way. There’s not much room for personal interpretation. It’s sort of everybody’s experiencing one thing, which I think there’s something in these kind of communal experiences that I’m going to hopefully understand one day.

Lucas Perry: All right. I think some of my favorite tracks are Look How Hard I’ve Tried on Debiasing. I also really like Maximum Utility and Neuron Collider. I mean, all of it is quite good and palatable.

Sam Barker: Thank you. The ones that you said are some of my personal favorites. It’s also funny how some of the least favorite tracks, or not least favorite, but the ones that I felt like didn’t really do what they set out to do, were other people’s favorites. Hedonic Treadmill, for example. I’d put that on the pile of didn’t work, but people are always playing it, too, finding things in it that I didn’t intentionally put there. Really, that track felt to me like stuck on the hedonic treadmill, and not sort of managing to push the speed up, or push the level up. This is, I suppose, the problem with art, that there isn’t a universal pleasure sense, that there isn’t a one size fits all way to these higher states.

David Pearce: You correctly called it the hedonic treadmill. Some people say the hedonistic treadmill. Even one professor I know calls it the hedonistic treadmill.

Lucas Perry: I want to get on that thing.

David Pearce: I wouldn’t mind spending all day on a hedonistic treadmill.

Sam Barker: That’s my kind of exercise, for sure.

Lucas Perry: All right, so let’s pivot here into section two of our conversation, then. For this section, I’d just like to focus on the future, in particular, and exploring the state of dance music culture, how it should evolve, and how science and technology, along with art and music, can evolve into the future. This question comes from you in particular, Sam, addressed to Dave. I think you were curious about his experiences in life and if he’s ever lost himself on a dance floor or has any special music or records that put him in a state of bliss?

Sam Barker: Very curious.

David Pearce: My musical autobiography. Well, some of my earliest memories is of a wind up gramophone. I’m showing my age here. Apparently, as a five year old child, I used to sing on the buses. Daisy, Daisy, give me your answer, due. I’m so crazy over love of you. Then, graduating via the military brass band play, apparently I used to enjoy as a small child to pop music. Essentially, for me, very, very unanswerable about music. I like to use it as a backdrop, you know. At its best, there’s this tingle up one’s spine one gets, but it doesn’t happen very often. The only thing I would say is that it’s really important for me that music should be happy. I know some people get into sad music. I know it’s complicated. Music, for me, has to elicit something that’s purely good.

Sam Barker: I definitely have no problem with exploring the sort of darker side of human nature, but I also have come to the realization that there’s better ways to explore the dark sides than aesthetic stimulation through, perhaps, words and ideas. Aesthetics is really at its optimum function when it’s working towards more positive goals of happiness and joy, and these sort of swear words in the art world.

Lucas Perry: Dave, you’re not trying to hide your rave warehouse days from us, are you?

David Pearce: Well, yeah. Let’s just say I might not have been entirely drug naïve with friends. Let’s just say I was high on life or something, but it’s a long time since I have explored that scene. Part of me still misses it. When it comes to anything in the art world, just as I think visual art should be beautiful. Which, I mean, not all serious artists would agree.

Sam Barker: I think the whole notion is just people find it repulsive somehow, especially in the art world. Somebody that painted a picture and then the description reads I just wanted it to be pretty is getting thrown out the gallery. What greater purpose could it really take on?

David Pearce: Yeah.

Lucas Perry: Maybe there’s some feeling of insecurity, and a feeling and a need to justify the work as having meaning beyond the sensual or something. Then there may also be this fact contributing to it. Seeking happiness and sensual pleasure directly, in and of itself, is often counterproductive towards that goal. Seeking wellbeing and happiness directly usually subverts that mission, and I guess that’s just a curse of Darwinian life. Perhaps those, I’m just speculating here, contribute to this cultural distaste, as you were pointing out, to enjoy pleasure as the goals of art.

Sam Barker: Yeah, we’re sort of intellectually allergic to these kinds of ideas, I think. They just seem sort of really shallow and superficial. I suppose that was kind of my existential fear before the album came out, that the idea that I was just trying to make people happy would just be seen as this shallow thing, which I don’t see it as, but I think the sentiment is quite strong in the art world.

Lucas Perry: If that’s quite shallow, then I guess those people are also going to have problems with the Buddha in people like that. I wouldn’t worry about it too much. I think you’re on the same intentional ground as the Buddha. Moving a little bit along here. Do you guys have thoughts or opinions on the future of aesthetics, art, music, and joy, and how science and technology can contribute to that?

David Pearce: Oh, good heavens. One possibility will be that, as neuroscience advances, it’ll be possible to isolate the molecular experience of visual beauty, musical bliss, spiritual excellence, and scientifically amplify them so that one can essentially enjoy musical experiences that are orders of magnitude richer than anything that’s even physiologically feasible today. I mean, I can use all this fancy language, but what actually this will involve, in terms of true trans-human and post-human artists. The gradients of bliss is important here, in such that I think we will retain information sensitive gradients, so we don’t lose critical sharpness, discernment, critical appreciation. Nonetheless, this base point for aesthetic excellence. All experience can be superhumanly beautiful. I mean, I religiously star my music collection from one to five, but what would a six be like? What would 100 be like?

Sam Barker: I like these questions. I guess the role of the artist in the long term future in creating these kinds of states maybe gets pushed out at some point by people who are in the labs and reprogram the way music is, or the way that any sort of sensory experience is received. I wonder whether there’s a place in techno utopia for music made by humans, or whether artists sort of just become redundant in some way. I’m not going to get offended if the answer is bye, bye.

Lucas Perry: I’d be interested in just making a few points about the evolutionary perspective before we get into the future of ape artists or mammalian artists. It just seems like some kind of happy cosmic accident that, for the vibration of air, human beings have developed a sensory appreciation of information and structure embedded in that medium. I think we’re quite lucky, as a species, that music and musical appreciation is embedded in the software of human genetics, as such that we can appreciate, and create, and share musical moments. Now, with genetic engineering and more ambitious paradise engineering, I think it would be beautiful to expand the modalities for which artistic, or aesthetic, or the appreciation of beauty can be experienced.

Music is one clear way of having aesthetic appreciation and joy. Visual art is another one. People do derive a lot of satisfaction from touch. Perhaps that could be more information structured in the ways that music and art are. There might be a way of changing what it means to be an intelligent thing, such there can be just an expansion of art appreciation across all of our essential modalities, and even into essential modalities which don’t exist yet.

David Pearce: The nature of trans-human and post-human art just leaves me floundering.

Lucas Perry: Yeah. It seems useful here just to reflect on how happy of an accident art is. As we begin to evolve, we can get into, say, A.I. here. A.I. and machine learning is likely to be able to have very, very good models of, say, our musical preferences within the next few years. I mean, they’re somewhat already very good at it. They’ll continue to get better. Then, we have fairly rudimental algorithms which can produce music. If we just extrapolate out into the future, eventually artificial intelligent systems will be able to produce music better than any human. In that world, what is the role of the human artist? I guess I’m not sure.

Sam Barker: I’m also completely not sure, but I feel like it’s probably going to happen in my lifetime, that these technologies get to a point that they actually do serve the purpose. At the moment, there is A.I. software that can create unique compositions, but it does so by looking at an archive of music with Ava. It’s Bach, and Beethoven, and Mozart. Then it reinterprets all of the codes that are embedded in that, and uses that to make new stuff. It sounds just like a composing quoting, and it’s convincing. Considering this is going to get better and better, I’m pretty confident that we’ll have a system that will be able to create music to a person’s specific taste, having not experienced music, that would say look at my music library, and then start making things that I might like. I can’t say how I feel about that.

Let’s say if it worked, and it did actually surprise me, and I was feeling like humans can’t make this kind of sensation in me. This is a level above. In a way, yeah, somebody that doesn’t like the vagueness of the creative process, this really appeals, somehow. The way that things are used, and the way that our attention is sort of a resource that gets manipulated, I don’t know whether we have an incredible technology, once again, in the wrong hands. It’s just going to be turned into a mind control. These kind of things would be put to use for nefarious purposes. I don’t fear the technology. I fear what we, in our unmodified state, might do with it.

David Pearce: Yes. I wonder when the last professional musician will retire, having been eclipsed by A.I. I mean, in some sense, we are, I think, stepping stones to something better. I don’t know when the last philosophers will be pensioned off. Hard problem of mind solved, announced in nature, Nobel Prize beckons. Distinguished philosophers of mind announce their intention to retire. Hard to imagine, but one does suppose that A.I. will be creating work of ever greater excellence tailored to the individual. I think the evolutionary roots of aesthetic appreciation are very, very deep. It kind of does sound very disrespectful to artists, saying that A.I. could replace artists, but mathematicians and scientists are probably going to be-

Lucas Perry: Everyone’s getting replaced.

Sam Barker: It’s maybe a similar step to when portrait painters when the camera was threatening their line of work. You can press a button and, in an instant, do what would’ve taken several days. I sort of am cautiously looking forward to more intelligent assistance in the production of music. If we did live in a world where there wasn’t any struggles to express, or any wrongs to right, any flaws in our character to unpick, then I would struggle to find anything other than the sort of basic pleasure of the action of making music. I wouldn’t really feel any reason to share what I made, in a sense. I think there’s a sort of moral, social purpose that’s embedded within music, if you want to grasp it. I think, if A.I. is implemented with that same moral, ethical purpose, then, in a way, we should treat it as any other task that comes to be automated or extremely simplified. In some way, we should sort of embrace the relaxation of our workload, in a way.

There’s nothing to say that we couldn’t just continue to make music if it brought us pleasure. I think distinguishing between these two things of making music and sharing it was an important discovery for me. The process of making a piece of music, if it was entirely pleasurable, but then you treat the experience like it was a failure because it didn’t reach enough people, or you didn’t get the response or the boost to your ego that you were searching from it, then it’s your remembering self overriding your experiencing self, in a way, or your expectations getting in the way of your enjoyment of the process. If there was no purpose to it anymore, I might still make it for my own pleasure, but I like to think I would be happy that a world that didn’t require music was already a better place. I like to think that I wouldn’t be upset with my redundancy with my P45 from David Pearce.

David Pearce: Oh, no. With a neuro chip, you see, your creative capacities could be massively augmented. You’d have narrow super intelligence on a chip. Now, in one sense, I don’t think classical digital computers are going to wake up and become conscious. They’re never actually going to be able to experience music or art or anything like this. In that sense, they will remain tools, but tools that one can actually incorporate within oneself, so that they become part of you.

Lucas Perry: A friendly flag there that many people who have been on this podcast disagree with that point. Yeah, fair enough, David. I mean, it seems that there are maybe three options. One is, as you mentioned, Sam, to find joy and beauty in more things, and to sort of let go of the need for meaning and joy to come from not being something that is redundant. Once human beings are made obsolete or redundant, it’s quite sad for us, because we derive much of our meaning, thanks a lot, evolution, from accomplishing things and being relevant. The two paths here seems like reaching some kind of spiritual evolution such that we’re okay with being redundant, or being okay with passing away as a species and allowing our descendants to proliferate. The last one would be to change what it means to be human, such that by merging or bi-evolution we somehow remain relevant to the progress of civilization. I don’t know which one it will be, but we’ll see.

David Pearce: I think the exciting one, for me, is where we can harness the advances in technology in a conscious way to positive ends, to greater net wellbeing in society. Maybe I’m hooked on the old ideals, but I do think a sense of purpose in your pleasure elevates the sensation somewhat.

Lucas Perry: I think human brains on MDMA would disagree with that.

Sam Barker: Yeah. You’ve obviously also reflected on an experience like that after the event, and come to the conclusion that there wasn’t, perhaps, much concrete meaning to your experience, but it was joyful, and real, and vivid. You don’t want to focus too much on the fact that it was mostly just you jumping up and down on a dance floor. I’m definitely familiar with the pleasure of essentially meaningless euphoria. I’ll say, at the very least, it’s interesting to think about. Reading a lot about the nature of happiness and the general consensus there being that happiness is sort of a balance of pleasure a purpose. The idea that maybe you don’t need the purpose is worth exploring, I think, at least.

David Pearce: We do have this term empty hedonism. One thing that’s striking is that one, for whatever reason or explanation, gets happier and happier. Everything seems more intensely meaningful. There are pathological forms like mania or hypermania, where it leads to grandiosity, masonic delusions, even theomania, and thinking one is God. It’s possible to have much more benign versions. In practice, I think when life is based on gradients of bliss, eventually, superhuman bliss, this will entail superhuman meaning and significance. Essentially, we’ve got a choice. I mean, we can either have pure bliss, or one could have a combination of miss and hyper-motivation, and one will be able to tweak the dials.

Sam Barker: This is all such deliciously appealing language as someone who’s spending a lot of their time tweaking dials.

David Pearce: This may or may not be the appropriate time to ask, but tell me about what future projects have you planned?

Sam Barker: I’m still very much exploring the potential of music as an increaser of wellbeing, and I think it’s sort of leading me in interesting directions. At present, I’m sort of in another crossroads, I feel. The general drive to realize these sort of higher functions of music is still a driving force. I’m starting to look at what is natural in music and what is learned. Like you say, there is this long history of the way that we appreciate sound. There’s link to all kinds of repetitive experiences that our ancestors had. There’s other aspects to sound production that are also very old. Use of reverb is connected to our experience as sort of cavemen dwelling in these kind of reverberant spaces. These were kind of sacred spaces for early humans, so this feeling of when you walk into a cathedral, for example, this otherworldly experience that comes from the acoustics is, I think, somehow deeply tied to this historical situation of seeking shelter in caves, and the caves having a bigger significance in the lives of early humans.

There’s a realization, I suppose, that what we’re experiencing that relates to music is rhythm, tone, and timbre noise. If you just sort of pay attention to your background noise, the things that you’re most familiar with are actually not very musical. You don’t really find harmony in nature very much. I’m sort of forming some ideas around what parts of music and our response to music are cultural, and what are natural. It’s sort of a strange word to apply. Our sort of harmonic language is a technical construction. Rhythm is something we have a much deeper connection with through our lives as defined by rhythms of planets and that dividing our time into smaller and smaller ratios down to heartbeats and breathing. We’re sort of experiencing really complex poly-rhythmic silence form of music, I suppose. I’m separating these two concepts of rhythm and harmony and trying to get to the bottom of their function and the goal of elevating bliss and happiness. I guess, looking at what the tools I’m using are and what their role could be, if that makes any sense.

David Pearce: In some sense, this sounds weird. I think, insofar as it’s possible, one does have a duty to take care of oneself, and if one can give happiness to others, not least by music, in that sense, one can be a more effective altruist. In some sense, perhaps one feels, ethically, ought one to be working 12, 14 hours a day to make the world a better place. Equally, we all have our design limitations, and just being able to relax and, either as a consumer of music, or if one is a creator of music, that has a valuable role, too. It really does. One needs to take care of one’s own mental health to be able to help others.

Sam Barker: I feel like the kind of under the bonnet tinkering that, in some way, needs to happen for us to really make use of the new technologies. We need to do something about human nature. I feel like we’re a bit further away from those sort of realities than we are with the technological side. I think there needs to be sort of emergency measures, in some way, to improve human nature through the old fashioned social, cultural nudges, perhaps, as a stopgap until we can really roll our sleeves up and change human nature on a molecular level.

David Pearce: Yeah. I think we might need both. All the kind of environmental, social, political form together, whether biological, genetic, by a happiness revolution. I would love to be able to. A 100 year plan blueprint to get rid of suffering. Replace it with gradients of bliss, paradise engineering. In practice, I feel the story of Darwinian life still has several centuries to go. I hope I’m too pessimistic. Some of my trans-humanist colleagues, intelligence explosion, or a complete cut via the infusion of humans and our machines, but we shall see.

Lucas Perry: David, Sam and I, and everyone else, loves your prose so much. Could you just kind of go off here and muster your best prose to give us some thoughts as beautiful as sunsets for how good the future of music, and art, and gradients of intelligent bliss will be?

David Pearce: I’m afraid. Put eloquence on hold, but yeah. Just try for a moment to remember your most precious, beautiful, sublime experience in your life, whatever it was. It may or may not be suitable for public consumption. Just try to hold it briefly. Imagine if life could be like that, only far, far better, all the time, and with no nasty side effects, no adverse social consequences. It is going to be possible to build this kind of super civilization based on gradients of bliss. Be over ambitious. Needless to say, if anything I have written, unfortunately you’d need to wade through all matter of fluff. I just want to say, I’m really thrilled and chuffed with utility, so anything else is just vegan icing on the cake.

Sam Barker: Beautiful. I’m really, like I say, super relieved that it was taken as such. It was really a reconfiguring of my approach and my involvement with the thing that I’ve sort of given my life to thus far, and a sort of a clarification of the purpose. Aside from anything else, it just put me in a really perfect mindset for addressing mental obstacles in the way of my own happiness. Then, once you get that, you sort of feel like sharing it with other people. I think it started off a very positive process in my thoughts, which sort of manifested in the work I was doing. Extremely grateful for your generosity in lending these ideas. I hope, actually, just that people scratched the surface a little bit, and maybe plug some of the terms into a search engine and got kind of lost in the world of utopia a little bit. That was really the main reason for putting these references in and pushing people in that direction.

David Pearce: Well, you’ve given people a lot of pleasure, which is fantastic. Certainly, I’d personally rather be thought of as associated with paradise engineering and gradients of bliss, rather than the depressive, gloomy, negative utilitarian.

Sam Barker: Yeah. There’s a real dark side to the idea. I think the thing I read after the Hedonistic Imperative was some of Les Knight’s writing about the voluntary human extinction movement. I honestly don’t know if he’d be classified as a utilitarian, but this sort of egocentric utilitarianism, which you sort of endorse through including the animal kingdom in your manifesto. There’s sort of a growing appreciation for this kind of antinatal sentiment.

David Pearce: Yes, antinatalism seems to be growing, but I don’t think it’s every going to be dominant. The only way to get rid of suffering and ensure high quality of life for all sentient beings is going to be, essentially, get to the heart of the problem to rewrite ourselves. I did actually do an antinatalist podcast the other week, but I’m only a soft antinatalist, because there’s always going to be selection pressure in favor of a predisposition to go forth and multiply. One needs to build alliances with fanatical life lovers, even if when one contemplates the state of the world, one has some rather dark thoughts.

Sam Barker: Yeah.

Lucas Perry: All right. So, is there any questions or things we haven’t touched on that you guys would like to talk about?

David Pearce: No. I just really want to just thank you to Lucas for organizing this. You’ve got quite a diverse range of podcasts now. Sam, I’m honored. Thank you very much. Really happy this has gone well.

Sam Barker: Yeah. David, really, it’s been my pleasure. Really appreciate your time and acceptance of how I’ve sort of handled your ideas.

Lucas Perry: I feel really happy that I was able to connect you guys, and I also think that both of you guys make the world more beautiful by your work and presence. For that, I am grateful and appreciative. Also, very much enjoy and take inspiration from both of your work, so keep on doing what you’re doing.

Sam Barker: Thanks, Lucas. Same to you. Really.

David Pearce: Thank you, Lucas. Very much appreciated.

Lucas Perry: I hope that you’ve enjoyed the conversation portion of this podcast. Now, I’m happy to introduce the guest mix by Barker. 

Steven Pinker and Stuart Russell on the Foundations, Benefits, and Possible Existential Threat of AI

 Topics discussed in this episode include:

  • The historical and intellectual foundations of AI 
  • How AI systems achieve or do not achieve intelligence in the same way as the human mind
  • The rise of AI and what it signifies 
  • The benefits and risks of AI in both the short and long term 
  • Whether superintelligent AI will pose an existential risk to humanity

You can take a survey about the podcast here

Submit a nominee for the Future of Life Award here

 

Timestamps: 

0:00 Intro 

4:30 The historical and intellectual foundations of AI 

11:11 Moving beyond dualism 

13:16 Regarding the objectives of an agent as fixed 

17:20 The distinction between artificial intelligence and deep learning 

22:00 How AI systems achieve or do not achieve intelligence in the same way as the human mind

49:46 What changes to human society does the rise of AI signal? 

54:57 What are the benefits and risks of AI? 

01:09:38 Do superintelligent AI systems pose an existential threat to humanity? 

01:51:30 Where to find and follow Steve and Stuart

 

Works referenced: 

Steven Pinker’s website and his Twitter

Stuart Russell’s new book, Human Compatible: Artificial Intelligence and the Problem of Control

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Note: The following transcript has been edited for style and clarity.

 

Lucas Perry: Welcome to the AI Alignment Podcast. I’m Lucas Perry. Today, we have a conversation with Steven Pinker and Stuart Russell. This episode explores the historical and intellectual foundations of AI, how AI systems achieve or do not achieve intelligence in the same way as the human mind, the benefits and risks of AI over the short and long-term, and finally whether superintelligent AI poses an existential risk to humanity. If you’re not currently following this podcast series, you can join us by subscribing on Apple Podcasts, Spotify, Soundcloud, or on whatever your favorite podcasting app is by searching for “Future of Life.” Our last episode was with Sam Harris on global priorities. If that sounds interesting to you, you can find that conversation wherever you might be following us. 

I’d also like to echo two announcements for the final time. So, if you’ve been tuned into the podcast recently, you can skip ahead just a bit. The first is that there is an ongoing survey for this podcast where you can give me feedback and voice your opinion about content. This goes a long way for helping me to make the podcast valuable for everyone. This survey should only come out once a year. So, this is a final call for thoughts and feedback if you’d like to voice anything. You can find a link for the survey about this podcast in the description of wherever you might be listening. 

The second announcement is that at the Future of Life Institute, we are in the midst of our search for the 2020 winner of the Future of Life Award. The Future of Life Award is a $50,000 prize that we give out to an individual who, without having received much recognition at the time of their actions, has helped to make today dramatically better than it may have been otherwise. The first two recipients of the Future of Life Award were Vasili Arkhipov and Stanislav Petrov, two heroes of the nuclear age. Both took actions at great personal risk to possibly prevent an all-out nuclear war. The third recipient was Dr. Matthew Meselson, who spearheaded the international ban on bioweapons. Right now, we’re not sure who to give the 2020 Future of Life Award to. That’s where you come in. If you know of an unsung hero who has helped to avoid global catastrophic disaster, or who has done incredible work to ensure a beneficial future of life, please head over to the Future of Life Award page and submit a candidate for consideration. The link for that page is on the page for this podcast or in the description of wherever you might be listening. If your candidate is chosen, you will receive $3,000 as a token of our appreciation. We’re also incentivizing the search via MIT’s successful red balloon strategy, where the first to nominate the winner gets $3,000 as mentioned, but there are also tiered pay outs where the first to invite the nomination winner gets $1,500, whoever first invited them gets $750, whoever first invited them $375, and so on. You can find details about that on the Future of Life Award page. Link in the description. 

Steven Pinker is a Professor in the Department of Psychology at Harvard University. He conducts research on visual cognition, psycholinguistics, and social relations. He has taught at Stanford and MIT and is the author of ten books: The Language Instinct, How the Mind Works, The Blank Slate, The Better Angels of Our Nature, The Sense of Style, and Enlightenment Now: The case for Reason, Science, Humanism, and Progress. 

Stuart Russell is a Professor of Computer Science and holder of the Smith-Zadeh chair in engineering at the University of California, Berkeley. He has served as the vice chair of the World Economic Forum’s Council on AI and Robotics and as an advisor to the United Nations on arms control. He is an Andrew Carnegie Fellow as well as a fellow of the Association for the Advancement of Artificial Intelligence, the Association for Computing Machinery and the American Association for the Advancement of Science.

He is the author with Peter Norvig of the definitive and universally acclaimed textbook on AI, Artificial Intelligence: A Modern Approach. He is also the author of Human Compatible: Artificial Intelligence and the Problem of Control. 

And with that, here’s our conversation with Steven Pinker and Stuart Russell. 

So let’s get started here then. What are the historical and intellectual foundations upon which the ongoing AI revolution is built?

Steven Pinker: I would locate them in the Age of Reason and the Enlightenment, when Thomas Hobbes said, “Reasoning is but reckoning,” reckoning in the old-fashioned sense of “calculation” or “computation.” A century later, the two major style of AI today were laid out: The neural network, or massively parallel interconnected system that is trained with examples and generalizes by similarity, and the symbol-crunching, propositional, “Good Old-Fashioned AI.” Both of those had adumbrations during the Enlightenment.  David Hume, in the empiricist or associationist tradition, said there are only three principles of connection among ideas, contiguity in time or place, resemblance, and cause and effect. On the other side, you have Leibniz, who thought of cognition as the grinding of wheels and gears and what we would now call the manipulation of symbols. Of course the actual progress began in the 20th century with the ideas of Turing and Shannon and Weaver and Norbert Wiener. The rest is the history that Stuart writes about in his textbook and his recent book.

Stuart Russell: I think I would like to add in a little bit of ancient history as well, just because I think Aristotle not only thought a lot about how human thinking was organized and how it could be correct or incorrect and how we could make rational decisions, he very clearly describes a backward regression goal planner in one of his pieces, and his work was incredibly influential. One of the things he said is we deliberate about means and not about ends. I think he says, “A doctor does not choose whether to heal,” and so on. And you might disagree with that, but I think that that’s been a pretty influential thread in Western thinking for the last two millennia or more. That we kind of take objectives as given and the purpose of intelligence is to act in ways that achieve your objectives.

That idea got refined gradually. So Aristotle talked mainly about goals and logically provable sequences of actions that would achieve those goals. And then in the 17th and 18th centuries, I want to give a shout out to the French and the Swiss, so Pascal and Fermat and Arnauld and Bernoulli brought in ideas of rational decision making under uncertainty and the weighing of probabilities and the concept of utility that Bernoulli introduced. So that generalized Aristotle’s idea, but it didn’t change the fundamental principle that they took the objectives, the utilities, as given. Just intrinsic properties of a human being in a given moment.

In AI, we sort of went through the same historical development, except that we did the logic stuff for the first 30 years or so, roughly, and then we did the probability and decision theory stuff for the next 30 years. I think we’re in a terrible state now, because the vast majority of the deep learning community, when you read their papers, nothing is cited before 2012. Occasionally, from time to time, they’ll say things like, “For this problem, the learning algorithms that we have are probably inadequate, and in future I think we should direct some of our research towards something that we might call reasoning or knowledge,” as if no one had ever thought of those things before and they were the first person in history to ever have the idea that reasoning might be necessary for intelligence.

Steven Pinker: Yes.

Stuart Russell: I find this quite frustrating and particularly frustrating when students want to actually just bypass the AI course altogether and go straight to the deep learning course, because they just don’t think AI is necessary anymore.

Steven Pinker: Indeed, and also galling to me. In the late ’80s and ’90s I was involved in a debate over the applicability of the predecessors of deep learning models, then called multi-layer perceptrons, artificial neural networks, connectionist networks, and Parallel Distributed Processing networks. Gary Marcus and Alan Prince and Michael Ullman and other collaborators I pointed out the limitations of trying to achieve intelligence–even for simple linguistic processes like forming the plural of a noun or a past tense of a verb–if the only tool you had available was the ability to associate features with features, without any symbol processing. That debate went on for a couple of decades and then petered out. But then one of the prime tools in the neural network community, multilayer networks trained by error back-propagation,  were revived in 2012. Indeed there is an amnesia for the issues in that debate, which Gary Marcus has revived for a modern era.

It would be interesting to trace the truly radical idea behind artificial intelligence: not just that there are rules or algorithms, whether they are from logic or probability theory, that an intelligent agent can use, the way a human pulls out a smartphone. But the idea that there is nothing but rules or algorithms, and that’s what an intelligent agent consists of: that is, no ghost to the machine, no agent separate from the mechanism. And there, I’m not sure whether Aristotle actually exorcised the ghost in the machine. I think he did have a notion of a soul. The idea that it’s rules all the way down,  that intelligence is just a mechanism, probably has shallow roots. Although Hobbes probably could claim credit for it, and perhaps Hume as well.

Lucas Perry: That’s an excellent point, Steve, it seems like Abrahamic religions have kind of given rise in part to this belief, or maybe an expression of that belief, the kind of mind-body dualism, the ghost in the machine where the mind seems to be a nonphysical thing. So it seems like intelligence has had to go the same road of “life.” There used to be “elan vital” or some other spooky presupposed mechanism for giving rise to life. And so similarly with intelligence, it seems like we’ve had to move from thinking that there was a ghost in the machine that made the things work to there being rules all the way down. If you guys have anything else to add to that, I think that’d be interesting.

My other two reactions to what has been said so far are that this point about computer science taking the goal as given, I think is important and interesting, and maybe we could expand upon that a little bit. Then there’s also, Stuart mentioned the difference between AI and deep learning and that students want to skip the AI and just get straight to the deep learning. That seemed a little bit confusing to me.

Steven Pinker: Let me address the first part and I’ll turn it over to Stuart for the second. The notion of dualism–that there is a mechanism, but sitting on top of it is an immaterial agent or self or soul or I–is enshrined in the Abrahamic religions and in other religions, but it has deep intuitive roots. We are all intuitively dualists (Paul Bloom has made this argument in his book Descartes’ Baby.) Fortunately, when we deal with each other in everyday life we don’t treat each other like robots or wind-up dolls, but we assume that there is an inner life that is much like ours, and we make sense of people’s behavior in terms of their beliefs and desires, which we don’t conceptualize as neural circuitry transforming patterns. We think there’s a locus of consciousness, which is easy to think of as separate from the flesh that we’re made of, especially since–and this is a point made by the 19th century British anthropologist Edward Tylor–that there’s actually a lot of empirical “evidence” that supports dualism in our everyday life.  Like dreaming.

When you dream, you know your body is in bed the whole time, but there’s some part of you that’s up and about in the world. When you see your reflection in a mirror or in still water, there is an animated essence that seems to have parted company with your body. When you’re in a trance from a drug or a fever and have an out-of-body experience, it seems  that we and our bodies are not the same thing. And with death, one moment a person is walking around, the next moment the body is lifeless. It’s natural to think that it’s lost some invisible ingredient that had animated it while it was alive.

Today we know that this is just the activity of the brain, but in terms of the experience available to a person, dualism seems perfectly plausible. It’s one of the great achievements of neuroscience, on the one hand,  to show that a brain is capable of supporting problem solving and perception and decision making, and of the computational sciences, on the other, for showing that intelligence can be understood in terms of information and computation, and that goals (like the Aristotelian final cause) can be understood in terms of control and cybernetics and feedback.

Stuart Russell: On the point that in computer science, we regard the objectives as fixed, it’s much broader than just computer science. If you look at Von Neumann — Morgenstern and their characterizations of rationality, nowhere do they talk about what is the process by which the agent might rationally come by its preferences. The agent is always assumed a priori to come with the preferences built in, and the only constraint is that those preferences be self-consistent so that you can’t be driven around circles of intransitive preferences where you simply cough up money to go round and round the same circle.

The same thing I think is true in control theory, where the objective is the cost function, and you design a controller that minimizes the expected cost function, which might be a square of the distance from the desired trajectory or whatever it might be. Same in statistics, where let’s just assume that there’s a loss function. There’s no discussion in statistics of what the loss function should be or how the loss function might change or anything like that.

So this is something that pervades many of the technological underpinnings of the 20th century. As far as I can tell, to some extent in developmental psychology, but I think in moral philosophy, people really take seriously the question of what goal should we have? Is it moral for an agent to have such and such as its objective, and how could we, for example, teach an agent to have different objectives? And that gets you into some very unchartered philosophical waters about what is a rational process that would lead an agent to have different objectives at the end than it did at the beginning, given that if it has different objectives at the end, then it can only expect that it won’t be achieving the objectives that it has at the beginning. So why would it embark on a process that’s going to result in failure to achieve the objectives that it currently has?

So that’s sort of a philosophical puzzle, but it’s a real issue because in fact human beings do change. We’re not born with the preferences that we have as adults, and so there is a notion of plasticity that absolutely has to be understood if we’re to get this right.

Steven Pinker: Indeed, and I suspect we’ll return to the point later when we talk about potential risks of advanced artificial intelligence. The issue is whether a system having intelligence implies that the system would have certain goals, and probably Stuart and I agree the answer is no, at least not by definition. Precisely because what you want and how to get what you want are two logically independent questions. Hume famously said that reason must be that the slave of the passions, by which he didn’t mean that we should just surrender to our impulses and do whatever feels good. What he meant was that reason itself can’t specify the goals that it tries to bring about. Those are exogenous. And indeed, von Neumann and Morgenstern are often misunderstood as saying that we must be ruthlessly, egotistical self interested maximizers. Whereas the goal that is programmed into us — say by evolution or by culture — could include other people’s happiness as part of our utility function. That is a question that merely making our choices consistent is silent on.

 So the ability to reason doesn’t by itself give you moral goals, including taking into account the interests of others. That having been said, there is a long tradition in moral philosophy which shows  how it doesn’t take much to go from one to the other. Because as soon as we care about persuading others, as soon as our interests depend on how others treat us, then we can’t get away with saying “only my interests count and yours don’t because I am me and you’re not,” because there is no logical difference between “me” and “you.” So we’re forced to a kind of impartiality, wherein whatever I insist on for me I’ve got to grant to you, a kind of Golden Rule or Categorical Imperative that makes our interests interchangeable as soon as we’re in discourse with one another.

This is all to acknowledge Stuart’s point, but to take it a few steps further in how it deals with the question of what our goals ought to be.

Stuart Russell: The other point you raised Lucas was on being confused by my distinction between AI and deep learning.

Lucas Perry: That’s right.

Stuart Russell: I think you’re pointing to a confusion that exists in the public mind, in the media and even in parts of the AI community. AI has always included machine learning as a subdiscipline, all the way back to Turing’s 1950 paper, where he speculates, in fact, that might be a good way to build AI would be just start with a child program and train it to be an adult intelligent machine. But there are many other sub-fields of AI; knowledge representation, reasoning, planning, decision making under uncertainty, problem solving, perception. Machine learning is relevant to all of these because they all involve processes that can be improved through experience. So that’s what we mean by machine learning: simply the improvement of performance through experience; and deep learning is a technology that helps with that process.

It by itself as far as we can tell, doesn’t have what is necessary to produce general intelligence. Just to pick one example, the idea that human beings know things seems so self-evident that we hardly need to argue about it. But deep learning systems in a real sense don’t know things. They can’t usefully acquire knowledge by reading a book and then go out and use that to design a radio telescope, which human beings arguably can. So it seems inevitable that if we’re going to make progress, I mean, sure we take the advances that deep learning has offered. Effectively, what we’ve discovered with deep learning is that you can train more complicated circuits than we previously would have guessed possible using various kinds of stochastic gradient descent, and other tricks.

I think it’s true to say that most people would not have expected that you could build a thousand layer network that was 20,000 units wide. So it’s got 20 million circuit elements and simply put a signal in one end and some data in the other and expect that you’re going to be able to train those 20 million elements to represent the complicated function that you’re trying to get it to learn. So that was a big surprise, and that capability is opening up all kinds of new frontiers: in vision, in speech recognition, language, machine translation, and physical control in robots among other things. It’s a wonderful set of advances, but it’s not the entire solution. Any more than group theory is the entire solution to mathematics. There’s lots of other branches of mathematics that are exciting and interesting and important and you couldn’t function without them. The same is true for AI.

So I think that we’re probably going to see even without further major conceptual advances, another decade of progress in achieving greater understanding of why deep learning works and how to do it better, and all the various applications that we can create using it. But I think if we don’t go back and then try to reintegrate all the other ideas of AI, we’re going to hit a wall. And so I think the sooner we lose our obsession with this new shiny thing, the better.

Steven Pinker: I couldn’t agree more. Indeed, in some ways we have already hit the wall. Any user of Siri or Cortana or a question-answering system has been frustrated by the way they just make associations to individual words and have a shallow understanding of the syntax of the sentence. If you ask Google or Siri, “Can you show me digital music players without a camera?” It’ll give you a long list of music players with discussions of their cameras, failing to understand the syntax of “X without Y.” Or, “What are some fast food restaurants nearby that are not McDonald’s?”  and you get a list of nearby McDonald’s.

It’s not hard to bump into the limitations of systems that for all their sophistication are being trained on associations among local elements, and can–I agree, surprisingly–learn higher-order combinations of those elements. But despite the name “deep learning,” they are shallow in the sense that they don’t build up a knowledge base of what are the objects, and who did what to whom, which they can access through various routes.

Stuart Russell: Yeah. My favorite example, I’m not sure if it’s apocryphal, is you say to Siri, “Call me an ambulance,” and Siri says, “Okay. From now on I’ll call you Ann Ambulance.”

Steven Pinker: In Marx Brothers movie, there’s the sequence, “Call me a taxi.” “Okay. You’re a taxi.” I don’t know if the AI story is an urban legend based on the Marx Brothers movie or whether life is imitating art.

Lucas Perry: Steven, I really appreciate it and liked that point about dualism and intelligence. I think it points in really interesting directions around identity in the self, which we don’t have time to get into here. But I did appreciate that.

So moving on ahead here, to what extent do you both see AI systems as achieving intelligence in the same way or not as the human mind does? What kinds of similarities are there or differences?

Stuart Russell: This is a really interesting question and we could spend the whole two hours just talking about this. So by artificial intelligence, I’m going to take it that we mean not deep learning, but the full range of techniques that AI researchers have developed over the years.

So some of them– for example, logical reasoning were– developed going back to Aristotle and other Greek philosophers who developed formal logic to model human thinking. So it’s not surprising that when we build programs that do logical reasoning, we are in some sense capturing one aspect of human reasoning capability. Then in the ’80s, as I mentioned, AI developed reasoning under uncertainty, and then later on refining that with notions of causality as well, particularly in the work of Judea Pearl. The differences are really because AI and cognitive science separated probably sometime in the ’60s. I think before that there wasn’t really a clear distinction between whether you were doing AI or whether you were doing cognitive science. It was very much the thought that if you could get a program to do anything that we think of as requiring intelligence with a human, then you were in some sense exhibiting a possible theory of how the human does it, or even you would make introspective claims and say, “Look, I’ve now shown that this theory of intelligence really works.”

But fairly soon people said, “Look, this is not really scientific. If you want to make a claim about how the human mind does something, you have to base it on real psychological experimentation with human subjects.” And that’s distinct from the engineering goal of AI, which is simply to produce programs that demonstrate certain capabilities. So for most of the last 50, 60 years, these two fields have grown further and further apart. I think now partly because of deep learning and partly because of other work, for example in probabilistic programming, we can start to do things that humans do that we couldn’t do before. So it becomes interesting again, to ask, well, are humans really somewhat Bayesian and are they doing these kinds of Bayesian symbolic probabilistic program learning that, for example, Josh Tenenbaum was proposing or are they doing something else? For example, Geoff Hinton is pretty adamant that as he puts it, symbols are the luminiferous aether of AI by which he means that they’re simply something that we imagined and they have no physical reality whatsoever in the human mind.

I find this a little hard to believe, and you have to wonder if symbols don’t exist, why are almost all deep learning applications aimed at recognizing the symbolic category to which an object belongs, and I haven’t heard an answer yet from the deep learning community about why that is. But it’s also clear that AI systems are doing things that have no resemblance to human cognition. When you look at what AlphaGo is actually doing, part of it is that sort of perception-like ability to look at a position and get a sense, to use an anthropomorphic term, of its potential for winning for white or for black. And perhaps that part is human-like, and actually it’s incredibly good. It’s probably better at recognizing the potential position directly with no deliberation whatsoever than a human is.

But the other part of what AlphaGo does is completely non-human. It’s considering sequences of moves from the current state that run all the way to the end of the game. So part of it is searching in a tree which could go 40 or 50 or possibly more moves into the future. Then from the end of the tree, it then plays a random game all the way to the end and sees who wins that game. And this is nothing like what human beings do. When humans are reasoning about a game like Go or Chess, first of all, we are thinking about it at multiple levels of abstraction. So we’re thinking about the liveness of a particular group, we’re thinking about control of a particular region of territory on the board. We’re thinking, “Well, if I give up control of this territory, then I can trade it for capturing his group over there.”

So this kind of reasoning simply doesn’t happen in AlphaGo at all. We reason back from goals. In chess you say, “Perhaps I could trap his queen. Let me see if I can come up with a move that blocks his exit for the queen.” So we reason backwards from some goals and no chess program and no Go program does that kind of reasoning. The reason humans do this is because the world is incredibly complicated and in different circumstances, different kinds of cognitive processing are efficient and effective in producing good decisions quickly. And that’s the real issue for human intelligence, right?

If we didn’t have to worry about computation, then we would just set up the giant unknown, partially observable, Markov decision process of the universe, solve it and then we would take the first action in the virtually infinite strategy tree that solves that POMDP. Then we would observe the next percept, we would update all our beliefs about the universe and we would resolve the universe and that’s how we would proceed. We would have to do that sort of roughly every millisecond to control the muscles in our body, but we don’t do anything like that. All of the different kinds of mental capabilities that we have are deployed in this amazingly fluid way to get us through the complexity of the real world. We are so far away in AI from understanding how to do that, that when I see people say, “We’re just going to scale up our deep learning systems by another three orders of magnitude and we’ll be more intelligent than humans,” I just smile.

Steven Pinker: Yeah. I’d like to complement some of those observations. It is true that in the early days of artificial intelligence and cognitive psychology, they were driven by some of the same players. Herb Simon and Allen Newell can be credited as among the founders of AI and the founders of cognitive psychology. Likewise, Marvin Minsky and John McCarthy. When I was an undergraduate, I caught the tail end of what was called the cognitive revolution. It was exhilarating after the dominance of psychology by behaviorism, which forbade any talk of mentalistic concepts. You weren’t allowed to talk about memories or plans or goals or ideas or rules, because they were considered to be unobservable and thus unscientific. Then the concept of computation domesticated those mentalist terms and opened up a huge space of hypotheses. What are the rules by which we understand and formulate sentences?, a project that Noam Chomsky initiated. How can we model human knowledge as a semantic network?, a project that Minsky and Alan Collins and Ross Quillian and others developed. How do we make sense of foresight and planning and problem solving, which Newell and Simon pioneered?

There was a lot of back and forth between AI and cognitive science when they were first exposed to the very idea that intelligence could be understood in mechanistic terms, and there was a flow of hypotheses from computer science that psychologists then tested as possible models. Ideas that you couldn’t even frame, you couldn’t even articulate before there was the language of computation, such as What is the capacity of human short term memory? or What are the search algorithms by which we explore a problem space? These were unintelligible in the era of behaviorism.

All this caught the attention of philosophers like Hilary Putnam, and later Dan Dennett, who noted that the ideas from the hybrid of cognitive psychology and artificial intelligence were addressing deep questions about what mental entities consist of, namely information processing states. The back-and-forth spilled into the ’70s when I was a graduate student, and even the ’80s when centers for cognitive science were funded by the Sloan Foundation. There was also a lot of openness in the companies that hired artificial intelligence researchers: AT&T Bell Labs, which was a scientific powerhouse before the breakup of AT&T. Bolt Beranek and Newman in Cambridge, which eventually became part of Verizon. I would go there as a grad student to hear talks on artificial intelligence. I don’t know if this is apocryphal history, but Xerox Palo Alto Research Centers, where I was a consultant, was so open that, according to legend, Steve Jobs walked in and saw the first computer with a graphic user interface and a mouse and windows and icons, stole the ideas, and went on to build the Lisa and then the Macintosh. Xerox was out on their own invention, and companies got proprietary . Many of the AI researchers in companies no longer publish  in peer-reviewed journals in psychology the way they used to, and the two cultures drifted apart. 

Since hypotheses from computer science and artificial intelligence are just hypotheses, there is the question of whether the best engineering solution to a problem is the one that the brain uses. There’s the obvious objection that the hardware is radically different: the brain is massively parallel and noisy and stochastic; computers are serial and deterministic. That led in part to the backlash in the ’80s when perceptrons and artificial neural networks were revived. There was skepticism about the more symbolic approaches to artificial intelligence, which has been revived now in the deep learning era.

to get back to the question, what are ways in which human minds differ from AI systems? It depends on the AI system assessed, as Stuart pointed out. Both of us would agree that the easy equation of deep learning networks with human intelligence is unwarranted, that a lot of the walls that deep learning is hitting come about because, despite the noisy parallel elements the brain is made of, we do emulate a kind of symbol processing architecture, where we can be taught explicit propositions, and human intelligence does make use of these symbols in addition to massively parallel associative networks.

I can’t help but mention a historical irony.  I’ve known Geoff Hinton since we were both post-docs. Hinton himself, early in his career, provided a refutation of the very claim of his that Stuart cited, that symbols are like luminiferous aether, a mythical entity. Geoff and I have noted to each other that we’ve switched sides in the debate on the nature of cognition. There was a debate in the 1970s on the format of mental imagery. Geoff and I were on opposite sides, but he was the symbolic proposition guy and I was the analog parallel network guy.  

Hinton showed that our understanding of an  object depends on the symbolic format in which we mentally represent it. Take something as simple as a cube, he said. Imagine a cube poised on one of its vertices, with the diagonally opposite vertex aligned above it. If you ask people, “Point to all the other vertices,” they are stymied. Their imagery fails, and they often leave out a couple of vertices. But if, instead of describing it to them as a cube tilted on its diagonal axis, you describe it as two tilted diamonds, one above the other, or as two tripods joined by a zig-zag ring, they “see” the correct answer. Even visualizing an object depends critically on how people mentally describe it to themselves with symbols. This is an argument for symbolic representations that Geoff Hinton made in 1979, and with his recent remarks about symbols he seems to have forgotten his own powerful example.

Stuart Russell: I think another area where deep learning is clearly not capturing the human capacity for learning, is just in the efficiency of learning. I remember in the mid ’80s going to some classes in psychology at Stanford, and there were people doing machine learning then and they were very proud of their results, and somebody asked Gordon Bower, “how many examples do humans need to learn this kind of thing?” And Gordon said “one Sometimes two, usually one”, and this is genuinely true, right? If you look for a picture book that has one to two million pictures of giraffes to teach children what a giraffe is, you won’t find one. Picture books that tell children what giraffes are have one picture of a giraffe, one picture of an elephant, and the child gets it immediately, even though it’s a very crude cartoonish drawing, of a giraffe or an elephant, they never have a problem recognizing giraffes and elephants for the rest of their lives.

Deep learning systems are needing, even for these relatively simple concepts, thousands, tens of thousands, millions of examples, and the idea within deep learning seems to be that well, the way we’re going to scale up to more complicated things like learning how to write an email to ask for a job, is that we’ll just have billions or trillions of examples, and then we’ll be able to learn really, really complicated concepts. But of course the universe just doesn’t contain enough data for the machine to learn direct mappings from perceptual inputs or really actually perceptual input history. So imagine your entire video record of your life, and that feeds into the decision about what to do next, and you have to learn that mapping as a supervised learning problem. It’s not even funny how unfeasible that is. The longer the deep learning community persists in this, the worse the pain is going to be when their heads bang into the wall.

Steven Pinker: In many discussions of superintelligence inspired by the success of deep learning I’m puzzled as to what people could possibly mean. We’re sometimes asked to imagine an AI system that’s could solve the problem of Middle East peace or cure cancer. That implies that we would have to train it with 60 million other diseases and their cures, and it would extract the patterns and cure the new disease that we present it with. Needless to say, when it comes to solving global warming, or pandemics, or Middle Eastern peace, there aren’t going to be 60 million similar problems with their correct answers that could provide the training set for supervised learning.

Lucas Perry: So, human children and humans are generally capable of one shot learning, or you said we can learn via seeing one instance of a thing, whereas machine learning today is trained up via very, very large data sets. Can you explain what the actual perceptual difference is going on there? It seems for children, they see a giraffe and they can develop a bunch of higher order facts about the giraffe, like that it is tan, and has spots, and a long neck, and horns and other kinds of higher order things. Whereas machine learning systems may be doing something else. So could you explain that difference?

Stuart Russell: Yeah, I think you actually captured it pretty well. The human child is able to recognize the object, not as 20 million pixels, including–let’s not forget–all the pixels of the background. So many of these learning algorithms are actually learning to recognize the background, not the object at all. They’re really picking up on spurious regularities that happen in the way the images are being captured. But the human child immediately separates the figure from the background says, “okay, it’s the figure that’s being called a giraffe”, and recognizes the higher level properties; “okay, it’s a quadruped, relatively large” the most distinguishing characteristic, as you say, is the very long neck, plus the way its hide is colored. Probably most kids might not even notice the horns and I’m not even sure if all giraffes have the horns, or just the males or just the adults. I don’t know the answer to that.

So I wasn’t paying much attention to all those images. This carries over to many, many other situations, including in things like planning, where if we observe someone carrying out a successful behavior, that one example combined with our prior knowledge is typically enough for us to get the general idea of how to do that thing. And this prior knowledge is absolutely crucial. Just information-theoretically, you can’t learn from one example reliably, unless you bring to bear a great deal of prior knowledge. And this is completely absent in deep learning systems in two ways. One is they don’t have any prior knowledge. And two is some of the prior knowledge is specifically about the thing you’re trying to predict. So here, we’re trying to predict the category of an animal and we already have a great deal of prior knowledge about what it means to belong to a category of animals.

So for example, who owns you, is not an attribute that the child would need to know or care about. If you said, what kind of animal is this? And deep learning systems have no ability to include or exclude any input attribute on the basis of its relevance to what it’s trying to predict, because they know nothing about what it is you’re trying to predict. And if you think about it, that doesn’t make any sense, right? If I said, “okay, I want you to learn to predict predicate P1279A. Okay? And I’m going to give you loads and loads of examples.” And now you get a perfect predictor for ‘P1279A’, but you have absolutely no use for it, because P1279A doesn’t connect to anything else in your cognition. So you learned a completely useless predictor because you know nothing about the thing that you’re trying to predict.

So it seems like it’s broken in several really, really important ways, and I would say probably the absence of prior knowledge or any means to bring to bear prior knowledge on the learning process is the most crucial.

Steven Pinker: Indeed, this goes back to our conversation on how basic principles of intelligence that govern the design of intelligent systems provide hypotheses that can be tested within psychology. What Stuart has identified is ultimately the nature-nurture problem in cognition. Namely, what are the innate constraints that govern children’s first hypotheses as they try to make sense of the world? 

One famous answer is Chomsky’s universal grammar, which guides children as they acquire language. Another is the idea from my colleagues Susan Carey and Elizabeth Spelke, in different formulations, that children have a prior concept of a physical object whose parts move together, which persists over time, and which follows continuous spatiotemporal trajectories; and that they have a distinct  concept of an agent or mind, which is governed by beliefs and desires. Maybe, or maybe not, they come equipped with still other frameworks for concepts, like the concept of a living thing or the concept of an artifact, and these priors radically cut down the search space of hypotheses, so they don’t have to search at the level of pixels and all their logically possible weighted combinations. 

Of course, the challenge in the science is how you specify the innate constraints, the prior knowledge, so that they aren’t obviously too specific, given what we know about the plasticity of human cognition. The extreme example being the late philosopher Jerry Fodors’ suggestion that all concepts are innate, including “trombone” and “doorknob” and “carburetor.”

Stuart Russell: (Laughs)

Steven Pinker: Hard to swallow, but between that extreme and the deep learning architecture in which the only thing that’s innate are the pixels, the convolutional network that allows for translational invariance, and the network of connections, there’s an interesting middle ground. That defines the central research question in cognitive development.

Stuart Russell: I don’t think you have to believe in extensive innate structures in order to believe that prior knowledge is really, really important for learning. I would guess that some aspects of our cognition are innate, and one of them is probably that the world contains things, and that’s really important because if you just think about the brain as circuits, some circuit languages don’t have things as first class entities, whereas first order logical languages or programming languages do have things as first class entities and that’s a really important distinction.

Even if you believe that nothing is innate, the point is how does everything that you have perceived up to now affect your ability to learn the next thing? One argument is, everything you’ve perceived up to now, is simply data, and somehow magically, we have access to all our past perceptions, and then you’re just training a function from that whole lot to the next thing to do or how to interpret the next object.

That doesn’t make much sense. Presumably the experience you have from birth or even pre-birth onwards, is converted into something and one argument is that it’s just converted into something like knowledge, and then that knowledge is brought to bear on learning problems, for example, to even decide what are the relevant aspects of the input for predicting category membership of this thing?

And the other view would be that, in the deep learning community, they would say probably something like the accumulation of features. If you imagine a giant recurrent neural network: in the hidden layers of the recurrent neural network over years and years and years of perception, you’re building up internal representations, features, which then can perhaps simplify the learning of the next concept that you need to learn. And there’s probably some truth in that too.

And absolutely having a library of features that are generally useful for predicting and decision making and planning and our entire vocabulary, I think this is something that people often miss, our vocabulary, our language, is not just something we use to communicate with each other. It’s an enormous resource for simplifying the world in the right ways, to make the next thing we need to know, or the next thing we need to do, relatively easy. Right? So you imagine you decide at the age of 12, I want to understand the physical laws that control the universe.

The fact that we have in our vocabulary, something like doing a PhD, makes it much more feasible to figure out what your plan is going to be, to achieve this objective. If you didn’t have that, and if you didn’t have all the pieces of doing a PhD, like take a course, read a book, this library of words and action primitives, at all these levels of abstraction, is a resource without which you would be completely unable to formulate plans of any length or any likelihood of success. And this is another area where current AI systems, I would say generally, not just deep learning, we lack a real understanding of how to formulate these hierarchies and acquire this vocabulary and then how to deploy it in a seamless way so that we’re always managing to function successfully in the real world.

Lucas Perry: I’m basically just as confused about I guess, intelligence as anyone else. So the difference, it seems to me between the machine learning system and the child who one-shot-learns the giraffe is, that the child brings into this learning scenario, this knowledge that you guys were talking about, that they understand that the world is populated by things and that there are other minds and some other ideas about 3D objects and perception, but a core difference seems to be something like symbols and the ability to manipulate symbols is this right? Or is it wrong? And what are symbols and effective symbol manipulation made of?

Steven Pinker: Yes, and that is a limitation of the so-called deep learning systems, which are a subset of machine learning, which is a subset of artificial intelligence. It’s certainly not true that AI systems don’t manipulate symbols.  Indeed, that’s what classical AI systems trade in: manipulation of propositions, implementation of versions of logical inference or of cause-and-effect reasoning. Those can certainly be implemented in AI systems–it’s just gone out of fashion with the deep learning craze.

Lucas Perry: Well, they don’t learn those symbols, right? Like we give them the symbols and then they manipulate them.

Steven Pinker: The basic architecture of the system, almost by definition, can’t be learned;  you can’t learn something with nothing. There have got to be some elementary information processes, some formats of data representation, some basic ways of transforming one representation to another, that are hardwired into the architecture of the system. It’s an open empirical question, in the case of the human brain, whether it includes variables for objects and minds, or living things, or artifacts, or if those are scaffolded one on top of the other with experience. There’s nothing in principle that prevents AI systems from doing that;  many of them do, but at least for now they seem to have fallen out of fashion.

Stuart Russell: There is precedent for generating new symbols, both in the probabilistic programming literature and in the inductive logic programming literature. So predicate invention is a very important reason for doing inductive logic programming. But I agree with Steve, that it’s an open question as to whether the basic capacity to have a new symbol based representations in the brain is innate, or is it learned? There’s very anecdotal evidence about what happens to children who are not brought up among other human beings. I think those anecdotes suggest that they don’t become symbol-using in the same way. So it might be that the process of developing symbol-using capabilities in the brain is enormously aided by the fact that we grow up in the presence of symbol-using entities, namely our parents and family members and community. And of course that leads you to then a chicken and egg problem.

So you’d have to argue in that case that early humans, or pre-humans had much more rudimentary symbol-like capabilities: some animals have the ability to refer to different phenomena or objects with different signs, different kinds of sounds that some new world monkeys have, for example, for a snake and for puma, but they’re not able to do the full range of things that we do with symbols. You could argue that the symbol using capability developed over hundreds of thousands of years and the unaided human mind doesn’t come with it built in, but because we’re usually bathed in symbol-using activity around us, we are able to quickly pick it up. I don’t know what the truth is, but it seems very clear that this kind of capability, for example, gives you the ability to generalize so much faster than you can with circuits. So just to give a particular example of the rules of Go, we talked about earlier, the rules of Go apply the same rule at every time-step in the game.

And it’s the same rule at every square in the game, except around the edges, and if you have what we call first order capability, meaning you can have universal quantifiers or in programs, we think of these as loops, you can say very quickly for every square on the board, if you have a piece on there and it’s surrounded by the enemy, then it’s dead. That’s sort of a crude approximation to how things work and go, but it’s roughly right. In a circuit, you can’t say that because you don’t have the ability to say for every square. So you have to have a piece of circuit for each square. So you’ve got 361 copies of the rule in each of those copies has to be learned separately, and this is one of the things that we do with convolutional neural networks.

A convolutional neural network has the universal quantifier over the input space, built into it. So it’s a kind of cheating, and as far as we know, the brain doesn’t have that type of weight sharing. So the key aspect is not just the physical structure of the convolutional network, which has this repeating local receptive fields on each different part of the retina, so to speak, but that we also insist that weights for each of those local receptive fields are copied across all receptive fields in the retina. So there aren’t millions of separate weights that are trained, there’s only a few, sometimes even just a handful of weights that are trained and then the code makes sure that those are effectively copied across the entire retina. And the brain. I don’t think has any way to do that, so it’s doing something else to achieve this kind of rapid generalization.

Lucas Perry: All right. So now with all of this context and understanding about intelligence and its origins today in 2020, AI is beginning to proliferate and is occupying a lot of news cycles. What particular important changes to human society does the rise and proliferation of AI signal and how do you view it in relation to the agricultural and industrial revolutions?

Steven Pinker: I’m going to begin with a meta-answer, which is that we should keep in mind how spectacularly ignorant we are of the future even the relatively near future. Experts at superforecasting studied by Phil Tetlock, pretty much the best in the world, go down to about chance after about five years out. And we know, looking at predictions of the future from the past,  how ludicrous they can be, both in underpredicting technological changes and in overpredicting them. A 1993 book by Bill Gates called The Road Ahead  said virtually nothing about the internet! And there’s a sport of looking at science-fiction movies and spotting ludicrous anachronisms, such as the fact that in 2001: A Space Odyssey they were using typewriters. They had suspended animation and trips to Jupiter, but they hadn’t invented the word processor. To say nothing of the social changes they failed to predict, such as the fact that all of the women in the movie were secretaries and assistants.

So we should begin by acknowledging that it is extraordinarily difficult to predict the future. And there’s a systematic reason, namely that the future depends not just on technological developments, but also on people’s reaction to the developments, and on the  reactions to the reactions, and the reactions to the reactions to the reactions. There are seven and a half billion of us reacting, and we have to acknowledge that there’s a lot we’re going to get wrong. 

It’s safe to say that a lot of tasks that involve physical manipulation, like stocking shelves and driving trucks, are going to be automated, and societies will have to deal with the possibility of radical changes in employment, and Stuart talks about those in his book. We don’t know whether the job market will be flexible enough to create new jobs, always at the frontier of what machines can’t yet do, or whether there’ll be massive unemployment that will require economic adjustments, such as a universal basic income or government sponsored service. 

Less clear is the extent to which high-level decision making, like policy, diplomacy, or scientific hypothesis-testing,  will be replaced by AI. I think that’s impossible to predict.  Although, closer to the replacement of truck drivers by autonomous vehicles, AI as a useful tool, rather than as a replacement, for human intelligence will explode in science and business and technology and every walk of life.

Stuart Russell: I think all of those things are true. And I agree that our general record of forecasting has been pretty dismal. I am smiling as Steve was talking, because I was remembering Ray Kurzweil recently saying how proud he was that he had predicted the self driving car, I think it was in ’96 or ’92, something like that, and possibly wasn’t aware that the first self driving car was driving on the freeway in 1987, before he even thought to predict that such a thing might happen. If I had to say, in the next decade, if you said, roughly speaking, that what happened in the 2010s was primarily that visual perception became very crudely feasible for machines when it wasn’t before.

And that’s already having huge impact, including in self-driving cars, I would say that language understanding at least in a simplified sense will become possible in this decade. And I think it’ll be a combination of deep learning with probabilistic programming, with Bayesian and symbolic methods. That will open up enormous areas of activity to machines where they simply couldn’t go before, and some of that will be very straightforward, job replacement for call center workers. Most of what they do, I think could be automated by systems that are able to understand their conversations. The role of the smart speaker, the Alexa, or Cortana or Siri or whatever will radically change and will enable AI systems to actually understand your life to a much greater extent. One of the reasons that Siri and Cortana or Alexa are not very useful to me is because they just don’t understand anything about my life.

The “call me an ambulance” example illustrates that. If I got a text message saying “Johnny’s in the hospital with a broken arm”, well, if it doesn’t understand that Johnny is possibly my cat, or possibly my son, or possibly my great grandfather and does Johnny live nearby, or in my house, or on another continent, then it hasn’t the faintest idea of what to do. Or even whether I care. It’s only really through language understanding. I doubt that we’re going to be filling these things full of first order logic assertions that we will type into our AI system. So it’s only through language that it’s going to be able to acquire the knowledge that it needs to be a useful assistant to an individual or a corporation. So having that language capability will open up whole new areas for AI to be useful to individuals and also to take jobs from people. And I’m not able to predict what else we might be able to do when there are AI systems that understand language, but it has to have a huge impact.

Lucas Perry: Is there anything else that you guys would like to add in terms of where AI is at right now, where it’ll be in the near future and the benefits and risks it will pose?

Stuart Russell: I could point to a few things that are already happening. There’s a lot of discussion about the negative impacts on women and minorities from algorithms that inadvertently pick up on biases in society. So we saw the example of Amazon’s hiring algorithm that rejected any resume that had the word “woman’s” in it. And I think that’s serious, but I think the AI community we’re still not completely woke, and there’s a lot of consciousness raising that needs to happen. But I think technically that problem is manageable, and I think one interesting thing that’s occurring is that we’re starting to develop an understanding, not just of the machine learning algorithm, but of the socio-technical context in which that machine learning is embedded and modeling that social technical context allows you to predict whether the use of that algorithm will have negative feedback kinds of consequences, or it will be vulnerable to certain kinds of selection bias in the input data, and so on.

Deepfakes surveillance and manipulation, that’s another big area, and then something I’m very concerned about is the use of AI for autonomous weapons. This is another area where we fight against media stereotypes. So when the media talk about autonomous weapons, they invariably have a picture of a Terminator. Always. And I tell journalists, I’m not going to talk to you if you put a picture of a Terminator in the article. And they always say, well, I don’t have any control over that, that’s a different part of the newspaper, but it always happens anyway.

And the reason that’s a problem is because then everyone thinks, “Oh, well this is science fiction. We don’t have to worry about this because this is science fiction.” And you know, I’ve heard the Russian ambassador to the UN and Geneva say, well, why are we even discussing these things, because this is science fiction, it’s 20 or 30 years in the future? Oh, by the way, I have some of these weapons, if you’d like to buy them. The reality is that many militaries around the world are developing these, companies are selling them. There’s a Turkish arms company, STM, selling a device, which is basically the slaughterbot from the Slaughterbots movie. So it’s a small drone with onboard explosives and they advertise it as capable of tracking and autonomously attacking human beings based on video signatures and/or face recognition.

The Turkish government has announced that they’re going to be using those against the Kurds in Syria sometime this year. So we’ll see if it happens, but there’s no doubt that this is not science fiction, and it’s very real. And it’s going to create a new kind of weapon of mass destruction, because if it’s autonomous, it doesn’t need to be supervised. And if it doesn’t need to be supervised, then you can launch them by the million, and then you have something with the same effect as a nuclear weapon, but much cheaper, much easier to proliferate with much less collateral damage and all the rest of it.

Steven Pinker: I think in all of these discussions, it’s critical to not fall prey to a status-quo bias and compare the hypothetical problems of a future technology with an idealized present, ignoring the real problems with the present we take for granted. In the case of bias, we know that humans are horribly biased. It’s not just that we’re biased against particular genders and ethnic groups and sexual orientations. But inj general we make judgements that can easily be outperformed by even simple algorithms, like a linear regression formula. So we should remember that our benchmark in talking about the accuracies or inaccuracies of AI prediction algorithms has to be the human, and that’s often a pretty low bar. When it comes to bias, of course, a system that’s trained on a sample that’s unrepresentative is not a particularly intelligent system. And going back to the idea that we have to distinguish the goals we want to achieve from the intelligence that achieves them, if our goal is to overcome past inequities, then by definition we don’t want to make selections that simply replicate the statistical distribution of women and minorities in the past. Our goal is to rectify those inequities, and the problem in a system that replicates them is not that it’s not intelligent enough, but then we’ve given it the wrong goal.

When it comes to weapons, here too, we’ve got to compare the potential harm of intelligent weapons systems with the stupendous harm of dumb weapon systems. Aerial bombardment, artillery, automatic weapons, search-and- destroy missions, and tank battles have killed people by the millions. I think there’s been insufficient attention to how a battleground that used smarter weapons would compare to what we’ve tolerated for centuries simply because that’s what we have come to accept, though it’s being fantastically destructive. What ultimately we want to do is to make the use of any weapons less likely, and as I’ve written about, that has been the general trend in the last 75 years, fortunately.

Stuart Russell: Yeah, I think there is some truth in that. When I first got the email from Human Rights Watch, so they began a campaign, I think was back in 2013, to argue for a treaty banning autonomous weapons. Human Rights Watch came into existence because of the awful things that human soldiers do. And now they’re saying “No, no human soldiers are great, it’s the machines we need to worry about.” And I found that a little bit odd. To me, the argument about whether the weapons will inadvertently violate humans right in ways that human soldiers don’t, or sort of accidentally kill people in ways that we are getting better at avoiding, I don’t think that’s the issue. I think it’s specifically the weapon of mass destruction property that autonomous weapons have that for example machine guns don’t.

There’s a hundred million or more Kalashnikov rifles in private hands in the world. If all those weapons got up one morning by themselves and started shooting anyone they could see, that would be a big chunk of the human race gone, but they don’t do that. Each of them has to be carried by a person. And if you want to put a million of them into the field, you need another 10 million people to feed and train those million soldiers, and to transport them, and protect them, and all that stuff. And that’s why we haven’t seen very large scale death from all those hundred million Kalashnikovs.

Even carpet-bombing, which I think nowadays would be regarded as indiscriminate and therefore a violation of international law. And I think even during the Second World War, people argued that “No, you can’t go and bomb cities.” But once the Germans started to do it, then there was escalating rounds of retaliation and people lost all sense of what was a civilized and what was an uncivilized act of war. But even The Blitz against Great Britain, as far as I know killed only between 50 and 60,000 people, even though it hit dozens and dozens of cities. But literally one truckload of autonomous weapons can kill a million people.

An interesting fact about World War II is that for every person who died, between 1,000 and 10,000 bullets were fired. So just killing people with bullets on average in World War II cost you, let’s take a geometric mean 3,000 bullets, which is actually about a thousand dollars at current prices, but you could build a lethal autonomous weapon for a lot less than that. And even if they had a 25% success rate in finding and killing a human, it’s much cheaper than the bullet, let alone the guns and the aircraft and all the rest of it.

So as a way of killing very, very large numbers of people it’s incredibly cheap and incredibly effective. They can also be selective. So you can kill just the kind of people you want to get rid of. And it seems to me that we just don’t need another weapon of mass destruction with all of these extra characteristics. We’ve got rid of to some extent biological and chemical weapons. We’re trying to get rid of nuclear weapons, and introducing another one that’s arguably much worse seems to be a step in the wrong direction.

Steven Pinker: You asked also about the benefits of artificial intelligence, which I think could be stupendous. They include elimination of drudgery and the boring and dangerous jobs that no one really likes to do, like stocking shelves, making beds, mining coal, and picking fruit. There could be a bonanza in automating all the things that humans want done without human pain and labor and boredom and danger. It raises the problem of how we will support the people (if new jobs don’t materialize) who have nothing to do. But that’s a more minor economic problem to solve, compared to the spectacular advance we could have in eliminating human drudgery.

Also, there are a lot of jobs, such as the care of elderly people–lifting them onto toilets, reaching things from upper shelves–that, if automated, would allow more of them to live at home instead of being warehoused in nursing homes. Here, too, the potential for human flourishing is spectacular. And as I mentioned, many kinds of human judgment are so error-prone that they can already be replaced by simple algorithms, and better still if they were more intelligent algorithms. There’s the potential of much less waste, much less error, far fewer accidents. An obvious example is the million and a quarter people killed in traffic accidents each year that could be terrifically reduced if we had autonomous vehicles that were affordable and widespread.

Lucas Perry: A core of this is that all of the problems that humanity faces simply require intelligence to solve them, essentially. And if we’re able to solve the problem of how to make intelligent machines, then our problems will evermore and continuously become automateable by machine systems. So Stuart, do have you have anything else to add here in terms of existential hope and benefits to compliment what Steve just contributed before we pivot into existential risk?

Stuart Russell: Yeah, there is an argument going around, and I think Mark Zuckerberg said it pretty clearly, and Oren Etzioni and various other people have said basically the same thing. And it’s usually put this way, “If you’re against AI, then you’re against better medical decisions, or reducing medical errors, or safer cars,” and so on. And this is, I think, just a ridiculous argument. So first of all, people who are concerned about the risks of AI, are not against AI, right? That’s like arguing if you’re a nuclear engineer and you’re concerned about the possibility of a design flaw that would lead to a meltdown, you’re against electricity. No, you’re not against electricity. You’re just against millions of people dying for no reason, and you want to fix the problem. And the same argument I think is true about those who are concerned about the risk of AI. If AI didn’t have any benefits we wouldn’t be having this discussion at all. No one would be investing any money, no one would have put their lives and careers into working on the capabilities of AI, and the whole point would be moot.

So of course, AI will have benefits, but if you don’t address the risks, you won’t get the benefits, because the technology will be rejected, or we won’t even have a choice to reject it. And if you look at what happened with nuclear power, I think it’s really an object lesson. Nuclear power could and still can produce quite cheap electricity. So I have a house in France and most electricity in France comes from nuclear power, and it’s very cheap and very reliable. And it also doesn’t produce a lot of carbon dioxide, but because of Chernobyl, the nuclear industry has been literally decimated, by which I mean, reduced by a factor of 10, or more. And so we didn’t get the benefits, because we didn’t pay enough attention to the risks. The same holds with AI.

So the benefits of AI in the long run I would argue are pretty unlimited, and medical errors and safer cars, that’s all nice, but that’s a tiny, tiny footnote in what can be done. As Steve already mentioned, the elimination of drudgery and repetitive work. It’s easy for us intellectuals to talk about that. We’ve never really engaged in a whole lot of it, but for most of the human race, for most of recorded history, people with power and money have used everybody else as robots to get what they want. Whether we’ve been using them as military robots, or agricultural robots, or factory robots, we’ve been using people as robots.

And if you had gone back to the early hunter gatherer days and written some science fiction, and you said, “You know what, in the future, people will go into big square buildings, thousands of feet long with no windows and they’ll do the same thing a thousand times a day. And then they’ll go back the next day and do the same thing another thousand times. And they’re going to do that for thousands of days until they’re practically dead.” The audience, the readers of science fiction in 20,000 BC, would have said, “You’re completely nuts, that’s so unrealistic.” But that’s how we did it. And now we’re worried that it’s coming to an end, and it is coming to an end, because we finally have robots that can do the things that we’ve been using human robots to do.

And I’m not saying we should just get rid of those jobs, because jobs have all kinds of purposes in people’s lives. And I’m not a big fan of UBI, which says basically, “Okay, we give up. Humans are useless, so the machines will feed them and house them, entertain them, but that’s all they’re good for.”

Now the benefits to me… It’s hard to imagine, just like we could not imagine very well all the things we would use the internet for. I mean, I remember the Berkeley computer science faculty in the ’80s sitting around at lunch, we knew more about networking than almost anybody else, but we still had absolutely no idea. What was the point of being able to click on a link? What’s that about? We totally blew it.

And we don’t understand all the things that superhuman AI could do for us. I mean, Steve mentioned that we could do much better science, and I agree with that. In the book, I visualize it as taking various ideas, like, “travel as a service,” and extending that to “everything as a service.” So travel as a service is a good example. Like if you think about going to Australia 200 years ago, you’re talking about a billion dollar proposition, probably 10 years, thousands of people, 80% chance of death. Now I take out my cell phone, I go tap, tap, tap, and now I’m in Australia tomorrow. And it’s basically free compared to what it used to be. So that’s what I mean by, as a service, you want something, you just get it.

Superhuman AI could make everything as a service. So think about the things that are expensive and difficult or impossible now, like training a neurosurgeon, or building a railway to connect your rural village to a nearby city so that people can visit, or trade, or whatever. For most of the developing world these things are completely out of reach. The health budget of a lot of countries in Africa is less than $10 per person per year. So the entire health budget of a country would train one neurosurgeon in the US. So these things are out of reach, but if you take out the humans then these services can become effectively free. They become services like travel is today, and that would enable us to bring everyone on earth up to the kind of living standard that they might aspire to. And if we can figure out the resource constraints and so on that will be a wonderful thing.

Lucas Perry: Now that’s quite a beautiful picture of the future. There’s a lot of existential hope there. The other side to existential hope is existential risk. Now this is an interesting subject, which Steve and you, Stuart, I believe have disagreements about. So pivoting into this area, and Steve, you can go first here, do you believe that human beings, should we not go extinct in the meantime, will we build artificial superintelligence? And does that pose an existential risk to humanity?

Steven Pinker: Yeah, I’m on record as being skeptical of that scenario and dubious about the value of putting a lot of effort into worrying about it now. The concept of superintelligence is itself obscure. In a lot of the discussions you could replace the word “superintelligence” with “magic” or “miracle” and the sentence would read the same. You read about an AI system that could duplicate brains in silicon, or solve problems like war in the Middle East, or cure cancer.  It’s just imagining the possibility of a solution and assuming that the ability to bring it about will exist, without laying out what that intelligence would consist of, or what would count as a solution to the problem. 

So I find the concept of superintelligence itself a dubious extrapolation of an unextrapolable continuum, like human-to-animal, or not-so-bright human-to-smart-human. I don’t think there is a power called “intelligence” such that we can compare a squirrel or an octopus to a human and say, “Well, imagine even more of that.” 

I’m also skeptical about the existential risk scenarios. They tend to come in two varieties. One is based on the notion of a will to power: that as soon as you get an intelligent system, it will inevitably want to dominate and exploit. Often the analogy is that we humans have exploited and often extinguished animals because we’re smarter than them, so as soon as there is an artificial system that’s smarter than us, it’ll do to us what we did to the dodos. Or that technologically advanced civilizations, like European colonists and conquistadors subjugated and sometimes wiped out indigenous peoples, so that’s what an AI system might do to us. That’s one variety of this scenario.

I think that scenario confuses intelligence with dominance, based on the fact that in one species, Homo sapiens, they happen to come bundled together, because we came about through natural selection, a competitive process driven by relative success at capturing scarce resources and competing for mates, ultimately with the goal of relative reproductive success. But there’s no reason that a system that is designed to pursue a goal would have as its goal, domination. This goes back to our earlier discussion that the ability to achieve a goal is distinct from what the goal is.

It just so happens that in products of natural selection, the goal was winning in reproductive competition. For an artifact we design, there’s just no reason that would be true. This is sometimes called the orthogonality thesis in discussions of existential risk, although that’s just a fancy-schmancy way of referring to Hume’s distinction between our goals and our intelligence.

Now I know that there is an argument that says, “Wouldn’t any intelligence system have to maximize its own survivability, because if it’s given the goal of X, well, you can’t achieve X if you don’t exist, therefore, as a subgoal to achieving X, you’ve got to maximize your own survival at all costs.” I think that’s fallacious. It’s certainly not true that all complex systems have to work toward their own perpetuation. My iPhone doesn’t take any steps to resist my dropping it into a toilet, or letting it run out of power.

You could imagine if it could be programmed like a child to whine, and to cry, and to refuse to do what it’s told to do as its power level went down. We wouldn’t buy one. And we know in the natural world, there are plenty of living systems that sacrifice their own existence for other goals. When a bee stings you, its barbed stinger is dislodged when the bee escapes, killing the bee, but because the bee is programmed to maximize the survivability of the colony, not itself, it willingly sacrifices itself. So it is not true that by definition an intelligent system has to maximize its own power or survivability.

But the more common existential threat scenario is not a will to power but collateral damage. That if an AI system is given a single goal, what if it relentlessly pursues it without consideration of side effects, including harm to us? There are famous examples that I originally thought were spoofs, but were intended seriously, like giving an AI system the goal of making as many paperclips as possible, and so it converts all available matter into paperclips, including our own bodies (putting aside the fact that we don’t need more efficient paperclip manufacturing than what we already have, and that human bodies are a pretty crummy source of iron for paperclips).

Barely more plausible is the idea that we might give an AI system the goal of curing cancer, and so it will  conscript us as involuntary guinea pigs and induce tumors in all of us, or that we might give it the goal of regulating the level of water behind a dam and it might flood a town because it was never given the goal of not drowning a village. 

The problem with these scenarios is that they’re self-refuting. They assume that an “intelligent” artifact would be designed to implement a single goal, which is not true of even the stupid artifacts that we live with. When we design a car, we don’t just give the goal of going from A to B as fast as possible; we also install brakes and a steering wheel and a muffler and a catalytic converter. A lot of these scenarios seem to presuppose both idiocy on the part of the designers, who would give a system control over the infrastructure of the entire planet without testing it first to see how it worked, and an idiocy on the part of the allegedly intelligent system, which would pursue a single goal regardless of all the other effects. This does not exist in any human artifact, let alone one that claims to be intelligent. Giving an AI system one vaguely worded, sketchy goal, and empowering it with control over the entire infrastructure of the planet without testing it first seems to me just so self-evidently moronic that I don’t worry that engineers have to be warned against it.

I’ve quoted Stuart himself, who in an interview made the point well when he said, “No one talks about building bridges that don’t fall down. They just call it building bridges.” Likewise, AI that avoids idiocies like that is just AI, it’s not AI with extra safeguards. That’s what intelligence consists of.

Let me make one other comment. You could say, well, even if the odds are small, the damage would be so catastrophic that it is worth our concern. But there are downsides to worrying about existential risk. One of them is the possible stigmatization and abandonment of helpful technologies. Stuart mentioned the example of nuclear power. What’s catastrophic is that we don’t roll out nuclear power the way that France did, which would go a long way toward solving the genuinely dangerous problem of climate change. Fear of nuclear power has been irrationally stoked by vivid examples:the fairly trivial accident at Three Mile Island in the United States, which killed no one, the tsunami at Fukushima, where people died in the botched evacuation, not the nuclear accident, and the Soviet bungling at Chernobyl. Even that accident killed a fraction of the people that die every day from the burning of fossil fuels, to say nothing of the likely future harm from climate change. The reaction to Chernobyl is exactly how we should not deal with the dangers facing humanity. 

Genetically modified organisms are another example: a technology overregulated or outlawed out of worst-case fears, depriving us of the spectacular benefits of greater ecological sustainability, human nutrition, and less use of water and pesticides. 

There are other downsides of fretting about exotic hypothetical existential risks. There is a line of reasoning in the existential risk community and the so called Rationality community that goes something like this: since the harm of extinguishing the species is basically infinite, probabilities no longer matter, because by expected utility calculations, if you multiply the tiny risk by the very large number of the potential descendants of humans before the sun expands and kills us off (or in wilder scenarios,  the astronomically larger number of immortal consciousnesses that will exist when we can upload our connectomes to the cloud, or when we colonize and multiply in other solar systems)—well, then even an eensy, eensy, infinitesimal probability of extinction would be catastrophic, and we should worry about it now.

The problem is that that argument could apply to any scenario with a nonzero probability, which means any scenario that is not logically impossible. Should we take steps to prevent the evolution of toxic killer gerbils that will nibble everyone to death? If I say, “That’s preposterous,” you can say, “Well, even if the probability is very, very small, since the harm of extinction is so great, we must devote some brain power to that scenario.”

I do fear the moral hazard of human intellect being absorbed in this free-for-all: that any risk, if you imagine it’s potentially existential, could justify any amount of expenditure, according to this expected utility calculation. The hazard is that smart people, clever enough to grasp a danger that common sense would never conceive of, will be absorbed into what might be a fruitless pursuit, compared to areas where we urgently do need application of human brain power–in climate, in the prevention of nuclear war, in the prevention of pandemics. Those are real risks, which no one denies, and we haven’t solved any of them, together with other massive sources of human misery like Alzheimer’s disease. Given these needs, I wonder whether the infinitesimal-probability-times-infinite-harm is the right way of allocating our intellectual capital.

Lucas Perry: Stuart, you want to react to those points.

Stuart Russell: Yeah, there’s a lot there to react to and I’m tempted to start at the end and work back and just ask, well, if we were spending hundreds of billions of dollars a year to breed billions of toxic killer gerbils, wouldn’t you ask people if that was a good idea before dismissing any reason to be concerned about it? If that’s what we were actually investing in creating. I don’t buy the analogy between AI and toxic killer gerbils in any shape or form. But I will go back to the beginning, and we began by talking about feasibility. And Steve argues, I think, primarily, that it’s not even meaningful, that we could create superhuman levels of intelligence, that there isn’t a single continuum.

And yes, there isn’t a single continuum, but there doesn’t have to be a single continuum. When people say one person is more intelligent than another, or one species is more intelligent than other, it’s not a scientific statement that there is a single scalar on which species one exceeds species two. They’re talking in broad brush. So when we say humans are more intelligent than chimpanzees, that’s probably a reasonable thing to say, but there are clearly dimensions of intelligence where actually chimpanzees are more intelligent than humans. For example, short term memory. A chimpanzee, once they get what a digit is, they can learn 20 digit telephone numbers at the drop of a hat, and humans can’t do that. Clearly there’s dimensions on which chimpanzee intelligence, on average, is probably better than human. But nonetheless, when you look at which species would you rather be right now the chimpanzees don’t have much of a chance against the humans.

I think that there is a meaningful motion of generality of intelligence, and one way to think about it is to take a decision making scenario where we already understand how to produce very effective decisions, and then ask, how is that decision scenario restricted, and what happens when we relax the restrictions and figure out how to maintain the same, let’s say, superhuman quality of decision making? So if you look at Go play, it’s clear that the humans have been left far behind. So it’s not unreasonable to ask, just as the machines wiped the floor with humans on the Go board, and the chess board, and now on the StarCraft board, and lots of other boards, could you take that and transfer that into the real world where we make decisions of all kinds? The difference between the Go board and the real world is pretty dramatic. And that’s why we’ve had lots of success on the Go board and not so much in the real world.

The first thing is that the Go board is fully observable. You can see the entire state of the world that matters. And of course in the real world there’s lots of stuff you don’t see and don’t know. Some of it you can infer by accumulating information over time, what we call state estimation, but that turns out to be quite a difficult problem. Another thing is that we know all the rules of Go, and of course in the real world, you don’t know all the rules, you have to learn a lot as you go along. Another thing about the Go board is that despite the fact that we think of it as really complicated, it’s incredibly simple compared to the real world. At any given time on the Go board there’s a couple of hundred legal moves, and the game lasts for a couple hundred moves.

And if you said, well, what are the analogous primitive actions in the real world for a human being? Well, we have 600 muscles and we can actuate them maybe about 10 times per second each. Your brain probably isn’t able to do that, but physically that’s what could be your action space. And so you actually have then a far greater action space. And you’re also talking about… We often make plans that last for many years, which is literally trillions of primitive actions in terms of muscle actuations. Now we don’t plan those all out in detail, but we function on those kinds of timescales. Those are some of the ways that Go and the real world differ. And what we do in AI is we don’t say, okay, I’ve done Go, now I’m going to work on suicide Go, and now I’m going to work on chess with three queens.

What we try to do is extract the general lessons. Okay, we now understand fairly well how to handle that whole class of problems. Can we relax the assumptions, these basic qualitative assumptions about the nature of the problem? And if you relax all the ones that I listed, and probably a couple more that I’ve got, you’re getting towards systems that can function at a superhuman level in the real world, assuming that you figure out how to deal with all those issues. So just as we find ourself flummoxed by the moves that the AI system makes on the Go board, if you’re a General, and you’re up against an AI system that’s controlling, or coming up with the decision making plans for the other side, you might find yourself flummoxed, that everything you try, the machine has already anticipated and put in place something that will prevent your plan from succeeding. The pace of warfare will be beyond anything humans have ever contemplated, right?

So they won’t even have time to think, just as the Iraqis were not used to the rate of decision making of the US Army in the first Gulf War, and they couldn’t do anything. They were literally paralyzed, and just step by step by step the allied forces were able to take them apart because they couldn’t respond within the timescales that the allied forces were operating.

So it will be kind of like that if you were a human general. If you were a human CEO and your competitor company is organized and run by AI systems, you’d be in the same kind of situation. So it’s entirely conceivable. I’m not necessarily saying plausible, but conceivable that we can create real world decision making capabilities that exceed those of humans across the board. So that this notion of generality, I think it is something that still needs to be worked out. Most definitions of generality that people come up with end up saying, “Well, humans are general because they can do all the things that humans can do,” which is sort of a tautology. But nonetheless, it’s interesting that when you think about all the jobs: doctor, carpenter, advertising, sales, representative, most normally functioning people could do most of those jobs at least to some reasonable level.

So we are incredibly flexible compared to current AI systems. There is progress on achieving generality, but there’s a long way to go. I’m certainly not one of those who says that superintelligent AI is imminent and that’s why we need to worry. And in fact, I’m probably more conservative. If you want to appeal to what most expert AI people think, most expert AI people think that we will have something that’s reasonably described as superintelligent AI sooner than I do.

So most people think sometime in the middle of the century. It turns out that Asian AI researchers particularly in China are more optimistic, so they think 20 years. People in the US and Europe may be more like 40 years. I would be reasonably confident saying by the end of this century.

I think Nick Bostrom is in about the same place. He’s also more conservative than the average expert AI researcher. There are major breakthroughs that have to happen, but the massive investment that’s taking place, the influx of incredibly smart people into the field, these things suggest that those breakthroughs will probably take place but the timescale is very hard to say.

And when we think about the risks, I would say Steve is really putting up one straw man after another and then knocking it down. So for example, the paperclip argument is not a scenario that Nick Bostrom thinks is one of the more likely ways for the human race to end. It’s a philosophical thought experiment intended to illustrate a point. And the point is incontrovertible and I don’t think Steve disagrees with it.

So let’s not use the word intelligent because I think Steve here is using the word intelligent to mean always behaves in whatever way we think we wish that it would behave well.

Of course, if you define intelligence that way, then there isn’t an issue. The question is, how do we create any such thing? And the ways we have right now of creating any such thing fall under the standard model, which I described earlier that we set up, let’s call it a superoptimizer and then we give it an objective. And then off it goes. And he’s (Bostrom) describing what happens when you give a superoptimizer the wrong goal. And he’s not saying, “Yes, of course we should give it wrong goals.”

And he’s using this to illustrate what happens when you give it even what seems to be innocuous. So he’s trying to convey the idea that we are not very good at judging the consequences of seemingly innocuous goals. My example of curing cancer: “Curing cancer? Yeah, of course, that’s a good goal to give to an AI system” — but the point is, if that’s the only goal you give to the AI system, then all these weird things happen because that’s the nature of super optimizers. That’s the nature of the standard model of AI.

And this is, I think, the main point being made is not that no matter what we do AI going to get us. It’s that given our current understanding and given hundreds of billions of dollars are being invested into that current understanding then there is a failure mode and it’s reasonable to point that out just as if you’re a nuclear engineer and you say, “Look, everyone is designing these reactors in this way. All of you are doing this. And look there’s this failure mode.” That’s a reasonable thing to point out.

Steven Pinker: Several reactions. First, while money is pouring into AI, it’s not pouring it into super-optimizers tasked with curing cancer and with the power to kidnap people. And the analogies of humans outcompeting chimpanzees, or American generals outsmarting their Iraqi counterparts, once again assume that systems that are smarter than us will therefore be in competition with us. As for straw men, I was mindful to avoid them: the AI system that would give people tumors to pursue the goal of curing cancer was taken from Stuart’s book.

I agree that a super-optimizer that was given a single goal would be menace. But a super-optimizer that pursued a single goal is self-evidently unintelligent, not superintelligent! 

Stuart Russell: Of course, we have multiple goals. There’s a whole field of multi-attribute utility theory that’s been going now for more than 50 years. Of course, we understand that. When we look at even the design of the algorithms that Uber uses to get you to the airport, they take into account multiple goals.

But the point is the same argument applies if you operate in a standard model when you add in the multiple goals. Unless you’re able to be sure that you have completely and correctly captured all the things we care about under all conceivable and the inconceivable,because I think one of the things about superintelligent AI systems they will come up with, by human standards, inconceivable forms of actions.

We cannot guarantee that. And this is the point. So you could say multiple goals, but multiple goals are just a single goal. They add up to the ability to rank futures. And the question is: is that ability to rank futures fully aligned with what humans want their futures to be like? And the answer is inevitably, no. We are inevitably going to leave things out.

So even if you have a thousand terms in the objective function, there’s probably another million that you ought to have included that you didn’t think about because it never occurred to you.

So for example, you can go out and find lists of important things that human beings care about. This is sort of the whole-values community, human-development community, Maslow hierarchy, all of those things. People do make whole lists of things trying to build up a picture of very roughly what is the human utility function after all.

But invariably, those lists just refer to things that are usually a subject of discussion among humans about “Do we spend money on schools or hospitals?”or whatever it might be. On that list, you will not find the color of the sky because no one, no humans right now are thinking about, “Oh, should we change the sky to be orange with pink stripes?” But if someone did change the color of the sky, I can bet you a lot of people would be really upset about it.

And so invariably we fail to include many, many criteria in whatever list of objectives you might come up with. And when you do that, what happens is that the optimizer will take advantage of those dimensions of freedom and typically, and actually under fairly general algebraic conditions, will set them to extreme values because that gives you better optimization on the things that are in the list of goals.

So the argument is that within the standard model, which I bear some responsibility for, because it’s the way we wrote the first three editions of the textbook, within the standard model, further progress on AI could lead to increasing problems of control and it’s not because there’s any will to dominance.

I don’t know of any serious thinkers in the X-risk community who think that that’s the problem. That’s another straw man.

Steven Pinker: When you’re finished, I do have some responses to that.

Stuart Russell: The argument is not that we automatically build in because we all want our systems to be alpha males or anything like that. And I think Steve Omohundro has put it fairly clearly in some of his earlier papers that the behavior of a superoptimizer given any finite list of goals is going to include efforts to maximize its computational resources and other resources that will help it achieve the objectives that we do specify.

And you could put in something saying, “Well, and don’t spend any money.” Or, “Don’t do this and don’t do that and don’t do the other.” But the same structure of the argument is going to apply. We can reduce the risk by adding more and more stuff into the explicit objectives, but I think the argument I’m making in the book is that that’s just a completely broken way to design AI systems.

The meta argument is that if we don’t talk about the failure modes, we won’t be able to address them. So actually I think that Steve and I don’t disagree about the plausible future evolution. I don’t think it’s particularly plausible. If I was going into forecast mode, so just betting on the future saying, “What’s the probability that this thing will happen or that thing will happen?” I don’t think it’s particularly plausible that we will be destroyed by superintelligent AI.

And there are several reasons why I don’t think that’s going to occur because we would probably get some early warnings of it. And if we couldn’t figure out how to prevent it, we would probably put very strong restrictions on further development or we would figure out how to actually make it provably safe and beneficial.

But you can’t have that discussion unless you talk about the failure modes. Just like in nuclear safety, it’s not against the rules to raise possible failure modes like what if this molten sodium that you’re proposing should flow around all these pipes? What if it ever came into contact with the water that’s on the turbine side of the system? Wouldn’t you have a massive explosion which could rip off the containment and so on? That’s not exactly what happened in Chernobyl, but not so dissimilar.

And of course that’s what they do. So this culture of safety that Steve talks about consist exactly of this. People saying, “Look, if you design things that way these terrible things are going to happen. So don’t design things that way, design things this way.” And this is a process that we are going through in the AI community right now.

And I have to say, I just actually was reading a letter from one of my very senior colleagues, former president of AAAI, who said, “Five years ago, everyone thought Stuart was nuts, but Stuart was right. These risks have to be taken seriously and we all owe him a great debt for bringing it within the AI community so that we can start to address it.”

And I don’t think I invented these risks. And I was just in chance position that I had two years of sabbatical to think about the future of the field and to read some of the things that others had already written about the field from the outside.

My sense is that Steve and I are kind of the glass half full glass half empty. In terms of our forecast, we think on the whole, the weather tomorrow is likely to be sunny. I think we disagree on how to make sure that it’s sunny. I really do think that the problem of creating a provably beneficial AI, by which I mean that no matter how powerful the AI system is, we remain in power. We have power over it forever, that we never lose control.

That’s a big ask and the idea that we could solve that problem without even mentioning it, without even talking about it and without even pointing out why it’s difficult and why it’s important, that’s not the culture of safety. That’s sort of more like the culture of the communist party committee in Chernobyl, that simply continued to assert that nothing bad was happening.

Steven Pinker: Obviously, I’m in favor of the safety mindset of engineering, that is, you test the system before you implement it, you try to anticipate the failure modes. And perhaps I have overestimated the common sense of the AI community and they have to be warned about the absurdity of building a superoptimizer.  But a lot of these examples–flooding a town to control the water level, or curing cancer by turning humans into involuntary Guinea pigs, or maximizing happiness by injecting everyone with a drip of antidepressants–strike me as so far from reasonable failure modes that they’re not part of the ordinary engineering effort to ensure safety–particularly when they are coupled with the term “existential.”

These are not ordinary engineering discussions of ways in which a system could fail; they are speculations on how the human species might end. That is very different from not plugging in an AI system until you’ve tested it to find out how it fails. And perhaps we agree that the superoptimizers in these thought experiments are so unintelligent that no one will actually empower them.

Stuart Russell: But Steve, I wasn’t saying we give it one goal. I’m saying however many goals we give it, that’s equivalent to giving it a ranking over futures. So the idea of single goal versus multiple is a complete red herring.

Steven Pinker: But the scare stories all involve systems that are given a single goal. As you go down the tail of possible risks, you’re getting into potentially infinitesimal risks. There is no system, conceivable or existing, that will have zero risk of every possibility. 

Stuart Russell: If we could do that, if we had some serious theory by which we could say, “Okay. We’ve got within epsilon of the true human ranking over futures,” I think that’s very hard to do. We literally do not have a clue how to do that. And the purpose of these examples is actually to dismiss the idea that this has a simple solution.

So people want to dismiss the idea of risk by saying, “Oh, we’ll just give the AI system such and such objective.” And then the failure mode goes away and everything’s cool. And then people say, “Oh, but no.” Look, if you give it the objective that everyone should be happy, then here’s a solution that the AI system could find that clearly we wouldn’t want.

Those processes lead actually to deeper questioning, what do we really mean by happy? We don’t just mean pleasure as measured by the pleasure center in the brain. And the same arguments happen in moral philosophy.

So no one is accusing G.E. Moore of being a naive idiot because he objected to a pleasure maximization definition of what is a good moral decision to make. He was making an important philosophical point and I don’t think we should dismiss that same point when it’s made in the context of designing objectives for AI systems.

Steven Pinker: Yes, that’s an excellent argument against building a universally empowered AI system that’s given the single goal of maximizing human happiness–your example. Do AI researchers need to be warned against that absurd project? It seems to me that that’s the straw man, and so are the other scenarios that are designed to sow worry, such as conscripting the entire human race as involuntary guinea pigs in cancer experiments. Even if there isn’t an epsilon that we can’t go below in laying out possible risks, it doesn’t strike me that that’s within the epsilon.

These strike me more as exercises of human imagination. Assuming a ridiculously simple system that’s given one goal, what could go wrong? Well, yeah, stuff could go wrong, but is that really what’s going to face us when it comes to actual AI systems that have some hope of being implemented?

Naturally, we ought to test the living daylights out of any system before we give it control over anything. That’s Stuart’s point about building bridges that don’t fall down and the standard safety ethic in engineering. But I’m not sure that exotic scenarios based on incredibly stupid ideas for AI systems like giving one the goal of maximizing human happiness is the route that gives us safe AI.

Stuart Russell: Okay. So let me say once again that the one goal versus multiple goals is a red herring. If you think it’s so easy to specify the goal correctly, perhaps your next paper will write it out. Then we’ll say, “Okay, that’s not a straw man. This is Steve Pinker’s suggestion of what the objective should be for the superintelligent AI system.” And then the people who love doing these things, probably Nick Bostrom and others will find ways of failing.

So the idea that we could just test before deploying something that is significantly more powerful than human beings or even the human race combined, that’s a pretty optimistic idea. We’re not even able to test ordinary software systems right now. So test generation is one of the effective methods used in software engineering, but it has many, many known failure cases for real world examples, including multiplication.

Intel’s Pentium chip was tested with billions of examples of multiplication, but it failed to uncover a bug in the multiplication circuitry, which caused it to produce incorrect results in some cases. And so we have a technology of formal verification, which would have uncovered that error, but particularly in the US there’s a culture that’s somewhat opposed to using formal verification in software design.

Less so in hardware design nowadays, partly because of the Pentium error, but still in software, formal verification is considered very difficult and very European and not something we do. And this is far harder than that because software verification typically is thinking only about correctness of the software in an internal sense, that what happens inside the algorithm between the inputs and outputs meet some specifications.

What we want here is that the combination of the algorithm and the world evolves in ways that we are certain to be pleased about. And that’s a much harder kind of thing. Control theory has that view of what they mean by verification. And they’re able to do very simple linear quadratic regulators and a few other examples. And beyond that, they get stuck. And so I actually think that the testing is probably neither (not very) feasible. I mean, not saying we shouldn’t do it, but it’s going to be extremely hard to get any kind of confidence from testing, because you’re really asking, can you simulate the entire world and all the ways a system could use the world to bring about the objective.

However complicated and however multifaceted that objective is, it’s probably going to be the wrong one. So I’ve proposed a, not completely different, but a generalized form of AI that knows that it doesn’t know what the real objectives are. It knows it doesn’t know how humans rank possible futures and that changes the way it behaves, but that also has failure modes.

One of them being the plasticity of human preference rankings over the future and how do you prevent the AI system from taking advantage of that plasticity? You can’t prevent it completely because anything it does is going to have some effect on human preferences. But the question is what constitutes reasonable modifications of human preferences and what constitutes unreasonable ones? We don’t know the answer to that. So there are many, many really difficult research problems that we have to overcome for the research agenda that I’m proposing to have a chance of success.

I’m not that optimistic that this is an easy or a straightforward problem to solve and I think we can only solve it if we go outside the conceptual framework that AI has worked in for the last 70 years.

Steven Pinker: Well, yes. Certainly, if the conceptual framework for AI is optimizing some single or small list of generic goals, like a ranking over possible futures, and it is empowered to pursue them by any means, as opposed to building tools that solve specific problems. But note that you’ve also given arguments why the fantasies of superintelligence are unlikely to come about–the near-miraculous powers to outsmart us, to augment its own intelligence, to defeat all of our attempts to control it. In the scenarios, these all work flawlessly–yet  the complexities that make it hard to predict all conceivable failures also make it hard to achieve superintelligence in the first place.

Namely, we can’t take into account the fantastically chaotic and unpredictable reactions of humans. And we can’t program a system that has complete knowledge of the physical universe without allowing it to do experiments and acquire empirical knowledge, at a rate determined by the physical world. Exactly the infirmities that prevent us from exploring the entire space of behavior of one of these systems in advance is the reason that it’s not going to be superintelligent in the way that these scenarios outline.

And that’s a reason not to empower any generic goal-driven system that aspires toward “superintelligence” or that we might think of as “superintelligent”–it  is unlikely to exist, and likely to display various forms of error and stupidity.

Stuart Russell: I would agree that some of the concerns that you might see in the X-risk community are, say, nonphysical. So the idea that a system could predict the next hundred years and your entire life in such detail that a hundred years ago, it knew what you were going to be saying at a particular millisecond in a hundred years in the future, this is obviously complete nonsense.

I don’t think we need to be too concerned about that as a serious question. Whether it’s a thought experiment that sheds light on fundamental questions in decision theory, like the Newcomb problems is another issue that we don’t have to get into. But we can’t solve the problem by saying, “Well, superintelligence of the kind that could lead to significant global consequences could not possibly exist.”

And actually I kind of like Danny Hillis’ argument, which says that actually, no, it already does exist and it already has, and is having significant global consequences. And his example is to view, let’s say the fossil fuel industry as if it were an AI system. I think this is an interesting line of thought, because what he’s saying basically and — other people have said similar things — is that you should think of a corporation as if it’s an algorithm and it’s maximizing a poorly designed objective, which you might say is some discounted stream of quarterly profits or whatever. And it really is doing it in a way that’s oblivious to lots of other concerns of the human race. And it has outwitted the rest of the human race.

So we might all think, well, of course, we know that what it’s trying to do is wrong and of course we all know the right answer, but in fact we’ve lost and we should have pointed out a hundred years ago that there is this risk and it needs to be taken seriously.

And it was. People did point it out a hundred years ago, but no one took them seriously. And this is what happened. So I think we have actually a fairly good example that this type of thing, the optimization of objectives, ignoring externalities as the economists would point out, by superintelligent entities. And in some sense, the fossil fuel industry outwitted us because whatever organizational structures allow large groups of humans to generate effective complex behaviors in the real world and develop complex plans, it operates in some ways like a superintelligent entity, just like we were able to put a person on the moon because of the combined effect of many human intellects working together.

But each of those humans in the fossil fuel industry is a piece of an algorithm if you like and their own individual preferences about the future don’t count for much and in fact, they get molded by their role within the corporation.

I think in some ways you already have an existence proof that the concern is real.

Steven Pinker: A simpler explanation is that people like energy, fossil fuels are the most convenient source, and no one has had to pay for the external damage they do. Clearly we ought to anticipate foreseeable risks and attempt to mitigate them. But they have to be calibrated against what we know, taking into account our own ignorance of the future. It can be hazardous to chase the wrong worries, such as running out of petroleum, which was the big worry in the 1970s. Now we know that the problem with petroleum is too much, not too little. Overpopulation and genetically modified organisms are other examples. 

If we try to fantasize too far into the future, beyond what we can reasonably predict, we can sow fear about the wrong risks. My concern about all these centers and smart people worrying about the existential risk of AI is that we are misallocating our worry budget and our intellectual resources. We should be thinking hard about how to mitigate climate change, which is a real problem. That is less true of spinning exotic scenarios about hypothetical AI systems which have been given control over the physical universe and might enslave us in cancer experiments.

Lucas Perry: All right. So wrapping up here, do you guys have final statements that you’d like to say, just if you felt like what you just said didn’t fully capture what you want to end on on this issue of AI existential risk?

Steven Pinker: Despite our disagreements, most about my assessment of AI agrees with Stuart’s. I personally don’t think that the adjective existential is helpful in ordinary concerns over safety, which we ought to have. I think there are tremendous potential benefits to AI, and that we ought to seek at the same time as we anticipate the reasonable risks and take every effort to mitigate them.

Stuart Russell: Yep. I mean, it’s hard to disagree that we should focus on the reasonable risks. The question is whether you think that the hundreds of billions of dollars that are being invested into AI research will produce systems that can have potentially global consequences.

And to me it seems self evident that it can and we can look at even simple machine algorithms like the content selection algorithms in social media because those algorithms interact with humans for hours every day and dictate what literally billions of people see and read every day. They are having substantial global impact already.

And they are very, very simple. They don’t know that human beings exist at all, but they still learn to manipulate our brains to optimize the objective. I had a very interesting little Facebook exchange with Yann LeCun. And at some point in the argument, Yann said something quite similar to something Steve said earlier. He said, “There’s really no risk. You’d have to be extremely stupid to put an incorrect objective into a powerful system and then deploy it on a global scale.”

And I said, “Well, you mean like optimizing click-through?” And he said, “Facebook stopped using click-through years ago.” And I said, “Well, why was that?” And he said, “Oh, because it was the incorrect objective.”

So you did put an incorrect objective into a powerful system, deployed on a global scale. Now what does that say about Facebook? So I think just as you might have said — and in fact the nuclear industry did say — “It’s perfectly safe. Nothing can go wrong. We’re the experts. We understand safety. We understand everything.”

Nonetheless, we had Chernobyl, we had Fukushima. And actually, I think there’s an argument to be made that despite the massive environmental cost of foregoing nuclear power, that countries like Germany, Italy, Spain and probably a bunch of others are in the process of actually deciding that we need to phase out nuclear power because even though theoretically, it’s possible to develop and operate completely safe nuclear power systems, it’s beyond our capabilities and the evidence is there.

You might have argued that while Russia is corrupt, it’s technology was not as great as it should have been, they cut lots of corners, but you can’t argue that the Japanese nuclear industry was unsophisticated or unconcerned with safety, but they still failed. And so I think voters in those countries who said, “We don’t want nuclear power because we just don’t want to be in that situation even if we have the engineers making our best efforts.”

These kinds of considerations suggest that we do need to pay very careful attention. I’m not saying we should stop working on climate change, but when we invented synthetic biology, we said okay, we’d better think about how do we prevent the creation of disease or new disease organisms that could produce pandemics. And we took steps. People spent a lot of time thinking about safety mechanisms for those devices. We have to do the same thing for AI.

Lucas Perry: All right. Stuart and Steven, thanks so much. I’ve learned a ton of stuff today. If listeners want to follow you or look into your work, where’s the best place to do that? I’ll start with you, Steven.

Steven Pinker: Stevenpinker.com, which has pages for ten books, including the most recent, Enlightenment Now. SAPinker on Twitter. 

Lucas Perry: And Stuart.

Stuart Russell: So you can Google me. I don’t really have a website or social media activity, but the book Human Compatible, which was published last October by Viking in the US and Penguin in the UK and it’s being translated into lots of languages, I think that captures my views pretty well.

Lucas Perry: All right. Thanks so much for coming on. And yeah, it was a pleasure speaking.

Steven Pinker: Thanks very much, Lucas for hosting it. Thank you Stuart for the dialogue.

Stuart Russell:It was great fun, Steve. I look forward to doing it again.

Steven Pinker: Me too.

End of recorded material

Sam Harris on Global Priorities, Existential Risk, and What Matters Most

Human civilization increasingly has the potential both to improve the lives of everyone and to completely destroy everything. The proliferation of emerging technologies calls our attention to this never-before-seen power — and the need to cultivate the wisdom with which to steer it towards beneficial outcomes. If we’re serious both as individuals and as a species about improving the world, it’s crucial that we converge around the reality of our situation and what matters most. What are the most important problems in the world today and why? In this episode of the Future of Life Institute Podcast, Sam Harris joins us to discuss some of these global priorities, the ethics surrounding them, and what we can do to address them.

Topics discussed in this episode include:

  • The problem of communication 
  • Global priorities 
  • Existential risk 
  • Animal suffering in both wild animals and factory farmed animals 
  • Global poverty 
  • Artificial general intelligence risk and AI alignment 
  • Ethics
  • Sam’s book, The Moral Landscape

You can take a survey about the podcast here

Submit a nominee for the Future of Life Award here

 

Timestamps: 

0:00 Intro

3:52 What are the most important problems in the world?

13:14 Global priorities: existential risk

20:15 Why global catastrophic risks are more likely than existential risks

25:09 Longtermist philosophy

31:36 Making existential and global catastrophic risk more emotionally salient

34:41 How analyzing the self makes longtermism more attractive

40:28 Global priorities & effective altruism: animal suffering and global poverty

56:03 Is machine suffering the next global moral catastrophe?

59:36 AI alignment and artificial general intelligence/superintelligence risk

01:11:25 Expanding our moral circle of compassion

01:13:00 The Moral Landscape, consciousness, and moral realism

01:30:14 Can bliss and wellbeing be mathematically defined?

01:31:03 Where to follow Sam and concluding thoughts

 

You can follow Sam here: 

samharris.org

Twitter: @SamHarrisOrg

 

This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.

All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. Today we have a conversation with Sam Harris where we get into issues related to global priorities, effective altruism, and existential risk. In particular, this podcast covers the critical importance of improving our ability to communicate and converge on the truth, animal suffering in both wild animals and factory farmed animals, global poverty, artificial general intelligence risk and AI alignment, as well as ethics and some thoughts on Sam’s book, The Moral Landscape. 

If you find this podcast valuable, you can subscribe or follow us on your preferred listening platform, like on Apple Podcasts, Spotify, Soundcloud, or whatever your preferred podcasting app is. You can also support us by leaving a review. 

Before we get into it, I would like to echo two announcements from previous podcasts. If you’ve been tuned into the FLI Podcast recently you can skip ahead just a bit. The first is that there is an ongoing survey for this podcast where you can give me feedback and voice your opinion about content. This goes a super long way for helping me to make the podcast valuable for everyone. You can find a link for the survey about this podcast in the description of wherever you might be listening. 

The second announcement is that at the Future of Life Institute we are in the midst of our search for the 2020 winner of the Future of Life Award. The Future of Life Award is a $50,000 prize that we give out to an individual who, without having received much recognition at the time of their actions, has helped to make today dramatically better than it may have been otherwise. The first two recipients of the Future of Life Award were Vasili Arkhipov and Stanislav Petrov, two heroes of the nuclear age. Both took actions at great personal risk to possibly prevent an all-out nuclear war. The third recipient was Dr. Matthew Meselson, who spearheaded the international ban on bioweapons. Right now, we’re not sure who to give the 2020 Future of Life Award to. That’s where you come in. If you know of an unsung hero who has helped to avoid global catastrophic disaster, or who has done incredible work to ensure a beneficial future of life, please head over to the Future of Life Award page and submit a candidate for consideration. The link for that page is on the page for this podcast or in the description of wherever you might be listening. If your candidate is chosen, you will receive $3,000 as a token of our appreciation. We’re also incentivizing the search via MIT’s successful red balloon strategy, where the first to nominate the winner gets $3,000 as mentioned, but there are also tiered pay outs where the first to invite the nomination winner gets $1,500, whoever first invited them gets $750, whoever first invited the previous person gets $375, and so on. You can find details about that on the Future of Life Award page. 

Sam Harris has a PhD in neuroscience from UCLA and is the author of five New York Times best sellers. His books include The End of Faith, Letter to a Christian Nation, The Moral Landscape, Free Will, Lying, Waking Up, and Islam and the Future of Tolerance (with Maajid Nawaz). Sam hosts the Making Sense Podcast and is also the creator of the Waking Up App, which is for anyone who wants to learn to meditate in a modern, scientific context. Sam has practiced meditation for more than 30 years and studied with many Tibetan, Indian, Burmese, and Western meditation teachers, both in the United States and abroad.

And with that, here’s my conversation with Sam Harris.

Starting off here, trying to get a perspective on what matters most in the world and global priorities or crucial areas for consideration, what do you see as the most important problems in the world today?

Sam Harris: There is one fundamental problem which is encouragingly or depressingly non-technical, depending on your view of it. I mean it should be such a simple problem to solve, but it’s seeming more or less totally intractable and that’s just the problem of communication. The problem of persuasion, the problem of getting people to agree on a shared consensus view of reality, and to acknowledge basic facts and to have their probability assessments of various outcomes to converge through honest conversation. Politics is obviously the great confounder of this meeting of the minds. I mean, our failure to fuse cognitive horizons through conversation is reliably derailed by politics. But there are other sorts of ideology that do this just as well, religion being perhaps first among them.

And so it seems to me that the first problem we need to solve, the place where we need to make progress and we need to fight for every inch of ground and try not to lose it again and again is in our ability to talk to one another about what is true and what is worth paying attention to, to get our norms to align on a similar picture of what matters. Basically value alignment, not with superintelligent AI, but with other human beings. That’s the master riddle we have to solve and our failure to solve it prevents us from doing anything else that requires cooperation. That’s where I’m most concerned. Obviously technology influences it, social media and even AI and the algorithms behind the gaming of everyone’s attention. All of that is influencing our public conversation, but it really is a very apish concern and we have to get our arms around it.

Lucas Perry: So that’s quite interesting and not the answer that I was expecting. I think that that sounds like quite the crucial stepping stone. Like the fact that climate change isn’t something that we’re able to agree upon, and is a matter of political opinion drives me crazy. And that’s one of many different global catastrophic or existential risk issues.

Sam Harris: Yeah. The COVID pandemic has made me, especially skeptical of our agreeing to do anything about climate change. The fact that we can’t persuade people about the basic facts of epidemiology when this thing is literally coming in through the doors and windows, and even very smart people are now going down the rabbit hole of this is on some level a hoax, people’s political and economic interests just bend their view of basic facts. I mean it’s not to say that there hasn’t been a fair amount of uncertainty here, but it’s not the sort of uncertainty that should give us these radically different views of what’s happening out in the world. Here we have a pandemic moving in real time. I mean, where we can see a wave of illness breaking in Italy a few weeks before it breaks in New York. And again, there’s just this Baghdad Bob level of denialism. The prospects of our getting our heads straight with respect to climate change in light of what’s possible in the middle of a pandemic, that seems at the moment, totally farfetched to me.

For something like climate change, I really think a technological elite needs to just decide at the problem and decide to solve it by changing the kinds of products we create and the way we manufacture things and we just have to get out of the politics of it. It can’t be a matter of persuading more than half of American society to make economic sacrifices. It’s much more along the lines of just building cars and other products that are carbon neutral that people want and solving the problem that way.

Lucas Perry: Right. Incentivizing the solution by making products that are desirable and satisfy people’s self-interest.

Sam Harris: Yeah. Yeah.

Lucas Perry: I do want to explore more actual global priorities. This point about the necessity of reason for being able to at least converge upon the global priorities that are most important seems to be a crucial and necessary stepping stone. So before we get into talking about things like existential and global catastrophic risk, do you see a way of this project of promoting reason and good conversation and converging around good ideas succeeding? Or do you have any other things to sort of add to these instrumental abilities humanity needs to cultivate for being able to rally around global priorities?

Sam Harris: Well, I don’t see a lot of innovation beyond just noticing that conversation is the only tool we have. Intellectual honesty spread through the mechanism of conversation is the only tool we have to converge in these ways. I guess the thing to notice that’s guaranteed to make it difficult is bad incentives. So we should always be noticing what incentives are doing behind the scenes to people’s cognition. There are things that could be improved in media. I think the advertising model is a terrible system of incentives for journalists and anyone else who’s spreading information. You’re incentivized to create sensational hot takes and clickbait and depersonalize everything. Just create one lurid confection after another, that really doesn’t get at what’s true. The fact that this tribalizes almost every conversation and forces people to view it through a political lens. The way this is all amplified by Facebook’s business model and the fact that you can sell political ads on Facebook and we use their micro-targeting algorithm to frankly, distort people’s vision of reality and get them to vote or not vote based on some delusion.

All of this is pathological and it has to be disincentivized in some way. The business model of digital media is part of the problem. But beyond that, people have to be better educated and realize that thinking through problems and understanding facts and creating better arguments and responding to better arguments and realizing when you’re wrong, these are muscles that need to be trained, and there are certain environments in which you can train them well. And there’s certain environments where they are guaranteed to atrophy. Education largely consists in the former, in just training someone to interact with ideas and with shared perceptions and with arguments and evidence in a way that is agnostic as to how things will come out. You’re just curious to know what’s true. You don’t want to be wrong. You don’t want to be self-deceived. You don’t want to have your epistemology anchored to wishful thinking and confirmation bias and political partisanship and religious taboos and other engines of bullshit, really.

I mean, you want to be free of all that, and you don’t want to have your personal identity trimming down your perception of what is true or likely to be true or might yet happen. People have to understand what it feels like to be willing to reason about the world in a way that is unconcerned about the normal, psychological and tribal identity formation that most people, most of the time use to filter against ideas. They’ll hear an idea and they don’t like the sound of it because it violates some cherished notion they already have in the bag. So they don’t want to believe it. That should be a tip off. That’s not more evidence in favor of your worldview. That’s evidence that you are an ape who’s disinclined to understand what’s actually happening in the world. That should be an alarm that goes off for you, not a reason to double down on the last bad idea you just expressed on Twitter.

Lucas Perry: Yeah. The way the ego and concern for reputation and personal identity and shared human psychological biases influence the way that we do conversations seems to be a really big hindrance here. And being aware of how your mind is reacting in each moment to the kinetics of the conversation and what is happening can be really skillful for catching unwholesome or unskillful reactions it seems. And I’ve found that non-violent communication has been really helpful for me in terms of having valuable open discourse where one’s identity or pride isn’t on the line. The ability to seek truth with another person instead of have a debate or argument is a skill certainly developed. Yet that kind of format for discussion isn’t always rewarded or promoted as well as something like an adversarial debate, which tends to get a lot more attention.

Sam Harris: Yeah.

Lucas Perry: So as we begin to strengthen our epistemology and conversational muscles so that we’re able to arrive at agreement on core issues, that’ll allow us to create a better civilization and work on what matters. So I do want to pivot here into what those specific things might be. Now I have three general categories, maybe four, for us to touch on here.

The first is existential risk that primarily come from technology, which might lead to the extinction of Earth originating life, or more specifically just the extinction of human life. You have a Ted Talk on AGI risk, that’s artificial general intelligence risk, the risk of machines becoming as smart or smarter than human beings and being misaligned with human values. There’s also synthetic bio risk where advancements in genetic engineering may unleash a new age of engineered pandemics, which are more lethal than anything that is produced by nature. We have nuclear war, and we also have new technologies or events that might come about that we aren’t aware of or can’t predict yet. And the other categories in terms of global priorities, I want to touch on are global poverty, animal suffering and human health and longevity. So how is it that you think of and prioritize and what is your reaction to these issues and their relative importance in the world?

Sam Harris: Well, I’m persuaded that thinking about existential risk is something we should do much more. It is amazing how few people spend time on this problem. It’s a big deal that we have the survival of our species as a blind spot, but I’m more concerned about what seems likelier to me, which is not that we will do something so catastrophically unwise as to erase ourselves, certainly not in the near term. And we’re capable of doing that clearly, but I think it’s more likely we’re capable of ensuring our unrecoverable misery for a good long while. We could just make life basically not worth living, but we’ll be forced or someone will be forced to live it all the while, basically a Road Warrior like hellscape could await us as opposed to just pure annihilation. So that’s a civilizational risk that I worry more about than extinction because it just seems probabilistically much more likely to happen no matter how big our errors are.

I worry about our stumbling into an accidental nuclear war. That’s something that I think is still pretty high on the list of likely ways we could completely screw up the possibility of human happiness in the near term. It’s humbling to consider what an opportunity cost this, compared to what’s possible, minor pandemic is, right. I mean, we’ve got this pandemic that has locked down most of humanity and every problem we had and every risk we were running as a species prior to anyone learning the name of this virus is still here. The threat of nuclear war has not gone away. It’s just, this has taken up all of our bandwidth. We can’t think about much else. It’s also humbling to observe how hard a time we’re having, even agreeing about what’s happening here, much less responding intelligently to the problem. If you imagine a pandemic that was orders of magnitude, more deadly and more transmissible, man, this is a pretty startling dress rehearsal.

I hope we learn something from this. I hope we think more about things like this happening in the future and prepare for them in advance. I mean, the fact that we have a CDC, that still cannot get its act together is just astounding. And again, politics is the thing that is gumming up the gears in any machine that would otherwise run halfway decently at the moment. I mean, we have a truly deranged president and that is not a partisan observation. That is something that can be said about Trump. And it would not be said about most other Republican presidents. There’s nothing I would say about Trump that I could say about someone like Mitt Romney or any other prominent Republican. This is the perfect circumstance to accentuate the downside of having someone in charge who lies more readily than any person in human history perhaps.

It’s like toxic waste at the informational level has been spread around for three years now and now it really matters that we have an information ecosystem that has no immunity against crazy distortions of the truth. So I hope we learn something from this. And I hope we begin to prioritize the list of our gravest concerns and begin steeling our civilization against the risk that any of these things will happen. And some of these things are guaranteed to happen. The thing that’s so bizarre about our failure to grapple with a pandemic of this sort is, this is the one thing we knew was going to happen. This was not a matter of “if.” This was only a matter of “when.” Now nuclear war is still a matter of “if”, right? I mean, we have the bombs, they’re on hair-trigger, overseen by absolutely bizarre and archaic protocols and highly outdated technology. We know this is just a doomsday system we’ve built that could go off at any time through sheer accident or ineptitude. But it’s not guaranteed to go off.

But pandemics are just guaranteed to emerge and we still were caught flat footed here. And so I just think we need to use this occasion to learn a lot about how to respond to this sort of thing. And again, if we can’t convince the public that this sort of thing is worth paying attention to, we have to do it behind closed doors, right? I mean, we have to get people into power who have their heads screwed on straight here and just ram it through. There has to be a kind of Manhattan Project level urgency to this, because this is about as benign a pandemic as we could have had, that would still cause significant problems. An engineered virus, a weaponized virus that was calculated to kill the maximum number of people. I mean, that’s a zombie movie, all of a sudden, and we’re not ready for the zombies.

Lucas Perry: I think that my two biggest updates from the pandemic were that human civilization is much more fragile than I thought it was. And also I trust the US government way less now in its capability to mitigate these things. I think at one point you said that 9/11 was the first time that you felt like you were actually in history. And as someone who’s 25, being in the COVID pandemic, this is the first time that I feel like I’m in human history. Because my life so far has been very normal and constrained, and the boundaries between everything has been very rigid and solid, but this is perturbing that.

So you mentioned that you were slightly less worried about humanity just erasing ourselves via some kind of existential risk and part of the idea here seems to be that there are futures that are not worth living. Like if there’s such thing as a moment or a day that isn’t worth living then there are also futures that are not worth living. So I’m curious if you could unpack why you feel that these periods of time that are not worth living are more likely than existential risks. And if you think that some of those existential conditions could be permanent, and could you speak a little bit about the relative likely hood of existential risk and suffering risks and whether you see the higher likelihood of the suffering risks to be ones that are constrained in time or indefinite.

Sam Harris: In terms of the probabilities, it just seems obvious that it is harder to eradicate the possibility of human life entirely than it is to just kill a lot of people and make the remaining people miserable. Right? If a pandemic spreads, whether it’s natural or engineered, that has 70% mortality and the transmissibility of measles, that’s going to kill billions of people. But it seems likely that it may spare some millions of people or tens of millions of people, even hundreds of millions of people and those people will be left to suffer their inability to function in the style to which we’ve all grown accustomed. So it would be with war. I mean, we could have a nuclear war and even a nuclear winter, but the idea that it’ll kill every last person or every last mammal, it would have to be a bigger war and a worse winter to do that.

So I see the prospect of things going horribly wrong to be one that yields, not a dial tone, but some level of remaining, even civilized life, that’s just terrible, that nobody would want. Where we basically all have the quality of life of what it was like on a mediocre day in the middle of the civil war in Syria. Who wants to live that way? If every city on Earth is basically a dystopian cell on a prison planet, that for me is a sufficient ruination of the hopes and aspirations of civilized humanity. That’s enough to motivate all of our efforts to avoid things like accidental nuclear war and uncontrolled pandemics and all the rest. And in some ways it’s more of motivating because when you ask people, what’s the problem with the failure to continue the species, right? Like if we all died painlessly in our sleep tonight, what’s the problem with that?

That actually stumps some considerable number of people because they immediately see that the complete annihilation of the species painlessly is really a kind of victimless crime. There’s no one around to suffer our absence. There’s no one around to be bereaved. There’s no one around to think, oh man, we could have had billions of years of creativity and insight and exploration of the cosmos and now the lights have gone out on the whole human project. There’s no one around to suffer that disillusionment. So what’s the problem? I’m persuaded that that’s not the perfect place to stand to evaluate the ethics. I agree that losing that opportunity is a negative outcome that we want to value appropriately, but it’s harder to value it emotionally and it’s not as clear. I mean it’s also, there’s an asymmetry between happiness and suffering, which I think is hard to get around.

We are perhaps rightly more concerned about suffering than we are about losing opportunities for wellbeing. If I told you, you could have an hour of the greatest possible happiness, but it would have to be followed by an hour of the worst possible suffering. I think most people given that offer would say, oh, well, okay, I’m good. I’ll just stick with what it’s like to be me. The hour of the worst possible misery seems like it’s going to be worse than the highest possible happiness is going to be good and I do sort of share that intuition. And when you think about it, in terms of the future of humanity, I think it is more motivating to think, not that your grandchildren might not exist, but that your grandchildren might live horrible lives, really unendurable lives and they’ll be forced to live them because there’ll be born. If for no other reason, then we have to persuade some people to take these concerns seriously, I think that’s the place to put most of the emphasis.

Lucas Perry: I think that’s an excellent point. I think it makes it more morally salient and leverages human self-interest more. One distinction that I want to make is the distinction between existential risks and global catastrophic risks. Global catastrophic risks are those which would kill a large fraction of humanity without killing everyone, and existential risks are ones which would exterminate all people or all Earth-originating intelligent life. And this former risk, the global catastrophic risks are the ones which you’re primarily discussing here where something goes really bad and now we’re left with some pretty bad existential situation.

Sam Harris: Yeah.

Lucas Perry: Now we’re not locked in that forever. So it’s pretty far away from being what is talked about in the effective altruism community as a suffering risk. That actually might only last a hundred or a few hundred years or maybe less. Who knows. It depends on what happened. But now taking a bird’s eye view again on global priorities and standing on a solid ground of ethics, what is your perspective on longtermist philosophy? This is the position or idea that the deep future has overwhelming moral priority, given the countless trillions of lives that could be lived. So if an existential risk occur, then we’re basically canceling the whole future like you mentioned. There won’t be any suffering and there won’t be any joy, but we’re missing out on a ton of good it would seem. And with the continued evolution of life, through genetic engineering and enhancements and artificial intelligence, it would seem that the future could also be unimaginably good.

If you do an expected value calculation about existential risks, you can estimate very roughly the likelihood of each existential risk, whether it be from artificial general intelligence or synthetic bio or nuclear weapons or a black swan event that we couldn’t predict. And you multiply that by the amount of value in the future, you’ll get some astronomical number, given the astronomical amount of value in the future. Does this kind of argument or viewpoint do the work for you to commit you to seeing existential risk as a global priority or the central global priority?

Sam Harris: Well, it doesn’t do the emotional work largely because we’re just bad at thinking about longterm risk. It doesn’t even have to be that long-term for our intuitions and concerns to degrade irrationally. We’re bad at thinking about the well-being, even of our future selves as you get further out in time. The term of jargon is that we “hyperbolically discount” our future well being. People will smoke cigarettes or make other imprudent decisions in the present. They know they will be the inheritors of these bad decisions, but there’s some short-term upside.

The mere pleasure of the next cigarette say, that convinces them that they don’t really have to think long and hard about what their future self will wish they had done at this point. Our ability to be motivated by what we think is likely to happen in the future is even worse when we’re thinking about our descendants. Right? People we either haven’t met yet or may never meet. I have kids, but I don’t have grandkids. How much of my bandwidth is taken up thinking about the kinds of lives my grandchildren will have? Really none. It’s conserved. It’s safeguarded by my concern about my kids, at this point.

But, then there are people who don’t have kids and are just thinking about themselves. It’s hard to think about the comparatively near future. Even a future that, barring some real mishap, you have every expectation of having to live in yourself. It’s just hard to prioritize. When you’re talking about the far future, it becomes very, very difficult. You just have to have the science fiction geek gene or something disproportionately active in your brain, to really care about that.

Unless you think you are somehow going to cheat death and get aboard the starship when it’s finally built. You’re popping 200 vitamins a day with Ray Kurzweil and you think you might just be in the cohort of people who are going to make it out of here without dying because we’re just on the cusp of engineering death out of the system, then I could see, okay. There’s a self interested view of it. If you’re really talking about hypothetical people who you know you will never come in contact with, I think it’s hard to be sufficiently motivated, even if you believe the moral algebra here.

It’s not clear to me that it need run through. I agree with you that if you do a basic expected value calculation here, and you start talking about trillions of possible lives, their interests must outweigh the interests of the 7.8 or whatever it is, billion of us currently alive. A few asymmetries here, again. The asymmetry between actual and hypothetical lives, there are no identifiable lives who would be deprived of anything if we all just decided to stop having kids. You have to take the point of view of the people alive who make this decision.

If we all just decided, “Listen. These are our lives to live. We can decide how we want to live them. None of us want to have kids anymore.” If we all independently made that decision, the consequence on this calculus is we are the worst people, morally speaking, who have ever lived. That doesn’t quite capture the moment, the experience or the intentions. We could do this thing without ever thinking about the implications of existential risk. If we didn’t have a phrase for this and we didn’t have people like ourselves talking about this is a problem, people could just be taken in by the overpopulation thesis.

That that’s really the thing that is destroying the world and what we need is some kind of Gaian reset, where the Earth reboots without us. Let’s just stop having kids and let nature reclaim the edges of the cities. You could see a kind of utopian environmentalism creating some dogma around that, where it was no one’s intention ever to create some kind of horrific crime. Yet, on this existential risk calculus, that’s what would have happened. It’s hard to think about the morality there when you talk about people deciding not to have kids and it would be the same catastrophic outcome.

Lucas Perry: That situation to me seems to be like looking over the possible moral landscape and seeing a mountain or not seeing a mountain, but there still being a mountain. Then you can have whatever kinds of intentions that you want, but you’re still missing it. From a purely consequentialist framework on this, I feel not so bad saying that this is probably one of the worst things that have ever happened.

Sam Harris: The asymmetry here between suffering and happiness still seems psychologically relevant. It’s not quite the worst thing that’s ever happened, but the best things that might have happened have been canceled. Granted, I think there’s a place to stand where you could think that is a horrible outcome, but again, it’s not the same thing as creating some hell and populating it.

Lucas Perry: I see what you’re saying. I’m not sure that I quite share the intuition about the asymmetry between suffering and well-being. I feel somewhat suspect about that, but that would be a huge tangent right now, I think. Now, one of the crucial things that you said was, for those that are not really compelled to care about the long-term future argument, if you don’t have the science fiction geek gene and are not compelled by moral philosophy, the essential way it seems to be that you’re able to compel people to care about global catastrophic and existential risk is to demonstrate how they’re very likely within this century.

And so their direct descendants, like their children or grandchildren, or even them, may live in a world that is very bad or they may die in some kind of a global catastrophe, which is terrifying. Do you see this as the primary way of leveraging human self-interest and feelings and emotions to make existential and global catastrophic risk salient and pertinent for the masses?

Sam Harris: It’s certainly half the story, and it might be the most compelling half. I’m not saying that we should be just worried about the downside because the upside also is something we should celebrate and aim for. The other side of the story is that we’ve made incredible progress. If you take someone like Steven Pinker and his big books of what is often perceived as happy talk. He’s pointing out all of the progress, morally and technologically and at the level of public health.

It’s just been virtually nothing but progress. There’s no point in history where you’re luckier to live than in the present. That’s true. I think that the thing that Steve’s story conceals, or at least doesn’t spend enough time acknowledging, is that the risk of things going terribly wrong is also increasing. It was also true a hundred years ago that it would have been impossible for one person or a small band of people to ruin life for everyone else.

Now that’s actually possible. Just imagine if this current pandemic were an engineered virus, more like a lethal form of measles. It might take five people to create that and release it. Here we would be locked down in a truly terrifying circumstance. The risk is ramped up. I think we just have to talk about both sides of it. There is no limit to how beautiful life could get if we get our act together. Take an argument of the sort that David Deutsch makes about the power of knowledge.

Every problem has a solution born of a sufficient insight into how things work, i.e. knowledge, unless the laws of physics rules it out. If it’s compatible with the laws of physics, knowledge can solve the problem. That’s virtually a blank check with reality that we could live to cash, if we don’t kill ourselves in the process. Again, as the upside becomes more and more obvious, the risk that we’re going to do something catastrophically stupid is also increasing. The principles here are the same. The only reason why we’re talking about existential risk is because we have made so much progress. Without the progress, there’d be no way to make a sufficiently large mistake. It really is two sides of the coin of increasing knowledge and technical power.

Lucas Perry: One thing that I wanted to throw in here in terms of the kinetics of long-termism and emotional saliency, it would be stupidly optimistic I think, to think that everyone could become selfless bodhisattvas. In terms of your interest, the way in which you promote meditation and mindfulness, and your arguments against the conventional, experiential and conceptual notion of the self, for me at least, has dissolved much of the barriers which would hold me from being emotionally motivated from long-termism.

Now, that itself I think, is another long conversation. When your sense of self is becoming nudged, disentangled and dissolved in new ways, the idea that it won’t be you in the future, or the idea that the beautiful dreams that Dyson spheres will be having in a billion years are not you, that begins to relax a bit. That’s probably not something that is helpful for most people, but I do think that it’s possible for people to adopt and for meditation, mindfulness and introspection to lead to this weakening of sense of self, which then also opens one’s optimism, and compassion, and mind towards the long-termist view.

Sam Harris: That’s something that you get from reading Derek Parfit’s work. The paradoxes of identity that he so brilliantly framed and tried to reason through yield something like what you’re talking about. It’s not so important whether it’s you, because this notion of you is in fact, paradoxical to the point of being impossible to pin down. Whether the you that woke up in your bed this morning is the same person who went to sleep in it the night before, that is problematic. Yet there’s this fact of some degree of psychological continuity.

The basic fact experientially is just, there is consciousness and its contents. The only place for feelings, and perceptions, and moods, and expectations, and experience to show up is in consciousness, whatever it is and whatever its connection to the physics of things actually turns out to be. There’s just consciousness. The question of where it appears is a genuinely interesting one philosophically, and intellectually, and scientifically, and ultimately morally.

Because if we build conscious robots or conscious computers and build them in a way that causes them to suffer, we’ve just done something terrible. We might do that inadvertently if we don’t know how consciousness arises based on information processing, or whether it does. It’s all interesting terrain to think about. If the lights are still on a billion years from now, and the view of the universe is unimaginably bright, and interesting and beautiful, and all kinds of creative things are possible by virtue of the kinds of minds involved, that will be much better than any alternative. That’s certainly how it seems to me.

Lucas Perry: I agree. Some things here that ring true seem to be, you always talk about how there’s only consciousness and its contents. I really like the phrase, “Seeing from nowhere.” That usually is quite motivating for me, in terms of the arguments against the conventional conceptual and experiential notions of self. There just seems to be instantiations of consciousness intrinsically free of identity.

Sam Harris: Two things to distinguish here. There’s the philosophical, conceptual side of the conversation, which can show you that things like your concept of a self, or certainly your concept of a self that could have free will that, that doesn’t make a lot of sense. It doesn’t make sense when mapped onto physics. It doesn’t make sense when looked for neurologically. Any way you look at it, it begins to fall apart. That’s interesting, but again, it doesn’t necessarily change anyone’s experience.

It’s just a riddle that can’t be solved. Then there’s the experiential side which you encounter more in things like meditation, or psychedelics, or sheer good luck where you can experience consciousness without the sense that there’s a subject or a self in the center of it appropriating experiences. Just a continuum of experience that doesn’t have structure in the normal way. What’s more, that’s not a problem. In fact, it’s the solution to many problems.

A lot of the discomfort you have felt psychologically goes away when you punch through to a recognition that consciousness is just the space in which thoughts, sensations and emotions continually appear, change and vanish. There’s no thinker authoring the thoughts. There’s no experiencer in the middle of the experience. It’s not to say you don’t have a body. There’s every sign that you have a body is still appearing. There’s sensations of tension, warmth, pressure and movement.

There are sights, there are sounds but again, everything is simply an appearance in this condition, which I’m calling consciousness for lack of a better word. There’s no subject to whom it all refers. That can be immensely freeing to recognize, and that’s a matter of a direct change in one’s experience. It’s not a matter of banging your head against the riddles of Derek Parfit or any other way of undermining one’s belief in personal identity or the reification of a self.

Lucas Perry: A little bit earlier, we talked a little bit about the other side of the existential risk coin. Now, the other side of that is this existential hope, we like to call at The Future of Life Institute. We’re not just a doom and gloom society. It’s also about how the future can be unimaginably good if we can get our act together and apply the appropriate wisdom to manage and steward our technologies with wisdom and benevolence in mind.

Pivoting in here and reflecting a little bit on the implications of some of this no self conversation we’ve been having for global priorities, the effective altruism community has narrowed down on three of these global priorities as central issues of consideration, existential risk, global poverty and animal suffering. We talked a bunch about existential risk already. Global poverty is prolific, and many of us live in quite nice and abundant circumstances.

Then there’s animal suffering, which can be thought of as in two categories. One being factory farmed animals, where we have billions upon billions of animals being born into miserable conditions and being slaughtered for sustenance. Then we also have wild animal suffering, which is a bit more esoteric and seems like it’s harder to get any traction on helping to alleviate. Thinking about these last two points, global poverty and animal suffering, what is your perspective on these?

I find the lack of willingness for people to empathize and be compassionate towards animal suffering to be quite frustrating, as well as global poverty, of course. If you view the perspective of no self as potentially being informative or helpful for leveraging human compassion and motivation to help other people and to help animals. One quick argument here that comes from the conventional view of self, so isn’t strictly true or rational, but is motivating for me, is that I feel like I was just born as me and then I just woke up one day as Lucas.

I, referring to this conventional and experientially illusory notion that I have of myself, this convenient fiction that I have. Now, you’re going to die and you could wake up as a factory farmed animal. Surely there are those billions upon billions of instantiations of consciousness that are just going through misery. If the self is an illusion then there are selfless chicken and cow experiences of enduring suffering. Any thoughts or reactions you have to global poverty, animal suffering and what I mentioned here?

Sam Harris: I guess the first thing to observe is that again, we are badly set up to prioritize what should be prioritized and to have the emotional response commensurate with what we could rationally understand is so. We have a problem of motivation. We have a problem of making data real. This has been psychologically studied, but it’s just manifest in oneself and in the world. We care more about the salient narrative that has a single protagonist than we do about the data on, even human suffering.

The classic example here is one little girl falls down a well, and you get wall to wall news coverage. All the while there could be a genocide or a famine killing hundreds of thousands of people, and it doesn’t merit more than five minutes. One broadcast. That’s clearly a bug, not a feature morally speaking, but it’s something we have to figure out how to work with because I don’t think it’s going away. One of the things that the effective altruism philosophy has done, I think usefully, is that it has separated two projects which up until the emergence of effective altruism, I think were more or less always conflated.

They’re both valid projects, but one has much greater moral consequence. The fusion of the two is, the concern about giving and how it makes one feel. I want to feel good about being philanthropic. Therefore, I want to give to causes that give me these good feels. In fact, at the end of the day, the feeling I get from giving is what motivates me to give. If I’m giving in a way that doesn’t really produce that feeling, well, then I’m going to give less or give less reliably.

Even in a contemplative Buddhist context, there’s an explicit fusion of these two things. The reason to be moral and to be generous is not merely, or even principally, the effect on the world. The reason is because it makes you a better person. It gives you a better mind. You feel better in your own skin. It is in fact, more rewarding than being selfish. I think that’s true, but that doesn’t get at really, the important point here, which is we’re living in a world where the difference between having good and bad luck is so enormous.

The inequalities are so shocking and indefensible. The fact that I was born me and not born in some hell hole in the middle of a civil war soon to be orphaned, and impoverished and riddled by disease, I can take no responsibility for the difference in luck there. That difference is the difference that matters more than anything else in my life. What the effective altruist community has prioritized is, actually helping the most people, or the most sentient beings.

That is fully divorceable from how something makes you feel. Now, I think it shouldn’t be ultimately divorceable. I think we should recalibrate our feelings or struggle to, so that we do find doing the most good the most rewarding thing in the end, but it’s hard to do. My inability to do it personally, is something that I have just consciously corrected for. I’ve talked about this a few times on my podcast. When Will MacAskill came on my podcast and we spoke about these things, I was convinced at the end of the day, “Well, I should take this seriously.”

I recognize that fighting malaria by sending bed nets to people in sub-Saharan Africa is not a cause I find particularly sexy. I don’t find it that emotionally engaging. I don’t find it that rewarding to picture the outcome. Again, compared to other possible ways of intervening in human misery and producing some better outcome, it’s not the same thing as rescuing the little girl from the well. Yet, I was convinced that, as Will said on that podcast and as organizations like GiveWell attest, giving money to the Against Malaria Foundation was and remains one of the absolute best uses of every dollar to mitigate unnecessary death and suffering.

I just decided to automate my giving to the Against Malaria Foundation because I knew I couldn’t be trusted to wake up every day, or every month or every quarter, whatever it would be, and recommit to that project because some other project would have captured my attention in the meantime. I was either going to give less to it or not give at all, in the end. I’m convinced that we do have to get around ourselves and figure out how to prioritize what a rational analysis says we should prioritize and get the sentimentality out of it, in general.

It’s very hard to escape entirely. I think we do need to figure out creative ways to reformat our sense of reward. The reward we find in helping people has to begin to become more closely coupled to what is actually most helpful. Conversely, the disgust or horror we feel over bad outcomes should be more closely coupled to the worst things that happen. As opposed to just the most shocking, but at the end of the day, minor things. We’re just much more captivated by a sufficiently ghastly story involving three people than we are by the deaths of literally millions that happen some other way. These are bugs we have to figure out how to correct for.

Lucas Perry: I hear you. The person running in the burning building to save the child is sung as a hero, but if you are say, earning to give for example and write enough checks to save dozens of lives over your lifetime, that might not go recognized or felt in the same way.

Sam Harris: And also these are different people, too. It’s also true to say that someone who is psychologically and interpersonally not that inspiring, and certainly not a saint might wind up doing more good than any saint ever does or could. I don’t happen to know Bill Gates. He could be saint-like. I literally never met him, but I don’t get that sense that he is. I think he’s kind of a normal technologist and might be normally egocentric, concerned about his reputation and legacy.

He might be a prickly bastard behind closed doors. I don’t know, but he certainly stands a chance of doing more good than any person in human history at this point, just based on the checks he’s writing and his intelligent prioritization of his philanthropic efforts. There is an interesting uncoupling here where you could just imagine someone who might be a total asshole, but actually does more good than any army of Saints you could muster. That’s interesting. That just proves a point that a concern about real world outcomes is divorceable from the psychology that we tend to associate with doing good in the world. On the point of animal suffering, I share your intuitions there, although again, this is a little bit like climate change in that I think that the ultimate fix will be technological. It’ll be a matter of people producing the Impossible Burger squared that is just so good that no one’s tempted to eat a normal burger anymore, or something like Memphis Meats, which actually, I invested in.

I have no idea where it’s going as a company, but when I had its CEO on my podcast back in the day, Uma Valeti, I just thought, “This is fantastic to engineer actual meat without producing any animal suffering. I hope he can bring this to scale.” At the time, it was like an $18,000-meatball. I don’t know what it is now, but it’s that kind of thing that will close the door to the slaughterhouse more than just convincing billions of people about the ethics. It’s too difficult and the truth may not align with exactly what we want.

I’m going to reap the whirlwind of criticism from the vegan mafia here, but it’s just not clear to me that it’s easy to be a healthy vegan. Forget about yourself as an adult making a choice to be a vegan, raising vegan kids is a medical experiment on your kids of a certain sort and it’s definitely possible to screw it up. There’s just no question about it. If you’re not going to admit that, you’re not a responsible parent.

It is possible, it is by no means easier to raise healthy vegan kids than it is to raise kids who eat meat sometimes and that’s just a problem, right? Now, that’s a problem that has a technical solution, but there’s still diversity of opinion about what constitutes a healthy human diet even when all things are on the menu. We’re just not there yet. It’s unlikely to be just a matter of supplementing B12.

Then the final point you made does get us into a kind of, I would argue, a reductio ad absurdum of the whole project ethically when you’re talking about losing sleep over whether to protect the rabbits from the foxes out there in the wild. If you’re going to go down that path, and I will grant you, I wouldn’t want to trade places with a rabbit, and there’s a lot of suffering out there in the natural world, but if you’re going to try to figure out how to minimize the suffering of wild animals in relation to other wild animals then I think you are a kind of antinatalist with respect to the natural world. I mean, then it would be just better if these animals didn’t exist, right? Let’s just hit stop on the whole biosphere, if that’s the project.

Then there’s the argument that there are many more ways to suffer and to be happy as a sentient being. Whatever story you want to tell yourself about the promise of future humanity, it’s just so awful to be a rabbit or an insect that if an asteroid hit us and canceled everything, that would be a net positive.

Lucas Perry: Yeah. That’s an actual view that I hear around a bunch. I guess my quick response is as we move farther into the future, if we’re able to reach an existential situation which is secure and where there is flourishing and we’re trying to navigate the moral landscape to new peaks, it seems like we will have to do something about wild animal suffering. With AGI and aligned superintelligence, I’m sure there could be very creative solutions using genetic engineering or something. Our descendants will have to figure that out, whether they are just like, “Are wild spaces really necessary in the future and are wild animals actually necessary, or are we just going to use those resources in space to build more AI that would dream beautiful dreams?”

Sam Harris: I just think it may be, in fact, the case that nature is just a horror show. It is bad almost any place you could be born in the natural world, you’re unlucky to be a rabbit and you’re unlucky to be a fox. We’re lucky to be humans, sort of, and we can dimly imagine how much luckier we might get in the future if we don’t screw up.

I find it compelling to imagine that we could create a world where certainly most human lives are well worth living and better than most human lives ever were. Again, I follow Pinker in feeling that we’ve sort of done that already. It’s not to say that there aren’t profoundly unlucky people in this world, and it’s not to say that things couldn’t change in a minute for all of us, but life has gotten better and better for virtually everyone when you compare us to any point in the past.

If we get to the place you’re imagining where we have AGI that we have managed to align with our interests and we’re migrating into of spaces of experience that changes everything, it’s quite possible we will look back on the “natural world” and be totally unsentimental about it, which is to say, we could compassionately make the decision to either switch it off or no longer provide for its continuation. It’s like that’s just a bad software program that evolution designed and wolves and rabbits and bears and mice, they were all unlucky on some level.

We could be wrong about that, or we might discover something else. We might discover that intelligence is not all it’s cracked up to be, that it’s just this perturbation on something that’s far more rewarding. At the center of the moral landscape, there’s a peak higher than any other and it’s not one that’s elaborated by lots of ideas and lots of creativity and lots of distinctions, it’s just this great well of bliss that we actually want to fully merge with. We might find out that the cicadas were already there. I mean, who knows how weird this place is?

Lucas Perry: Yeah, that makes sense. I totally agree with you and I feel this is true. I also feel that there’s some price that is paid because there’s already some stigma around even thinking this. I think it’s a really early idea to have in terms of the history of human civilization, so people’s initial reaction is like, “Ah, what? Nature’s so beautiful and why would you do that to the animals?” Et cetera. We may come to find out that nature is just very net negative, but I could be wrong and maybe it would be around neutral or better than that, but that would require a more robust and advanced science of consciousness.

Just hitting on this next one fairly quickly, effective altruism is interested in finding new global priorities and causes. They call this “Cause X,” something that may be a subset of existential risk or something other than existential risk or global poverty or animal suffering probably still just has to do with the suffering of sentient beings. Do you think that a possible candidate for Cause X would be machine suffering or the suffering of other non-human conscious things that we’re completely unaware of?

Sam Harris: Yeah, well, I think it’s a totally valid concern. Again, it’s one of these concerns that’s hard to get your moral intuitions tuned up to respond to. People have a default intuition that a conscious machine is impossible, that substrate independence, on some level, is impossible, they’re making an assumption without ever doing it explicitly… In fact, I think most people would explicitly deny thinking this, but it is implicit in what they then go on to think when you pose the question of the possibility of suffering machines and suffering computers.

That just seems like something that never needs to be worried about and yet the only way to close the door to worrying about it is to assume that consciousness is totally substrate-dependent and that we would never build a machine that could suffer because we’re building machines out of some other material. If we built a machine out of biological neurons, well, then, then we might be up for condemnation morally because we’ve taken an intolerable risk analogous to create some human-chimp hybrid or whatever. It’s like obviously, that thing’s going to suffer. It’s an ape of some sort and now it’s in a lab and what sort of monster would do that, right? We would expect the lights to come on in a system of that sort.

If consciousness is the result of information processing on some level, and again, that’s an “if,” we’re not sure that’s the case, and if information processing is truly substrate-independent, and that seems like more than an “if” at this point, we know that’s true, then we could inadvertently build conscious machines. And then the question is: What is it like to be those machines and are they suffering? There’s no way to prevent that on some level.

Certainly, if there’s any relationship between consciousness and intelligence, if building more and more intelligent machines is synonymous with increasing the likelihood that the lights will come on experientially, well, then we’re clearly on that path. It’s totally worth worrying about, but it’s again, judging from what my own mind is like and what my conversations with other people suggest, it seems very hard to care about for people. That’s just another one of these wrinkles.

Lucas Perry: Yeah. I think a good way of framing this is that humanity has a history of committing moral catastrophes because of bad incentives and they don’t even realize how bad the thing is that they’re doing, or they just don’t really care or they rationalize it, like subjugation of women and slavery. We’re in the context of human history and we look back at these people and see them as morally abhorrent.

Now, the question is: What is it today that we’re doing that’s morally abhorrent? Well, I think factory farming is easily one contender and perhaps human selfishness that leads to global poverty and millions of people drowning in shallow ponds is another one that we’ll look back on. With just some foresight towards the future, I agree that machine suffering is intuitively and emotionally difficult to empathize with if your sci-fi gene isn’t turned on. It could be the next thing.

Sam Harris: Yeah.

Lucas Perry: I’d also like to pivot here into AI alignment and AGI. In terms of existential risk from AGI or transformative AI systems, do you have thoughts on public intellectuals who are skeptical of existential risk from AGI or superintelligence? You had a talk about AI risk and I believe you got some flak from the AI community about that. Elon Musk was just skirmishing with the head of AI at Facebook, I think. What is your perspective about the disagreement and confusion here?

Sam Harris: It comes down to a failure of imagination on the one hand and also just bad argumentation. No sane person who’s concerned about this is concerned because they think it’s going to happen this year or next year. It’s not a bet on how soon this is going to happen. For me, it certainly isn’t a bet on how soon it’s going to happen. It’s just a matter of the implications of continually making progress in building more and more intelligent machines. Any progress, it doesn’t have to be Moore’s law, it just has to be continued progress, will ultimately deliver us into relationship with something more intelligent than ourselves.

To think that that is farfetched or is not likely to happen or can’t happen is to assume some things that we just can’t assume. It’s to assume that substrate independence is not in the cards for intelligence. Forget about consciousness. I mean, consciousness is orthogonal to this question. I’m not suggesting that AGI need be conscious, it just needs to be more competent than we are. We already know that our phones are more competent as calculators than we are, they’re more competent chess players than we are. You just have to keep stacking cognitive-information-processing abilities on that and making progress, however incremental.

I don’t see how anyone can be assuming substrate dependence for really any of the features of our mind apart from, perhaps, consciousness. Take the top 200 things we do cognitively, consciousness aside, just as a matter of sheer information-processing and behavioral control and power to make decisions and you start checking those off, those have to be substrate independent: facial recognition, voice recognition, we can already do that in silico. It’s just not something you need meat to do.

We’re going to build machines that get better and better at all of these things and ultimately, they will pass the Turing test and ultimately, it will be like chess or now Go as far as the eye can see, where it will be in relationship to something that is better than we are at everything that we have prioritized, every human competence we have put enough priority in that we took the time to build it into our machines in the first place: theorem-proving in mathematics, engineering software programs. There is no reason why a computer will ultimately not be the best programmer in the end, again, unless you’re assuming that there’s something magical about doing this in meat. I don’t know anyone who’s assuming that.

Arguing about the time horizon is a non sequitur, right? No one is saying that this need happen soon to ultimately be worth thinking about. We know that whatever the time horizon is, it can happen suddenly. We have historically been very bad at predicting when there will be a breakthrough. This is a point that Stuart Russell makes all the time. If you look at what Rutherford said about the nuclear chain reaction being a pipe dream, it wasn’t even 24 hours before Leo Szilard committed the chain reaction to paper and had the relevant breakthrough. We know we can make bad estimates about the time horizon, so at some point, we could be ambushed by a real breakthrough, which suddenly delivers exponential growth in intelligence.

Then there’s a question of just how quickly that could unfold and whether this something like an intelligence explosion. That’s possible. We can’t know for sure, but you need to find some foothold to doubt whether these things are possible and the footholds that people tend to reach for are either nonexistent or they’re non sequiturs.

Again, the time horizon is irrelevant and yet the time horizon is the first thing you hear from people who are skeptics about this: “It’s not going to happen for a very long time.” Well, I mean, Stuart Russell’s point here, which is, again, it’s just a reframing, but in the persuasion business, reframing is everything. The people who are consoled by this idea that this is not going to happen for 50 years wouldn’t be so consoled if we receive a message from an alien civilization which said, “People of Earth, we will arrive on your humble planet in 50 years. Get ready.”

If that happened, we would be prioritizing our response to that moment differently than the people who think it’s going to take 50 years for us to build AGI are prioritizing their response to what’s coming. We would recognize a relationship with something more powerful than ourselves is in the often. It’s only reasonable to do that on the assumption that we will continue to make progress.

The point I made in my TED Talk is that the only way to assume we’re not going to continue to make progress is to be convinced of a very depressing thesis. The only way we wouldn’t continue to make progress is if we open the wrong door of the sort that you and I have been talking about in this conversation, if we invoke some really bad roll of the dice in terms of existential risk or catastrophic civilizational failure, and we just find ourselves unable to build better and better computers. I mean, that’s the only thing that would cause us to be unable to do that. Given the power and value of intelligent machines, we will build more and more intelligent machines at almost any cost at this point, so a failure to do it would be a sign that something truly awful has happened.

Lucas Perry: Yeah. From my perspective, the people that are skeptical of substrate independence, I wouldn’t say that those are necessarily AI researchers. Those are regular persons or laypersons who are not computer scientists. I think that’s motivated by mind-body dualism, where one has a conventional and experiential sense of the mind as being non-physical, which may be motivated by popular religious beliefs, but when we get into the area of actual AI researchers, for them, it seems to either be like they’re attacking some naive version of the argument or a straw man or something

Sam Harris: Like robots becoming spontaneously malevolent?

Lucas Perry: Yeah. It’s either that, or they think that the alignment problem isn’t as hard as it is. They have some intuition, like why the hell would we even release systems that weren’t safe? Why would we not make technology that served us or something? To me, it seems that when there are people from like the mainstream machine-learning community attacking AI alignment and existential risk considerations from AI, it seems like they just don’t understand how hard the alignment problem is.

Sam Harris: Well, they’re not taking seriously the proposition that what we will have built are truly independent minds more powerful than our own. If you actually drill down on what that description means, it doesn’t mean something that is perfectly enslaved by us for all time, I mean, because that is by definition something that couldn’t be more intelligent across the board than we are.

The analogy I use is imagine if dogs had invented us to protect their interests. Well, so far, it seems to be going really well. We’re clearly more intelligent than dogs, they have no idea what we’re doing or thinking about or talking about most of the time, and they see us making elaborate sacrifices for their wellbeing, which we do. I mean, the people who own dogs care a lot about them and make, you could argue, irrational sacrifices to make sure they’re happy and healthy.

But again, back to the pandemic, if we recognize that we had a pandemic that was going to kill the better part of humanity and it was jumping from dogs to people and the only way to stop this is to kill all the dogs, we would kill all the dogs on a Thursday. There’d be some holdouts, but they would lose. The dog project would be over and the dogs would never understand what happened.

Lucas Perry: But that’s because humans aren’t perfectly aligned with dog values.

Sam Harris: But that’s the thing: Maybe it’s a solvable problem, but it’s clearly not a trivial problem because what we’re imagining are minds that continue to grow in power and grow in ways that by definition we can’t anticipate. Dogs can’t possibly anticipate where we will go next, what we will become interested in next, what we will discover next, what we’ll prioritize next. If you’re not imagining minds so vast that we can’t capture their contents ourselves, you’re not talking about the AGI that the people who are worried about alignment are talking about.

Lucas Perry: Maybe this is like a little bit of a nuanced distinction between you or I, but I think that that story that you’re developing there seems to assume that the utility function or the value learning or the objective function of the systems that we’re trying to align with human values is dynamic. It may be the case that you can build a really smart alien mind and it might become super-intelligent, but there are arguments that maybe you could make its alignment stable.

Sam Harris: That’s the thing we have to hope for, right? I’m not a computer scientist, so as far as the doability of this, that’s something I don’t have good intuitions about, but Stuart Russell’s argument that we would need a system whose ultimate value is to more and more closely approximate our current values that would continually, no matter how much its intelligence escapes our own, it would continually remain available to the conversation with us where we say, “Oh, no, no. Stop doing that. That’s not what we want.” That would be the most important message from its point of view, no matter how vast its mind got.

Maybe that’s doable, right, but that’s the kind of thing that would have to be true for the thing to remain completely aligned to us because the truth is we don’t want it aligned to who we used to be and we don’t want it aligned to the values of the Taliban. We want to grow in moral wisdom as well and we want to be able to revise our own ethical codes and this thing that’s smarter than us presumably could help us do that, provided it doesn’t just have its own epiphanies which cancel the value of our own or subvert our own in a way that we didn’t foresee.

If it really has our best interest at heart, but our best interests are best conserved by it deciding to pull the plug on everything, well, then we might not see the wisdom of that. I mean, it might even be the right answer. Now, this is assuming it’s conscious. We could be building something that is actually morally more important than we are.

Lucas Perry: Yeah, that makes sense. Certainly, eventually, we would want it to be aligned with some form of idealized human values and idealized human meta preferences over how value should change and evolve into the deep future. This is known, I think, as “ambitious value learning” and it is the hardest form of value learning. Maybe we can make something safe without doing this level of ambitious value learning, but something like that may be deeper in the future.

Now, as we’ve made moral progress throughout history, we’ve been expanding our moral circle of consideration. In particular, we’ve been doing this farther into space, deeper into time, across species, and potentially soon, across substrates. What do you see as the central way of continuing to expand our moral circle of consideration and compassion?

Sam Harris: Well, I just think we have to recognize that things like distance in time and space and superficial characteristics, like whether something has a face, much less a face that can make appropriate expressions or a voice that we can relate to, none of these things have moral significance. The fact that another person is far away from you in space right now shouldn’t fundamentally affect how much you care whether or not they’re being tortured or whether they’re starving to death.

Now, it does. We know it does. People are much more concerned about what’s happening on their doorstep, but I think proximity, if it has any weight at all, it has less and less weight the more our decisions obviously affect people regardless of separation and space, but the more it becomes truly easy to help someone on another continent because you can just push a button in your browser, then you’re caring less about them is clearly a bug. And so it’s just noticing that the things that attenuate our compassion tend to be things that for evolutionary reasons we’re designed to discount in this way, but at the level of actual moral reasoning about a global civilization it doesn’t make any sense and it prevents us from solving the biggest problems.

Lucas Perry: Pivoting into ethics more so now. I’m not sure if this is the formal label that you would use but your work on the moral landscape lands you pretty much it seems in the moral realism category.

Sam Harris: Mm-hmm (affirmative).

Lucas Perry: You’ve said something like, “Put your hand in fire to know what bad is.” That seems to disclose or seems to argue about the self intimating nature of suffering about how it’s clearly bad. If you don’t believe me, go and do the suffering things. From other moral realists who I’ve talked to and who argued for moral realism, like Peter Singer, they make similar arguments. What view or theory of consciousness are you most partial to? And how does this inform this perspective about the self intimating nature of suffering as being a bad thing?

Sam Harris: Well, I’m a realist with respect to morality and consciousness in the sense that I think it’s possible not to know what you’re missing. So if you’re a realist, the property that makes the most sense to me is that there are facts about the world that are facts whether or not anyone knows them. It is possible for everyone to be wrong about something. We could all agree about X and be wrong. That’s the realist position as opposed to pragmatism or some other variant, where it’s all just a matter, it’s all a language game, and the truth value of a statement is just the measure of the work it does in conversation. So with respect to consciousness, I’m a realist in the sense that if a system is conscious, if a cricket is conscious, if a sea cucumber is conscious, they’re conscious whether we know it or not. For the purposes of this conversation, let’s just decide that they’re not conscious, the lights are not on in those systems.

Well, that’s a claim that we could believe, we could all believe it, but we could be wrong about it. And so the facts exceed our experience at any given moment. And so it is with morally salient facts, like the existence of suffering. If a system can be conscious whether I know it or not a system can be suffering whether I know it or not. And that system could be me in the future or in some counterfactual state. I could think I’m doing the right thing by doing X. But the truth is I would have been much happier had I done Y and I’ll never know that. I was just wrong about the consequences of living in a certain way. That’s what realism on my view entails. So the way this relates to questions of morality and good and evil and right and wrong, this is back to my analogy of the moral landscape, I think morality really is a navigation problem. There are possibilities of experience in this universe and we don’t even need the concept of morality, we don’t need the concept of right and wrong and good and evil really.

That’s shorthand for, in my view, the way we should talk about the burden that’s on us in each moment to figure out what we should do next. Where should we point ourselves across this landscape of mind and possible minds? And knowing that it’s possible to move in the wrong direction, and what does it mean to be moving in the wrong direction? Well, it’s moving in a direction where everything is getting worse and worse and everything that was good a moment ago is breaking down to no good end. You could conceive of moving down a slope on the moral landscape only to ascend some higher peak. That’s intelligible to me that we might have to all move in the direction that seems to be making things worse but it is a sacrifice worth making because it’s the only way to get to something more beautiful and more stable.

I’m not saying that’s the world we’re living in, but it certainly seems like a possible world. But this just doesn’t seem open to doubt. There’s a range of experience on offer. And, on the one end, it’s horrific and painful and all the misery is without any silver lining, right? It’s not like we learn a lot from this ordeal. No, it just gets worse and worse and worse and worse and then we die, and I call that the worst possible misery for everyone. Alright so, the worst possible misery for everyone is bad if anything is bad, if the word bad is going to mean anything, it has to apply to the worst possible misery for everyone. But now some people come in and think they’re doing philosophy when they say things like, “Well, who’s to say the worst possible misery for everyone is bad?” Or, “Should we avoid the worst possible misery for everyone? Can you prove that we should avoid it?” And I actually think those are unintelligible noises that they’re making.

You can say those words, I don’t think you can actually mean those words. I have no idea what that person actually thinks they’re saying. You can play a language game like that but when you actually look at what the words mean, “the worst possible misery for everyone,” to then say, “Well, should we avoid it?” In a world where you should do anything, where the word should make sense, there’s nothing that you should do more than avoid the worst possible misery for everyone. By definition, it’s more fundamental than the concept of should. What I would argue is if you’re hung up on the concept of should, and you’re taken in by Hume’s flippant and ultimately misleading paragraph on, “You can’t get an ought from an is,” you don’t need oughts then. There is just this condition of is. There’s a range of experience on offer, and the one end it is horrible, on the other end, it is unimaginably beautiful.

And we clearly have a preference for one over the other, if we have a preference for anything. There is no preference more fundamental than escaping the worst possible misery for everyone. If you doubt that, you’re just not thinking about how bad things can get. It’s incredibly frustrating. In this conversation, you’re hearing the legacy of the frustration I’ve felt in talking to otherwise smart and well educated people who think they’re on interesting philosophical ground in doubting whether we should avoid the worst possible misery for everyone. Or that it would be good to avoid it, or perhaps it’s intelligible to have other priorities. And, again, I just think that they’re not understanding the words “worst possible misery and everyone”, they’re not letting those words and land in language cortex. And if they do, they’ll see that there is no other place to stand where you could have other priorities.

Lucas Perry: Yeah. And my brief reaction to that is, I still honestly feel confused about this. So maybe I’m in the camp of frustrating people. I can imagine other evolutionary timelines where there are minds that just optimize for the worst possible misery for everyone, just because in mind space those minds are physically possible.

Sam Harris: Well, that’s possible. We can certainly create a paperclip maximizer that is just essentially designed to make every conscious being suffer as much as it can. And that would be especially easy to do provided that intelligence wasn’t conscious. If it’s not a matter of its suffering, then yeah, we could use AGI to make things awful for everyone else. You could create a sadistic AGI that wanted everyone else to suffer and it derived immense pleasure from that.

Lucas Perry: Or immense suffering. I don’t see anything intrinsically motivating about suffering as navigating a mind necessarily away from it. Computationally, I can imagine a mind just suffering as much as possible and spreads that as much as possible. And maybe the suffering is bad in some objective sense, given consciousness realism, and that that was disclosing the intrinsic valence of consciousness in the universe. But the is-ought distinction there still seems confusing to me. Yes, suffering is bad and maybe the worst possible misery for everyone is bad, but that’s not universally motivating for all possible minds.

Sam Harris: The usual problem here is, it’s easy for me to care about my own suffering, but why should I care about the suffering of others? That seems to be the ethical stalemate that people worry about. My response there is that it doesn’t matter. You can take the view from above there and you can just say, “The universe would be better if all the sentient beings suffered less and it would be worse if they suffered more.” And if you’re unconvinced by that, you just have to keep turning the dial to separate those two more and more and more and more so that you get to the extremes. If any given sentient being can’t be moved to care about the experience of others, well, that’s one sort of world, that’s not a peak on the moral landscape. That will be a world where beings are more callous than they would otherwise be in some other corner of the universe. And they’ll bump into each other more and they’ll be more conflict and they’ll fail to cooperate in certain ways that would have opened doors to positive experiences that they now can’t have.

And you can try to use moralizing language about all of this and say, “Well, you still can’t convince me that I should care about people starving to death in Somalia.” But the reality is an inability to care about that has predictable consequences. If enough people can’t care about that then certain things become impossible and those things, if they were possible, lead to good outcomes that if you had a different sort of mind, you would enjoy. So all of this bites its own tail in an interesting way when you imagine being able to change a person’s moral intuitions. And then the question is, well, should you change those intuitions? Would it be good to change your sense of what is good? That question has an answer on the moral landscape. It has an answer when viewed as a navigation problem.

Lucas Perry: Right. But isn’t the assumption there that if something leads to a good world, then you should do it?

Sam Harris: Yes. You can even drop your notion of should. I’m sure it’s finite, but a functionally infinite number of worlds on offer and there’s ways to navigate into those spaces. And there are ways to fail to navigate into those spaces. There are ways to try and fail, and worse still, there are ways to not know what you’re missing, to not even know where you should be pointed on this landscape, which is to say, you need to be a realist here. There are experiences that are better than any experience that you are going to have and you are never going to know about them, possible experiences. And granting that, you don’t need a concept of should, should is just shorthand for how we speak with one another and try to admonish one another to be better in the future in order to cooperate better or to realize different outcomes. But it’s not a deep principle of reality.

What is a deep principle of reality is consciousness and its possibilities. Consciousness is the one thing that can’t be an illusion. Even if we’re in a simulation, even if we’re brains in vats, even if we’re confused about everything, something seems to be happening, and that seeming is the fact of consciousness. And almost as rudimentary as that is the fact that within this space of seemings, again, we don’t know what the base layer of reality is, we don’t know if our physics is the real physics, we could be confused, this could be a dream, we could be confused about literally everything except that in this space of seemings there appears to be a difference between things getting truly awful to no apparent good end and things getting more and more sublime.

And there’s potentially even a place to stand where that difference isn’t so captivating anymore. Certainly, there are Buddhists who would tell you that you can step off that wheel of opposites, ultimately. But even if you buy that, that is some version of a peak on my moral landscape. That is a contemplative peak where the difference between agony and ecstasy is no longer distinguishable because what you are then aware of is just that consciousness is intrinsically free of its content and no matter what its possible content could be. If someone can stabilize that intuition, more power to them, but then that’s the thing you should do, just to bring it back to the conventional moral framing.

Lucas Perry: Yeah. I agree with you. I’m generally a realist about consciousness and still do feel very confused, not just because of reasons in this conversation, but just generally about how causality fits in there and how it might influence our understanding of the worst possible misery for everyone being a bad thing. I’m also willing to go that far to accept that as objectively a bad thing, if bad means anything. But then I still get really confused about how that necessarily fits in with, say, decision theory or “shoulds” in the space of possible minds and what is compelling to who and why?

Sam Harris: Perhaps this is just semantic. Imagine all these different minds that have different utility functions. The paperclip maximizer wants nothing more than paperclips. And anything that reduces paperclips is perceived as a source of suffering. It has a disutility. If you have any utility function, you have this liking and not liking component provided your sentient. That’s what it is to be motivated consciously. For me, the worst possible misery for everyone is a condition where, whatever the character of your mind, every sentient mind is put in the position of maximal suffering for it. So some things like paperclips and some things hate paperclips. If you hate paperclips, we give you a lot of paperclips. If you like paperclips, we take away all your paperclips. If that’s your mind, we tune your corner of the universe to that torture chamber. You can be agnostic as to what the actual things are that make something suffer. It’s just suffering is by definition the ultimate frustration of that mind’s utility function.

Lucas Perry: Okay. I think that’s a really, really important crux and crucial consideration between us and a general point of confusion here. Because that’s the definition of what suffering is or what it means. I suspect that those things may be able to come apart. So, for you, maximum disutility and suffering are identical, but I guess I could imagine a utility function being separate or inverse from the hedonics of a mind. Maybe the utility function, which is purely a computational thing, is getting maximally satisfied, maximizing suffering everywhere, and the mind that is implementing that suffering is just completely immiserated while doing it. But the utility function, which is different and inverse from the experience of the thing, is just getting satiated and so the machine keeps driving towards maximum-suffering-world.

Sam Harris: Right, but there’s either something that is liked to be satiated in that way or there isn’t right now. If we’re talking about real conscious society, we’re talking about some higher order satisfaction or pleasure that is not suffering by my definition. We have this utility function ourselves. I mean when you take somebody who decides to climb to the summit of Mount Everest where the process almost every moment along the way is synonymous with physical pain and intermittent fear of death, torture by another name. But the whole project is something that they’re willing to train for, sacrifice for, dream about, and then talk about for the rest of their lives, and at the end of the day might be in terms of their conscious sense of what it was like to be them, the best thing they ever did in their lives.

That is this sort of bilayered utility function you’re imagining, whereas if you could just experience sample what it’s like to be in the death zone on Everest, it really sucks and if imposed on you for any other reason, it would be torture. But given the framing, given what this person believes about what they’re doing, given the view out their goggles, given their identity as a mountain climber, this is the best thing they’ve ever done. You’re imagining some version of that, but that fits in my view on the moral landscape. That’s not the worst possible misery for anyone. The source of satisfaction that is deeper than just bodily, sensory pleasure every moment of the day, or at least it seems to be for that person at that point in time. They could be wrong about that. There could be something better. They don’t know what they’re missing. It’s actually much better to not care about mountain climbing.

The truth is, your aunt is a hell of a lot happier than Sir Edmund Hillary was and Edmund Hillary was never in a position to know it because he was just so into climbing mountains. That’s where the realism comes in, in terms of you not knowing what you’re missing. But I just see any ultimate utility function, if it’s accompanied by consciousness, it can’t define itself as the ultimate frustration of its aims if its aims are being satisfied.

Lucas Perry: I see. Yeah. So this just seems to be a really important point around hedonics and computation and utility function and what drives what. So, wrapping up here, I think I would feel defeated if I let you escape without maybe giving a yes or no answer to this last question. Do you think that bliss and wellbeing can be mathematically defined?

Sam Harris: That is something I have no intuitions about it. I’m not enough of a math head to think in those terms. If we mathematically understood what it meant for us neurophysiologically in our own substrate, well then, I’m sure we can characterize it for creatures just like us. I think substrate independence makes it something that’s hard to functionally understand in new systems and it’ll just pose problems of our just knowing what it’s like to be something that on the outside seems to be functioning much like we do but is organized in a very different way. But yeah, I don’t have any intuitions around that one way or the other.

Lucas Perry: All right. And so pointing towards your social media or the best places to follow you, where should we do that?

Sam Harris: My website is just samharris.org and I’m SamHarrisorg without the dot on Twitter, and you can find anything you want about me on my website, certainly.

Lucas Perry: All right, Sam. Thanks so much for coming on and speaking about this wide range of issues. You’ve been deeply impactful in my life since I guess about high school. I think you probably partly at least motivated my trip to Nepal, where I overlooked the Pokhara Lake and reflected on your terrifying acid trip there.

Sam Harris: That’s hilarious. That’s in my book Waking Up, but it’s also on my website and it’s also I think I read it on the Waking Up App and it’s in a podcast. It’s also on Tim Ferriss’ podcast. But anyway, that acid trip was detailed in this piece called Drugs and The Meaning of Life. That’s hilarious. I haven’t been back to Pokhara since, so you’ve seen that lake more recently than I have.

Lucas Perry: So yeah, you’ve contributed much to my intellectual and ethical development and thinking, and for that, I have tons of gratitude and appreciation. And thank you so much for taking the time to speak with me about these issues today.

Sam Harris: Nice. Well, it’s been a pleasure, Lucas. And all I can say is keep going. You’re working on very interesting problems and you’re very early to the game, so it’s great to see you doing it.

Lucas Perry: Thanks so much, Sam.

FLI Podcast: On the Future of Computation, Synthetic Biology, and Life with George Church

Progress in synthetic biology and genetic engineering promise to bring advancements in human health sciences by curing disease, augmenting human capabilities, and even reversing aging. At the same time, such technology could be used to unleash novel diseases and biological agents which could pose global catastrophic and existential risks to life on Earth. George Church, a titan of synthetic biology, joins us on this episode of the FLI Podcast to discuss the benefits and risks of our growing knowledge of synthetic biology, its role in the future of life, and what we can do to make sure it remains beneficial. Will our wisdom keep pace with our expanding capabilities?

Topics discussed in this episode include:

  • Existential risk
  • Computational substrates and AGI
  • Genetics and aging
  • Risks of synthetic biology
  • Obstacles to space colonization
  • Great Filters, consciousness, and eliminating suffering

You can take a survey about the podcast here

Submit a nominee for the Future of Life Award here

 

Timestamps: 

0:00 Intro

3:58 What are the most important issues in the world?

12:20 Collective intelligence, AI, and the evolution of computational systems

33:06 Where we are with genetics

38:20 Timeline on progress for anti-aging technology

39:29 Synthetic biology risk

46:19 George’s thoughts on COVID-19

49:44 Obstacles to overcome for space colonization

56:36 Possibilities for “Great Filters”

59:57 Genetic engineering for combating climate change

01:02:00 George’s thoughts on the topic of “consciousness”

01:08:40 Using genetic engineering to phase out voluntary suffering

01:12:17 Where to find and follow George

 

Citations: 

George Church’s Twitter and website

 

This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.

All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. Today we have a conversation with Professor George Church on existential risk, the evolution of computational systems, synthetic-bio risk, aging, space colonization, and more. We’re skipping the AI Alignment Podcast episode this month, but I intend to have it resume again on the 15th of June. Some quick announcements for those unaware, there is currently a live survey that you can take about the FLI and AI Alignment Podcasts. And that’s a great way to voice your opinion about the podcast, help direct its evolution, and provide feedback for me. You can find a link for that survey on the page for this podcast or in the description section of wherever you might be listening. 

The Future of Life Institute is also in the middle of its search for the 2020 winner of the Future of Life Award. The Future of Life Award is a $50,000 prize that we give out to an individual who, without having received much recognition at the time of their actions, has helped to make today dramatically better than it may have been otherwise. The first two recipients of the Future of Life Institute award were Vasili Arkhipov and Stanislav Petrov, two heroes of the nuclear age. Both took actions at great personal risk to possibly prevent an all-out nuclear war. The third recipient was Dr. Matthew Meselson, who spearheaded the international ban on bioweapons. Right now, we’re not sure who to give the 2020 Future of Life Award to. That’s where you come in. If you know of an unsung hero who has helped to avoid global catastrophic disaster, or who has done incredible work to ensure a beneficial future of life, please head over to the Future of Life Award page and submit a candidate for consideration. The link for that page is on the page for this podcast or in the description of wherever you might be listening. If your candidate is chosen, you will receive $3,000 as a token of our appreciation. We’re also incentivizing the search via MIT’s successful red balloon strategy, where the first to nominate the winner gets $3,000 as mentioned, but there are also tiered pay outs to the person who invited the nomination winner, and so on. You can find details about that on the page. 

George Church is Professor of Genetics at Harvard Medical School and Professor of Health Sciences and Technology at Harvard and MIT. He is Director of the U.S. Department of Energy Technology Center and Director of the National Institutes of Health Center of Excellence in Genomic Science. George leads Synthetic Biology at the Wyss Institute, where he oversees the directed evolution of molecules, polymers, and whole genomes to create new tools with applications in regenerative medicine and bio-production of chemicals. He helped initiate the Human Genome Project in 1984 and the Personal Genome Project in 2005. George invented the broadly applied concepts of molecular multiplexing and tags, homologous recombination methods, and array DNA synthesizers. His many innovations have been the basis for a number of companies including Editas, focused on gene therapy, Gen9bio, focused on Synthetic DNA, and Veritas Genetics, which is focused on full human genome sequencing. And with that, let’s get into our conversation with George Church.

So I just want to start off here with a little bit of a bigger picture about what you care about most and see as the most important issues today.

George Church: Well, there’s two categories of importance. One are things that are very common and so affect many people. And then there are things that are very rare but very impactful nevertheless. Those are my two top categories. They weren’t when I was younger. I didn’t consider either of them that seriously. So examples of very common things are age-related diseases, infectious diseases. They can affect all 7.7 billion of us. Then on the rare end would be things that could wipe out all humans or all civilization or all living things, asteroids, supervolcanoes, solar flares, and engineered or costly natural pandemics. So those are things that I think are very important problems. Then we have had the research to enhance wellness and minimize those catastrophes. The third category or somewhat related to those two which is things we can do to say get us off the planet, so things would be highly preventative from total failure.

Lucas Perry: So in terms of these three categories, how do you see the current allocation of resources worldwide and how would you prioritize spending resources on these issues?

George Church: Well the current allocation of resources is very different from the allocations that I would set for my own research goals and what I would set for the world if I were in charge, in that there’s a tendency to be reactive rather than preventative. And this applies to both therapeutics versus preventatives and the same thing for environmental and social issues. All of those, we feel like it somehow makes sense or is more cost-effective, but I think it’s an illusion. It’s far more cost-effective to do many things preventatively. So, for example, if we had preventatively had a system of extensive testing for pathogens, we could probably save the world trillions of dollars on one disease alone with COVID-19. I think the same thing is true for global warming. A little bit of preventative environmental engineering for example in the Arctic where relatively few people would be directly engaged, could save us disastrous outcomes down the road.

So I think we’re prioritizing a very tiny fraction for these things. Aging and preventative medicine is maybe a percent of the NIH budget, and each institute sets aside about a percent to 5% on preventative measures. Gene therapy is another one. Orphan drugs, very expensive therapies, millions of dollars per dose versus genetic counseling which is now in the low hundreds, soon will be double digit dollars per lifetime.

Lucas Perry: So in this first category of very common widespread issues, do you have any other things there that you would add on besides aging? Like aging seems to be the kind of thing in culture where it’s recognized as an inevitability so it’s not put on the list of top 10 causes of death. But lots of people who care about longevity and science and technology and are avant-garde on these things would put aging at the top because they’re ambitious about reducing it or solving aging. So are there other things that you would add to that very common widespread list, or would it just be things from the top 10 causes of mortality?

George Church: Well infection was the other one that I included in the original list in common diseases. Infectious diseases are not so common in the wealthiest parts of the world, but they are still quite common worldwide, HIV, TB, malaria are still quite common, millions of people dying per year. Nutrition is another one that tends to be more common in the four parts of the world that still results in death. So the top three would be aging-related.

And even if you’re not interested in longevity and even if you believe that aging is natural, in fact some people think that infectious diseases and nutritional deficiencies are natural. But putting that aside, if we’re attacking age-related diseases, we can use preventative medicine and aging insights into reducing those. So even if you want to neglect longevity that’s unnatural, if you want to address heart disease, strokes, lung disease, falling down, infectious disease, all of those things might be more easily addressed by aging studies and therapies and preventions than by a frontal assault on each micro disease one at a time.

Lucas Perry: And in terms of the second category, existential risk, if you were to rank order the likelihood and importance of these existential and global catastrophic risks, how would you do so?

George Church: Well you can rank their probability based on past records. So, we have some records of supervolcanoes, solar activity, and asteroids. So that’s one way of calculating probability. And then you can also calculate the impact. So it’s a product, the probability and impact for the various kinds of recorded events. I mean I think they’re similar enough that I’m not sure I would rank order those three.

And then pandemics, whether natural or human-influenced, probably a little more common than those first three. And then climate change. There are historic records but it’s not clear that they’re predictive. The probability of an asteroid hitting probably is not influenced by human presence, but climate change probably is and so you’d need a different model for that. But I would say that that is maybe the most likely of the lot for having an impact.

Lucas Perry: Okay. The Future of Life Institute, the things that we’re primarily concerned about in terms of this existential risk category would be the risks from artificial general intelligence and superintelligence, also synthetic bio-risk coming up in the 21st century more and more, and then accidental nuclear war would also be very bad, maybe not an existential risk. That’s arguable. Those are sort of our central concerns in terms of the existential risk category.

Relatedly the Future of Life Institute sees itself as a part of the effective altruism community which when ranking global priorities, they have four areas of essential consideration for impact. The first is global poverty. The second is animal suffering. And the third is long-term future and existential risk issues, having to do mainly with anthropogenic existential risks. The fourth one is meta-effective altruism. So I don’t want to include that. They also tend to make the same ranking, being that mainly the long-term risks of advanced artificial intelligence are basically the key issues that they’re worried about.

How do you feel about these perspectives or would you change anything?

George Church: My feeling is that natural intelligence is ahead of artificial intelligence and will stay there for quite a while, partly because synthetic biology has a steeper slope and I’m including the enhanced natural intelligence in the synthetic biology. That has a steeper upward slope than totally inorganic computing now. But we can lump those together. We can say artificial intelligence writ large to include anything that our ancestors didn’t have in terms of intelligence, which could include enhancing our own intelligence. And I think especially should include corporate behavior. Corporate behavior is a kind of intelligence which is not natural, is wide spread, and it is likely to change, mutate, evolve very rapidly, faster than human generation times, probably faster than machine generation times.

Nukes I think are aging and maybe are less attractive as a defense mechanism. I think they’re being replaced by intelligence, artificial or otherwise, or collective and synthetic biology. I mean I think that if you wanted to have mutually assured destruction, it would be more cost-effective to do that with syn-bio. But I would still keep it on the list.

So I agree with that list. I’d just like nuanced changes to where the puck is likely to be going.

Lucas Perry: I see. So taking into account and reflecting on how technological change in the short to medium term will influence how one might want to rank these risks.

George Church: Yeah. I mean I just think that a collective human enhanced intelligence is going to be much more disruptive potentially than AI is. That’s just a guess. And I think that nukes will just be part of a collection of threatening things that people do. Probably it’s more threatening to cause collapse of a electric grid or a pandemic or some other economic crash than nukes.

Lucas Perry: That’s quite interesting and is very different than the story that I have in my head, and I think will also be very different than the story that many listeners have in their heads. Could you expand and unpack your timelines and beliefs about why you think the\at collective organic intelligence will be ahead of AI? Could you say, I guess, when you would expect AI to surpass collective bio intelligence and some of the reasons again for why?

George Church: Well, I don’t actually expect silicon-based intelligence to ever bypass in every category. I think it’s already super good at storage retrieval and math. But that’s subject to change. And I think part of the assumptions have been that we’ve been looking at a Moore’s law projection while most people haven’t been looking at the synthetic biology equivalent and haven’t noticed that the Moore’s law might finally be plateauing, at least as it was originally defined. So that’s part of the reason I think for the excessive optimism, if you will, about artificial intelligence.

Lucas Perry: The Moore’s law thing has to do with hardware and computation, right?

George Church: Yeah.

Lucas Perry: That doesn’t say anything about how algorithmic efficiency and techniques and tools are changing, and the access to big data. Something we’ve talked about on this podcast before is that many of the biggest insights and jumps in deep learning and neural nets haven’t come from new techniques but have come from more massive and massive amounts of compute on data.

George Church: Agree, but those data are also available to humans as big data. I think maybe the compromise here is that it’s some hybrid system. I’m just saying that humans plus big data plus silicon-based computers, even if they stay flat in hardware is going to win over either one of them separately. So maybe what I’m advocating is hybrid systems. Just like in your brain you have different parts of your brain that have different capabilities and functionality. In a hybrid system we would have the wisdom of crowds, plus compute engines, plus big data, but available to all the parts of the collective brain.

Lucas Perry: I see. So it’s kind of like, I don’t know if this is still true, but I think at least at some point it was true, that the best teams at chess were AIs plus humans?

George Church: Correct, yeah. I think that’s still true. But I think it will become even more true if we start altering human brains, which we have a tendency to try to do already via education and caffeine and things like that. But there’s really no particular limit to that.

Lucas Perry: I think one of the things that you said was that you don’t think that AI alone will ever be better than biological intelligence in all ways.

George Church: Partly because biological intelligence is a moving target. The first assumption was that the hardware would keep improving on Moore’s law, which it isn’t. The second assumption was that we would not alter biological intelligence. There’s one moving target which was silicon and biology was not moving, when in fact biology is moving at a steeper slope both in terms of hardware and algorithms and everything else and we’re just beginning to see that. So I think that when you consider both of those, it at least sows the seed of uncertainty as to whether AI is inevitably better than a hybrid system.

Lucas Perry: Okay. So let me just share the kind of story that I have in my head and then you can say why it might be wrong. AI researchers have been super wrong about predicting how easy it would be to make progress on AI in the past. So taking predictions with many grains of salt, if you interview say the top 100 AI researchers in the world, they’ll give a 50% probability of there being artificial general intelligence by 2050. That could be very wrong. But they gave like a 90% probability of there being artificial general intelligence by the end of the century.

And the story in my head says that I expect there to be bioengineering and genetic engineering continuing. I expect there to be designer babies. I expect there to be enhancements to human beings further and further on as we get into the century in increasing capacity and quality. But there are computational and substrate differences between computers and biological intelligence like the clock speed of computers can be much higher. They can compute much faster. And then also there’s this idea about the computational architectures in biological intelligences not being privileged or only uniquely available to biological organisms such that whatever the things that we think are really good or skillful or they give biological intelligences a big edge on computers could simply be replicated in computers.

And then there is an ease of mass manufacturing compute and then emulating those systems on computers such that the dominant and preferable form of computation in the future will not be on biological wetware but will be on silicon. And for that reason at some point there’ll just be a really big competitive advantage for the dominant form of compute and intelligence and life on the planet to be silicon based rather than biological based. What is your reaction to that?

George Church: You very nicely summarized what I think is a dominant worldview of people that are thinking about the future, and I’m happy to give a counterpoint. I’m not super opinionated but I think it’s worthy of considering both because the reason we’re thinking about the future is we don’t want to be blind sighted by it. And this could be happening very quickly by the way because both revolutions are ongoing as is the merger.

Now clock speed, my guess is that clock speed may not be quite as important as energy economy. And that’s not to say that both systems, let’s call them bio and non-bio, can’t optimize energy. But if you look back at sort of the history of evolution on earth, the fastest clock speeds, like bacteria and fruit flies, aren’t necessarily more successful in any sense than humans. They might have more bio mass, but I think humans are the only species with our slow clock speed relative to bacteria that are capable of protecting all of the species by taking us to a new planet.

And clock speed is only important if you’re in a direct competition in a fairly stable environment where the fastest bacteria win. But worldwide most of the bacteria are actually very slow growers. If you look at energy consumption right now, which both of them can improve, there are biological compute systems that are arguably a million times more energy-efficient at even tasks where the biological system wasn’t designed or evolved for that task, but it can kind of match. Now there are other things where it’s hard to compare, either because of the intrinsic advantage that either the bio or the non-bio system has, but where they are sort of on the same framework, it takes 100 kilowatts of power to run say Jeopardy! and Go on a computer and the humans that are competing are using considerably less than that, depending on how you calculate all the things that is required to support the 20 watt brain.

Lucas Perry: What do you think the order of efficiency difference is?

George Church: I think it’s a million fold right now. And this largely a hardware thing. I mean there is algorithmic components that will be important. But I think that one of the advantages that bio chemical systems have is that they are intrinsically atomically precise. While Moore’s law seem to be plateauing somewhere around 3 nanometer fabrication resolution, that’s off by maybe a thousand fold from atomic resolution. So that’s one thing, that as you go out many years, they will either be converging on or merging in some ways so that you get the advantages of atomic precision, the advantages of low energy and so forth. So that’s why I think that we’re moving towards a slightly more molecular future. It may not be recognizable as either our silicon von Neumann or other computers, nor totally recognizable as a society of humans.

Lucas Perry: So is your view that we won’t reach artificial general intelligence like the kind of thing which can reason about as well as about humans across all the domains that humans are able to reason? We won’t reach that on non-bio methods of computation first?

George Church: No, I think that we will have AGI in a number of different substrates, mechanical, silicon, quantum computing. Various substrates will be able of doing artificial general intelligence. It’s just that the ones that do it in a most economic way will be the ones that we will tend to use. There’ll be some cute museum that will have a collection of all the different ways, like the tinker toy computer that did Tic Tac Toe. Well, that’s in a museum somewhere next to Danny Hillis, but we’re not going to be using that for AGI. And I think there’ll be a series of artifacts like that, that in practice it will be very pragmatic collection of things that make economic sense.

So just for example, its easier to make a copy of a biological brain. Now that’s one thing that appears to be an advantage to non-bio computers right now, is you can make a copy of even large data sets for a fairly small expenditure of time, cost, and energy. While, to educate a child takes decades and in the end you don’t have anything totally resembling the parents and teachers. I think that’s subject to change. For example, we have now storage of data in DNA form, which is about a million times denser than any comprable non-chemical, non-biological system, and you can make a copy of it for hundreds of joules of energy and pennies. So you can hold an exabyte of data in the palm of your hand and you can make a copy of it relatively easy.

Now that’s not a mature technology, but it shows where we’re going. If we’re talking 100 years, there’s no particular reason why you couldn’t have that embedded in your brain and input and output to it. And by the way, the cost of copying that is very close to the thermodynamic limit for making copies of bits, while computers are nowhere near that. They’re off by a factor of a million.

Lucas Perry: Let’s see if I get this right. Your view is that there is this computational energy economy benefit. There is this precisional element which is of benefit, and that because there are advantages to biological computation, we will want to merge the best aspects of biological computation with non-biological in order to sort of get best of both worlds. So while there may be many different AGIs on offer on different substrates, the future looks like hybrids.

George Church: Correct. And it’s even possible that silicon is not in the mix. I’m not predicting that it’s not in the mix. I’m just saying it’s possible. It’s possible that an atomically precise computer is better at quantum computing or is better at clock time or energy.

Lucas Perry: All right. So I do have a question later about this kind of thing and space exploration and reducing existential risk via further colonization which I do want to get into later. I guess I don’t have too much more to say about our different stories around here. I think that what you’re saying is super interesting and challenging in very interesting ways. I guess the only thing I would have to say is I guess I don’t know enough, but you said that the computation energy economy is like a million fold more efficient.

George Church: That’s for copying bits, for DNA. For doing complex tasks for example, Go, Jeopardy! or Einstein’s Mirabilis, those kinds of things were typically competing a 20 watt brain plus support structure with a 100 kilowatt computer. And I would say at least in the case of Einstein’s 1905 we win, even though we lose at Go and Jeopardy!, which is another interesting thing, is that humans have a great deal more of variability. And if you take the extreme values like one person in one year, Einstein in 1905 as the representative rather than the average person and the average year for that person, well, if you make two computers, they are going to likely be nearly identical, which is both a plus and a minus in this case. Now if you make Einstein in 1905 the average for humans, then you have a completely different set of goalpost for the AGI than just being able to pass a basic Turing test where you’re simulating someone of average human interest and intelligence.

Lucas Perry: Okay. So two things from my end then. First is, do you expect AGI to first come from purely non-biological silicon-based systems? And then the second thing is no matter what the system is, do you still see the AI alignment problem as the central risk from artificial general intelligence and superintelligence, which is just aligning AIs with human values and goals and intentions?

George Church: I think the further we get from human intelligence, the harder it is to convince ourselves that we can educate, and whereas the better they will be at fooling us. It doesn’t mean they’re more intelligent than us. It’s just they’re alien. It’s like a wolf can fool us when we’re out in the woods.

Lucas Perry: Yeah.

George Church: So I think that exceptional humans are as hard to guarantee that we really understand their ethics. So if you have someone who is a sociopath or high functioning autistic, we don’t really know after 20 years of ethics education whether they actually are thinking about it the same way we are, or even in compatible way to the way that we are. We being in this case neurotypicals, although I’m not sure I am one. But anyway.

I think that this becomes a big problem with AGI, and it may actually put a damper on it. Part of the assumption so far is we won’t change humans because we have to get ethics approval for changing humans. But we’re increasingly getting ethics approval for changing humans. I mean gene therapies are now approved and increasing rapidly, all kinds of neuro-interfaces and so forth. So I think that that will change.

Meanwhile, the silicon-based AGI as we approached it, it will change in the opposite direction. It will be harder and harder to get approval to do manipulations in those systems, partly because there’s risk, and partly because there’s sympathy for the systems. Right now there’s very little sympathy for them. But as you got to the point where computers haven an AGI level of say IQ of 70 or something like that for a severely mentally disabled person so it can pass the Turing test, then they should start getting the rights of a disabled person. And once they have the rights of a disabled person, that would include the right to not be unplugged and the right to vote. And then that creates a whole bunch of problems that we won’t want to address, except as academic exercises or museum specimens that we can say, hey, 50 years ago we created this artificial general intelligence, just like we went to the Moon once. They’d be stunts more than practical demonstrations because they will have rights and because it will represent risks that will not be true for enhanced human societies.

So I think more and more we’re going to be investing in enhanced human societies and less and less in the uncertain silicon-based. That’s just a guess. It’s based not on technology but on social criteria.

Lucas Perry: I think that it depends what kind of ethics and wisdom that we’ll have at that point in time. Generally I think that we may not want to take conventional human notions of personhood and apply them to things where it might not make sense. Like if you have a system that doesn’t mind being shut off, but it can be restarted, why is it so unethical to shut it off? Or if the shutting off of it doesn’t make it suffer, suffering may be some sort of high level criteria.

George Church: By the same token you can make human beings that don’t mind being shut off. That won’t change our ethics much I don’t think. And you could also make computers that do mind being shut off, so you’ll have this continuum on both sides. And I think we will have sympathetic rules, but combined with the risk, which is the risk that they can hurt you, the risk that if you don’t treat them with respect, they will be more likely to hurt you, the risk that you’re hurting them without knowing it. For example, if you have somebody with locked-in syndrome, you could say, “Oh, they’re just a vegetable,” or you could say, “They’re actually feeling more pain than I am because they have no agency, they have no ability to control their situation.”

So I think creating computers that could have the moral equivalent of locked-in syndrome or some other pain without the ability to announce their pain could be very troubling to us. And we would only overcome it if that were a solution to an existential problem or had some gigantic economic benefit. I’ve already called that into question.

Lucas Perry: So then, in terms of the first AGI, do you have a particular substrate that you imagine that coming online on?

George Church: My guess is it will probably be very close to what we have right now. As you said, it’s going to be algorithms and databases and things like that. And it will be probably at first a stunt, in the same sense that Go and Jeopardy! are stunts. It’s not clear that those are economically important. A computer that could pass the Turing test, it will make a nice chat bots and phone answering machines and things like that. But beyond that it may not change our world, unless we solve energy issues and so. So I think to answer your question, we’re so close to it now that it might be based on an extrapolation of current systems.

Quantum computing I think is maybe a more special case thing. Just because it’s good at encryption, encryption is very societal utility. I haven’t yet seen encryption described as something that’s mission critical for space flight or curing diseases, other than the social components of those. And quantum simulation may be beaten by building actual quantum systems. So for example, atomically precise systems that you can build with synthetic biology are quantum systems that are extraordinarily hard to predict, but they’re very easy to synthesize and measure.

Lucas Perry: Is your view here that if the first AGI is on the economic and computational scale of a supercomputer such that we imagine that we’re still just leveraging really, really big amounts of data and we haven’t made extremely efficient advancements and algorithms such that the efficiency jumps a lot but rather the current trends continue and it’s just more and more data and maybe some algorithmic improvements, that the first system is just really big and clunky and expensive, and then that thing can self-recursively try to make itself cheaper, and then that the direction that that would move in would be increasingly creating hardware which has synthetic bio components.

George Church: Yeah, I’d think that that already exists in a certain sense. We have a hybrid system that is self-correcting, self-improving at an alarming rate. But it is a hybrid system. In fact, it’s such a complex hybrid system that you can’t point to a room where it can make a copy of itself. You can’t even point to a building, possibly not even a state where you can make a copy of this self-modifying system because it involves humans, it involves all kinds of fab labs scattered around the globe.

We could set a goal to be able to do that, but I would argue we’re much closer to achieving that goal with a human being. You can have a room where you only can make a copy of a human, and if that is augmentable, that human can also make computers. Admittedly it would be a very primitive computer if you restricted that human to primitive supplies and a single room. But anyway, I think that’s the direction we’re going. And we’re going to have to get good at doing things in confined spaces because we’re not going to be able to easily duplicate planet Earth, probably going to have to make a smaller version of it and send it off and how big that is we can discuss later.

Lucas Perry: All right. Cool. This is quite perspective shifting and interesting, and I will want to think about this more in general going forward. I want to spend just a few minutes on this next question. I think it’ll just help give listeners a bit of overview. You’ve talked about it in other places. But I’m generally interested in getting a sense of where we currently stand with the science of genetics in terms of reading and interpreting human genomes, and what we can expect on the short to medium term horizon in human genetic and biological sciences for health and longevity?

George Church: Right. The short version is that we have gotten many factors of 10 improvement in speed, cost, accuracy, and interpretability, 10 million fold reduction in price from $3 billion for a poor quality genomic non-clinical quality sort of half a genome in that each of us have two genomes, one from each parent. So we’ve gone from $3 billion to $300. It will probably be $100 by the middle of year, and then will keep dropping. There’s no particular second law of thermodynamics or Heisenberg stopping us, at least for another million fold. That’s where we are in terms of technically being able to read and for that matter write DNA.

But the interpretation certainly there are genes that we don’t know what they do, there are disease that we don’t know what causes them. There’s a great vast amount of ignorance. But that ignorance may not be as impactful as sometimes we think. It’s often said that common diseases or so called complex multi-genic diseases are off in the future. But I would reframe that slightly for everyone’s consideration, that many of these common diseases are diseases of aging. Not all of them but many, many of them that we care about. And it could be that attacking aging as a specific research program may be more effective than trying to list all the millions of small genetic changes that has small phenotypic effects on these complex diseases.

So that’s another aspect of the interpretation where we don’t necessarily have to get super good at so called polygenic risk scores. We will. We are getting better at it, but it could be in the end a lot of the things that we got so excited about precision medicine, and I’ve been one of the champions of precision medicine since before it was called that. But precision medicine has a potential flaw in it, which is it’s the tendency to work on the reactive cures for specific cancers and inherited diseases and so forth when the preventative form of it which could be quite generic and less personalized might be more cost-effective and humane.

So for example, taking inherited diseases, we have a million to multi-million dollars spent on people having inherited diseases per individual, while a $100 genetic diagnosis could be used to prevent that. And generic solutions like aging reversal or aging prevention might stop cancer more effectively than trying to stop it once it gets to metastatic stage, which there is a great deal of resources put into that. That’s my update on where genomics is. There’s a lot more that could be said.

Lucas Perry:

Yeah. As a complete lay person in terms of biological sciences, stopping aging to me sounds like repairing and cleaning up human DNA and the human genome such that information that is lost over time is repaired. Correct me if I’m wrong or explain a little bit about what the solution to aging might look like.

George Church: I think there’s two kind of closer related schools of thought which one is that there’s damage that you need to go in there and fix the way you would fix a pothole. And the other is that there’s regulation that informs the system how to fix itself. I believe in both. I tend to focus on the second one.

If you take a very young cell, say a fetal cell. It has a tendency to repair much better than an 80-year-old adult cell. The immune system of a toddler is much more capable than that of a 90-year-old. This isn’t necessarily due to damage. This is due to the epigenetic so called regulation of the system. So one cell is convinced that it’s young. I’m going to use some anthropomorphic terms here. So you can take an 80-year-old cell, actually up to 100 years is now done, reprogram it into an embryo like state through for example Yamanaka factors named after Shinya Yamanaka. And that reprogramming resets many, not all, of the features such that it now behaves like a young non-senescent cell. While you might have taken it from a 100-year-old fibroblast that would only replicate a few times before it senesced and died.

Things like that seem to convince us that aging is reversible and you don’t have to micromanage it. You don’t have to go in there and sequence the genome and find every bit of damage and repair it. The cell will repair itself.

Now there are some things like if you delete a gene it’s gone unless you have a copy of it, in which case you could copy it over. But those cells will probably die off. And the same thing happens in the germline when you’re passing from parent to kid, those sorts of things that can happen and the process of weeding them out is not terribly humane right now.

Lucas Perry: Do you have a sense or timelines on progress of aging throughout the century?

George Church: There’s been a lot of wishful thinking for centuries on this topic. But I think we have a wildly different scenario now, partly because this exponential improvement in technologies, reading and writing DNA and the list goes on and on in cell biology and so forth. So I think we suddenly have a great deal of knowledge of causes of aging and ways to manipulate those to reverse it. And I think these are all exponentials and we’re going to act on them very shortly.

We already are seeing some aging drugs, small molecules that are in clinical trials. My lab just published a combination gene therapy that will hit five different diseases of aging in mice and now it’s in clinical trials in dogs and then hopefully in a couple of years it will be in clinical trials in humans.

We’re not talking about centuries here. We’re talking about the sort of time that it takes to get things through clinical trails, which is about a decade. And a lot of stuff going on in parallel which then after one decade of parallel trials would be merging into combined trials. So a couple of decades.

Lucas Perry: All right. So I’m going to get in trouble in here if I don’t talk to you about synthetic bio risk. So, let’s pivot into that. What are your views and perspectives on the dangers to human civilization that an increasingly widespread and more advanced science of synthetic biology will pose?

George Church: I think it’s a significant risk. Getting back to the very beginning of our conversation, I think it’s probably one of the most significant existential risks. And I think that preventing it is not as easy as nukes. Not that nukes are easy, but it’s harder. Partly because it’s becoming cheaper and the information is becoming more widespread.

But it is possible. Part of it depends on having many more positive societally altruistic do gooders than do bad. It would be helpful if we could also make a big impact on poverty and diseases associated poverty and psychiatric disorders. The kind of thing that causes unrest and causes dissatisfaction is what tips the balance where one rare individual or a small team will do something that otherwise it would be unthinkable for even them. But if they’re sociopaths or they are representing a disadvantaged category of people then they feel justified.

So we have to get at some of those core things. It would also be helpful if we were more isolated. Right now we are very well mixed pot, which puts us both at risk for natural, as well as engineered diseases. So if some of us lived in sealed environments on Earth that are very similar to the sealed environments that we would need in space, that would both prepare us for going into space. And some of them would actually be in space. And so the further we are away from the mayhem of our wonderful current society, the better. If we had a significant fraction of population that was isolated, either on earth or elsewhere, it would lower the risk of all of us dying.

Lucas Perry: That makes sense. What are your intuitions about the offense/defense balance on synthetic bio risk? Like if we have 95% to 98% synthetic bio do gooders and a small percentage of malevolent actors or actors who want more power, how do you see the relative strength and weakness of offense versus defense?

George Church: I think as usual it’s a little easier to do offense. It can go back and forth. Certainly it seems easier to defend yourself from a ICBM than from something that could be spread in a cough. And we’re seeing that in spades right now. I think the fraction of white hats versus black hats is much better than 98% and it has to be. It has to be more like a billion to one. And even then it’s very risky. But yeah, it’s not easy to protect.

Now you can do surveillance so that you can restrict research as best you can, but it’s a numbers game. It’s combination of removing incentives, adding strong surveillance, whistleblowers that are not fearful of false positives. The suspicious package in the airport should be something you look at, even though most of them are not actually bombs. We should tolerate a very high rate of false positives. But yes, surveillance is not something we’re super good at it. It falls in the category of preventative medicine. And we would far prefer to do reactive, is to wait until somebody releases some pathogen and then say, “Oh, yeah, yeah, we can prevent that from happening again in the future.”

Lucas Perry: Is there a opportunity for boosting or beefing a human immune system or a public early warning detection systems of powerful and deadly synthetic bio agents?

George Church: Well so, yes is the simple answer. If we boost our immune systems in a public way — which it almost would have to be, there’d be much discussion about how to do that — then pathogens that get around those boosts might become more common. In terms of surveillance, I proposed in 2004 that we had an opportunity and still do of doing surveillance on all synthetic DNA. I think that really should be 100% worldwide. Right now it’s 80% or so. That is relatively inexpensive to fully implement. I mean the fact that we’ve done 80% already closer to this.

Lucas Perry: Yeah. So, funny enough I was actually just about to ask you about that paper that I think you’re referencing. So in 2004 you wrote A Synthetic Biohazard Non-proliferation Proposal, in anticipation of a growing dual use risk of synthetic biology, which proposed in part the sale and registry of certain synthesis machines to verified researchers. If you were to write a similar proposal today, are there some base elements of it you would consider including, especially since the ability to conduct synthetic biology research has vastly proliferated since then? And just generally, are you comfortable with the current governance of dual use research?

George Church: I probably would not change that 2004 white paper very much. Amazingly the world has not changed that much. There still are a very limited number of chemistries and devices and companies, so that’s a bottleneck which you can regulate and is being regulated by the International Gene Synthesis Consortium, IGSC. I did advocate back then and I’m still advocating that we get closer to an international agreement. Two sectors generally in the United Nations have said casually that they would be in favor of that, but we need essentially every level from the UN all the way down to local governments.

There’s really very little pushback today. There was some pushback back in 2004 where the company’s lawyers felt that they would be responsible or there would be an invasion of privacy of their customers. But I think eventually the rationale of high risk avoidance won out, so now it’s just a matter of getting full compliance.

One of these unfortunate things that the better you are at avoiding an existential risk, the less people know about it. In fact, we did so well on Y2K makes it uncertain as to whether we needed to do anything about Y2K at all, and I think hopefully the same thing will be true for a number of disasters that we avoid without most of the population even knowing how close we were.

Lucas Perry: So the main surveillance intervention here would be heavy monitoring and regulation and tracking of the synthesis machines? And then also a watch dog organization which would inspect the products of said machines?

George Church: Correct.

Lucas Perry: Okay.

George Church: Right now most of the DNA is ordered. You’ll send on the internet your order. They’ll send back the DNA. Those same principles have to apply to desktop devices. It has to get some kind of approval to show that you are qualified to make a particular DNA before the machine will make that DNA. And it has to be protected against hardware and software hacking which is a challenge. But again, it’s a numbers game.

Lucas Perry: So on the topic of biological risk, we’re currently in the context of the COVID-19 pandemic. What do you think humanity should take as lessons from COVID-19?

George Church: Well, I think the big one is testing. Testing is probably the fastest way out of it right now. The geographical locations that have pulled out of it fastest were the ones that were best at testing and isolation. If your testing is good enough, you don’t even have to have very good contact tracing, but that’s also valuable. The longer shots are cures and vaccines and those are not entirely necessary and they are long-term and uncertain. There’s no guarantee that we will come up with a cure or a vaccine. For example, HIV, TB and malaria do not have great vaccines, and most of them don’t have great stable cures. HIV is a full series of cures over time. But not even cures. They’re more maintenance, management.

I sincerely hope that coronavirus is not in that category of HIV, TB, and malaria. But we can’t do public health based on hopes alone. So testing. I’ve been requesting a bio weather map and working towards improving the technology to do so since around 2002, which was before the SARS 2003, as part of the inspiration for the personal genome project, was this bold idea of bio weather map. We should be at least as interested in what biology is doing geographically as we are about what the low pressure fronts are doing geographically. It could be extremely inexpensive, certainly relative to the multi-trillion dollar cost for one disease.

Lucas Perry: So given the ongoing pandemic, what has COVID-19 demonstrated about human global systems in relation to existential and global catastrophic risk?

George Church: I think it’s a dramatic demonstration that we’re more fragile than we would like to believe. It’s a demonstration that we tend to be more reactive than proactive or preventative. And it’s a demonstration that we’re heterogeneous. That there are geographical reasons and political systems that are better prepared. And I would say at this point the United States is probably among the least prepared, and that was predictable by people who thought about this in advance. Hopefully we will be adequately prepared that we will not emerge from this as a third world nation. But that is still a possibility.

I think it’s extremely important to make our human systems, especially global systems more resilient. It would be nice to take as examples the countries that did the best or even towns that did the best. For example, the towns of Vo, Italy and I think Bolinas, California, and try to spread that out to the regions that did the worst. Just by isolation and testing, you can eliminate it. That sort of thing is something that we should have worldwide. To make the human systems more resilient we can alter our bodies, but I think very effective is altering our social structures so that we are testing more frequently, we’re constantly monitoring both zoonotic sources and testing bushmeat and all the places where we’re getting too close to the animals. But also testing our cities and all the environments that humans are in so that we have a higher probability of seeing patient zero before they become a patient.

Lucas Perry: The last category that you brought up at the very beginning of this podcast was preventative measures and part of that was not having all of our eggs in the same basket. That has to do with say Mars colonization or colonization of other moons which are perhaps more habitable and then eventually to Alpha Centauri and beyond. So with advanced biology and advanced artificial intelligence, we’ll have better tools and information for successful space colonization. What do you see as the main obstacles to overcome for colonizing the solar system and beyond?

George Church: So we’ll start with the solar system. Most of the solar system is not pleasant compared to Earth. It’s a vacuum and it’s cold, including Mars and many of the moons. There are moons that have more water, more liquid water than Earth, but it requires some drilling to get down to it typically. There’s radiation. There’s low gravity. And we’re not adaptive.

So we might have to do some biological changes. They aren’t necessarily germline but they’ll be the equivalent. There are things that you could do. You can simulate gravity with centrifuges and you can simulate the radiation protection we have on earth with magnetic fields and thick shielding, equivalent of 10 meters of water or dirt. But there will be a tendency to try to solve those problems. There’ll be issues of infectious disease, which ones we want to bring with us and which ones we want to quarantine away from. That’s an opportunity more than a uniquely space related problem.

A lot of the barriers I think are biological. We need to practice building colonies. Right now we have never had a completely recycled human system. We have completely recycled plant and animal systems but none that are humans, and that is partly having to do with social issues, hygiene and eating practices and so forth. I think that can be done, but it should be tested on Earth because the consequences of failure on a moon or non-earth planet is much more severe than if you test it out on Earth. We should have thousands, possibly millions of little space colonies on Earth since one of my pet projects is making that so that it’s economically feasible on Earth. Only by heavy testing at that scale will we find the real gotchas and failure modes.

And then final barrier, which is more in the category that people think about is the economies of, if you do the physics calculation how much energy it takes to raise a kilogram into orbit or out of orbit, it’s much, much less than the cost per kilogram, orders of magnitude than what we currently do. So there’s some opportunity for improvement there. So that’s in the solar system.

Outside of the solar system let’s say Proxima B, Alpha Centauri and things of that range, there’s nothing particularly interesting between here and there, although there’s nothing to stop us from occupying the vacuum of space. To get to four and a half light years either requires a revolution in propulsion and sustainability in a very small container, or a revolution in the size of the container that we’re sending.

So, one pet project that I’m working on is trying to make a nanogram size object that would contain the information sufficiently for building a civilization or at least building a communication device that’s much easier to accelerate and decelerate a nanogram than it is to do any of the scale of space probes we currently use.

Lucas Perry: Many of the issues that human beings will face within the solar system and beyond machines or synthetic computation that exist today seems more robust towards. Again, there are the things which you’ve already talked about like the computational efficiency and precision for self-repair and other kinds of things that modern computers may not have. So I think just a little bit of perspective on that would be useful, like why we might not expect that machines would take the place of humans in many of these endeavors.

George Church: Well, so for example, we would be hard pressed to even estimate, I haven’t seen a good estimate yet, of a self-contained device that could make a copy of itself from dirt or whatever, the chemicals that are available to it on a new planet. But we do know how to do that with humans or hybrid systems.

Here’s a perfect example of a hybrid system. Is a human can’t just go out into space. It needs a spaceship. A spaceship can’t go out into space either. It needs a human. So making a replicating system seems like a good idea, both because we are replicating systems and it lowers the size of the package you need to send. So if you want to have a million people in the Alpha Centauri system, it might be easier just to send a few people and a bunch of frozen embryos or something like that.

Sending a artificial general intelligence is not sufficient. It has to also be able to make a copy of itself, which I think is a much higher hurdle than just AGI. I think AGI, we will achieve before we achieve AGI plus replication. It may not be much before, it will be probably be before.

In principle, a lot of organisms, including humans, start from single cells and mammals tend to need more support structure than most other vertebrates. But in principle if you land a vertebrate fertilized egg in an aquatic environment, it will develop and make copies of itself and maybe even structures.

So my speculation is that there exist a nanogram cell that’s about the size of a lot of vertebrate eggs. There exists a design for a nanogram that would be capable of dealing with a wide variety of harsh environments. We have organisms that thrive everywhere between the freezing point of water and the boiling point or 100 plus degrees at high pressure. So you have this nanogram that is adapted to a variety of different environments and can reproduce, make copies of itself, and built into it is a great deal of know-how about building things. The same way that building a nest is built into a bird’s DNA, you could have programmed into an ability to build computers or a radio or laser transmitters so it could communicate and get more information.

So a nanogram could travel at close the speed of light and then communicate at close the speed of light once it replicates. I think that illustrates the value of hybrid systems, within this particular case a high emphasis on the biochemical, biological components that’s capable of replicating as the core thing that you need for efficient transport.

Lucas Perry: If your claim about hybrid systems is true, then if we extrapolate it to say the deep future, then if there’s any other civilizations out there, then the form in which we will meet them will likely also be hybrid systems.

And this point brings me to reflect on something that Nick Bostrom talks about, the great filters which are supposed points in the evolution and genesis of life throughout the cosmos that are very difficult for life to make it through those evolutionary leaps, so almost all things don’t make it through the filter. And this is hypothesized to be a way of explaining the Fermi paradox, why is it that there are hundreds of billions of galaxies and we don’t see any alien superstructures or we haven’t met anyone yet?

So, I’m curious to know if you have any thoughts or opinions on what the main great filters to reaching interstellar civilization might be?

George Church: Of all the questions you’ve asked, this is the one where i’m most uncertain. I study among other things how life originated, in particular how we make complex biopolymers, so ribosomes making proteins for example, the genetic code. That strikes me as a pretty difficult thing to have arisen. That’s one filter. Maybe much earlier than many people would think.

Another one might be lack of interest that once you get to a certain level of sophistication, you’re happy with your life, your civilization, and then typically you’re overrun by someone or something that is more primitive from your perspective. And then they become complacent, and the cycle repeats itself.

Or the misunderstanding of resources. I mean we’ve seen a number of island civilizations that have gone extinct because they didn’t have a sustainable ecosystem, or they might turn inward. You know, like Easter Island, they got very interested in making statutes and tearing down trees in order to do that. And so they ended up with an island that didn’t have any trees. They didn’t use those trees to build ships so they could populate the rest of the planet. They just miscalculated.

So all of those could be barriers. I don’t know which of them it is. There probably are many planets and moons where if we transplanted life, it would thrive there. But it could be that just making life in the first place is hard and then making intelligence and civilizations that care to grow outside of their planet. It might be hard to detect them if they’re growing in a subtle way.

Lucas Perry: I think the first thing you brought up might be earlier than some people expect, but I think for many people thinking about great filters it is not like abiogenesis, if that’s the right word, seems really hard getting the first self-replicating things in the ancient oceans going. There seemed to be loss of potential filters from there to multi-cellular organisms and then general intelligences like people and beyond.

George Church: But many empires have just become complacent and they’ve been overtaken by perfectly obvious technology that they could’ve at least kept up with by spying, if not by invention. But they became complacent. They seem to plateau at roughly the same place. We’re plateauing more or less the same place the Easter Islanders and the Roman Empire plateaued. Today I mean the slight differences that we are maybe space faring civilization now.

Lucas Perry: Barely.

George Church: Yeah.

Lucas Perry: So, climate change has been something that you’ve been thinking about a bunch it seems. You have the Woolly Mammoth Project which we don’t need to necessarily get into here. But are you considering or are you optimistic about other methods of using genetic engineering for combating climate change?

George Church: Yeah, I think genetic engineering has potential. Most of the other things we talk about putting in LEDs or slightly more efficient car engines, solar power and so forth. And these are slowing down the inevitable rather than reversing it. To reverse it we need to take carbon out of the air, and a really, great way to do that is with photosynthesis, partly because it builds itself. So if we just allow the Arctic to do the photosynthesis the way it used to, we could get a net loss of carbon dioxide from the atmosphere and put it into the ground rather than releasing a lot.

That’s part of the reason that I’m obsessed with Arctic solutions and the Arctic Ocean is also similar. It’s the place where you get upwelling of nutrients, and so you get a natural, very high rate of carbon fixation. It’s just you also have a high rate of carbon consumption back into carbon dioxide. So if you could change that cycle a little bit. So that I think both Arctic land and ocean is a very good place to reverse carbon and accumulation in the atmosphere, and I think that that is best done with synthetic biology.

Now the barriers have historically been release of recombinant DNA into the wild. We now have salmon which are essentially in the wild, the humans that are engineered that are in the wild, and we have golden rice is now finally after more than a decade of tussle being used in the Philippines.

So I think we’re going to see more and more of that. To some extent even the plants of agriculture are in the wild. This is one of the things that was controversial, was that the pollen was going all over the place. But I think there’s essentially zero examples of recombinant DNA causing human damage. And so we just need to be cautious about our environmental decision making.

Lucas Perry: All right. Now taking kind of a sharp pivot here. In the philosophy of consciousness there is a distinction between the hard problem of consciousness and the easy problem. The hard problem is why is it that computational systems have something that it is like to be that system? Why is there a first person phenomenal perspective and experiential perspective filled with what one might call qualia. Some people reject the hard problem as being an actual thing and prefer to say that consciousness is an illusion or is not real. Other people are realists about consciousness and they believe phenomenal consciousness is substantially real and is on the same ontological or metaphysical footing as other fundamental forces of nature, or that perhaps consciousness discloses the intrinsic nature of the physical.

And then the easy problems are how is that we see, how is that light enters the eyes and gets computed, how is it that certain things are computationally related to consciousness?

David Chalmers calls another problem here, the meta problem of consciousness, which is why is it that we make reports about consciousness? Why is that we even talk about consciousness? Particularly if it’s an illusion? Maybe it’s performing some kind of weird computational efficiency. And if it is real, there seems to be some tension between the standard model of physics, being pretty complete feeling, and then how is it that we would be making reports about something that doesn’t have real causal efficacy if there’s nothing real to add to the standard model?

Now you have the Human Connectome Project which would seem to help a lot with the easy problems of consciousness and maybe might have something to say about the meta problem. So I’m curious to know if you have particular views on consciousness or how the Human Connectome Project might relate to that interest?

George Church: Okay. So I think that consciousness is real and it has selective advantage. Part of reality to a biologist is evolution, and I think it’s somewhat coupled to free will. I think of them as even though they are real and hard to think about, they may be easier than we often lay on, and this is when you think of it from an evolutionary standpoint or also from a simulation standpoint.

I can really only evaluate consciousness and the qualia by observations. I can only imagine that you have something similar to what I feel by what you do. And from that standpoint it wouldn’t be that hard to make a synthetic system that displayed consciousness that would be nearly impossible to refute. And as that system replicated and took on a life of its own, let’s say it’s some hybrid biological, non-biological system that displays consciousness, to really convincingly display consciousness it would also have to have some general intelligence or at least pass the Turing test.

But it would have evolutionary advantage in that it could think or could reason about itself. It recognizes the difference between itself and something else. And this has been demonstrated already in robots. There are admittedly kind of proof of concept demos. Like you have robots that can tell themselves in a reflection in a mirror from other people to operate upon their own body by removing dirt from their face, which is only demonstrated in a handful of animal species and recognize their own voice.

So you can see how these would have evolutionary advantages and they could be simulated to whatever level of significance is necessarily to convince an objective observer that they are conscious as far as you know, to the same extent that I know that you are.

So I think the hard problem is a worthy one. I think it is real. It has evolutionary consequences. And free will is related in that free will I think is a game theory which is if you behave in a completely deterministic predictable way, all the organisms around you have an advantage over you. They know that you are going to do a certain thing and so they can anticipate that, they can steal your food, they can bite you, they can do whatever they want. But if you’re unpredictable, which is essentially free will, in this case it can be a random number generator or dice, you now have a selective advantage. And to some extent you could have more free will than the average human, though the average human is constrained by all sorts of social mores and rules and laws and things like that, that something with more free will might not be.

Lucas Perry: I guess I would just want to tease a part self-consciousness from consciousness in general. I think that one can have a first person perspective without having a sense of self or being able to reflect on one’s own existence as a subject in the world. I also feel a little bit confused about why consciousness would provide an evolutionary advantage, where consciousness is the ability to experience things, I guess I have some intuitions about it not being causal like having causal efficacy because the standard model doesn’t seem to be missing anything essentially.

And then your point on free will makes sense. I think that people mean very different things here. I think within common discourse, there is a much more spooky version of free will which we can call libertarian free will, which says that you could’ve done otherwise and it’s more closely related to religion and spirituality, which I reject and I think most people listening to this would reject. I just wanted to point that out. Your take on free will makes sense and is the more scientific and rational version.

George Church: Well actually, I could say they could’ve done otherwise. If you consider that religious, that is totally compatible with flipping the coin. That helps you do otherwise. If you could take the same scenario, you could do something differently. And that ability to do otherwise is of selective advantage. As indeed religions can be of a great selective advantage in certain circumstances.

So back to consciousness versus self-consciousness, I think they’re much more intertwined. I’d be cautious about trying to disentangle them too much. I think your ability to reason about your own existence as being separate from other beings is very helpful for say self-grooming, for self-protection, so forth. And I think that maybe consciousness that is not about oneself may be a byproduct of that.

The greater your ability to reason about yourself versus others, your hand versus the piece of wood in your hands makes you more successful. Even if you’re not super intelligent, just the fact that you’re aware that you’re different from the entity that you’re competing with is a advantage. So I find it not terribly useful to make a giant rift between consciousness and self-consciousness.

Lucas Perry: Okay. So I’m becoming increasingly mindful of your time. We have five minutes left here so I’ve just got one last question for you and I need just a little bit to set it up. You’re vegan as far as I understand.

George Church: Yes.

Lucas Perry: And the effective altruism movement is particularly concerned with animal suffering. We’ve talked a lot about genetic engineering and its possibilities. David Pearce has written something called The Hedonistic Imperative which outlines a methodology and philosophy for using genetic engineering for voluntarily editing out suffering. So that can be done both for wild animals and it could be done for the human species and our descendants.

So I’m curious to know what your view is on animal suffering generally in the world, and do you think about or have thoughts on genetic engineering for wild animal suffering in places outside of human civilization? And then finally, do you view a role for genetic engineering and phasing out human suffering, making it biologically impossible by re-engineering people to operate on gradients of intelligent bliss?

George Church: So I think this kind of difficult problem, a technique that I employ is I imagine what this would be like on another planet and in the future, and whether given that imagined future, we would be willing to come back to where we are now. Rather than saying whether we’re willing to go forward, they ask whether you’re willing to come back. Because there’s a great deal of appropriate respect for inertia and the way things have been. Sometimes it’s called natural, but I think natural includes the future and everything that’s manmade, as well, we’re all part of nature. So I think it’s more of the way things were. So if you go to the future and ask whether we’d be willing to come back is a different way of looking.

I think in going to another planet, we might want to take a limited set of organisms with us, and we might be tempted to make them so that they don’t suffer, including humans. There is a certain amount of let’s say pain which could be a little red light going off on your dashboard. But the point of pain is to get your attention. And you could reframe that. People are born with chronic insensitivity to pain, CIPA, genetically, and they tend to get into problems because they will chew their lips and other body parts and get infected, or they will jump from high places because it doesn’t hurt and break things they shouldn’t break.

So you need some kind of alarm system that gets your attention that cannot be ignored. But I think it could be something that people would complain about less. It might even be more effective because you could prioritize it.

I think there’s a lot of potential there. By studying people that have chronic insensitivity to pain, you could even make that something you could turn on and off. SCNA9 for example is a channel in human neuro system that doesn’t cause the dopey effects of opioids. You can be pain-free without being compromised intellectually. So I think that’s a very promising direction to think about this problem.

Lucas Perry: Just summing that up. You do feel that it is technically feasible to replace pain with some other kind of informationally sensitive thing that could have the same function for reducing and mitigating risk and signaling damage?

George Church: We can even do better. Right now we’re unaware of certain physiological states can be quite hazardous and we’re blind to for example all the pathogens in the air around us. These could be new signaling. It wouldn’t occur to me to make every one of those painful. It would be better just to see the pathogens and have little alarms that go off. It’s much more intelligent.

Lucas Perry: That makes sense. So wrapping up here, if people want to follow your work, or follow you on say Twitter or other social media, where is the best place to check out your work and to follow what you do?

George Church: My Twitter is @geochurch. And my website is easy to find just by google, but it’s arep.med.harvard.edu. Those are two best places.

Lucas Perry: All right. Thank you so much for this. I think that a lot of the information you provided about the skillfulness and advantages of biology and synthetic computation will challenge many of the intuitions of our usual listeners and people in general. I found this very interesting and valuable, and yeah, thanks so much for coming on.

George Church: Okay. Great. Thank you.

FLI Podcast: On Superforecasting with Robert de Neufville

Essential to our assessment of risk and ability to plan for the future is our understanding of the probability of certain events occurring. If we can estimate the likelihood of risks, then we can evaluate their relative importance and apply our risk mitigation resources effectively. Predicting the future is, obviously, far from easy — and yet a community of “superforecasters” are attempting to do just that. Not only are they trying, but these superforecasters are also reliably outperforming subject matter experts at making predictions in their own fields. Robert de Neufville joins us on this episode of the FLI Podcast to explain what superforecasting is, how it’s done, and the ways it can help us with crucial decision making. 

Topics discussed in this episode include:

  • What superforecasting is and what the community looks like
  • How superforecasting is done and its potential use in decision making
  • The challenges of making predictions
  • Predictions about and lessons from COVID-19

You can take a survey about the podcast here

Submit a nominee for the Future of Life Award here

 

Timestamps: 

0:00 Intro

5:00 What is superforecasting?

7:22 Who are superforecasters and where did they come from?

10:43 How is superforecasting done and what are the relevant skills?

15:12 Developing a better understanding of probabilities

18:42 How is it that superforecasters are better at making predictions than subject matter experts?

21:43 COVID-19 and a failure to understand exponentials

24:27 What organizations and platforms exist in the space of superforecasting?

27:31 Whats up for consideration in an actual forecast

28:55 How are forecasts aggregated? Are they used?

31:37 How accurate are superforecasters?

34:34 How is superforecasting complementary to global catastrophic risk research and efforts?

39:15 The kinds of superforecasting platforms that exist

43:00 How accurate can we get around global catastrophic and existential risks?

46:20 How to deal with extremely rare risk and how to evaluate your prediction after the fact

53:33 Superforecasting, expected value calculations, and their use in decision making

56:46 Failure to prepare for COVID-19 and if superforecasting will be increasingly applied to critical decision making

01:01:55 What can we do to improve the use of superforecasting?

01:02:54 Forecasts about COVID-19

01:11:43 How do you convince others of your ability as a superforecaster?

01:13:55 Expanding the kinds of questions we do forecasting on

01:15:49 How to utilize subject experts and superforecasters

01:17:54 Where to find and follow Robert

 

Citations: 

The Global Catastrophic Risk Institute

NonProphets podcast

Robert’s Twitter and his blog Anthropocene

If you want to try making predictions, you can try Good Judgement Open or Metaculus

 

This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.

All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. Today we have a conversation with Robert de Neufville about superforecasting. But, before I get more into the episode I have two items I’d like to discuss. The first is that the Future of Life Institute is looking for the 2020 recipient of the Future of Life Award. For those not familiar, the Future of Life Award is a $50,000 prize that we give out to an individual who, without having received much recognition at the time of their actions, has helped to make today dramatically better than it may have been otherwise. The first two recipients were Vasili Arkhipov and Stanislav Petrov, two heroes of the nuclear age. Both took actions at great personal risk to possibly prevent an all-out nuclear war. The third recipient was Dr. Matthew Meselson, who spearheaded the international ban on bioweapons. Right now, we’re not sure who to give the 2020 Future of Life Award to. That’s where you come in. If you know of an unsung hero who has helped to avoid global catastrophic disaster, or who have done incredible work to ensure a beneficial future of life, please head over to the Future of Life Award page and submit a candidate for consideration. The link for that page is on the page for this podcast or the description of wherever you might be listening. You can also just search for it directly. If your candidate is chosen, you will receive $3,000 as a token of our appreciation. We’re also incentivizing the search via MIT’s successful red balloon strategy, where the first to nominate the winner gets $3,000 as mentioned, but there are also tiered pay outs to the person who invited the nomination winner, and so on. You can find details about that on the page. 

The second item is that there is a new survey that I wrote about the Future of Life Institute and AI Alignment Podcasts. It’s been a year since our last survey and that one was super helpful for me understanding what’s going well, what’s not, and how to improve. I have some new questions this time around and would love to hear from everyone about possible changes to the introductions, editing, content, and topics covered. So, if you have any feedback, good or bad, you can head over to the SurveyMonkey poll in the description of wherever you might find this podcast or on the page for this podcast. You can answer as many or as little of the questions as you’d like and it goes a long way for helping me to gain perspective about the podcast, which is often hard to do from my end because I’m so close to it. 

And if you find the content and subject matter of this podcast to be important and beneficial, consider sharing it with friends, subscribing on Apple Podcasts, Spotify, or whatever your preferred listening platform, and leaving us a review. It’s really helpful for getting information on technological risk and the future of life to more people.

Regarding today’s episode, I just want to provide a little bit of context. The foundation of risk analysis has to do with probabilities. We use these probabilities and the predicted value lost if certain risks occur to calculate or estimate expected value. This in turn helps us to prioritize risk mitigation efforts to where it’s truly needed. So, it’s important that we’re able to make accurate predictions about the likelihood of future events and risk so that we can take the appropriate action to mitigate them. This is where superforecasting comes in.

Robert de Neufville is a researcher, forecaster, and futurist with degrees in government and political science from Harvard and Berkeley. He works particularly on the risk of catastrophes that might threaten human civilization. He is also a “superforecaster”, since he was among the top 2% of participants in IARPA’s Good Judgment forecasting tournament. He has taught international relations, comparative politics, and political theory at Berkeley and San Francisco State. He has written about politics for The Economist, The New Republic, The Washington Monthly, and Big Think. 

And with that, here’s my conversation with Robert de Neufville on superforecasting. 

All right. Robert, thanks so much for coming on the podcast.

Robert de Neufville: It’s great to be here.

Lucas Perry: Let’s just start off real simply here. What is superforecasting? Say if you meet someone, a friend or family member of yours asks you what you do for work. How do you explain what superforecasting is?

Robert de Neufville: I just say that I do some forecasting. People understand what forecasting is. They may not understand specifically the way I do it. I don’t love using “superforecasting” as a noun. There’s the book Superforecasting. It’s a good book and it’s kind of great branding for Good Judgment, the company, but it’s just forecasting, right, and hopefully I’m good at it and there are other people that are good at it. We have used different techniques, but it’s a little bit like an NBA player saying that they play super basketball. It’s still basketball.

But what I tell people for background is that the US intelligence community had this forecasting competition basically just to see if anyone could meaningfully forecast the future because it turns out one of the things that we’ve seen in the past is that people who supposedly have expertise in subjects don’t tend to be very good at estimating probabilities that things will happen.

So the question was, can anyone do that? And it turns out that for the most part people can’t, but a small subset of people in the tournament were consistently more accurate than the rest of the people. And just using open source information, we were able to decisively beat subject matter experts who actually that’s not a high bar. They don’t do very well. And we were also able to beat intelligence community analysts. We didn’t originally know we were going up against them, but we’re talking about forecasters in the intelligence community who had access to classified information we didn’t have access to. We were basically just using Google.

And one of the stats that we got later was that as a group we were more accurate 300 days ahead of a question being resolved than others were just a hundred days ahead. As far as what makes the technique of superforecasting sort of fundamentally distinct, I think one of the things is that we have a system for scoring our accuracy. A lot of times when people think about forecasting, people just make pronouncements. This thing will happen or it won’t happen. And then there’s no real great way of checking whether they were right. And they can also often after the fact explain away their forecast. But we make probabilistic predictions and then we use a mathematical formula that weather forecasters have used to score them. And then we can see whether we’re doing well or not well. We can evaluate and say, “Hey look, we actually outperformed these other people in this way.” And we can also then try to improve our forecasting when we don’t do well, ask ourselves why and try to improve it. So that’s basically how I explain it.

Lucas Perry: All right, so can you give me a better understanding here about who “we” is? You’re saying that the key point and where this started was this military competition basically attempting to make predictions about the future or the outcome of certain events. What are the academic and intellectual foundations of superforecasting? What subject areas would one study or did superforecasters come from? How was this all germinated and seeded prior to this competition?

Robert de Neufville: It actually was the intelligence community, although though I think military intelligence participated in this. But I mean I didn’t study to be a forecaster and I think most of us didn’t. I don’t know if there really has been a formal study that would lead you to be a forecaster. People just learn subject matter and then apply that in some way. There must be some training that people had gotten in the past, but I don’t know about it.

There was a famous study by Phil Tetlock. I think in the 90s it came out as a book called Expert Political Judgment, and he found essentially that experts were not good at this. But what he did find, he made a distinction between foxes and hedgehogs you might’ve heard. Hedgehogs are people that have one way of thinking about things, one system, one ideology, and they apply it to every question, just like the hedgehog has one trick and it’s its spines. Hedgehogs didn’t do well. If you were a Marxist or equally a dyed in the wool Milton Friedman capitalist and you applied that way of thinking to every problem, you tended not to do as well at forecasting.

But there’s this other group of people that he found did a little bit better and he called him foxes, and foxes are tricky. They have all sorts of different approaches. They don’t just come in with some dogmatic ideology. They look at things from a lot of different angles. So that was sort of the initial research that inspired him. And there’s other people that were talking about this, but it was ultimately Phil Tetlock and Barb Miller’s group that outperformed everyone else, had looked for people that were good at forecasting and they put them together in teams, and they aggregated their scores with algorithmic magic.

We had a variety of different backgrounds. If you saw any of the press initially, the big story that came out in the press was that we were just regular people. There was a lot of talk about so-and-so was a housewife and that’s true. We weren’t people that had a reputation for being great pundits or anything. That’s totally true. I think that was a little bit overblown though because it made it sound like so and so was a housewife and no one knew that she had this skill. Otherwise she was completely unremarkable. In fact, superforecasters as a group tended to be highly educated with advanced degrees. They tended to have backgrounds and they lived in a bunch of different countries.

The thing that correlates most with forecasting ability seems to be basically intelligence, performing well on measures of intelligence tests, and also I should say that a lot of very smart people aren’t good forecasters. Just being smart isn’t enough, but that’s one of the strongest predictors of forecasting ability and that’s not as good a story for journalists.

Lucas Perry: So it wasn’t crystals.

Robert de Neufville: If you do surveys of the way superforecasters think about the world, they tend not to do what you would call magical thinking. Some of us are religious. I’m not. But for the most part the divine isn’t an explanation in their forecast. They don’t use God to explain it. They don’t use things that you might consider a superstition. Maybe that seems obvious, but it’s a very rational group.

Lucas Perry: How’s superforecasting done and what kinds of models are generated and brought to bear?

Robert de Neufville: As a group, we tend to be very numeric. That’s one thing that correlates pretty well with forecasting ability. And when I say they come from a lot of backgrounds, I mean there are doctors, pharmacists, engineers. I’m a political scientist. There are actually a fair number of political scientists. Some people who are in finance or economics, but they all tend to be people who could make at least a simple spreadsheet model. We’re not all statisticians, but have at least a intuitive familiarity with statistical thinking and intuitive concept of Bayesian updating.

As far as what the approach is, we make a lot of simple models, often not very complicated models I think because often when you make a complicated model, you end up over fitting the data and drawing falsely precise conclusions, at least when we’re talking about complex, real-world political science-y kind of situations. But I would say the best guide for predicting the future, and this probably sounds obvious, best guide for what’s going to happen is what’s happened in similar situations in the past. One of the key things you do, if somebody asks you, “Will so and so when an election?” you would look back and say, “Well, what’s happened in similar elections in the past? What’s the base rate of the incumbent, for example, maybe from this party or that party winning an election, given this economy and so on?”

Now it is often very hard to beat simple algorithms that try to do the same thing, but that’s not a thing that you can just do by rote. It requires an element of judgment about what situations in the past count as similar to the situation you’re trying to ask a question about. In some ways that’s a big part of the trick is to figure out what’s relevant to the situation, trying to understand what past events are relevant, and that’s something that’s hard to teach I think because you could make a case for all sorts of things being relevant and there’s an intuitive feel that’s hard to explain to someone else.

Lucas Perry: The things that seem to be brought to bear here would be like these formal mathematical models and then the other thing would be what I think comes from Daniel Kahneman and is borrowed by the rationalist community, this idea of system one and system two thinking.

Robert de Neufville: Right.

Lucas Perry: Where system one’s, the intuitive, the emotional. We catch balls using system one. System one says the sun will come out tomorrow.

Robert de Neufville: Well hopefully the system two does too.

Lucas Perry: Yeah. System two does too. So I imagine some questions are just limited to sort of pen and paper system one, system two thinking, and some are questions that are more suitable for mathematical modeling.

Robert de Neufville: Yeah, I mean some questions are more suitable for mathematical modeling for sure. I would say though the main system we use is system two. And this is, as you say, we catch balls with some sort of intuitive reflex. It’s sort of maybe not in our prefrontal cortex. If I were trying to calculate the trajectory of a ball and tried to catch it, that would work very well. But I think most of what we’re doing when we forecast is trying to calculate something else. Often the models are really simple. It might be as simple as saying, “This thing has happened seven times in the last 50 years, so let’s start from the idea there’s a 14% chance of that thing happening again.” It’s analytical. We don’t necessarily just go with the gut and say this feels like a one in three chance.

Now that said, I think that it helps a lot and this is a problem with applying the results of our work. It helps a lot to have a good intuitive feel of probability like what one in three feels like, just a sense of how often that is. And superforecasters tend to be people who they are able to distinguish between smaller gradations of probability.

I think in general people that don’t think about this stuff very much, they have kind of three probabilities: definitely going to happen, might happen, and will never have. And there’s no finer grain distinction there. Whereas, I think superforecasters often feel like they can distinguish between 1% or 2% probabilities, the difference between 50% and 52%.

The sense of what that means I think is a big thing. If we’re going to tell a policymaker there’s a 52% chance of something happening, a big part of the problem is that policymakers have no idea what that means. They’re like, “Well, will it happen or won’t it? Oh, what do I do at number?” Right? How is that different from 50%? And I

Lucas Perry: All right, so a few things I’m interested in here. The first is I’m interested in what you have to say about what it means and how one learns how probabilities work. If you were to explain to policymakers or other persons who are interested who are not familiar with working with probabilities a ton, how one can get a better understanding of them and what that looks like. I feel like that would be interesting and helpful. And then the other thing that I’m sort of interested in getting a better understanding of is most of what is going on here seems like a lot of system two thinking, but I also would suspect and guess that many of the top superforecasters have very excellent, finely tuned system ones.

Robert de Neufville: Yeah.

Lucas Perry: Curious if you have any thoughts about these two things.

Robert de Neufville: I think that’s true. I mean, I don’t know exactly what counts as system one in the cognitive psych sense, but I do think that there is a feel that you get. It’s like practicing a jump shot or something. I’m sure Steph Curry, not that I’m Steph Curry in forecasting, but sure, Steph Curry, when he takes a shot, isn’t thinking about it at the time. He’s just practiced a lot. And by the same token, if you’ve done a lot of forecasting and thought about it and have a good feel for it, you may be able to look at something and think, “Oh, here’s a reasonable forecast. Here’s not a reasonable forecast.” I had that sense recently. When looking at FiveThirtyEight tracking COVID predictions for a bunch of subject matter experts, and they’re honestly kind of doing terribly. And part of it is that some of the probabilities are just not plausible. And that’s immediately obvious to me. And I think to other forecasters spent a lot of time thinking about it.

So I do think that without even having to do a lot of calculations or a lot of analysis, often I have a sense of what’s plausible, what’s in the right range just because of practice. When I’m watching a sporting event and I’m stressed about my team winning, for years before I started doing this, I would habitually calculate the probability of winning. It’s a neurotic thing. It’s like imposing some kind of control. I think I’m doing the same thing with COVID, right? I’m calculating probabilities all the time to make myself feel more in control. But that actually was pretty good practice for getting a sense of it.

I don’t really have the answer to how to teach that to other people except potentially the practice of trying to forecast and seeing what happens and when you’re right and when you’re wrong. Good Judgment does have some training materials that improved forecasting for people validated by research. They involve things about thinking about the base rate of things happening in the past and essentially going through sort of system two approaches, and I think that kind of thing can also really help people get a sense for it. But like anything else, there’s an element of practice. You can get better or worse at it. Well hopefully you get better.

Lucas Perry: So a risk that is 2% likely is two times more likely than a 1% chance risk. How do those feel differently to you than to me or a policymaker who doesn’t work with probabilities a ton?

Robert de Neufville: Well I don’t entirely know. I don’t entirely know what they feel like to someone else. I think I do a lot of one time in 50 that’s what 2% is and one time in a hundred that’s what 1% is. The forecasting platform we use, we only work in integer probabilities. So if it goes below half a percent chance, I’d round down to zero. And honestly I think it’s tricky to get accurate forecasting with low probability events for a bunch of reasons or even to know if you’re doing a good job because you have to do so many of them. I think about fractions often and have a sense of what something happening two times in seven might feel like in a way.

Lucas Perry: So you’ve made this point here that superforecasters are often better at making predictions than subject matter expertise. Can you unpack this a little bit more and explain how big the difference is? You recently just mentioned the COVID-19 virologists.

Robert de Neufville: Virologists, infectious disease experts, I don’t know all of them, but people whose expertise I really admire, who know the most about what’s going on and to whom I would turn in trying to make a forecast about some of these questions. And it’s not really fair because these are people often who have talked to FiveThirtyEight for 10 minutes and produced a forecast. They’re very busy doing other things, although some of them are doing modeling and you would think that they would have thought about some of these probabilities in advance. But one thing that really stands out when you look at those is they’ll give a 5% or 10% chance of something happening, which to me is virtually impossible. And I don’t think it’s their better knowledge of virology that makes them think it’s more likely. I think it’s having thought about what 5% or 10% means a lot. Well, they think it’s not very likely and they assign it, which sounds like a low number. That’s my guess. I don’t really know what they’re doing.

Lucas Perry: What’s an example of that?

Robert de Neufville: Recently there were questions about how many tests would be positive by a certain date, and they assigned a real chance, like a 5% or 10%, I don’t remember exactly the numbers, but way higher than I thought it would be for there being below a certain number of tests. And the problem with that was it would have meant essentially that all of a sudden the number of tests that were happening positive every day would drop off the cliff. Go from, I don’t know how many positive tests are a day, 27,000 in the US all of a sudden that would drop to like 2000 or 3000. And this we’re talking about forecasting like a week ahead. So really a short timeline. It just was never plausible to me that all of a sudden tests would stop turning positive. There’s no indication that that’s about to happen. There’s no reason why that would suddenly shift.

I mean maybe I can always say maybe there’s something that a virologist knows that I don’t, but I have been reading what they’re saying. So how would they think that it would go from 25,000 a day to 2000 a day over the next six days? I’m going to assign that basically a 0% chance.

Another thing that’s really striking, and I think this is generally true and it’s true to some extent of superforecasts, so we’ve had a little bit of an argument on our superforecasting platform, people are terrible at thinking about exponential growth. They really are. They really under predicted the number of cases and deaths even again like a week or two in advance because it was orders of magnitude higher than the number at the beginning of the week. But a computer, they’ve had like an algorithm to fit an exponential curve, would have had no problem doing it. Basically, I think that’s what the good forecasters did is we fit an exponential curve and said, “I don’t even need to know many of the details over the course of a week. My outside knowledge is the progression of the disease and vaccines or whatever isn’t going to make much difference.”

And like I said it’s often hard to beat a simple algorithm, but the virologists and infectious disease experts weren’t applying that simple algorithm, and it’s fair to say, well maybe some public health intervention will change the curve or something like that. But I think they were assigning way too high a probability to the exponential trends stopping. I just think it’s a failure to imagine. You know maybe the Trump administration is motivated reasoning on this score. They kept saying it’s fine. There aren’t very many deaths yet. But it’s easy for someone to project the trajectory a little bit further in the future and say, “Wow, there are going to be.” So I think that’s actually been a major policy issue too is people can’t believe the exponential growth.

Lucas Perry: There’s this tension between not trying to panic everyone in the country or you’re unsure if this is the kind of thing that’s an exponential or you just don’t really intuit how exponentials work. For the longest time, our federal government were like, “Oh, it’s just a person. There’s just like one or two people. They’re just going to get better and that will let go away or something.” What’s your perspective on that? Is that just trying to assuage the populace while they try to figure out what to do or do you think that they actually just don’t understand how exponentials work?

Robert de Neufville: I’m not confident with my theory of mind with people in power. I think one element is this idea that we need to avoid panic and I think that’s probably, they believe in good faith, that’s a thing that we need to do. I am not necessarily an expert on the role of panic in crises, but I think that that’s overblown personally. We have this image of, hey, in the movies, if there’s a disaster, all of a sudden everyone’s looting and killing each other and stuff, and we think that’s what’s going to happen. But actually often in disasters people really pull together and if anything have a stronger sense of community and help their neighbors rather than immediately go and try to steal their supplies. We did see some people fighting over toilet paper on news rolls and there are always people like that, but even this idea that people were hoarding toilet paper, I don’t even think that’s the explanation for why it was out of the stores.

If you tell everyone in the country they need two to three weeks and toilet paper right now today, yeah, of course they’re going to buy it off the shelf. That’s actually just what they need to buy. I haven’t seen a lot of panic. And I honestly am someone, if I had been an advisor to the administrations, I would have said something along the lines of “It’s better to give people accurate information so we can face it squarely than to try to sugarcoat it.”

But I also think that there was a hope that if we pretended things weren’t about to happen or that maybe they would just go away, I think that that was misguided. There seems to be some idea that you could reopen the economy and people would just die but the economy would end up being fine. I don’t think that would be worth it any way. Even if you don’t shut down, the economy’s going to be disrupted by what’s happening. So I think there are a bunch of different motivations for why governments weren’t honest or weren’t dealing squarely with this. It’s hard to know what’s not honesty and what is just genuine confusion.

Lucas Perry: So what organizations exist that are focused on superforecasting? Where or what are the community hubs and prediction aggregation mechanisms for superforecasters?

Robert de Neufville: So originally in the IARPA Forecasting Tournament, there were a bunch of different competing teams, and one of them was run by a group called Good Judgment. And that team ended up doing so well. They ended up basically taking over the later years of the tournament and it became the Good Judgment project. There was then a spinoff. Phil Tetlock and others who were involved with that spun off into something called Good Judgment Incorporated. That is the group that I work with and a lot of the superforecasters that were identified in that original tournament continue to work with Good Judgment.

We do some public forecasting and I try to find private clients interested in our forecasts. It’s really a side gig for me and part of the reason I do it is that it’s really interesting. It gives me an opportunity to think about things in a way and I feel like I’m much better up on certain issues because I’ve thought about them as forecasting questions. So there’s Good Judgment Inc. and they also have something called the Good Judgment Open. They have an open platform where you can forecast the kinds of questions we do. I should say that we have a forecasting platform. They come up with forecastable questions, but forecastable means that they’re a relatively clear resolution criteria.

But also you would be interested in knowing the answer. It wouldn’t be just some picky trivial answer. They’ll have a set resolution date so you know that if you’re forecasting something happening, it has to happen by a certain date. So it’s all very well-defined. And coming up with those questions is a little bit of its own skill. It’s pretty hard to do. So Good Judgment will do that. And they put it on a platform where then as a group we discuss the questions and give our probability estimates.

We operate to some extent in teams and they found there’s some evidence that teams of forecasters, at least good forecasters, can do a little bit better than people on their own. I find it very valuable because other forecasters do a lot of research and they critique my own ideas. There’s concerns about group think, but I think that we’re able to avoid those. I can talk about why if you want. Then there’s also this public platform called Good Judgment Open where they use the same kind of questions and anyone can participate. And they’ve actually identified some new superforecasters who participated on this public platform, people who did exceptionally well, and then they invited them to work with the company as well. There are others. I know a couple of superforecasters who are spinning off their own group. They made an app. I think it’s called Maybe, where you can do your own forecasting and maybe come up with your own questions. And that’s a neat app. There is Metaculus, which certainly tries to apply the same principles. And I know some superforecasters who forecast on Metaculus. I’ve looked at it a little bit, but I just haven’t had time because forecasting takes a fair amount of time. And then there are always prediction markets and things like that. There are a number of other things, I think, that try to apply the same principles. I don’t know enough about the space to know of all of the other platforms and markets that exist.

Lucas Perry: For some more information on the actual act of forecasting that will be put onto these websites, can you take us through something which you have forecasted recently that ended up being true? And tell us how much time it took you to think about it? And what your actual thinking was on it? And how many variables and things you considered?

Robert de Neufville: Yeah, I mean it varies widely. And to some extent it varies widely on the basis of how many times have I forecasted something similar. So sometimes we’ll forecast the change in interest rates, the fed moves. That’s something that’s obviously a lot of interest to people in finance. And at this point, I’ve looked at that kind of thing enough times that I have set ideas about what would make that likely or not likely to happen.

But some questions are much harder. We’ve had questions about mortality in certain age groups in different districts in England and I didn’t know anything about that. And all sorts of things come into play. Is the flu season likely to be bad? What’s the chance of flu season will be bad? Is there a general trend among people who are dying of complications from diabetes? Does poverty matter? How much would Brexit affect mortality chances? Although a lot of what I did was just look at past data and project trends, just basically projecting trends you can get a long way towards an accurate forecast in a lot of circumstances.

Lucas Perry: When such a forecast is made and added to these websites and the question for the thing which is being predicted resolves, what are the ways in which the websites aggregate these predictions? Or are we at the stage of them often being put to use? Or is the utility of these websites currently primarily honing the epistemic acuity of the forecasters?

Robert de Neufville: There are a couple of things. Like I hope that my own personal forecasts are potentially pretty accurate. But when we work together on a platform, we will essentially produce an aggregate, which is, roughly speaking, the median prediction. There’s some proprietary elements to it. They extremize it a little bit, I think, because once you aggregate it kind of blurs things towards the middle. They maybe weight certain forecasts and more recent forecasts differently. I don’t know the details of it. But you can improve accuracy not just by taking the median of our forecast or in a prediction market, but doing a little algorithmic tweaking they found they can improve accuracy a little bit. That’s sort of what happens with our output.

And then as far as how people use it, I’m afraid not very well. There are people who are interested in Good Judgement’s forecasts and who pay them to produce forecasts. But it’s not clear to me what decision makers do with it or if they know what to do.

I think a big problem selling forecasting is that people don’t know what to do with a 78% chance of this, or let’s say a 2% chance of a pandemic in a given year, I’m just making that up. But somewhere in that ballpark, what does that mean about how you should prepare? I think that people don’t know how to work with that. So it’s not clear to me that our forecasts are necessarily affecting policy. Although it’s the kind of thing that gets written up in the news and who knows how much that affects people’s opinions, or they talk about it at Davos and maybe those people go back and they change what they’re doing.

Certain areas, I think people in finance know how to work with probabilities a little bit better. But they also have models that are fairly good at projecting certain types of things, so they’re already doing a reasonable job, I think.

I wish it were used better. If I were the advisor to a president, I would say you should create a predictive intelligence unit using superforecasters. Maybe give them access to some classified information, but even using open source information, have them predict probabilities of certain kinds of things and then develop a system for using that in your decision making. But I think we’re a fair ways away from that. I don’t know any interest in that in the current administration.

Lucas Perry: One obvious leverage point for that would be if you really trusted this group of superforecasters. And the key point for that is just simply how accurate they are. So just generally, how accurate is superforecasting currently? If we took the top 100 superforecasters in the world, how accurate are they over history?

Robert de Neufville: We do keep score, right? But it depends a lot on the difficulty of the question that you’re asking. If you ask me whether the sun will come up tomorrow, yeah, I’m very accurate. If you asked me to predict a random number generator, but you want a 100, I’m not very accurate. And it’s hard often to know with a given question how hard it is to forecast.

I have what’s called a Brier score. Essentially a mathematical way of correlating your forecast, the probabilities you give with the outcomes. A lower Brier score essentially is a better fit. I can tell you what my Brier score was on the questions I forecasted in the last year. And I can tell you that it’s better than a lot of other people’s Brier scores. And that’s the way you know I’m doing a good job. But it’s hard to say how accurate that is in some absolute sense.

It’s like saying how good are NBA players and taking jump shots. It depends where they’re shooting from. That said, I think broadly speaking, we are the most accurate. So far, superforecasters had a number of challenges. And I mean I’m proud of this. We pretty much crushed all comers. They’ve tried to bring artificial intelligence into it. We’re still, I think as far as I know, the gold standard of forecasting. But we’re not prophets by any means. Accuracy for us is saying there’s a 15% chance of this thing in politics happening. And then when we do that over a bunch of things, yeah, 15% of them end up happening. It is not saying this specific scenario will definitely come to pass. We’re not prophets. Getting the well calibrated probabilities over a large number of forecasts is the best that we can do, I think, right now and probably in the near future for these complex political social questions.

Lucas Perry: Would it be skillful to have some sort of standardized group of expert forecasters rank the difficulty of questions, which then you would be able to better evaluate and construct a Brier score for persons?

Robert de Neufville: It’s an interesting question. I think I could probably tell you, I’m sure other forecasters could tell you which questions are relatively easier or harder to predict. Things where there’s a clear trend and there’s no good reason for it changing are relatively easy to predict. Things where small differences could make it tip into a lot of different end states are hard to predict. And I can sort of have a sense initially what those would be.

I don’t know what the advantage of ranking questions like that and then trying to do some weighted adjustment. I mean maybe you could. But the best way that I know of to really evaluate forecasting scale is to compare it with other forecasters. I’d say it’s kind of a baseline. What do you know other good forecasters come up with and what do average forecasters come up with? And can you beat prediction markets? I think that’s the best way of evaluating relative forecasting ability. But I’m not sure it’s possible that some kind of weighting would be useful in some context. I hadn’t really thought about it.

Lucas Perry: All right, so you work both as a superforecaster, as we’ve been talking about, but you also have a position at the Global Catastrophic Risk Institute. Can you provide a little bit of explanation for how superforecasting and existential and global catastrophic risk analysis are complimentary?

Robert de Neufville: What we produce at GCRI, a big part of our product is academic research. And there are a lot of differences. If I say there’s a 10% chance of something happening on a forecasting platform, I have an argument for that. I can try to convince you that my rationale is good. But it’s not the kind of argument that you would make in an academic paper. It wouldn’t convince people it was 100% right. My warrant for saying that on the forecasting platform is I have a track record. I’m good at figuring out what the correct argument is or have been in the past, but producing an academic paper is a whole different thing.

There’s some of the same skills, but we’re trying to produce a somewhat different output. What superforecasters say is an input in writing papers about catastrophic risk or existential risk. We’ll use what superforecasters think as a piece of data. That said, superforecasters are validated at doing well at certain category of political, social economic questions. And over a certain timeline, we know that we outperform others up to like maybe two years.

We don’t really know if we can do meaningful forecasting 10 years out. That hasn’t been validated. You can see why that would be difficult to do. You would have to have a long experiment to even figure that out. And it’s often hard to figure out what the right questions to ask about 2030 would be. I generally think that the same techniques we use would be useful for forecasting 10 years out, but we don’t even know that. And so a lot of the things that I would look at in terms of global catastrophic risk would be things that might happen at some distant point in the future. Now what’s the risk that there will be a nuclear war in 2020, but also over the next 50 years? It’s a somewhat different thing to do.

They’re complementary. They both involve some estimation of risk and they use some of the same techniques. But the longer term aspect … The fact that as I think I said, one of the best ways superforecasters do well is that they use the past as a guide to the future. A good rule of thumb is that the status quo is likely to be the same. There’s a certain inertia. Things are likely to be similar in a lot of ways to the past. I don’t know if that’s necessarily very useful for predicting rare and unprecedented events. There is no precedent for an artificial intelligence catastrophe, so what’s the base rate of that happening? It’s never happened. I can use some of the same techniques, but it’s a little bit of a different kind of thing.

Lucas Perry: Two people are coming to my mind of late. One is Ray Kurzweil, who has made a lot of longterm technological predictions about things that have not happened in the past. And then also curious to know if you’ve read The Precipice: Existential Risk and the Future of Humanity by Toby Ord. Toby makes specific predictions about the likelihood of existential and global catastrophic risks in that book. I’m curious if you have any perspective or opinion or anything to add on either of these two predictors or their predictions?

Robert de Neufville: Yeah, I’ve read some good papers by Toby Ord. I haven’t had a chance to read the book yet, so I can’t really comment on that. I really appreciate Ray Kurzweil. And one of the things he does that I like is that he holds himself accountable. He’s looked back and said, how accurate are my predictions? Did this come true or did that not come true? I think that is a basic hygiene point of forecasting. You have to hold yourself accountable and you can’t just go back and say, “Look, I was right,” and not rationalize whatever somewhat off forecasts you’ve made.

That said, when I read Kurzweil, I’m skeptical, maybe that’s my own inability to handle exponential change. When I look at his predictions for certain years, I think he does a different set of predictions for seven year periods. I thought, “Well, he’s actually seven years ahead.” That’s pretty good actually, if you’re predicting what things are going to be like in 2020, but you just think it’s going to be 2013. Maybe they get some credit for that. But I think that he is too aggressive and optimistic about the pace of change. Obviously exponential change can happen quickly.

But I also think another rule of thumb is that things take a long time to go through beta. There’s the planning fallacy. People always think that projects are going to take less time than they actually do. And even when you try to compensate for the planning fallacy and double the amount of time, it still takes twice as much time as you come up with. I tend to think Kurzweil sees things happening sooner than they will. He’s a little bit of a techno optimist, obviously. But I haven’t gone back and looked at all of his self evaluation. He scores himself pretty well.

Lucas Perry: So we’ve spoken a bit about the different websites. And what are they technically called, what is the difference between a prediction market and … I think Metaculus calls itself a massive online prediction solicitation and aggregation engine, which is not a prediction market. What are the differences here and how’s the language around these platforms used?

Robert de Neufville: Yeah, so I don’t necessarily know all the different distinction categories someone would make. I think a prediction market particularly is where you have some set of funds, some kind of real or fantasy money. We used one market in the Good Judgement project. Our money was called Inkles and we could spend that money. And essentially, they traded probabilities like you would trade a share. So if there was a 30% chance of something happening on the market, that’s like a price of 30 cents. And you would buy that for 30 cents and then if people’s opinions about how likely that was changed and a lot of people bought it, then we could bid up to 50% chance of happening and that would be worth 50 cents.

So if I correctly realize that something … that the market says is a 30% chance of happening, if I correctly realized that, that’s more likely, I would buy shares of that. And then eventually either other people would realize it, too, or it would happen. I should say that when things happened, then you’d get a dollar, then it’s suddenly it’s 100% chance of happening.

So if you recognize that something had a higher percent chance of happening than the market was valuing at, you could buy a share of that and then you would make money. That basically functions like a stock market, except literally what you’re trading is directly the probability of a question will answer yes or no.

The stock market’s supposed to be really efficient, and I think in some ways it is. I think prediction markets are somewhat useful. Big problem with prediction markets is that they’re not liquid enough, which is to say that a stock market, there’s so much money going around and people are really just on it to make money, that it’s hard to manipulate the prices.

There’s plenty of liquidity on the prediction markets that I’ve been a part of. Like for the one on the Good Judgement project, for example, sometimes there’d be something that would say there was like a 95% chance of it happening on the prediction market. In fact, there would be like a 99.9% chance of it happening. But I wouldn’t buy that share, even though I knew it was undervalued, because the return on investment wasn’t as high as it was on some other questions. So it would languish at this inaccurate probability, because there just wasn’t enough money to chase all the good investments.

So that’s one problem you can have in a prediction market. Another problem you can have … I see it happen with PredictIt, I think. They used to be the IO Exchange predicting market. People would try to manipulate the market for some advertising reason, basically.

Say you were working on a candidate’s campaign and you wanted to make it look like they were a serious contender, it was a cheap investment and you put a lot of money in the prediction market and you boost their chances, but that’s not really boosting their chances. That’s just market manipulation. You can’t really do that with the whole stock market, but prediction markets aren’t well capitalized, you can do that.

And then I really enjoy PredictIt. PredictIt’s one of the prediction markets that exists for political questions. They have some dispensation so that it doesn’t count as gambling in the U.S. Add it’s research purposes: is there some research involved with PredictIt. But they have a lot of fees and they use their fees to pay for the people who run the market. And it’s expensive. But the fees mean that the prices are very sticky and it’s actually pretty hard to make money. Probabilities have to be really out of whack before you can make enough money to cover your fees.

So things like that make these markets not as accurate. I also think that although we’ve all heard about the wisdom of the crowds, and broadly speaking, crowds might do better than just a random person. They can also do a lot of herding behavior that good forecasters wouldn’t do. And sometimes the crowds overreact to things. And I don’t always think the probabilities that prediction markets come up with are very good.

Lucas Perry: All right. Moving along here a bit. Continuing the relationship of superforecasting with global catastrophic and existential risk. How narrowly do you think that we can reduce the error range for superforecasts on low probability events like global catastrophic risks and existential risks? If a group of forecasters settled on a point estimate of 2% chance for some kind of global catastrophic for existential risk, but with an error range of like 1%, that dramatically changes how useful the prediction is, because of its major effects on risk. How accurate do you think we can get and how much do you think we can squish the probability range?

Robert de Neufville: That’s a really hard question. When we produce forecasts, I don’t think there’s necessarily clear error bars built in. One thing that Good Judgement will do, is it will show where forecasters all agreed the probability is 2% and then it will show if there’s actually a wide variation. I’m thinking 0%, some think it’s 4% or something like that. And that maybe tells you something. And if we had a lot of very similar forecasts, maybe you could look back and say, we tend to have an error of this much. But for the kinds of questions we look at with catastrophic risk, it might really be hard to have a large enough “n”. Hopefully it’s hard to have a large “n” where you could really compute an error range. If our aggregate spits out a probability of 2%, it’s difficult to know in advance for a somewhat unique question how far off we could be.

I don’t spend a lot of time thinking about frequentist or Bayesian interpretations or probability or counterfactuals or whatever. But at some point, if I say it has a 2% probability of something and then it happens, I mean it’s hard to know what my probability meant. Maybe we live in a deterministic universe and that was 100% going to happen and I simply failed to see the signs of it. I think that to some extent, what kind of probabilities you assign things depend on the amount of information you get.

Often we might say that was a reasonable probability to assign to something because we couldn’t get much better information. Given the information we had, that was our best estimate of the probability. But it might always be possible to know with more confidence if we got better information. So I guess one thing I would say is if you want to reduce the error on our forecasts, it would help to have better information about the world.

And that’s some extent where what I do with GCRI comes in. We’re trying to figure out how to produce better estimates. And that requires research. It requires thinking about these problems in a systematic way to try to decompose them into different parts and figure out what we can look at the past and use to inform our probabilities. You can always get better information and produce more accurate probabilities, I think.

The best thing to do would be to think about these issues more carefully. Obviously, it’s a field. Catastrophic risk is something that people study, but it’s not the most mainstream field. There’s a lot of research that needs to be done. There’s a lot of low hanging fruit, work that could easily be done applying research done in other fields, to catastrophic risk issues. But they’re just aren’t enough researchers and there isn’t enough funding to do all the work that we should do.

So my answer would be, we need to do better research. We need to study these questions more closely. That’s how we get to better probability estimates.

Lucas Perry: So if we have something like a global catastrophic or existential risk, and say a forecaster says that there’s a less than 1% chance that, that thing is likely to occur. And if this less than 1% likely thing happens in the world, how does that update our thinking about what the actual likelihood of that risk was? Given this more meta point that you glossed over about how if the universe is deterministic, then the probability of that thing was actually more like 100%. And the information existed somewhere, we just didn’t have access to that information or something. Can you add a little bit of commentary here about what these risks mean?

Robert de Neufville: I guess I don’t think it’s that important when forecasting, if I have a strong opinion about whether or not we live in a single deterministic universe where outcomes are in some sense in the future, all sort of baked in. And if only we could know everything, then we would know with a 100% chance everything that was going to happen. Or whether there are some fundamental randomness, or maybe we live in a multiverse where all these different outcomes are happening, you could say that in 30% of the universes in this multiverse, this outcome comes true. I don’t think that really matters for the most part. I do think as a practical question, we may make forecast on the basis of the best information we have, that’s all you can do. But there are some times you look back and say, “Well, I missed this. I should’ve seen this thing.” I didn’t think that Donald Trump would win the 2016 election. That’s literally my worst Brier score ever. I’m not alone in that. And I comfort myself by saying there was actually genuinely small differences made a huge impact.

But there are other forecasters who saw it better than I did. Nate Silver didn’t think that Trump was a lock, but he thought it was more likely and he thought it was more likely for the right reasons. That you would get this correlated polling error in a certain set of states that would hand Trump the electoral college. So in retrospect, I think, in that case I should’ve seen something like what Nate Silver did. Now I don’t think in practice it’s possible to know enough about an election to get in advance who’s going to win.

I think we still have to use the tools that we have, which are things like polling. In complex situations, there’s always stuff that I missed when I make a mistake and I can look back and say I should have done a better job figuring that stuff out. I do think though, with the kinds of questions we forecast, there’s a certain irreducible, I don’t want to say randomness because I’m not making a position on whether the university is deterministic, but irreducible uncertainty about what we’re realistically able to know and we have to base our forecasts on the information that’s possible to get. I don’t think metaphysical interpretation is that important to figuring out these questions. Maybe it comes up a little bit more with unprecedented one-off events. Even then I think you’re still trying to use the same information to estimate probabilities.

Lucas Perry: Yeah, that makes sense. There’s only the set of information that you have access to.

Robert de Neufville: Something actually occurs to me. One of the things that superforecaster are proud of is that we beat these intelligence analysts that had access to classified information and I think that if we had access to more information, I mean we’re doing our research on Google, right? Or maybe occasionally we’ll write a government official and get a FOIA request or something, but we’re using open source intelligence and it, I think it would probably help if we had access to more information that would inform our forecasts, but sometimes more information actually hurts you.

People have talked about a classified information bias that if you have secret information that other people don’t have, you are likely to think that is more valuable and useful than it actually is and you overweight the classified information. But if you had that secret information, I don’t know if it’s an ego thing, you want to have a different forecast than other people don’t have access to. It makes you special. You have to be a little bit careful. More information isn’t always better. Sometimes the easy to find information is actually really dispositive and is enough. And if you search for more information, you can find stuff that is irrelevant to your forecast, but think that it is relevant.

Lucas Perry: So if there’s some sort of risk and the risk occurs, after the fact how does one update what the probability was more like?

Robert de Neufville: It depends a little bit of the context. If you want to evaluate my prediction. If I say I thought there was a 30% chance of the original Brexit vote would be to leave England. That actually was more accurate than some other people, but I didn’t think it was likely. Now in hindsight, should I have said 100%. Somebody might argue that I should have, that if you’d really been paying attention, you would have known 100%.

Lucas Perry: But like how do we know it wasn’t 5% and we live in a rare world?

Robert de Neufville: We don’t. You basically can infer almost nothing from an n of 1. Like if I say there’s a 1% chance of something happening and it happens, you can be suspicious that I don’t know what I’m talking about. Even from that n of 1, but there’s also a chance that there was a 1% chance that it happened and that was the 1 time in a 100. To some extent that could be my defense of my prediction that Hillary was going to win. I should talk about my failures. The night before, I thought there was a 97% chance that Hillary would win the election and that’s terrible. And I think that that was a bad forecast in hindsight. But I will say that typically when I’ve said there’s a 97% chance of something happening, they have happened.

I’ve made more than 30-some predictions that things are going to be 97% percent likely and that’s the only one that’s been wrong. So maybe I’m actually well calibrated. Maybe that was the 3% thing that happened. You can only really judge over a body of predictions and if somebody is always saying there’s a 1% chance of things happening and they always happen, then that’s not a good forecaster. But that’s a little bit of a problem when you’re looking at really rare, unprecedented events. It’s hard to know how well someone does at that because you don’t have an n of hopefully more than 1. It is difficult to assess those things.

Now we’re in the middle of a pandemic and I think that the fact that this pandemic happened maybe should update our beliefs about how likely pandemics will be in the future. There was the Spanish flu and the Asian flu and this. And so now we have a little bit more information about the base rate, which these things happen. It’s a little bit difficult because 1918 is very different from 2020. The background rate of risk, may be very different from what it was in 1918 so you want to try to take those factors into account, but each event does give us some information that we can use for estimating the risk in the future. You can do other things. A lot of what we do as a good forecaster is inductive, right? But you can use deductive reasoning. You can, for example, with rare risks, decompose them into the steps that would have to happen for them to happen.

What systems have to fail for a nuclear war to start? Or what are the steps along the way to potentially an artificial intelligence catastrophe. And I might be able to estimate the probability of some of those steps more accurately than I estimate the whole thing. So that gives us some kind of analytic methods to estimate probabilities even without real base rate of the thing itself happening.

Lucas Perry: So related to actual policy work and doing things in the world. The thing that becomes skillful here seems to be to use these probabilities to do expected value calculations to try and estimate how much resources should be fed into mitigating certain kinds of risks.

Robert de Neufville: Yeah.

Lucas Perry: The probability of the thing happening requires a kind of forecasting and then also the value that is lost requires another kind of forecasting. What are your perspectives or opinions on superforecasting and expected value calculations and their use in decision making and hopefully someday more substantially in government decision making around risk?

Robert de Neufville: We were talking earlier about the inability of policymakers to understand probabilities. I think one issue is that a lot of times when people make decisions, they want to just say, “What’s going to happen? I’m going to plan for the single thing that’s going to happen.” But as a forecaster, I don’t know what’s going to happen. I might if I’m doing a good job, know there’s a certain percent chance that this will happen, a certain percent chance that that will happen. And in general, I think that policymakers need to make decisions over sort of the space of possible outcomes with the planning for contingencies. And I think that is a more complicated exercise than a lot of policymakers want to do. I mean I think it does happen, but it requires being able to hold in your mind all these contingencies and plan for them simultaneously. And I think that with expected value calculations to some extent, that’s what you have to do.

That gets very complicated very quickly. When we forecast questions, we might forecast some discrete fact about the world and how many COVID deaths will there be by a certain date. And it’s neat that I’m good at that, but there’s a lot that that doesn’t tell you about the state of the world at that time. There’s a lot of information that would be valuable making decisions. I don’t want to say infinite because it may be sort of technically wrong, but there is essentially uncountable amount of things you might want to know and you might not even know what the relevant questions to ask about a certain space. So it’s always going to be somewhat difficult to get an expected value calculation because you can sort of not possibly forecast all the things that might determine the value of something.

I mean, this is a little bit of a philosophical critique of consequentialist kind of analyses of things too. Like if you ask if something is good or bad, it may have an endless chain of consequences rippling throughout future history and maybe it’s really a disaster now, but maybe it means that future Hitler isn’t born. How do you evaluate that? It might seem like a silly trivial point, but the fact is it may be really difficult to know enough about the consequences of your action to an expected value calculation. So your expected value calculation may have to be kind of a approximation in a certain sense, given broad things we know these are things that are likely to happen. I still think expected value calculations are good. I just think there’s a lot of uncertainty in them and to some extent it’s probably irreducible. I think it’s always better to think about things clearly if you can. It’s not the only approach. You have to get buy-in from people and that makes a difference. But the more you can do accurate analysis about things, I think the better your decisions are likely to be.

Lucas Perry: How much faith or confidence do you have that the benefits of superforecasting and this kind of thought will increasingly be applied to critical government or non-governmental decision-making processes around risk?

Robert de Neufville: Not as much as I’d like. I think now that we know that people can do a better or worse job of predicting the future, we can use that information and it will eventually begin to be integrated into our governance. I think that that will help. But in general, you know my background’s in political science and political science is, I want to say, kind of discouraging. You learn that even under the best circumstances, outcomes of political struggles over decisions are not optimal. And you could imagine some kind of technocratic decision-making system, but even that ends up having its problems or the technocrats end up just lining their own pockets without even realizing they’re doing it or something. So I’m a little bit skeptical about it and right now what we’re seeing with the pandemic, I think we systematically underprepare for certain kinds of things, that there are reasons why it doesn’t help leaders very much to prepare for things that will never happen.

And with something like a public health crisis, the deliverable is for nothing to happen and if you succeed, it looks like all your money was wasted, but in fact you’ve actually prevented anything from happening and that’s great. The problem is that that creates an underincentive for leaders. They don’t get credit for preventing the pandemic that no one even knew could have happened and they don’t necessarily win the next election or business leaders may not improve their quarterly profits much by preparing for rare risks for that and other reasons too. I think that we’re probably… have a hard time believing cognitively that certain kinds of things that seem crazy like this could happen. I’m somewhat skeptical about that. Now I think in this case we had institutions who did prepare for this, but for whatever reason a lot of governments fail to do what was necessary.

Failed to respond quickly enough or minimize that what was happening. There are worse actors than others, right, but this isn’t a problem that’s just about the US government. This is a problem in Italy, in China, and it’s disheartening because COVID-19 is pretty much exactly one of the major scenarios that infectious disease experts have been warning about. The novel coronavirus that jumps from animals to humans that spread through some kind of respiratory pathway that’s highly infectious, that spreads asymptomatically. This is something that people worried about and knew about and in a sense it was probably only a matter of time that this was going to happen and there might be a small risk in any given year and yet we weren’t ready for it, didn’t take the steps, we lost time. It could have been used saving lives. That’s really disheartening.

I would like to see us learn a lesson from this and I think to some extent, once this is all over, whenever that is, we will probably create some institutional structures, but then we have to maintain them. We tend to forget a generation later about these kinds of things. We need to create governance systems that have more incentive to prepare for rare risks. It’s not the only thing we should be doing necessarily, but we are underprepared. That’s my view.

Lucas Perry: Yeah, and I mean the sample size of historic pandemics is quite good, right?

Robert de Neufville: Yeah. It’s not like we were invaded by aliens. Something like this happens in just about every person’s lifetime. It’s historically not that rare and this is a really bad one, but the Spanish flu and the Asian flu were also pretty bad. We should have known this was coming.

Lucas Perry: What I’m also reminded here of and some of these biases you’re talking about, we have climate change on the other hand, which is destabilizing and kind of global catastrophic risky, depending on your definition and for people who are against climate change, there seems to be A) lack in trust of science and B) then not wanting to invest in expensive technologies or something that seemed wasteful. I’m just reflecting here on all of the biases that fed into our inability to prepare for COVID.

Robert de Neufville: Well, I don’t think the distrust of science is sort of a thing that’s out there. I mean, maybe to some extent it is, but it’s also a deliberate strategy that people with interests in continuing, for example, the fossil fuel economy, have deliberately tried to cloud the issue to create distrust in science to create phony studies that make it seem that climate change isn’t real. We thought a little bit about this at GCRI about how this might happen with artificial intelligence. You can imagine that somebody with a financial interest might try to discredit the risks and make it seem safer than it is, and maybe they even believe that to some extent, nobody really wants to believe that the thing that’s getting them a lot of money is actually evil. So I think distrust in science really isn’t an accident and it’s a deliberate strategy and it’s difficult to know how to combat it. There are strategies you can take, but it’s a struggle, right? There are people who have an interest in keeping scientific results quiet.

Lucas Perry: Yeah. Do you have any thoughts then about how we could increase the uptake of using forecasting methodologies for all manner of decision making? It seems like generally you’re pessimistic about it right now.

Robert de Neufville: Yeah. I am a little pessimistic about it. I mean one thing is that I think that we’ve tried to get people interested in our forecasts and a lot of people just don’t know what to do with them. Now one thing I think is interesting is that often people, they’re not interested in my saying, “There’s a 78% chance of something happening.” What they want to know is, how did I get there? What is my arguments? That’s not unreasonable. I really like thinking in terms of probabilities, but I think it often helps people understand what the mechanism is because it tells them something about the world that might help them make a decision. So I think one thing that maybe can be done is not to treat it as a black box probability, but to have some kind of algorithmic transparency about our thinking because that actually helps people, might be more useful in terms of making decisions than just a number.

Lucas Perry: So is there anything else here that you want to add about COVID-19 in particular? General information or intuitions that you have about how things will go? What the next year will look like? There is tension in the federal government about reopening. There’s an eagerness to do that, to restart the economy. The US federal government and the state governments seem totally unequipped to do the kind of testing and contact tracing that is being done in successful areas like South Korea. Sometime in the short to medium term we’ll be open and there might be the second wave and it’s going to take a year or so for a vaccine. What are your intuitions and feelings or forecasts about what the next year will look like?

Robert de Neufville: Again, with the caveat that I’m not a virologist or not an expert in vaccine development and things like that, I have thought about this a lot. I think there was a fantasy, still is a fantasy that we’re going to have what they call a V-shape recovery that… you know everything crashed really quickly. Everyone started filing for unemployment as all the businesses shut down. Very different than other types of financial crises, this virus economics. But there was this fantasy that we would sort of put everything on pause, put the economy into some cryogenic freeze, and somehow keep people able to pay their bills for a certain amount of time. And then after a few months, we’d get some kind of therapy or vaccine or it would die down and suppress the disease somehow. And then we would just give it a jolt of adrenaline and we’d be back and everyone would be back in their old jobs and things would go back to normal. I really don’t think that is what’s going to happen. I think it is almost thermodynamically harder to put things back together than it is to break them. That there are things about the US economy in particular, the fact that in order to keep getting paid, you actually need to lose your job and go on unemployment, in many cases. It’s not seamless. It’s hard to even get through on the phone lines or to get the funding.

I think that even after a few months, the US economy is going to look like a town that’s been hit by a hurricane and we’re going to have to rebuild a lot of things. And maybe unemployment will go down faster than it did in previous recessions where it was more about a bubble popping or something, but I just don’t think that we go back to normal.

I also just don’t think we go back to normal in a broader sense. This idea that we’re going to have some kind of cure. Again, I’m not a virologist, but I don’t think we typically have a therapy that cures viruses the way you know antibiotics might be super efficacious against bacteria. Typically, viral diseases, I think are things we have to try to mitigate and some cocktail may improve treatments and we may figure out better things to do with ventilators. Well, you might get the fatality rate down, but it’s still going to be pretty bad.

And then there is this idea maybe we’ll have a vaccine. I’ve heard people who know more than I do say maybe it’s possible to get a vaccine by November. But, the problem is until you can simulate with a supercomputer what happens in the human body, you can’t really speed up biological trials. You have to culture things in people and that takes time.

You might say, well, let’s don’t do all the trials, this is an emergency. But the fact is, if you don’t demonstrate that a vaccine is safe and efficacious, you could end up giving something to people that has serious adverse effects, or even makes you more susceptible to disease. That was problem one of the SARS vaccines they tried to come up with. Originally, is it made people more susceptible. So you don’t want to hand out millions and millions of doses of something that’s going to actually hurt people, and that’s the danger if you skip these clinical trials. So it’s really hard to imagine a vaccine in the near future.

I don’t want to sell short human ingenuity because we’re really adaptable, smart creatures, and we’re throwing all our resources at this. But, there is a chance that there is really no great vaccine for this virus. We haven’t had great luck with finding vaccines for coronaviruses. It seems to do weird things to the human immune system and maybe there is evidence that immunity doesn’t stick around that long. It’s possible that we come up with a vaccine that only provides partial immunity and doesn’t last that long. And I think there is a good chance that essentially we have to keep social distancing well into 2021 and that this could be a disease that remains dangerous and we have to continue to keep fighting for years potentially.

I think that we’re going to open up and it is important to open up as soon as we can because what’s happening with the economy will literally kill people and cause famines. But on the other hand, we’re going to get outbreaks that come back up again. You know it’s going to be a like fanning coals if we open up too quickly and in some places we’re not going to get it right and that doesn’t save anyone’s life. I mean, if it starts up again and the virus disrupts the economy again. So I think this is going to be a thing we are struggling to find a balance to mitigate and that we’re not going to go back to December 2019 for a while, not this year. Literally, it may be years.

And I think that although humans have amazing capacity to forget things and go back to normal life. I think that we’re going to see permanent changes. I don’t know exactly what they are. But, I think we’re going to see permanent changes in the way we live. And I don’t know if I’m ever shaking anyone’s hands again. We’ll see about that. A whole generation of people are going to be much better at washing their hands.

Lucas Perry: Yeah. I’ve already gotten a lot better at washing my hands watching tutorials.

Robert de Neufville: I was terrible at it. I had no idea how bad I was.

Lucas Perry: Yeah, same. I hope people who have shaken my hand in the past aren’t listening. So the things that will stop this are sufficient herd immunity to some extent or a vaccine that is efficacious. Those seem like the, okay, it’s about time to go back to normal points, right?

Robert de Neufville: Yeah.

Lucas Perry: A vaccine is not a given thing given the class of coronavirus diseases and how they behave?

Robert de Neufville: Yeah. Eventually now this is where I really feel like I’m not a virologist, but eventually diseases evolve and we co-evolve with them. Whatever the Spanish Flu was, it didn’t continue to kill as many people years down the line. I think that’s because people did develop immunity.

But also, viruses don’t get any evolutionary advantage from killing their hosts. They want to use us to reproduce. Well, they don’t want anything, but that advantages them. If they kill us and make us use mitigation strategies, that hurts their ability to reproduce. So in the long run, and I don’t know how long that run is, but eventually we co-evolve with it and it becomes endemic instead of epidemic and it’s presumably not as lethal. But, I think that it is something that we could be fighting for a while.

There is chances of additional disasters happening on top of it. We could get another disease popping out of some animal population while our immune systems are weak or something like that. So we should probably be rethinking the way we interact with caves full of bats and live pangolins.

Lucas Perry: All right. We just need to be prepared for the long haul here.

Robert de Neufville: Yeah, I think so.

Lucas Perry: I’m not sure that most people understand that.

Robert de Neufville: I don’t think they do. I mean, I guess I don’t have my finger on the pulse and I’m not interacting with people anymore, but I don’t think people want to understand it. It’s hard. I had plans. I did not intend to be staying in my apartment. Having your health is more important and the health of others, but it’s hard to face that we may be dealing with a very different new reality.

This thing, the opening up in Georgia, it’s just completely insane to me. Their cases have been slowing, but if it’s shrinking, it seems to be only a little bit. To me, when they talk about opening up, it sounds like they’re saying, well, we reduced the extent of this forest fire by 15%, so we can stop fighting it now. Well, it’s just going to keep growing. But, you have to actually stamp it out or get really close to it before you can stop fighting it. I think people want to stop fighting the disease sooner than we should because it sucks. I don’t want to be doing this.

Lucas Perry: Yeah, it’s a new sad fact and there is a lot of suffering going on right now.

Robert de Neufville: Yeah. I feel really lucky to be in a place where there aren’t a lot of cases, but I worry about family members in other places and I can’t imagine what it’s like in places where it’s bad.

I mean, in Hawaii, people in the hospitality industry and tourism industry have all lost their jobs all at once and they still have to pay our super expensive rent. Maybe that’ll be waived and they won’t be evicted. But, that doesn’t mean they can necessarily get medications and feed their family. And all of these are super challenging for a lot of people.

Nevermind that other people are in the position of, they’re lucky to have jobs, but they’re maybe risking getting an infection going to work, so they have to make this horrible choice. And maybe they have someone with comorbidities or who is elderly living at home. This is awful. So I understand why people really want to get past this part of it soon.

Was it Dr. Fauci that said, “The virus has its own timeline?”

One of the things I think that this may be teaching us, it’s certainly reminding me that humans are not in charge of nature, not the way we think we are. We really dominate the planet in a lot of ways, but it’s still bigger than us. It’s like the ocean or something. You know? You may think you’re a good swimmer, but if you get a big wave, you’re not in control anymore and this is a big wave.

Lucas Perry: Yeah. So back to the point of general superforecasting. Suppose you’re a really good superforecaster and you’re finding well-defined things to make predictions about, which is, as you said, sort of hard to do and you have carefully and honestly compared your predictions to reality and you feel like you’re doing really well.

How do you convince other people that you’re a great predictor when almost everyone else is making lots of vague predictions and cherry picking their successes or their interests groups that are biasing and obscuring things to try to have a seat at the table? Or for example, if you want to compare yourself to someone else who has been keeping a careful track as well, how do you do that technically?

Robert de Neufville: I wish I knew the answer to that question. I think it is probably a long process of building confidence and communicating reasonable forecasts and having people see that they were pretty accurate. People trust something like FiveThirthyEight, Nate Silvers’, or Nick Cohen, or someone like that because they have been communicating for a while and people can now see it. They have this track record and they also are explaining how it happens, how they get to those answers. And at least a lot of people started to trust what Nate Silver says. So I think something like that really is the longterm strategy.

But, I think it’s hard because a lot of times there is always someone who is saying every different thing at any given time. And if somebody says there is definitely a pandemic going to happen, and they do it in November 2019, then a lot of people may think, “Wow, that person’s a prophet and we should listen to them.”

To my mind, if you were saying that in November of 2019, that wasn’t a great prediction. I mean, you turned out to be right, but you didn’t have good reasons for it. At that point, it was still really uncertain unless you had access to way more information than as far as I know anyone had access to.

But, you know sometimes those magic tricks where somebody throws a dart at something and happens to hit the bullseye might be more convincing than an accurate probabilistic forecast. I think that in order to sell the accurate probabilistic forecasts, you really need to build a track record of communication and build confidence slowly.

Lucas Perry: All right, that makes sense.

So on prediction markets and prediction aggregators, they’re pretty well set up to treat questions like will X happen by Y date where X is some super well-defined thing. But lots of things we’d like to know are not really of this form. So what are other useful forms of question about the future that you come across in your work and what do you think are the prospects for training and aggregating skilled human predictors to tackle them?

Robert de Neufville: What are the other forms of questions? There is always a trade off with designing question between sort of the rigor of the question, how easy it is to say whether it turned out to be true or not and how relevant it is to things you might actually want to know. Now, that’s often difficult to balance.

I think that in general we need to be thinking more about questions, so I wouldn’t say here is the different type of question that we should be answering. But rather, let’s really try to spend a lot of time thinking about the questions. What questions could be useful to answer? I think just that exercise is important.

I think things like science fiction are important where they brainstorm a possible scenario and they often fill it out with a lot of detail. But, I often think in forecasting, coming up with very specific scenarios is kind of the enemy. If you come up with a lot of things that could plausibly happen and you build it into one scenario and you think this is the thing that’s going to happen, well the more specific you’ve made that scenario, the less likely it is to actually be the exact right one.

We need to do more thinking about spaces of possible things that could happen, ranges of things, different alternatives rather than just coming up with scenarios and anchoring on them as the thing that happens. So I guess I’d say more questions and realize that at least as far as we’re able to know, I don’t know if the universe is deterministic, but at least as far as we are able to know, a lot of different things are possible and we need to think about those possibilities and potentially plan for them.

Lucas Perry: All right. And so, let’s say you had 100 professors with deep subject matter expertise in say, 10 different subjects and you had 10 superforecasters, how would you make use of all of them and on what sorts of topics would you consult, what group or combination of groups?

Robert de Neufville: That’s a good question. I think we bash on subject matter experts because they’re bad at producing probabilistic forecasts. But the fact is that I completely depend on subject matter experts. When I try to forecast what’s going to happen on the pandemic, I am reading all the virologists and infectious disease experts because I don’t know anything about this. I mean, I know I get some stuff wrong. Although, I’m in a position where I can actually ask people, hey what is this, and get their explanations for it.

But, I would like to see them working together. To some extent, having some of the subject matter experts recognize that we may know some things about estimating probabilities that they don’t. But also, the more I can communicate with people that know specific facts about things, the better the forecasts I can produce are. I don’t know what the best system for that is. I’d like to see more communication. But, I also think you could get some kind of a thing where you put them in a room or on a team together to produce forecasts.

When I’m forecasting, typically, I come up with my own forecast and then I see what other people have said. But, I do that so as not to anchor on somebody else’s opinion and to avoid groupthink. You’re more likely to get groupthink if you have a leader and a team that everyone defers to and then they all anchor on whatever the leader’s opinion is. So, I try to form my own independent opinion.

But, I think some kind of a Delphi technique where people will come up with their own ideas and then share them and then revise their ideas could be useful and you could involve subject matter experts in that. I would love to be able to just sit and talk with epidemiologist about this stuff. I don’t know if they would love it as much to talk to me and I don’t know. But I think that, that would help us collectively produce better forecasts.

Lucas Perry: I am excited and hopeful for the top few percentage of superforecasters being integrated into more decision making about key issues. All right, so you have your own podcast.

Robert de Neufville: Yeah.

Lucas Perry: If people are interested in following you or looking into more of your work at the Global Catastrophic Riss Institute, for example, or following your podcast or following you on social media, where can they do that?

Robert de Neufville: Go to the Global Catastrophic Risk Institute’s website, it’s gcrinstitute.org, so you can see and read about our work. It’s super interesting and I believe super important. We’re doing a lot of work now on artificial intelligence risk. There has been a lot of interest in that. But, we also talk about nuclear war risk and there is going to be I think a new interest in pandemic risk. So these are things that we think about. I also do have a podcast. I co-host it with two other superforecasters, which sometimes becomes sort of like a forecasting politics variety hour. But we have a good time and we do some interviews with other superforecasters and we’ve also talked to people about existential risk and artificial intelligence. That’s called NonProphets. We have a blog, nonprophetspod.wordpress.org. But Nonprophets, it’s N-O-N-P-R-O-P-H-E-T-S like prophet like someone who sees the future, because we are not prophets. However, there is also another podcast, which I’ve never listened to and feel like I should, which also has the same name. There is an atheist podcast out of Texas and atheist comedians. I apologize for taking their name, but we’re not them, so if there is any confusion. One of the things about forecasting is it’s super interesting and it’s a lot of fun, at least for people like me to think about things in this way, and there are ways like Good Judgment Open you can do it too. So we talk about that. It’s fun. And I recommend everyone get into forecasting.

Lucas Perry: All right. Thanks so much for coming on and I hope that more people take up forecasting. And it’s a pretty interesting lifelong thing that you can participate in and see how well you do over time and keep resolving over actual real world stuff. I hope that more people take this up and that it gets further and more deeply integrated into communities of decision makers on important issues.

Robert de Neufville: Yeah. Well, thanks for having me on. It’s a super interesting conversation. I really appreciate talking about this stuff.

AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah

 Topics discussed in this episode include:

  • Rohin’s and Buck’s optimism and pessimism about different approaches to aligned AI
  • Traditional arguments for AI as an x-risk
  • Modeling agents as expected utility maximizers
  • Ambitious value learning and specification learning/narrow value learning
  • Agency and optimization
  • Robustness
  • Scaling to superhuman abilities
  • Universality
  • Impact regularization
  • Causal models, oracles, and decision theory
  • Discontinuous and continuous takeoff scenarios
  • Probability of AI-induced existential risk
  • Timelines for AGI
  • Information hazards

Timestamps: 

0:00 Intro

3:48 Traditional arguments for AI as an existential risk

5:40 What is AI alignment?

7:30 Back to a basic analysis of AI as an existential risk

18:25 Can we model agents in ways other than as expected utility maximizers?

19:34 Is it skillful to try and model human preferences as a utility function?

27:09 Suggestions for alternatives to modeling humans with utility functions

40:30 Agency and optimization

45:55 Embedded decision theory

48:30 More on value learning

49:58 What is robustness and why does it matter?

01:13:00 Scaling to superhuman abilities

01:26:13 Universality

01:33:40 Impact regularization

01:40:34 Causal models, oracles, and decision theory

01:43:05 Forecasting as well as discontinuous and continuous takeoff scenarios

01:53:18 What is the probability of AI-induced existential risk?

02:00:53 Likelihood of continuous and discontinuous take off scenarios

02:08:08 What would you both do if you had more power and resources?

02:12:38 AI timelines

02:14:00 Information hazards

02:19:19 Where to follow Buck and Rohin and learn more

 

Works referenced: 

AI Alignment 2018-19 Review

Takeoff Speeds by Paul Christiano

Discontinuous progress investigation by AI Impacts

An Overview of Technical AI Alignment with Rohin Shah (Part 1)

An Overview of Technical AI Alignment with Rohin Shah (Part 2)

Alignment Newsletter

Intelligence Explosion Microeconomics

AI Alignment: Why It’s Hard and Where to Start

AI Risk for Computer Scientists

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Note: The following transcript has been edited for style and clarity.

 

Lucas Perry: Welcome to the AI Alignment Podcast. I’m Lucas Perry. Today we have a special episode with Buck Shlegeris and Rohin Shah that serves as a review of progress in technical AI alignment over 2018 and 2019. This episode serves as an awesome birds eye view of the varying focus areas of technical AI alignment research and also helps to develop a sense of the field. I found this conversation to be super valuable for helping me to better understand the state and current trajectory of technical AI alignment research. This podcast covers traditional arguments for AI as an x-risk, what AI alignment is, the modeling of agents as expected utility maximizers, iterated distillation and amplification, AI safety via debate, agency and optimization, value learning, robustness, scaling to superhuman abilities, and more. The structure of this podcast is based on Rohin’s AI Alignment Forum post titled AI Alignment 2018-19 Review. That post is an excellent resource to take a look at in addition to this podcast. Rohin also had a conversation with us about just a year ago titled An Overview of Technical AI Alignment with Rohin shah. This episode serves as a follow up to that overview and as an update to what’s been going on in the field. You can find a link for it on the page for this episode.  

Buck Shlegeris is a researcher at the Machine Intelligence Research Institute. He tries to work to make the future good for sentient beings and currently believes that working on existential risk from artificial intelligence is the best way of doing this. Buck worked as a software engineer at PayPal before joining MIRI, and was the first employee at Triplebyte. He previously studied at the Australian National University, majoring in CS and minoring in math and physics, and he has presented work on data structure synthesis at industry conferences.

Rohin Shah is a 6th year PhD student in Computer Science at the Center for Human-Compatible AI at UC Berkeley. He is involved in Effective Altruism and was the co-president of EA UC Berkeley for 2015-16 and ran EA UW during 2016-2017. Out of concern for animal welfare, Rohin is almost vegan because of the intense suffering on factory farms. He is interested in AI, machine learning, programming languages, complexity theory, algorithms, security, and quantum computing to name a few. Rohin’s research focuses on building safe and aligned AI systems that pursue the objectives their users intend them to pursue, rather than the objectives that were literally specified. He also publishes the Alignment Newsletter, which summarizes work relevant to AI alignment. The Alignment Newsletter is something I highly recommend that you follow in addition to this podcast.  

And with that, let’s get into our review of AI alignment with Rohin Shah and Buck Shlegeris.

To get things started here, the plan is to go through Rohin’s post on the Alignment Forum about AI Alignment 2018 and 2019 In Review. We’ll be using this as a way of structuring this conversation and as a way of moving methodically through things that have changed or updated in 2018 and 2019, and to use those as a place for conversation. So then, Rohin, you can start us off by going through this document. Let’s start at the beginning, and we’ll move through sequentially and jump in where necessary or where there is interest.

Rohin Shah: Sure, that sounds good. I think I started this out by talking about this basic analysis of AI risk that’s been happening for the last couple of years. In particular, you have these traditional arguments, so maybe I’ll just talk about the traditional argument first, which basically says that the AI systems that we’re going to build are going to be powerful optimizers. When you optimize something, you tend to get these sort of edge case outcomes, these extreme outcomes that are a little hard to predict ahead of time.

You can’t just rely on tests with less powerful systems in order to predict what will happen, and so you can’t rely on your normal common sense reasoning in order to deal with this. In particular, powerful AI systems are probably going to look like expected utility maximizers due to various coherence arguments, like the Von Neumann–Morgenstern rationality theorem, and these expected utility maximizers have convergent instrumental sub-goals, like not wanting to be switched off because then they can’t achieve their goal, and wanting to accumulate a lot of power and resources.

The standard argument goes, because AI systems are going to be built this way, they will have these convergent instrumental sub-goals. This makes them dangerous because they will be pursuing goals that we don’t want.

Lucas Perry: Before we continue too much deeper into this, I’d want to actually start off with a really simple question for both of you. What is AI alignment?

Rohin Shah: Different people mean different things by it. When I use the word alignment, I’m usually talking about what has been more specifically called intent alignment, which is basically aiming for the property that the AI system is trying to do what you want. It’s trying to help you. Possibly it doesn’t know exactly how to best help you, and it might make some mistakes in the process of trying to help you, but really what it’s trying to do is to help you.

Buck Shlegeris: The way I would say what I mean by AI alignment, I guess I would step back a little bit, and think about why it is that I care about this question at all. I think that the fundamental fact which has me interested in anything about powerful AI systems of the future is that I think they’ll be a big deal in some way or another. And when I ask myself the question “what are the kinds of things that could be problems about how these really powerful AI systems work or affect the world”, one of the things which feels like a problem is that, we might not  know how to apply these systems reliably to the kinds of problems which we care about, and so by default humanity will end up applying them in ways that lead to really bad outcomes. And so I guess, from that perspective, when I think about AI alignment, I think about trying to make ways of building AI systems such that we can apply them to tasks that are valuable, such that that they’ll reliably pursue those tasks instead of doing something else which is really dangerous and bad.

I’m fine with intent alignment as the focus. I kind of agree with, for instance, Paul Christiano, that it’s not my problem if my AI system incompetently kills everyone, that’s the capability’s people’s problem. I just want to make the system so it’s trying to cause good outcomes.

Lucas Perry: Both of these understandings of what it means to build beneficial AI or aligned AI systems can take us back to what Rohin was just talking about, where there’s this basic analysis of AI risk, about AI as powerful optimizers and the associated risks there. With that framing and those definitions, Rohin, can you take us back into this basic analysis of AI risk?

Rohin Shah: Sure. The traditional argument looks like AI systems are going to be goal-directed. If you expect that your AI system is going to be goal-directed, and that goal is not the one that humans care about, then it’s going to be dangerous because it’s going to try to gain power and resources with which to achieve its goal.

If the humans tried to turn it off, it’s going to say, “No, don’t do that,” and it’s going to try to take actions that avoid that. So it pits the AI and the humans in an adversarial game with each other, and you ideally don’t want to be fighting against a superintelligent AI system. That seems bad.

Buck Shlegeris: I feel like Rohin is to some extent setting this up in a way that he’s then going to argue is wrong, which I think is kind of unfair. In particular, Rohin, I think you’re making these points about VNM theorems and stuff to set up the fact that it seems like these arguments don’t actually work. I feel that this makes it kind of unfairly sound like the earlier AI alignment arguments are wrong. I think this is an incredibly important question, of whether early arguments about the importance of AI safety were quite flawed. My impression is that overall the early arguments about AI safety were pretty good. And I think it’s a very interesting question whether this is in fact true. And I’d be interested in arguing about it, but I think it’s the kind of thing that ought to be argued about explicitly.

Rohin Shah: Yeah, sure.

Buck Shlegeris: And I get that you were kind of saying it narratively, so this is only a minor complaint. It’s a thing I wanted to note.

Rohin Shah: I think my position on that question of “how good were the early AI risk arguments,” probably people’s internal beliefs were good as to why AI was supposed to be risky, and the things they wrote down were not very good. Some things were good and some things weren’t. I think Intelligence Explosion Microeconomics was good. I think AI Alignment: Why It’s Hard and Where to Start, was misleading.

Buck Shlegeris: I think I agree with your sense that people probably had a lot of reasonable beliefs but that the written arguments seem flawed. I think another thing that’s true is that random people like me who were on LessWrong in 2012 or something, ended up having a lot of really stupid beliefs about AI alignment, which I think isn’t really the fault of the people who were thinking about it the best, but is maybe sociologically interesting.

Rohin Shah: Yes, that seems plausible to me. Don’t have a strong opinion on it.

Lucas Perry: To provide a little bit of framing here and better analysis of basic AI x-risk arguments, can you list what the starting arguments for AI risk were?

Rohin Shah: I think I am reasonably well portraying what the written arguments were. Underlying arguments that people probably had would be something more like, “Well, it sure seems like if you want to do useful things in the world, you need to have AI systems that are pursuing goals.” If you have something that’s more like tool AI, like Google Maps, that system is going to be good at the one thing it was designed to do, but it’s not going to be able to learn and then apply its knowledge to new tasks autonomously. It sure seems like if you want to do really powerful things in the world, like run companies or make policies, you probably do need AI systems that are constantly learning about their world and applying their knowledge in order to come up with new ways to do things.

In the history of human thought, we just don’t seem to know of a way to cause that to happen except by putting goals in systems, and so probably AI systems are going to be goal-directed. And one way you can formalize goal-directedness is by thinking about expected utility maximizers, and people did a bunch of formal analysis of that. Mostly going to ignore it because I think you can just say all the same thing with the idea of pursuing goals and it’s all fine.

Buck Shlegeris: I think one important clarification to that, is you were saying the reason that tool AIs aren’t just the whole story of what happens with AI is that you can’t apply it to all problems. I think another important element is that people back then, and I now, believe that if you want to build a really good tool, you’re probably going to end up wanting to structure that as an agent internally. And even if you aren’t trying to structure it as an agent, if you’re just searching over lots of different programs implicitly, perhaps by training a really large recurrent policy, you’re going to end up finding something agent shaped.

Rohin Shah: I don’t disagree with any of that. I think we were using the words tool AI differently.

Buck Shlegeris: Okay.

Rohin Shah: In my mind, if we’re talking about tool AI, we’re imagining a pretty restricted action space where no matter what actions in this action space are taken, with high probability, nothing bad is going to happen. And you’ll search within that action space, but you don’t go to arbitrary action in the real world or something like that. This is what makes tool AI hard to apply to all problems.

Buck Shlegeris: I would have thought that’s a pretty non-standard use of the term tool AI.

Rohin Shah: Possibly.

Buck Shlegeris: In particular, I would have thought that restricting the action space enough that you’re safe, regardless of how much it wants to hurt you, seems kind of non-standard.

Rohin Shah: Yes. I have never really liked the concept of tool AI very much, so I kind of just want to move on.

Lucas Perry: Hey, It’s post-podcast Lucas here. I just want to highlight here a little bit of clarification that Rohin was interested in adding, which is that he thinks that “tool AI evokes a sense of many different properties that he doesn’t know which properties most people are  usually thinking about and as a result he prefers not to use the phrase tool AI. And instead would like to use more precise terminology. He doesn’t necessarily feel though that the concepts underlying tool AI are useless.” So let’s tie things a bit back to these basic arguments for x-risk that many people are familiar with, that have to do with convergent instrumental sub-goals and the difficulty of specifying and aligning systems with our goals and what we actually care about in our preference hierarchies.

One of the things here that Buck was seeming to bring up, he was saying that you may have been narratively setting up the Von Neumann–Morgenstern theorem, which sets up AIs as expected utility maximizers, and that you are going to argue that that argument, which is sort of the formalization of these earlier AI risk arguments, that that is less convincing to you now than it was before, but Buck still thinks that these arguments are strong. Could you unpack this a little bit more or am I getting this right?

Rohin Shah: To be clear, I also agree with Buck, that the spirit of the original arguments does seem correct, though, there are people who disagree with both of us about that. Basically, the VNM theorem roughly says, if you have preferences over a set of outcomes, and you satisfy some pretty intuitive axioms about how you make decisions, then you can represent your preferences using a utility function such that your decisions will always be, choose the action that maximizes the expected utility. This is, at least in writing, given as a reason to expect that AI systems would be maximizing expected utility. The thing is, when you talk about AI systems that are acting in the real world, they’re just selecting a universe history, if you will. Any observed behavior is compatible with the maximization of some utility function. Utility functions are a really, really broad class of things when you apply it to choosing from universe histories.

Buck Shlegeris: An intuitive example of this: suppose that you see that every day I walk home from work in a really inefficient way. It’s impossible to know whether I’m doing that because I happened to really like that path. For any sequence of actions that I take, there’s some utility functions such that that was the optimal sequence of actions. And so we don’t actually learn anything about how my policy is constrained based on the fact that I’m an expected utility maximizer.

Lucas Perry: Right. If I only had access to your behavior and not your insides.

Rohin Shah: Yeah, exactly. If you have a robot twitching forever, that’s all it does, there is a utility function over a universe history that says that is the optimal thing to do. Every time the robot twitches to the right, it’s like, yeah, the thing that was optimal to do at that moment in time was twitching to the right. If at some point somebody takes a hammer and smashes the robot and it breaks, then the utility function that corresponds to that being optimal is like, yeah, that was the exact right moment to break down.

If you have these pathologically complex utility functions as possibilities, every behavior is compatible with maximizing expected utility, you might want to say something like, probably we’ll have the simple utility maximizers, but that’s a pretty strong assumption, and you’d need to justify it somehow. And the VNM theorem wouldn’t let you do that.

Lucas Perry: So is the problem here that you’re unable to fully extract human preference hierarchies from human behavior?

Rohin Shah: Well, you’re unable to extract agent preferences from agent behavior. You can see any agent behavior and you can rationalize it as expected utility maximization, but it’s not very useful. Doesn’t give you predictive power.

Buck Shlegeris: I just want to have my go at saying this argument in three sentences. Once upon a time, people said that because all rational systems act like they’re maximizing an expected utility function, we should expect them to have various behaviors like trying to maximize the amount of power they have. But every set of actions that you could take is consistent with being an expected utility maximizer, therefore you can’t use the fact that something is an expected utility maximizer in order to argue that it will have a particular set of behaviors, without making a bunch of additional arguments. And I basically think that I was wrong to be persuaded by the naive argument that Rohin was describing, which just goes directly from rational things are expected utility maximizers, to therefore rational things are power maximizing.

Rohin Shah: To be clear, this was the thing I also believed. The main reason I wrote the post that argued against it was because I spent half a year under the delusion that this was a valid argument.

Lucas Perry: Just for my understanding here, the view is that because any behavior, any agent from the outside can be understood as being an expected utility maximizer, that there are behaviors that clearly do not do instrumental sub-goal things, like maximize power and resources, yet those things can still be viewed as expected utility maximizers from the outside. So additional arguments are required for why expected utility maximizers do instrumental sub-goal things, which are AI risky.

Rohin Shah: Yeah, that’s exactly right.

Lucas Perry: Okay. What else is on offer other than expected utility maximizers? You guys talked about comprehensive AI services might be one. Are there other formal agentive classes of ‘thing that is not an expected utility maximizer but still has goals?’

Rohin Shah: A formalism for that? I think some people like John Wentworth is for example, thinking about markets as a model of agency. Some people like to think of multi-agent groups together leading to an emergent agency and want to model human minds this way. How formal are these? Not that formal yet.

Buck Shlegeris: I don’t think there’s anything which is competitively popular with expected utility maximization as the framework for thinking about this stuff.

Rohin Shah: Oh yes, certainly not. Expected utility maximization is used everywhere. Nothing else comes anywhere close.

Lucas Perry: So there’s been this complete focus on utility functions and representing the human utility function, whatever that means. Do you guys think that this is going to continue to be the primary way of thinking about and modeling human preference hierarchies? How much does it actually relate to human preference hierarchies? I’m wondering if it might just be substantially different in some way.

Buck Shlegeris: Me and Rohin are going to disagree about this. I think that trying to model human preferences as a utility function is really dumb and bad and will not help you do things that are useful. I don’t know; If I want to make an AI that’s incredibly good at recommending me movies that I’m going to like, some kind of value learning thing where it tries to learn my utility function over movies is plausibly a good idea. Even things where I’m trying to use an AI system as a receptionist, I can imagine value learning being a good idea.

But I feel extremely pessimistic about more ambitious value learning kinds of things, where I try to, for example, have an AI system which learns human preferences and then acts in large scale ways in the world. I basically feel pretty pessimistic about every alignment strategy which goes via that kind of a route. I feel much better about either trying to not use AI systems for problems where you have to think about large scale human preferences, or having an AI system which does something more like modeling what humans would say in response to various questions and then using that directly instead of trying to get a value function out of it.

Rohin Shah: Yeah. Funnily enough, I was going to start off by saying I think Buck and I are going to agree on this.

Buck Shlegeris: Oh.

Rohin Shah: And I think I mostly agree with the things that you said. The thing I was going to say was I feel pretty pessimistic about trying to model the normative underlying human values, where you have to get things like population ethics right, and what to do with the possibility of infinite value. How do you deal with fanaticism? What’s up with moral uncertainty? I feel pretty pessimistic about any sort of scheme that involves figuring that out before developing human-level AI systems.

There’s a related concept which is also called value learning, which I would prefer to be called something else, but I feel like the name’s locked in now. In my sequence, I called it narrow value learning, but even that feels bad. Maybe at least for this podcast we could call it specification learning, which is sort of more like the tasks Buck mentioned, like if you want to learn preferences over movies, representing that using a utility function seems fine.

Lucas Perry: Like superficial preferences?

Rohin Shah: Sure. I usually think of it as you have in mind a task that you want your AI system to do, and now you have to get your AI system to reliably do it. It’s unclear whether this should even be called a value learning at this point. Maybe it’s just the entire alignment problem. But techniques like inverse reinforcement learning, preference learning, learning from corrections, inverse reward design where you learn from a proxy reward, all of these are more trying to do the thing where you have a set of behaviors in mind, and you want to communicate that to the agent.

Buck Shlegeris: The way that I’ve been thinking about how optimistic I should be about value learning or specification learning recently has been that I suspect that at the point where AI is human level, by default we’ll have value learning which is about at human level. We’re about as good at giving AI systems information about our preferences that it can do stuff with as we are giving other humans information about our preferences that we can do stuff with. And when I imagine hiring someone to recommend music to me, I feel like there are probably music nerds who could do a pretty good job of looking at my Spotify history, and recommending bands that I’d like if they spent a week on it. I feel a lot more pessimistic about being able to talk to a philosopher for a week, and then them answer hard questions about my preferences, especially if they didn’t have the advantage of already being humans themselves.

Rohin Shah: Yep. That seems right.

Buck Shlegeris: So maybe that’s how I would separate out the specification learning stuff that I feel optimistic about from the more ambitious value learning stuff that I feel pretty pessimistic about.

Rohin Shah: I do want to note that I collated a bunch of stuff arguing against ambitious value learning. If I had to make a case for optimism about even that approach, it would look more like, “Under the value learning approach, it seems possible with uncertainty over rewards, values, preferences, whatever you want to call them to get an AI system such that you actually are able to change it, because it would reason that if you’re trying to change it, well then that means something about it is currently not good for helping you and so it would be better to let itself be changed. I’m not very convinced by this argument.”

Buck Shlegeris: I feel like if you try to write down four different utility functions that the agent is uncertain between, I think it’s just actually really hard for me to imagine concrete scenarios where the AI is corrigible as a result of its uncertainty over utility functions. Imagine the AI system thinks that you’re going to switch it off and replace it with an AI system which has a different method of inferring values from your actions and your words. It’s not going to want to let you do that, because its utility function is to have the world be the way that is expressed by your utility function as estimated the way that it approximates utility functions. And so being replaced by a thing which estimates utility functions or infers utility functions some other way means that it’s very unlikely to get what it actually wants, and other arguments like this. I’m not sure if these are super old arguments that you’re five levels of counter-arguments to.

Rohin Shah: I definitely know this argument. I think the problem of fully updated deference is what I would normally point to as representing this general class of claims and I think it’s a good counter argument. When I actually think about this, I sort of start getting confused about what it means for an AI system to terminally value the final output of what its value learning system would do. It feels like some additional notion of how the AI chooses actions has been posited, that hasn’t actually been captured in the model and so I feel fairly uncertain about all of these arguments and kind of want to defer to the future. 

Buck Shlegeris: I think the thing that I’m describing is just what happens if you read the algorithm literally. Like, if you read the value learning algorithm literally, it has this notion of the AI system wants to maximize the human’s actual utility function.

Rohin Shah: For an optimal agent playing a CIRL (cooperative inverse reinforcement learning) game, I agree with your argument. If you take optimality as defined in the cooperative inverse reinforcement learning paper and it’s playing over a long period of time, then yes, it’s definitely going to prefer to keep itself in charge rather than a different AI system that would infer values in a different way.

Lucas Perry: It seems like so far utility functions are the best way of trying to get an understanding of what human beings care about and value and have preferences over, you guys are bringing up all of the difficult intricacies with trying to understand and model human preferences as utility functions. One of the things that you also bring up here, Rohin, in your review, is the risk of lock-in, which may require us to solve hard philosophical problems before the development of AGI. That has something to do with ambitious value learning, which would be like learning the one true human utility function which probably just doesn’t exist.

Buck Shlegeris: I think I want to object to a little bit of your framing there. My stance on utility functions of humans isn’t that there are a bunch of complicated subtleties on top, it’s that modeling humans with utility functions is just a really sad state to be in. If your alignment strategy involves positing that humans behave as expected utility maximizers, I am very pessimistic about it working in the short term, and I just think that we should be trying to completely avoid anything which does that. It’s not like there’s a bunch of complicated sub-problems that we need to work out about how to describe us as expected utility maximizers, my best guess is that we would just not end up doing that because it’s not a good idea.

Lucas Perry: For the ambitious value learning?

Buck Shlegeris: Yeah, that’s right.

Lucas Perry: Okay, do you have something that’s on offer?

Buck Shlegeris: The two options instead of that, which seem attractive to me? As I said earlier, one is you just convince everyone to not use AI systems for things where you need to have an understanding of large scale human preferences. The other one is the kind of thing that Paul Christiano’s iterated distillation and amplification, or a variety of his other ideas, the kind of thing that he’s trying to get there is, I think, if you make a really powerful AI system, it’s actually going to have an excellent model of human values in whatever representation is best for actually making predictions about  humans because a really excellent AGI, like a really excellent paperclip maximizer, it’s really important for it to really get how humans work so that it can manipulate them into letting it build lots of paperclip factories or whatever.

So I think that if you think that we have AGI, then by assumption I think we have a system which is able to reason about human values if it wants. And so if we can apply these really powerful AI systems to tasks such that the things that they do display their good understanding of human values, then we’re fine and it’s just okay that there was no way that we could represent a utility function directly. So for instance, the idea in IDA is that if we could have this system which is just trying to answer questions the same way that humans would, but enormously more cheaply because it can run faster than humans and a few other tricks, then we don’t have to worry about writing down a utility functions of humans directly because we can just make the system do things that are kind of similar to the things humans would have done, and so it implicitly has this human utility function built into it. That’s option two. Option one is don’t use anything that requires a complex human utility function, option two is have your systems learn human values implicitly, by giving them a task such that this is beneficial for them and such that their good understanding of human values comes out in their actions.

Rohin Shah: One way I might condense that point, is that you’re asking for a nice formalism for human preferences and I just point to all the humans out there in the world who don’t know anything about utility functions, which is 99% of them and nonetheless still seem pretty good at inferring human preferences.

Lucas Perry: On this part about AGI, if it is AGI it should be able to reason about human preferences, then why would it not be able to construct something that was more explicit and thus was able to do more ambitious value learning?

Buck Shlegeris: So it can totally do that, itself. But we can’t force that structure from the outside with our own algorithms.

Rohin Shah: Image classification is a good analogy. Like, in the past we were using hand engineered features, namely SIFT and HOG and then training classifiers over these hand engineered features in order to do image classification. And then we came to the era of deep learning and we just said, yeah, throw away all those features and just do everything end to end with a convolutional neural net and it worked way better. The point was that, in fact there are good representations for most tasks and humans trying to write them down ahead of time just doesn’t work very well at that. It tends to work better if you let the AI system discover its own representations that best capture the thing you wanted to capture.

Lucas Perry: Can you unpack this point a little bit more? I’m not sure that I’m completely understanding it. Buck is rejecting this modeling human beings explicitly as expected utility maximizers and trying to explicitly come up with utility functions in our AI systems. The first was to convince people not to use these kinds of things. And the second is to make it so that the behavior and output of the AI systems has some implicit understanding of human behavior. Can you unpack this a bit more for me or give me another example?

Rohin Shah: So here’s another example. Let’s say I was teaching my kid that I don’t have, how to catch a ball. It seems that the formalism that’s available to me for learning how to catch a ball is, well, you can go all the way down to look at our best models of physics, we could use Newtonian mechanics let’s say, like here are these equations, estimate the velocity and the distance of the ball and the angle at which it’s thrown plug that into these equations and then predict that the ball’s going to come here and then just put your hand there and then magically catch it. We won’t even talk about the catching part. That seems like a pretty shitty way to teach a kid how to catch a ball.

Probably it’s just a lot better to just play catch with the kid for a while and let the kid’s brain figure out this is how to predict where the ball is going to go such that I can predict where it’s going to be and then catch it.

I’m basically 100% confident that the thing that the brain is doing is not Newtonian mechanics. It’s doing something else that’s just way more efficient at predicting where the ball is going to be so that I can catch it and if I forced the brain to use Newtonian mechanics, I bet it would not do very well at this task.

Buck Shlegeris: I feel like that still isn’t quite saying the key thing here. I don’t know how to say this off the top of my head either, but I think there’s this key point about: just because your neural net can learn a particular feature of the world doesn’t mean that you can back out some other property of the world by forcing the neural net to have a particular shape. Does that make any sense, Rohin?

Rohin Shah: Yeah, vaguely. I mean, well, no, maybe not.

Buck Shlegeris: The problem isn’t just the capabilities problem. There’s this way you can try and infer a human utility function by asking, according to this model, what’s the maximum likelihood utility function given all these things the human did. If you have a good enough model, you will in fact end up making very good predictions about the human, it’s just that the decomposition into their planning function and their utility function is not going to result in a utility function which is anything like a thing that I would want maximized if this process was done on me. There is going to be some decomposition like this, which is totally fine, but the utility function part just isn’t going to correspond to the thing that I want.

Rohin Shah: Yeah, that is also a problem, but I agree that is not the thing I was describing.

Lucas Perry: Is the point there that there’s a lack of alignment between the utility function and the planning function. Given that the planning function imperfectly optimizes the utility function.

Rohin Shah: It’s more like there are just infinitely many possible pairs of planning functions and utility functions that exactly predict human behavior. Even if it were true that humans were expected utility maximizers, which Buck is arguing we’re not, and I agree with him. There is a planning function that’s like humans are perfectly anti-rational and if you’re like what utility function works with that planner to predict human behavior. Well, the literal negative of the true utility function when combined with the anti-rational planner produces the same behavior as the true utility function with the perfect planner, there’s no information that lets you distinguish between these two possibilities.

You have to build it in as an assumption. I think Buck’s point is that building things in as assumptions is probably not going to work.

Buck Shlegeris: Yeah.

Rohin Shah: A point I agree with. In philosophy this is called the is-ought problem, right? What you can train your AI system on is a bunch of “is” facts and then you have to add in some assumptions in order to jump to “ought” facts, which is what the utility function is trying to do. The utility function is trying to tell you how you ought to behave in new situations and the point of the is-ought distinction is that you need some bridging assumptions in order to get from is to ought.

Buck Shlegeris: And I guess an important part here is your system will do an amazing job of answering “is” questions about what humans would say about “ought” questions. And so I guess maybe you could phrase the second part as: to get your system to do things that match human preferences, use the fact that it knows how to make accurate “is” statements about humans’ ought statements?

Lucas Perry: It seems like we’re strictly talking about inferring the human utility function or preferences via looking at behavior. What if you also had more access to the actual structure of the human’s brain?

Rohin Shah: This is like the approach that Stuart Armstrong likes to talk about. The same things still apply. You still have the is-ought problem where the facts about the brain are “is” facts and how you translate that into “ought” facts is going to involve some assumptions. Maybe you can break down such assumptions that everyone would agree with. Maybe it’s like if this particular neuron in a human brain spikes, that’s a good thing and we want more of it and if this other one spikes, that’s a bad thing. We don’t want it. Maybe that assumption is fine.

Lucas Perry: I guess I’m just pointing out, if you could find the places in the human brain that generate the statements about Ought questions.

Rohin Shah: As Buck said, that lets you predict what humans would say about ought statements, which your assumption could then be, whatever humans say about ought statements, that’s what you ought to do. And that’s still an assumption. Maybe it’s a very reasonable assumption that we’re happy to put it into our AI system.

Lucas Perry: If we’re not willing to accept some humans’ “is” statements about “ought” questions then we have to do some meta-ethical moral policing in our assumptions around getting “is” statements from “ought” questions.

Rohin Shah: Yes, that seems right to me. I don’t know how you would do such a thing, but you would have to do something along those lines.

Buck Shlegeris: I would additionally say that I feel pretty great about trying to do things which use the fact that we can trust our AI to have good “is” answers to “ought” questions, but there’s a bunch of problems with this. I think it’s a good starting point but trying to use that to do arbitrarily complicated things in the world has a lot of problems. For instance, suppose I’m trying to decide whether we should design a city this way or that way. It’s hard to know how to go from the ability to know how humans would answer questions about preferences to knowing what you should do to design the city. And this is for a bunch of reasons, one of them is that the human might not be able to figure out from your city building plans what the city’s going to actually be like. And another is that the human might give inconsistent answers about what design is good, depending on how you phrase the question, such that if you try to figure out a good city plan by optimizing for the thing that the human is going to be most enthusiastic about, then you might end up with a bad city plan. Paul Christiano has written in a lot of detail about a lot of this.

Lucas Perry: That also reminds me of what Stuart Armstrong wrote about the framing on the questions changing output on the preference.

Rohin Shah: Yep.

Buck Shlegeris: Sorry, to be clear other people than Paul Christiano have also written a lot about this stuff, (including Rohin). My favorite writing about this stuff is by Paul.

Lucas Perry: Yeah, those do seem problematic but it would also seem that there would be further “is” statements that if you queried people’s meta-preferences about those things, you would get more “is” statements about that, but then that just pushes the “ought” assumptions that you need to make further back. Getting into very philosophically weedy territory. Do you think that this kind of thing could be pushed to the long reflection as is talked about by William MacAskill and Toby Ord or how much of this do you actually think needs to be solved in order to have safe and aligned AGI?

Buck Shlegeris: I think there are kind of two different ways that you could hope to have good outcomes from AGI. One is: set up a world such that you never needed to make an AGI which can make large scale decisions about the world. And two is: solve the full alignment problem.

I’m currently pretty pessimistic about the second of those being technically feasible. And I’m kind of pretty pessimistic about the first of those being a plan that will work. But in the world where you can have everyone only apply powerful and dangerous AI systems in ways that don’t require an understanding of human values, then you can push all of these problems onto the long reflection. In worlds where you can do arbitrarily complicated things in ways that humans would approve of, you don’t really need to long reflect this stuff because of the fact that these powerful AI systems already have the capacity of doing portions of the long reflection work inside themselves as needed. (Quotes about the long reflection

Rohin Shah: Yeah, so I think my take, it’s not exactly disagreeing with Buck. It’s more like from a different frame as Buck’s. If you just got AI systems that did the things that humans did now, this does not seem to me to obviously require solving hard problems in philosophy. That’s the lower bound on what you can do before having to do long reflection type stuff. Eventually you do want to do a longer reflection. I feel relatively optimistic about having a technical solution to alignment that allows us to do the long reflection after building AI systems. So the long reflection would include both humans and AI systems thinking hard, reflecting on difficult problems and so on.

Buck Shlegeris: To be clear, I’m super enthusiastic about there being a long reflection or something along those lines.

Lucas Perry: I always find it useful reflecting on just how human beings do many of these things because I think that when thinking about things in the strict AI alignment sense, it can seem almost impossible, but human beings are able to do so many of these things without solving all of these difficult problems. It seems like in the very least, we’ll be able to get AI systems that very, very approximately do what is good or what is approved of by human beings because we can already do that.

Buck Shlegeris: That argument doesn’t really make sense to me. It also didn’t make sense when Rohin referred to it a minute ago.

Rohin Shah: It’s not an argument for we technically know how to do this. It is more an argument for this as at least within the space of possibilities.

Lucas Perry: Yeah, I guess that’s how I was also thinking of it. It is within the space of possibilities. So utility functions are good because they can be optimized for, and there seem to be risks with optimization. Is there anything here that you guys would like to say about better understanding agency? I know this is one of the things that is important within the MIRI agenda.

Buck Shlegeris: I am a bad MIRI employee. I don’t really get that part of the MIRI agenda, and so I’m not going to defend it. I have certainly learned some interesting things from talking to Scott Garrabrant and other MIRI people who have lots of interesting thoughts about this stuff. I don’t quite see the path from there to good alignment strategies. But I also haven’t spent a super long time thinking about it because I, in general, don’t try to think about all of the different AI alignment things that I could possibly think about.

Rohin Shah: Yeah. I also am not a good person to ask about this. Most of my knowledge comes from reading things and MIRI has stopped writing things very much recently, so I don’t know what their ideas are. I, like Buck, don’t really see a good alignment strategy that starts with, first we understand optimization and so that’s the main reason why I haven’t looked into it very much.

Buck Shlegeris: I think I don’t actually agree with the thing you said there, Rohin. I feel like understanding optimization could plausibly be really nice. Basically the story there is, it’s a real bummer if we have to make really powerful AI systems via searching over large recurrent policies for things that implement optimizers. If it turned out that we could figure out some way of coding up optimizer stuffs directly, then this could maybe mean you didn’t need to make mesa-optimizers. And maybe this means that your inner alignment problems go away, which could be really nice. The thing that I was saying I haven’t thought that much about is, the relevance of thinking about, for instance, the various weirdnesses that happen when you consider embedded agency or decision theory, and things like that.

Rohin Shah: Oh, got it. Yeah. I think I agree that understanding optimization would be great if we succeeded at it and I’m mostly pessimistic about us succeeding at it, but also there are people who are optimistic about it and I don’t know why they’re optimistic about it.

Lucas Perry: Hey it’s post-podcast Lucas here again. So, I just want to add a little more detail here again on behalf of Rohin. Here he feels pessimistic about us understanding optimization well enough and in a short enough time period that we are able to create powerful optimizers that we understand that rival the performance of the AI systems we’re already building and will build in the near future. Back to the episode. 

Buck Shlegeris: The arguments that MIRI has made about this,… they think that there are a bunch of questions about what optimization is, that are plausibly just not that hard compared to other problems which small groups of people have occasionally solved, like coming up with foundations of mathematics, kind of a big conceptual deal but also a relatively small group of people. And before we had formalizations of math, I think it might’ve seemed as impossible to progress on as formalizing optimization or coming up with a better picture of that. So maybe that’s my argument for some optimism.

Rohin Shah: Yeah, I think pointing to some examples of great success does not imply… Like there are probably many similar things that didn’t work out and we don’t know about them cause nobody bothered to tell us about them because they failed. Seems plausible maybe.

Lucas Perry: So, exploring more deeply this point of agency can either, or both of you, give us a little bit of a picture about the relevance or non relevance of decision theory here to AI alignment and I think, Buck, you mentioned the trickiness of embedded decision theory.

Rohin Shah: If you go back to our traditional argument for AI risk, it’s basically powerful AI systems will be very strong optimizers. They will possibly be misaligned with us and this is bad. And in particular one specific way that you might imagine this going wrong is this idea of mesa optimization where we don’t know how to build optimizers right now. And so what we end up doing is basically search across a huge number of programs looking for ones that do well at optimization and use that as our AGI system. And in this world, if you buy that as a model of what’s happening, then you’ll basically have almost no control over what exactly that system is optimizing for. And that seems like a recipe for misalignment. It sure would be better if we could build the optimizer directly and know what it is optimizing for. And in order to do that, we need to know how to do optimization well.

Lucas Perry: What are the kinds of places that we use mesa optimizers today?

Rohin Shah: It’s not used very much yet. The field of meta learning is the closest example. In the field of meta learning you have a distribution over tasks and you use gradient descent or some other AI technique in order to find an AI system that itself, once given a new task, learns how to perform that task well.

Existing meta learning systems are more like learning how to do all the tasks well and then when they’ll see a new task they just figure out ah, it’s this task and then they roll out the policy that they already learned. But the eventual goal for meta learning is to get something that, online, learns how to do the task without having previously figured out how to do that task.

Lucas Perry: Okay, so Rohin did what you say cover embedded decision theory?

Rohin Shah: No, not really. I think embedded decision theory is just, we want to understand optimization. Our current notion of optimization, one way you could formalize it is to say my AI agent is going to have Bayesian belief over all the possible ways that the environment could be. It’s going to update that belief over time as it gets observations and then it’s going to act optimally with respect to that belief, by maximizing its expected utility. And embedded decision theory basically calls into question the idea that there’s a separation between the agent and the environment. In particular I, as a human, couldn’t possibly have a Bayesian belief about the entire earth because the entire Earth contains me. I can’t have a Bayesian belief over myself so this means that our existing formalization of agency is flawed. It can’t capture these things that affect real agents. And embedded decision theory, embedded agency, more broadly, is trying to deal with this fact and have a new formalization that works even in these situations.

Buck Shlegeris: I want to give my understanding of the pitch for it. One part is that if you don’t understand embedded agency, then if you try to make an AI system in a hard coded way, like making a hard coded optimizer, traditional phrasings of what an optimizer is, are just literally wrong in that, for example, they’re assuming that you have these massive beliefs over world states that you can’t really have. And plausibly, it is really bad to try to make systems by hardcoding assumptions that are just clearly false. And so if we want to hardcode agents with particular properties, it would be good if we knew a way of coding the agent that isn’t implicitly making clearly false assumptions.

And the second pitch for it is something like when you want to understand a topic, sometimes it’s worth looking at something about the topic which you’re definitely wrong about, and trying to think about that part until you are less confused about it. When I’m studying physics or something, a thing that I love doing is looking for the easiest question whose answer I don’t know, and then trying to just dive in until I have satisfactorily answered that question, hoping that the practice that I get about thinking about physics from answering a question correctly will generalize to much harder questions. I think that’s part of the pitch here. Here is a problem that we would need to answer, if we wanted to understand how superintelligent AI systems work, so we should try answering it because it seems easier than some of the other problems.

Lucas Perry: Okay. I think I feel satisfied. The next thing here Rohin in your AI alignment 2018-19 review is value learning. I feel like we’ve talked a bunch about this already. Is there anything here that you want to say or do you want to skip this?

Rohin Shah: One thing we didn’t cover is, if you have uncertainty over what you’re supposed to optimize, this turns into an interactive sort of game between the human and the AI agent, which seems pretty good. A priori you should expect that there’s going to need to be a lot of interaction between the human and the AI system in order for the AI system to actually be able to do the things that the human wants it to do. And so having formalisms and ideas of where this interaction naturally falls out seems like a good thing.

Buck Shlegeris: I’ve said a lot of things about how I am very pessimistic about value learning as a strategy. Nevertheless it seems like it might be really good for there to be people who are researching this, and trying to get as good as we can get at improving sample efficiency so that can have your AI systems understand your preferences over music with as little human interaction as possible, just in case it turns out to be possible to solve the hard version of value learning. Because a lot of the engineering effort required to make ambitious value learning work will plausibly be in common with the kinds of stuff you have to do to make these more simple specification learning tasks work out. That’s a reason for me to be enthusiastic about people researching value learning even if I’m pessimistic about the overall thing working.

Lucas Perry: All right, so what is robustness and why does it matter?

Rohin Shah: Robustness is one of those words that doesn’t super clearly have a definition and people use it differently. Robust agents don’t fail catastrophically in situations slightly different from the ones that they were designed for. One example of a case where we see a failure of robustness currently, is in adversarial examples for image classifiers, where it is possible to take an image, make a slight perturbation to it, and then the resulting image is completely misclassified. You take a correctly classified image of a Panda, slightly perturb it such that a human can’t tell what the difference is, and then it’s classified as a gibbon with 99% confidence. Admittedly this was with an older image classifier. I think you need to make the perturbations a bit larger now in order to get them.

Lucas Perry: This is because the relevant information that it uses are very local to infer panda-ness rather than global properties of the panda?

Rohin Shah: It’s more like they’re high frequency features or imperceptible features. There’s a lot of controversy about this but there is a pretty popular recent paper that I believe, but not everyone believes, that claims that this was because they’re picking up on real imperceptible features that do generalize to the test set, that humans can’t detect. That’s an example of robustness. Recently people have been applying this to reinforcement learning both by adversarially modifying the observations that agents get and also by training agents that act in the environment adversarially towards the original agent. One paper out of CHAI showed that there’s this kick and defend environment where you’ve got two MuJoCo robots. One of them is kicking a soccer ball. The other one’s a goalie, that’s trying to prevent the kicker from successfully shooting a goal, and they showed that if you do self play in order to get kickers and defenders and then you take the kicker, you freeze it, you don’t train it anymore and you retrain a new defender against this kicker.

What is the strategy that this new defender learns? It just sort of falls to the ground and flaps about in a random looking way and the kicker just gets so confused that it usually fails to even touch the ball and so this is sort of an adversarial example for RL agents now, it’s showing that even they’re not very robust.

There was also a paper out of DeepMind that did the same sort of thing. For their adversarial attack they learned what sorts of mistakes the agent would make early on in training and then just tried to replicate those mistakes once the agent was fully trained and they found that this helped them uncover a lot of bad behaviors. Even at the end of training.

From the perspective of alignment, it’s clear that we want robustness. It’s not exactly clear what we want robustness to. This robustness to adversarial perturbations was kind of a bit weird as a threat model. If there is an adversary in the environment they’re probably not going to be restricted to small perturbations. They’re probably not going to get white box access to your AI system; even if they did, this doesn’t seem to really connect with the AI system as adversarially optimizing against humans story, which is how we get to the x-risk part, so it’s not totally clear.

I think on the intent alignment case, which is the thing that I usually think about, you mostly want to ensure that whatever is driving the “motivation” of the AI system, you want that to be very robust. You want it to agree with what humans would want in all situations or at least all situations that are going to come up or something like that. Paul Christiano has written a few blog posts about this that talk about what techniques he’s excited about solving that problem, which boil down to interpretability, adversarial training, and improving adversarial training through relaxations of the problem.

Buck Shlegeris: I’m pretty confused about this, and so it’s possible what I’m going to say is dumb. When I look at problems with robustness or problems that Rohin put in this robustness category here, I want to divide it into two parts. One of the parts is, things that I think of as capability problems, which I kind of expect the rest of the world will need to solve on its own. For instance, things about safe exploration, how do I get my system to learn to do good things without ever doing really bad things, this just doesn’t seem very related to the AI alignment problem to me. And I also feel reasonably optimistic that you can solve it by doing dumb techniques which don’t have anything too difficult to them, like you can have your system so that it has a good model of the world that it got from unsupervised learning somehow and then it never does dumb enough things. And also I don’t really see that kind of robustness problem leading to existential catastrophes. And the other half of robustness is the half that I care about a lot, which in my mind, is mostly trying to make sure that you succeeded at inner alignment. That is, that the mesa optimizers you’ve found through gradient descent have goals that actually match your goals.

This is like robustness in the sense that you’re trying to guarantee that in every situation, your AI system, as Rohin was saying, is intent aligned with you. It’s trying to do the kind of thing that you want. And I worry that, by default, we’re going to end up with AI systems not intent aligned, so there exist a bunch of situations they can be put in such that they do things that are very much not what you’d want, and therefore they fail at robustness. I think this is a really important problem, it’s like half of the AI safety problem or more, in my mind, and I’m not very optimistic about being able to solve it with prosaic techniques.

Rohin Shah: That sounds roughly similar to what I was saying. Yes.

Buck Shlegeris: I don’t think we disagree about this super much except for the fact that I think you seem to care more about safe exploration and similar stuff than I think I do.

Rohin Shah: I think safe exploration’s a bad example. I don’t know what safe exploration is even trying to solve but I think other stuff, I agree. I do care about it more. One place where I somewhat disagree with you is, you sort of have this point about all these robustness problems are the things that the rest of the world has incentives to figure out, and will probably figure out. That seems true for alignment too, it sure seems like you want your system to be aligned in order to do the things that you actually want. Everyone that has an incentive for this to happen. I totally expect people who aren’t EAs or rationalists or weird longtermists to be working on AI alignment in the future and to some extent even now. I think that’s one thing.

Buck Shlegeris: You should say your other thing, but then I want to get back to that point.

Rohin Shah: The other thing is I think I agree with you that it’s not clear to me how failures of the robustness of things other than motivation lead to x-risk, but I’m more optimistic than you are that our solutions to those kinds of robustness will help with the solutions to “motivation robustness” or how to make your mesa optimizer aligned.

Buck Shlegeris: Yeah, sorry, I guess I actually do agree with that last point. I am very interested in trying to figure out how to have aligned to mesa optimizers, and I think that a reasonable strategy to pursue in order to get aligned mesa optimizers is trying to figure out how to make your image classifiers robust to adversarial examples. I think you probably won’t succeed even if you succeed with the image classifiers, but it seems like the image classifiers are still probably where you should start. And I guess if we can’t figure out how to make image classifiers robust to adversarial examples in like 10 years, I’m going to be super pessimistic about the harder robustness problem, and that would be great to know.

Rohin Shah: For what it’s worth, my take on the adversarial examples of image classifiers is, we’re going to train image classifiers on more data with bigger nets, it’s just going to mostly go away. Prediction. I’m laying my cards on the table.

Buck Shlegeris: That’s also something like my guess.

Rohin Shah: Okay.

Buck Shlegeris: My prediction is: to get image classifiers that are robust to epsilon ball perturbations or whatever, some combination of larger things and adversarial training and a couple other clever things, will probably mean that we have robust image classifiers in 5 or 10 years at the latest.

Rohin Shah: Cool. And you wanted to return to the other point about the world having incentives to do alignment.

Buck Shlegeris: So I don’t quite know how to express this, but I think it’s really important which is going to make this a really fun experience for everyone involved. You know how Airbnb… Or sorry, I guess a better example of this is actually Uber drivers. Where I give basically every Uber driver a five star rating, even though some Uber drivers are just clearly more pleasant for me than others, and Uber doesn’t seem to try very hard to get around these problems, even though I think that if Uber caused there to be a 30% difference in pay between the drivers who I think of as 75th percentile and the drivers I think of as 25th percentile, this would make the service probably noticeably better for me. I guess it seems to me that a lot of the time the world just doesn’t try do kind of complicated things to make systems actually aligned, and it just does hack jobs, and then everyone deals with the fact that everything is unaligned as a result.

To draw this analogy back, I think that we’re likely to have the kind of alignment techniques that solve problems that are as simple and obvious as: we should have a way to have rate your hosts on Airbnb. But I’m worried that we won’t ever get around to solving the problems that are like, but what if your hosts are incentivized to tell you sob stories such that you give them good ratings, even though actually they were worse than some other hosts. And this is never a big enough deal that people are unilaterally individually incentivized to solve the harder version of the alignment problem, and then everyone ends up using these systems that actually aren’t aligned in the strong sense and then we end up in a doomy world. I’m curious if any of that made any sense.

Lucas Perry: Is a simple way to put that we fall into inadequate or an unoptimal equilibrium and then there’s tragedy of the commons and bad game theory stuff that happens that keeps us locked and that the same story could apply to alignment?

Buck Shlegeris: Yeah, that’s not quite what I mean.

Lucas Perry: Okay.

Rohin Shah: I think Buck’s point is that actually Uber or Airbnb could unilaterally, no gains required, make their system better and this would be an improvement for them and everyone else, and they don’t do it. There is nothing about equilibrium that is a failure of Uber to do this thing that seems so obviously good.

Buck Shlegeris: I’m not actually claiming that it’s better for Uber, I’m just claiming that there is a misalignment there. Plausibly, an Uber exec, if they were listening to this they’d just be like, “LOL, that’s a really stupid idea. People would hate it.” And then they would say more complicated things like “most riders are relatively price sensitive and so this doesn’t matter.” And plausibly they’re completely right.

Rohin Shah: That’s what I was going to say.

Buck Shlegeris: But the thing which feels important to me is something like a lot of the time it’s not worth solving the alignment problems at any given moment because something else is a bigger problem to how things are going locally. And this can continue being the case for a long time, and then you end up with everyone being locked in to this system where they never solved the alignment problems. And it’s really hard to make people understand this, and then you get locked into this bad world.

Rohin Shah: So if I were to try and put that in the context of AI alignment, I think this is a legitimate reason for being more pessimistic. And the way that I would make that argument is: it sure seems like we are going to decide on what method or path we’re going to use to build AGI. Maybe we’ll do a bunch of research and decide we’re just going to scale up language models or something like this. I don’t know. And we will do that before we have any idea of which technique would be easiest to align and as a result, we will be forced to try to align this exogenously chosen AGI technique and that would be harder than if we got to design our alignment techniques and our AGI techniques simultaneously.

Buck Shlegeris: I’m imagining some pretty slow take off here, and I don’t imagine this as ever having a phase where we built this AGI and now we need to align it. It’s more like we’re continuously building and deploying these systems that are gradually more and more powerful, and every time we want to deploy a system, it has to be doing something which is useful to someone. And many of the things which are useful, require things that are kind of like alignment. “I want to make a lot of money from my system that will give advice,” and if it wants to give good generalist advice over email, it’s going to need to have at least some implicit understanding of human preferences. Maybe we just use giant language models and everything’s just totally fine here. A really good language model isn’t able to give arbitrarily good aligned advice, but you can get advice that sounds really good from a language model, and I’m worried that the default path is going to involve the most popular AI advice services being kind of misaligned, and just never bothering to fix that. Does that make any more sense?

Rohin Shah: Yeah, I think I totally buy that that will happen. But I think I’m more like as you get to AI systems doing more and more important things in the world, it becomes more and more important that they are really truly aligned and investment in alignment increases correspondingly.

Buck Shlegeris: What’s the mechanism by which people realize that they need to put more work into alignment here?

Rohin Shah: I think there’s multiple. One is I expect that people are aware, like even in the Uber case, I expect people are aware of the misalignment that exists, but decide that it’s not worth their time to fix it. So the continuation of that, people will be aware of it and then they will decide that they should fix it.

Buck Shlegeris: If I’m trying to sell to city governments this language model based system which will give them advice on city planning, it’s not clear to me that at any point the city governments are going to start demanding better alignment features. Maybe that’s the way that it goes but it doesn’t seem obvious that city governments would think to ask that, and —

Rohin Shah: I wasn’t imagining this from the user side. I was imagining this from the engineers or designers side.

Buck Shlegeris: Yeah.

Rohin Shah: I think from the user side I would speak more to warning shots. You know, you have your cashier AI system or your waiter AIs and they were optimizing for tips more so than actually collecting money and so they like offer free meals in order to get more tips. At some point one of these AI systems passes all of the internal checks and makes it out into the world and only then does the problem arise and everyone’s like, “Oh my God, this is terrible. What the hell are you doing? Make this better.”

Buck Shlegeris: There’s two mechanisms via which that alignment might be okay. One of them is that researchers might realize that they want to put more effort into alignment and then solve these problems. The other mechanism is that users might demand better alignment because of warning shots. I think that I don’t buy that either of these is sufficient. I don’t buy that it’s sufficient for researchers to decide to do it because in a competitive world, the researchers who realize this is important, if they try to only make aligned products, they are not going to be able to sell them because their products will be much less good than the unaligned ones. So you have to argue that there is demand for the things which are actually aligned well. But for this to work, your users have to be able to distinguish between things that have good alignment properties and those which don’t, and this seems really hard for users to do. And I guess, when I try to imagine analogies, I just don’t see many examples of people successfully solving problems like this, like businesses making products that are different levels of dangerousness, and then users successfully buying the safe ones.

Rohin Shah: I think usually what happens is you get regulation that forces everyone to be safe. I don’t know if it was regulation, but like airplanes are incredibly safe. Cars are incredibly safe.

Buck Shlegeris: Yeah but in this case what would happen is doing the unsafe thing allows you to make enormous amounts of money, and so the countries which don’t put in the regulations are going to be massively advantaged compared to ones which don’t.

Rohin Shah: Why doesn’t that apply for cars and airplanes?

Buck Shlegeris: So to start with, cars in poor countries are a lot less safe. Another thing is that a lot of the effort in making safer cars and airplanes comes from designing them. Once you’ve done the work of designing it, it’s that much more expensive to put your formally-verified 747 software into more planes, and because of weird features of the fact that there are only like two big plane manufacturers, everyone gets the safer planes.

Lucas Perry: So tying this into robustness. The fundamental concern here is about the incentives to make aligned systems that are safety and alignment robust in the real world.

Rohin Shah: I think that’s basically right. I sort of see these incentives as existing and the world generally being reasonably good at dealing with high stakes problems.

Buck Shlegeris: What’s an example of the world being good at dealing with a high stakes problem?

Rohin Shah: I feel like biotech seems reasonably well handled, relatively speaking,

Buck Shlegeris: Like bio-security?

Rohin Shah: Yeah.

Buck Shlegeris: Okay, if the world handles AI as well as bio-security, there’s no way we’re okay.

Rohin Shah: Really? I’m aware of ways in which we’re not doing bio-security well, but there seem to be ways in which we’re doing it well too.

Buck Shlegeris: The nice thing about bio-security is that very few people are incentivized to kill everyone, and this means that it’s okay if you’re sloppier about your regulations, but my understanding is that lots of regulations are pretty weak.

Rohin Shah: I guess I was more imagining the research community’s coordination on this. Surprisingly good.

Buck Shlegeris: I wouldn’t describe it that way.

Rohin Shah: It seems like the vast majority of the research community is onboard with the right thing and like 1% isn’t. Yeah. Plausibly we need to have regulations for that last 1%.

Buck Shlegeris: I think that 99% of the synthetic biology research community is on board with “it would be bad if everyone died.” I think that some very small proportion is onboard with things like “we shouldn’t do research if it’s very dangerous and will make the world a lot worse.” I would say like way less than half of synthetic biologists seem to agree with statements like “it’s bad to do really dangerous research.” Or like, “when you’re considering doing research, you consider differential technological development.” I think this is just not a thing biologists think about, from my experience talking to biologists.

Rohin Shah: I’d be interested in betting with you on this afterwards.

Buck Shlegeris: Me too.

Lucas Perry: So it seems like it’s going to be difficult to come down to a concrete understanding or agreement here on the incentive structures in the world and whether they lead to the proliferation of unaligned AI systems or semi aligned AI systems versus fully aligned AI systems and whether that poses a kind of lock-in, right? Would you say that that fairly summarizes your concern Buck?

Buck Shlegeris: Yeah. I expect that Rohin and I agree mostly on the size of the coordination problem required, or the costs that would be required by trying to do things the safer way. And I think Rohin is just a lot more optimistic about those costs being paid.

Rohin Shah: I think I’m optimistic both about people’s ability to coordinate paying those costs and about incentives pointing towards paying those costs.

Buck Shlegeris: I think that Rohin is right that I disagree with him about the second of those as well.

Lucas Perry: Are you interested in unpacking this anymore? Are you happy to move on?

Buck Shlegeris: I actually do want to talk about this for two more minutes. I am really surprised by the claim that humans have solved coordination problems as hard as this one. I think the example you gave is humans doing radically nowhere near well enough. What are examples of coordination problem type things… There was a bunch of stuff with nuclear weapons, where I feel like humans did badly enough that we definitely wouldn’t have been okay in an AI situation. There are a bunch of examples of the US secretly threatening people with nuclear strikes, which I think is an example of some kind of coordination failure. I don’t think that the world has successfully coordinated on never threaten first nuclear strikes. If we had successfully coordinated on that, I would consider nuclear weapons to be less of a failure, but as it is the US has actually according to Daniel Ellsberg threatened a bunch of people with first strikes.

Rohin Shah: Yeah, I think I update less on specific scenarios and update quite a lot more on, “it just never happened.” The sheer amount of coincidence that would be required given the level of, Oh my God, there were close calls multiple times a year for many decades. That seems just totally implausible and it just means that our understanding of what’s happening is wrong.

Buck Shlegeris: Again, also the thing I’m imagining is this very gradual takeoff world where people, every year, they release their new most powerful AI systems. And if, in a particular year, AI Corp decided to not release its thing, then AI Corps two and three and four would rise to being one, two and three in total profits instead of two, three and four. In that kind of a world, I feel a lot more pessimistic.

Rohin Shah: I’m definitely imagining more of the case where they coordinate to all not do things. Either by international regulation or via the companies themselves coordinating amongst each other. Even without that, it’s plausible that AI Corp one does this. One example I’d give is, Waymo has just been very slow to deploy self driving cars relative to all the other self driving car companies, and my impression is that this is mostly because of safety concerns.

Buck Shlegeris: Interesting and slightly persuasive example. I would love to talk through this more at some point. I think this is really important and I think I haven’t heard a really good conversation about this.

Apologies for describing what I think is going wrong inside your mind or something, which is generally a bad way of saying things, but it sounds kind of to me like you’re implicitly assuming more concentrated advantage and fewer actors than I think actually are implied by gradual takeoff scenarios.

Rohin Shah: I’m usually imagining something like a 100+ companies trying to build the next best AI system, and 10 or 20 of them being clear front runners or something.

Buck Shlegeris: That makes sense. I guess I don’t quite see how the coordination successes you were describing arise in that kind of a world. But I am happy to move on.

Lucas Perry: So before we move on on this point, is there anything which you would suggest as obvious solutions, should Buck’s model of the risks here be the case. So it seemed like it would demand more centralized institutions which would help to mitigate some of the lock in here.

Rohin Shah: Yeah. So there’s a lot of work in policy and governance about this. Not much of which is public unfortunately. But I think the thing to say is that people are thinking about it and it does sort of look like trying to figure out how to get the world to actually coordinate on things. But as Buck has pointed out, we have tried to do this before and so there’s probably a lot to learn from past cases as well. But I am not an expert on this and don’t really want to talk as though I were one.

Lucas Perry: All right. So there’s lots of governance and coordination thought that kind of needs to go into solving many of these coordination issues around developing beneficial AI. So I think with that we can move along now to scaling to superhuman abilities. So Rohin, what do you have to say about this topic area?

Rohin Shah: I think this is in some sense related to what we were talking about before, you can predict what a human would say, but it’s hard to back out true underlying values beneath them. Here the problem is, suppose you are learning from some sort of human feedback about what you’re supposed to be doing, the information contained in that tells you how to do whatever the human can do. It doesn’t really tell you how to exceed what the human can do without having some additional assumptions.

Now, depending on how the human feedback is structured, this might lead to different things like if the human is demonstrating how to do the task to you, then this would suggest that it would be hard to do the task any better than the human can, but if the human was evaluating how well you did the task, then you can do the task better in a way that the human wouldn’t be able to tell was better. Ideally, at some point we would like to have AI systems that can actually do just really powerful, great things, that we are unable to understand all the details of and so we would neither be able to demonstrate or evaluate them.

How do we get to those sorts of AI systems? The main proposals in this bucket are iterated amplification, debate, and recursive reward modeling. So in iterated amplification, we started with an initial policy, and we alternate between amplification and distillation, which increases capabilities and efficiency respectively. This can encode a bunch of different algorithms, but usually amplification is done by decomposing questions into easier sub questions, and then using the agent to answer those sub questions. While distillation can be done using supervised learning or reinforcement learning, so you get these answers that are created by these amplified systems that take a long time to run, and you just train a neural net to very quickly predict the answers without having to do this whole big decomposition thing. In debate, we train an agent through self play in a zero sum game where the agent’s goal is to win a question answering debate as evaluated by a human judge. The hope here is that since both sides of the debate can point out flaws in the other side’s arguments — they’re both very powerful AI systems — such a set up can use a human judge to train far more capable agents while still incentivizing the agents to provide honest true information. With recursive reward modeling, you can think of it as an instantiation of the general alternate between amplification and distillation framework, but it works sort of bottom up instead of top down. So you’ll start by building AI systems that can help you evaluate simple, easy tasks. Then use those AI systems to help you evaluate more complex tasks and you keep iterating this process until eventually you have AI systems that help you with very complex tasks like how to design the city. And this lets you then train an AI agent that can design the city effectively even though you don’t totally understand why it’s doing the things it’s doing or why they’re even good.

Lucas Perry: Do either of you guys have any high level thoughts on any of these approaches to scaling to superhuman abilities?

Buck Shlegeris: I have some.

Lucas Perry: Go for it.

Buck Shlegeris: So to start with, I think it’s worth noting that another approach would be ambitious value learning, in the sense that I would phrase these not as approaches for scaling to superhuman abilities, but they’re like approaches for scaling to superhuman abilities while only doing tasks that relate to the actual behavior of humans rather than trying to back out their values explicitly. Does that match your thing Rohin?

Rohin Shah: Yeah, I agree. I often phrase that as with ambitious value learning, there’s not a clear ground truth to be focusing on, whereas with all three of these methods, the ground truth is what a human would do if they got a very, very long time to think or at least that is what they’re trying to approximate. It’s a little tricky to see why exactly they’re approximating that, but there are some good posts about this. The key difference between these techniques and ambitious value learning is that there is in some sense a ground truth that you are trying to approximate.

Buck Shlegeris: I think these are all kind of exciting ideas. I think they’re all kind of better ideas than I expected to exist for this problem a few years ago. Which probably means we should update against my ability to correctly judge how hard AI safety problems are, which is great news, in as much as I think that a lot of these problems are really hard. Nevertheless, I don’t feel super optimistic that any of them are actually going to work. One thing which isn’t in the elevator pitch for IDA, which is iterated distillation and amplification (and debate), is that you get to hire the humans who are going to be providing the feedback, or the humans whose answers AI systems are going to be trained with. And this is actually really great. Because for instance, you could have this program where you hire a bunch of people and you put them through your one month long training an AGI course. And then you only take the top 50% of them. I feel a lot more optimistic about these proposals given you’re allowed to think really hard about how to set it up such that the humans have the easiest time possible. And this is one reason why I’m optimistic about people doing research in factored cognition and stuff, which I’m sure Rohin’s going to explain in a bit.

One comment about recursive reward modeling: it seems like it has a lot of things in common with IDA. The main downside that it seems to have to me is that the human is in charge of figuring out how to decompose the task into evaluations at a variety of levels. Whereas with IDA, your system itself is able to naturally decompose the task into a variety levels, and for this reason I feel a bit more optimistic about IDA.

Rohin Shah: With recursive reward modeling, one agent that you can train is just an agent that’s good at doing decompositions. That is a thing you can do with it. It’s a thing that the people at DeepMind are thinking about. 

Buck Shlegeris: Yep, that’s a really good point. 

Rohin Shah: I also strongly like the fact that you can train your humans to be good at providing feedback. This is also true about specification learning. It’s less clear if it’s true about ambitious value learning. No one’s really proposed how you could do ambitious value learning really. Maybe arguably Stuart Russell’s book is kind of a proposal, but it doesn’t have that many details.

Buck Shlegeris: And, for example, it doesn’t address any of my concerns in ways that I find persuasive.

Rohin Shah: Right. But for specification learning also you definitely want to train the humans who are going to be providing feedback to the AI system. That is an important part of why you should expect this to work.

Buck Shlegeris: I often give talks where I try to give an introduction to IDA and debate as a proposal for AI alignment. I’m giving these talks to people with computer science backgrounds, and they’re almost always incredibly skeptical that it’s actually possible to decompose thought in this kind of a way. And with debate, they’re very skeptical that truth wins, or that the nash equilibrium is accuracy. For this reason I’m super enthusiastic about research into the factored cognition hypothesis of the type that Ought is doing some of.

I’m kind of interested in your overall take for how likely it is that the factored cognition hypothesis holds and that it’s actually possible to do any of this stuff, Rohin. You could also explain what that is.

Rohin Shah: I’ll do that. So basically with both iterated amplification, debate, or recursive reward modeling, they all hinge on this idea of being able to decompose questions, maybe it’s not so obvious why that’s true for debate, but it’s true. Go listen to the podcast about debate if you want to get more details on that.

So this hypothesis is basically for any tasks that we care about, it is possible to decompose this into a bunch of sub tasks that are all easier to do. Such that if you’re able to do the sub tasks, then you can do the overall top level tasks and in particular you can iterate this down, building a tree of smaller and smaller tasks until you can get to the level of tasks that a human could do in a day. Or if you’re trying to do it very far, maybe tasks that a human can do in a couple of minutes. Whether or not you can actually decompose the task “be an effective CEO” into a bunch of sub tasks that eventually bottom out into things humans can do in a few minutes is totally unclear. Some people are optimistic, some people are pessimistic. It’s called the factored cognition hypothesis and Ought is an organization that’s studying it.

It sounds very controversial at first and I, like many other people had the intuitive reaction of, ‘Oh my God, this is never going to work and it’s not true’. I think the thing that actually makes me optimistic about it is you don’t have to do what you might call a direct decomposition. You can do things like if your task is to be an effective CEO, your first sub question could be, what are the important things to think about when being a CEO or something like this, as opposed to usually when I think of decompositions I would think of, first I need to deal with hiring. Maybe I need to understand HR, maybe I need to understand all of the metrics that the company is optimizing. Very object level concerns, but the decompositions are totally allowed to also be meta level where you’ll spin off a bunch of computation that is just trying to answer the meta level of question of how should I best think about this question at all.

Another important reason for optimism is that based on the structure of iterated amplification, debate and recursive reward modeling, this tree can be gigantic. It can be exponentially large. Something that we couldn’t run even if we had all of the humans on Earth collaborating to do this. That’s okay. Given how the training process is structured, considering the fact that you can do the equivalent of millennia of person years of effort in this decomposed tree, I think that also gives me more of a, ‘okay, maybe this is possible’ and that’s also why you’re able to do all of this meta level thinking because you have a computational budget for it. When you take all of those together, I sort of come up with “seems possible. I don’t really know.”

Buck Shlegeris: I think I’m currently at 30-to-50% on the factored cognition thing basically working out. Which isn’t nothing.

Rohin Shah: Yeah, that seems like a perfectly reasonable thing. I think I could imagine putting a day of thought into it and coming up with numbers anywhere between 20 and 80.

Buck Shlegeris: For what it’s worth, in conversation at some point in the last few years, Paul Christiano gave numbers that were not wildly more optimistic than me. I don’t think that the people who are working on this think it’s obviously fine. And it would be great if this stuff works, so I’m really in favor of people looking into it.

Rohin Shah: Yeah, I should mention another key intuition against it. We have all these examples of human geniuses like Ramanujan, who were posed very difficult math problems and just immediately get the answer and then you ask them how did they do it and they say, well, I asked myself what should the answer be? And I was like, the answer should be a continued fraction. And then I asked myself which continued fraction and then I got the answer. And you’re like, that does not sound very decomposable. It seems like you need these magic flashes of intuition. Those would be the hard cases for factored cognition. It still seems possible that you could do it by both this exponential try a bunch of possibilities and also by being able to discover intuitions that work in practice and just believing them because they work in practice and then applying them to the problem at hand. You could imagine that with enough computation you’d be able to discover such intuitions.

Buck Shlegeris: You can’t answer a math problem by searching exponentially much through the search tree. The only exponential power you get from IDA is IDA is letting you specify the output of your cognitive process in such a way that’s going to match some exponentially sized human process. As long as that exponentially sized human process was only exponentially sized because it’s really inefficient, but is kind of fundamentally not an exponentially sized problem, then your machine learning should be able to speed it up a bunch. But the thing where you search over search strategy is not valid. If that’s all you can do, that’s not good enough.

Rohin Shah: Searching over search strategies, I agree you can’t do, but if you have an exponential search that could be implemented by humans. We know by hypothesis, if you can solve it with a flash of intuition, there is in fact some more efficient way to do it and so whether or not the distillation steps will actually be enough to get to the point where you can do those flashes of intuition. That’s an open question.

Buck Shlegeris: This is one of my favorite areas of AI safety research and I would love for there to be more of it. Something I have been floating for a little while is I kind of wish that there was another Ought. It just seems like it would be so good if we had definitive information about the factored cognition hypothesis. And it also it seems like the kind of thing which is potentially parallelizable. And I feel like I know a lot of people who love talking about how thinking works. A lot of rationalists are really into this. I would just be super excited for some of them to form teams of four and go off on their own and build an Ought competitor. I feel like this is the kind of thing where plausibly, a bunch of enthusiastic people could make progress on their own.

Rohin Shah: Yeah, I agree with that. Definitely seems like one of the higher value things but I might be more excited about universality.

Lucas Perry: All right, well let’s get started with universality then. What is universality and why are you optimistic about it?

Rohin Shah: So universality is hard to explain well, in a single sentence. For whatever supervisor is training our agent, you want that supervisor to “know everything the agent knows.” In particular if the agent comes up with some deceptive strategy to look like it’s achieving the goal, but actually it hasn’t. The supervisors should know that it was doing this deceptive strategy for the reason of trying to trick the supervisor and so the supervisor can then penalize it. The classic example of why this is important and hard also due to Paul Christiano is plagiarism. Suppose you are training on the AI system to produce novel works of literature and as part of its training data, the AI system gets to read this library of a million books.

It’s possible that this AI system decides, Hey, you know the best way I can make a great novel seeming book is to just take these five books and take out plot points, passages from each of them and put them together and then this new book will look totally novel and will be very good because I used all of the best Shakespearean writing or whatever. If your supervisor doesn’t know that the agent has done this, the only way the supervisor can really check is to go read the entire million books. Even if the agent only read 10 books and so then the supervision becomes a way more costly than running the agent, which is not a great state to be in, and so what you really want is that if the agent does this, the supervisor is able to say, I see that you just copied this stuff over from these other books in order to trick me into thinking that you had written something novel that was good.

That’s bad. I’m penalizing you. Stop doing that in the future. Now, this sort of property, I mean it’s very nice in the abstract, but who knows whether or not we can actually build it in practice. There’s some reason for optimism that I don’t think I can adequately convey, but I wrote a newsletter summarizing some of it sometime ago, but again, reading through the posts I became more optimistic that it was an achievable property, than when I first heard what the property was. The reason I’m optimistic about it is that it just sort of seems to capture the thing that we actually care about. It’s not everything, like it doesn’t solve the robustness problem. Universality only tells you what the agent’s currently doing. You know all the facts about that. Whereas for robustness you want to say even in these hypothetical situations that the agent hasn’t encountered yet and doesn’t know stuff about, even when it encounters those situations, it’s going to stay aligned with you so universality doesn’t get you all the way there, but it definitely feels like it’s getting you quite a bit.

Buck Shlegeris: That’s really interesting to hear you phrase it that way. I guess I would have thought of universality as a subset of robustness. I’m curious what you think of that first.

Rohin Shah: I definitely think you could use universality to achieve a subset of robustness. Maybe I would say universality is a subset of interpretability.

Buck Shlegeris: Yeah, and I care about interpretability as a subset of robustness basically, or as a subset of inner alignment, which is pretty close to robustness in my mind. The other thing I would say is you were saying there that one difference between universality and robustness is that universality only tells you why the agent did the thing it currently did, and this doesn’t suffice to tell us about the situations that the agent isn’t currently in. One really nice thing though is that if the agent is only acting a particular way because it wants you to trust it, that’s a fact about its current behavior that you will know, and so if you have the universality property, your overseer just knows your agent is trying to deceive it. Which seems like it would be incredibly great and would resolve like half of my problem with safety if you had it.

Rohin Shah: Yeah, that seems right. The case that universality doesn’t cover is when your AI system is initially not deceptive, but then at some point in the future it’s like, ‘Oh my God, now it’s possible to go and build Dyson spheres or something, but wait, in this situation probably I should be doing this other thing and humans won’t like that. Now I better deceive humans’. The transition into deception would have to be a surprise in some sense even to the AI system.

Buck Shlegeris: Yeah, I guess I’m just not worried about that. Suppose I have this system which is as smart as a reasonably smart human or 10 reasonably smart humans, but it’s not as smart as the whole world. If I can just ask it what its best sense about how aligned it is, is? And if I can trust its answer? I don’t know man, I’m pretty okay with systems that think they’re aligned, answering that question honestly.

Rohin Shah: I think I somewhat agree. I like this reversal where I’m the pessimistic one.

Buck Shlegeris: Yeah me too. I’m like, “look, system, I want you to think as hard as you can to come up with the best arguments you can come up with for why you are misaligned, and the problems with you.” And if I just actually trust the system to get this right, then the bad outcomes I get here are just pure accidents. I just had this terrible initialization of my neural net parameters, such that I had this system that honestly believed that it was going to be aligned. And then as it got trained more, this suddenly changed and I couldn’t do anything about it. I don’t quite see the story for how this goes super wrong. It seems a lot less bad than the default situation.

Rohin Shah: Yeah. I think the story I would tell is something like, well, if you look at humans, they’re pretty wrong about what their preferences will be in the future. For example, there’s this trope of how teenagers fall in love and then fall out of love, but when they’re in love, they swear undying oaths to each other or something. To the extent that is true, that seems like the sort of failure that could lead to x-risk if it also happened with AI systems.

Buck Shlegeris: I feel pretty optimistic about all the garden-variety approaches to solving this. Teenagers were not selected very hard on accuracy of their undying oaths. And if you instead had accuracy of self-model as a key feature you were selecting for in your AI system, plausibly you’ll just be way more okay.

Rohin Shah: Yeah. Maybe people could coordinate well on this. I feel less good about people coordinating on this sort of problem.

Buck Shlegeris: For what it’s worth, I think there are coordination problems here and I feel like my previous argument about why coordination is hard and won’t happen by default also probably applies to us not being okay. I’m not sure how this all plays out. I’d have to think about it more.

Rohin Shah: Yeah. I think it’s more like this is a subtle and non-obvious problem, which by hypothesis doesn’t happen in the systems you actually have and only happens later and those are the sorts of problems I’m like, Ooh, not sure if we can deal with those ones, but I agree that there’s a good chance that there’s just not a problem at all in the world where we already have universality and checked all the obvious stuff.

Buck Shlegeris: Yeah. I would like to say universality is one of my other favorite areas of AI alignment research, in terms of how happy I’d be if it worked out really well.

Lucas Perry: All right, so let’s see if we can slightly pick up the pace here. Moving forward and starting with interpretability.

Rohin Shah: Yeah, so I mean I think we’ve basically discussed interpretability already. Universality is a specific kind of interpretability, but the case for interpretability is just like, sure seems like it would be good if you could understand what your AI systems are doing. You could then notice when they’re not aligned, and fix that somehow. It’s a pretty clear cut case for a thing that would be good if we achieved it and it’s still pretty uncertain how likely we are to be able to achieve it.

Lucas Perry: All right, so let’s keep it moving and let’s hit impact regularization now.

Rohin Shah: Yeah, impact regularization in particular is one of the ideas that are not trying to align the AI system but are instead trying to say, well, whatever AI system we build, let’s make sure it doesn’t cause a catastrophe. It doesn’t lead to extinction or existential risk. What it hopes to do is say, all right, AI system, do whatever it is you wanted to do. I don’t care about that. Just make sure that you don’t have a huge impact upon the world.

Whatever you do, keep your impact not too high. And so there’s been a lot of work on this in recent years there’s been relative reachability, attainable utility preservation, and I think in general the sense is like, wow, it’s gone quite a bit further than people expected it to go. I think it definitely does prevent you from doing very, very powerful things of the sort, like if you wanted to stop all competing AI projects from ever being able to build AGI, that doesn’t seem like the sort of thing you can do with an impact regularized AI system, but it sort of seems plausible that you could prevent convergent instrumental sub goals using impact regularization. Where AI systems that are trying to steal resources and power from humans, you could imagine that you’d say, hey, don’t do that level of impact, you can still have the level of impact of say running a company or something like that.

Buck Shlegeris: My take on all this is that I’m pretty pessimistic about all of it working. I think that impact regularization or whatever is a non-optimal point on the capabilities / alignment trade off or something, in terms of safety you’re getting for how much capability you’re sacrificing. My basic a problem here is basically analogous to my problem with value learning, where I think we’re trying to take these extremely essentially fuzzy concepts and then factor our agent through these fuzzy concepts like impact, and basically the thing that I imagine happening is any impact regularization strategy you try to employ, if your AI is usable, will end up not helping with its alignment. For any definition of impacts you come up with, it’ll end up doing something which gets around that. Or it’ll make your AI system completely useless, is my basic guess as to what happens.

Rohin Shah: Yeah, so I think again in this setting, if you formalize it and then say, consider the optimal agent. Yeah, that can totally get around your impact penalty, but in practice it sure seems like, what you want to do is say this convergent instrumental subgoal stuff, don’t do any of that. Continue to do things that are normal in regular life. And those seem like pretty distinct categories. Such that I would not be shocked if we could actually distinguish between the two.

Buck Shlegeris: It sounds like the main benefit you’re going for is trying to make your AI system not do insane, convergent, instrumental sub-goal style stuff. So another approach I can imagine taking here would be some kind of value learning or something, where you’re asking humans for feedback on whether plans are insanely convergent, instrumental sub-goal style, and just not doing the things which, when humans are asked to rate how sketchy the plans are the humans rate as sufficiently sketchy? That seems like about as good a plan. I’m curious what you think.

Rohin Shah: The idea of power as your attainable utility across a wide variety of utility functions seems like a pretty good formalization to me. I think in the worlds where I actually buy a formalization, I tend to expect the formalization to work better. I do think the formalization is not perfect. Most notably with the current formalization of power, your power never changes if you have extremely good beliefs. Your notion, you’re just like, I always have the same power because I’m always able to do the same things and you never get surprised, so maybe I agree with you because I think the current formalization is not good enough.  (The strike through section has been redacted by Rohin. It’s incorrect and you can see why here.) Yeah, I think I agree with you but I could see it going either way.

Buck Shlegeris: I could be totally wrong about this, and correct me if I’m wrong, my sense is that you have to be able to back out the agent’s utility function or its models of the world. Which seems like it’s assuming a particular path for AI development which doesn’t seem to me particularly likely.

Rohin Shah: I definitely agree with that for all the current methods too.

Buck Shlegeris: So it’s like: assume that we have already perfectly solved our problems with universality and robustness and transparency and whatever else. I feel like you kind of have to have solved all of those problems before you can do this, and then you don’t need it or something.

Rohin Shah: I don’t think I agree with that. I definitely agree that the current algorithms that people have written assume that you can just make a change to the AI’s utility function. I don’t think that’s what even their proponents would suggest as the actual plan.

Buck Shlegeris: What is the actual plan?

Rohin Shah: I don’t actually know what their actual plan would be, but one plan I could imagine is figure out what exactly the conceptual things we have to do with impact measurement are, and then whatever method we have for building AGI, probably there’s going to be some part which is specify the goal and then in the specify goal part, instead of just saying pursue X, we want to say pursue X without changing your ability to pursue Y, and Z and W, and P, and Q.

Buck Shlegeris: I think that that does not sound like a good plan. I don’t think that we should expect our AI systems to be structured that way in the future.

Rohin Shah: Plausibly we have to do this with natural language or something.

Buck Shlegeris: It seems very likely to me that the thing you do is reinforcement learning where at the start of the episode you get a sentence of English which is telling you what your goal is and then blah, blah, blah, blah, blah, and this seems like a pretty reasonable strategy for making powerful and sort of aligned AI. Aligned enough to be usable for things that aren’t very hard. But you just fundamentally don’t have access to the internal representations that the AI is using for its sense of what belief is, and stuff like that. And that seems like a really big problem.

Rohin Shah: I definitely see this as more of an outer alignment thing, or like an easier to specify outer alignment type thing than say, IDA is that type stuff.

Buck Shlegeris: Okay, I guess that makes sense. So we’re just like assuming we’ve solved all the inner alignment problems?

Rohin Shah: In the story so far yeah, I think all of the researchers who actually work on this haven’t thought much about inner alignment.

Buck Shlegeris: My overall summary is that I really don’t like this plan. I feel like it’s not robust to scale. As you were saying Rohin, if your system gets more and more accurate beliefs, stuff breaks. It just feels like the kind of thing that doesn’t work.

Rohin Shah: I mean, it’s definitely not conceptually neat and elegant in the sense of it’s not attacking the underlying problem. And in a problem setting where you expect adversarial optimization type dynamics, conceptual elegance actually does count for quite a lot in whether or not you believe your solution will work.

Buck Shlegeris: I feel it’s like trying to add edge detectors to your image classifiers to make them more adversarily robust or something, which is backwards.

Rohin Shah: Yeah, I think I agree with that general perspective. I don’t actually know if I’m more optimistic than you. Maybe I just don’t say… Maybe we’d have the same uncertainty distributions and you just say yours more strongly or something.

Lucas Perry: All right, so then let’s just move a little quickly through the next three, which are causal modeling, oracles, and decision theory.

Rohin Shah: Yeah, I mean, well decision theory, MIRI did some work on it. I am not the person to ask about it, so I’m going to skip that one. Even if you look at the long version, I’m just like, here are some posts. Good luck. So causal modeling, I don’t fully understand what the overall story is here but the actual work that’s been published is basically what we can do is we can take potential plans or training processes for AI systems. We can write down causal models that tell us how the various pieces of the training system interact with each other and then using algorithms developed for causal models we can tell when an AI system would have an incentive to either observe or intervene on an underlying variable.

One thing that came out of this was that you can build a model-based reinforcement learner that doesn’t have any incentive to wire head as long as when it makes its plans, the plans are evaluated by the current reward function as opposed to whatever future reward function it would have. And that was explained using this framework of causal modeling. Oracles, Oracles are basically the idea that we can just train an AI system to just answer questions, give it a question and it tries to figure out the best answer it can to that question, prioritizing accuracy.

One worry that people have recently been talking about is the predictions that the Oracle makes then affect the world, which can affect whether or not the prediction was correct. Like maybe if I predict that I will go to bed at 11 then I’m more likely to actually go to bed at 11 because I want my prediction to come true or something and so then the Oracles can still “choose” between different self confirming predictions and so that gives them a source of agency and one way that people want to avoid this is using what are called counter-factual Oracles where you set up the training, such that the Oracles are basically making predictions under the assumption that their predictions are not going to influence the future.

Lucas Perry: Yeah, okay. Oracles seem like they just won’t happen. There’ll be incentives to make things other than Oracles and that Oracles would even be able to exert influence upon the world in other ways.

Rohin Shah: Yeah, I think I agree that Oracles do not seem very competitive.

Lucas Perry: Let’s do forecasting now then.

Rohin Shah: So the main sub things within forecasting one, there’s just been a lot of work recently on actually building good forecasting technology. There has been an AI specific version of Metaculus that’s been going on for a while now. There’s been some work at the Future of Humanity Institute on building better tools for working with probability distributions under recording and evaluating forecasts. There was an AI resolution council where basically now you can make forecasts about what this particular group of people will think in five years or something like that, which is much easier to operationalize than most other kinds of forecasts. So this helps with constructing good questions. On the actual object level, I think there are two main things. One is that it became increasingly more obvious in the past two years that AI progress currently is being driven by larger and larger amounts of compute.

It totally could be driven by other things as well, but at the very least, compute is a pretty important factor. And then takeoff speeds. So there’s been this long debate in the AI safety community over whether — to take the extremes, whether or not we should expect that AI capabilities will see a very sharp spike. So initially, your AI capabilities are improving by like one unit a year, maybe then with some improvements it got to two units a year and then for whatever reason, suddenly they’re now at 20 units a year or a hundred units a year and they just swoop way past what you would get by extrapolating past trends, and so that’s what we might call a discontinuous takeoff. If you predict that that won’t happen instead you’ll get AI that’s initially improving at one unit per year. Then maybe two units per year, maybe three units per year. Then five units per year, and the rate of progress continually increases. The world’s still gets very, very crazy, but in a sort of gradual, continuous way that would be called a continuous takeoff.

Basically there were two posts that argued pretty forcefully for continuous takeoff back in, I want to say February of 2018, and this at least made me believe that continuous takeoff was more likely. Sadly, we just haven’t actually seen much defense of the other side of the view since then. Even though we do know that there definitely are people who still believe the other side, that there will be a discontinuous takeoff.

Lucas Perry: Yeah so what are both you guys’ views on them?

Buck Shlegeris: Here are a couple of things. One is that I really love the operationalization of slow take off or continuous take off that Paul provided in his post, which was one of the ones Rohin was referring to from February 2018. He says, “by slow takeoff, I mean that there is a four year doubling of the economy before there is a one year doubling of the economy.” As in, there’s a period of four years over which world GDP increases by a factor four, after which there is a period of one year. As opposed to a situation where the first one-year doubling happens out of nowhere. Currently, doubling times for the economy are on the order of like 20 years, and so a one year doubling would be a really big deal. The way that I would phrase why we care about this, is because worlds where we have widespread, human level AI feel like they have incredibly fast economic growth. And if it’s true that we expect AI progress to increase gradually and continuously, then one important consequence of this is that by the time we have human level AI systems, the world is already totally insane. A four year doubling would just be crazy. That would be economic growth drastically higher than economic growth currently is.

This means it would be obvious to everyone who’s paying attention that something is up and the world is radically changing in a rapid fashion. Another way I’ve been thinking about this recently is people talk about transformative AI, by which they mean AI which would have at least as much of an impact on the world as the industrial revolution had. And it seems plausible to me that octopus level AI would be transformative. Like suppose that AI could just never get better than octopus brains. This would be way smaller of a deal than I expect AI to actually be, but it would still be a massive deal, and would still possibly lead to a change in the world that I would call transformative. And if you think this is true, and if you think that we’re going to have octopus level AI before we have human level AI, then you should expect that radical changes that you might call transformative have happened by the time that we get to the AI alignment problems that we’ve been worrying about. And if so, this is really big news.

When I was reading about this stuff when I was 18, I was casually imagining that the alignment problem is a thing that some people have to solve while they’re building an AGI in their lab while the rest of the world’s ignoring them. But if the thing which is actually happening is the world is going insane around everyone, that’s a really important difference.

Rohin Shah: I would say that this is probably the most important contested question in AI alignment right now. Some consequences of it are in a gradual or continuous takeoff world you expect that by the time we get to systems that can pose an existential risk. You’ve already had pretty smart systems that have been deployed in the real world. They probably had some failure modes. Whether or not we call them alignment failure modes or not is maybe not that important. The point is people will be aware that AI systems can fail in weird ways, depending on what sorts of failures you expect, you might expect this to lead to more coordination, more involvement in safety work. You might also be more optimistic about using testing and engineering styles of approaches to the problem which rely a bit more on trial and error type of reasoning because you actually will get a chance to see errors before they happen at a super intelligent existential risk causing mode. There are lots of implications of this form that pretty radically change which alignment plans you think are feasible.

Buck Shlegeris: Also, it’s pretty radically changed how optimistic you are about this whole AI alignment situation, at the very least, people who are very optimistic about AI alignment causing relatively small amounts of existential risk. A lot of the reason for this seems to be that they think that we’re going to get these warning shots where before we have superintelligent AI, we have sub-human level intelligent AI with alignment failures like the cashier Rohin was talking about earlier. And then people start caring about AI alignment a lot more. So optimism is also greatly affected by what you think about this.

I’ve actually been wanting to argue with people about this recently. I wrote a doc last night where I was arguing that even in gradual takeoff worlds, we should expect a reasonably high probability of doom if we can’t solve the AI alignment problem. And I’m interested to have this conversation in more detail with people at some point. But yeah, I agree with what Rohin said.

Overall on takeoff speeds, I guess I still feel pretty uncertain. It seems to me that currently, what we can do with AI, like AI capabilities are increasing consistently, and a lot of this comes from applying relatively non-mindblowing algorithmic ideas to larger amounts of compute and data. And I would be kind of surprised if you can’t basically ride this wave away until you have transformative AI. And so if I want to argue that we’re going to have fast takeoffs, I kind of have to argue that there’s some other approach you can take which lets you build AI without having to go along that slow path, which also will happen first. And I guess I think it’s kind of plausible that that is what’s going to happen. I think that’s what you’d have to argue for if you want to argue for a fast take off.

Rohin Shah: That all seems right to me. I’d be surprised if, out of nowhere, we saw a new AI approach suddenly started working and overtook deep learning. You also have to argue that it then very quickly reaches human level AI, which would be quite surprising, right? In some sense, it would have to be something completely novel that we failed to think about in the last 60 years. We’re putting in way more effort now than we were in the last 60 years, but then counter counterpoint is that all of that extra effort is going straight into deep learning. It’s not really searching for completely new paradigm-shifting ways to get to AGI.

Buck Shlegeris: So here’s how I’d make that argument. Perhaps a really important input into a field like AI, is the number of really smart kids who have been wanting to be an AI researcher since they were 16 because they thought that it’s the most important thing in the world. I think that in physics, a lot of the people who turn into physicists have actually wanted to be physicists forever. I think the number of really smart kids who wanted to be AI researchers forever has possibly gone up by a factor of 10 over the last 10 years, it might even be more. And there just are problems sometimes, that are bottle necked on that kind of a thing, probably. And so it wouldn’t be totally shocking to me, if as a result of this particular input to AI radically increasing, we end up in kind of a different situation. I haven’t quite thought through this argument fully.

Rohin Shah: Yeah, the argument seems plausible. There’s a large space of arguments like this. I think even after that, then I’ve started questioning, “Okay, we get a new paradigm. The same arguments apply to that paradigm?” Not as strongly. I guess not the arguments you were saying about compute going up over time, but the arguments given in the original slow takeoff posts which were people quickly start taking the low-hanging fruit and then move on. When there’s a lot of effort being put into getting some property, you should expect that easy low-hanging fruit is usually just already taken, and that’s why you don’t expect discontinuities. Unless the new idea just immediately rockets you to human-level AGI, or x-risk causing AGI, I think the same argument would pretty quickly start applying to that as well.

Buck Shlegeris: I think it’s plausible that you do get rocketed pretty quickly to human-level AI. And I agree that this is an insane sounding claim.

Rohin Shah: Great. As long as we agree on that.

Buck Shlegeris: Something which has been on my to-do list for a while, and something I’ve been doing a bit of and I’d be excited for someone else doing more of, is reading the history of science and getting more of a sense of what kinds of things are bottlenecked by what, where. It could lead me to be a bit less confused about a bunch of this stuff. AI Impacts has done a lot of great work cataloging all of the things that aren’t discontinuous changes, which certainly is a strong evidence to me against my claim here.

Lucas Perry: All right. What is the probability of AI-induced existential risk?

Rohin Shah: Unconditional on anything? I might give it 1 in 20. 5%.

Buck Shlegeris: I’d give 50%.

Rohin Shah: I had a conversation with AI Impacts that went into this in more detail and partially just anchored on the number I gave there, which was 10% conditional on no intervention from longtermists, I think the broad argument is really just the one that Buck and I were disagreeing about earlier, which is to what extent will society be incentivized to solve the problem? There’s some chance that the first thing we try just works and we don’t even need to solve any sort of alignment problem. It might just be fine. This is not implausible to me. Maybe that’s 30% or something.

Most of the remaining probability comes from, “Okay, the alignment problem is a real problem. We need to deal with it.” It might be very easy in which case we can just solve it straight away. That might be the case. That doesn’t seem that likely to me if it was a problem at all. But what we will get is a lot of these warning shots and people understanding the risks a lot more as we get more powerful AI systems. This estimate is also conditional on gradual takeoff. I keep forgetting to say that, mostly because I don’t know what probability I should put on discontinuous takeoff.

Lucas Perry: So is 5% with longtermist intervention, increasing to 10% if fast takeoff?

Rohin Shah: Yes, but still with longtermist intervention. I’m pretty pessimistic on fast takeoff, but my probability assigned to fast takeoff is not very high. In a gradual takeoff world, you get a lot of warning shots. There will just generally be awareness of the fact that the alignment problem is a real thing and you won’t have the situation you have right now of people saying this thing about worrying about superintelligent AI systems not doing what we want is totally bullshit. That won’t be a thing. Almost everyone will not be saying that anymore, in the version where we’re right and there is a problem. As a result, people will not want to build AI systems that are going to kill them. People tend to be pretty risk averse in my estimation of the world, which Buck will probably disagree with. And as a result, you’ll get a lot of people trying to actually work on solving the alignment problem. There’ll be some amount of global coordination which will give us more time to solve the alignment problem than we may otherwise have had. And together, these forces mean that probably we’ll be okay.

Buck Shlegeris: So I think my disagreements with Rohin are basically that I think fast takeoffs are more likely. I basically think there is almost surely a problem. I think that alignment might be difficult, and I’m more pessimistic about coordination. I know I said four things there, but I actually think of this as three disagreements. I want to say that “there isn’t actually a problem” is just a kind of “alignment is really easy to solve.” So then there’s three disagreements. One is gradual takeoff, another is difficulty of solving competitive prosaic alignment, and another is how good we are at coordination.

I haven’t actually written down these numbers since I last changed my mind about a lot of the inputs to them, so maybe I’m being really dumb. I guess, it feels to me that in fast takeoff worlds, we are very sad unless we have competitive alignment techniques, and so then we’re just only okay if we have these competitive alignment techniques. I guess I would say that I’m something like 30% on us having good competitive alignment techniques by the time that it’s important, which incidentally is higher than Rohin I think.

Rohin Shah: Yeah, 30 is totally within the 25th to 75th interval on the probability, which is a weird thing to be reporting. 30 might be my median, I don’t know.

Buck Shlegeris: To be clear, I’m not just including the outer alignment proportion here, which is what we were talking about before with IDA. I’m also including the inner alignment.

Rohin Shah: Yeah, 30% does seem a bit high. I think I’m a little more pessimistic.

Buck Shlegeris: So I’m like 30% that we can just solve the AI alignment problem in this excellent way, such that anyone who wants to can have very little extra cost and then make AI systems that are aligned. I feel like in worlds where we did that, it’s pretty likely that things are reasonably okay. I think that the gradual versus fast takeoff isn’t actually enormously much of a crux for me because I feel like in worlds without competitive alignment techniques and gradual takeoff, we still have a very high probability of doom. And I think that comes down to disagreements about coordination. So maybe the main important disagreement between Rohin and I is, actually how well we’ll be able to coordinate, or how strongly individual incentives will be for alignment.

Rohin Shah: I think there are other things. The reason I feel a bit more pessimistic than you in the fast takeoff world is just solving problems in advance just is really quite difficult and I really like the ability to be able to test techniques on actual AI systems. You’ll have to work with less powerful things. At some point, you do have to make the jump to more powerful things. But, still, being able to test on the less powerful things, that’s so good, so much safety from there.

Buck Shlegeris: It’s not actually clear to me that you get to test the most important parts of your safety techniques. So I think that there are a bunch of safety problems that just do not occur on dog-level AIs, and do occur on human-level AI. If there are three levels of AI, there’s a thing which is as powerful as a dog, there’s a thing which is as powerful as a human, and there’s a thing which is as powerful as a thousand John von Neumanns. In gradual takeoff world, you have a bunch of time in both of these two milestones, maybe. I guess it’s not super clear to me that you can use results on less powerful systems as that much evidence about whether your safety techniques work on drastically more powerful systems. It’s definitely somewhat helpful.

Rohin Shah: It depends what you condition on in your difference between continuous takeoff and discontinuous takeoff to say which one of them happens faster. I guess the delta between dog and human is definitely longer in gradual takeoff for sure. Okay, if that’s what you were saying, yep, I agree with that.

Buck Shlegeris: Yeah, sorry, that’s all I meant.

Rohin Shah: Cool. One thing I wanted to ask is when you say dog-level AI assistant, do you mean something like a neural net that if put in a dog’s body replacing its brain would do about as well as a dog? Because such a neural net could then be put in other environments and learn to become really good at other things, probably superhuman at many things that weren’t in the ancestral environment. Do you mean that sort of thing?

Buck Shlegeris: Yeah, that’s what I mean. Dog-level AI is probably much better than GPT2 at answering questions. I’m going to define something as dog-level AI, if it’s about as good as a dog at things which I think dogs are pretty heavily optimized for, like visual processing or motor control in novel scenarios or other things like that, that I think dogs are pretty good at.

Rohin Shah: Makes sense. So I think in that case, plausibly, dog-level AI already poses an existential risk. I can believe that too.

Buck Shlegeris: Yeah.

Rohin Shah: The AI cashier example feels like it could totally happen probably before a dog-level AI. You’ve got all of the motivation problems already at that point of the game, and I don’t know what problems you expect to see beyond then.

Buck Shlegeris: I’m more talking about whether you can test your solutions. I’m not quite sure how to say my intuitions here. I feel like there are various strategies which work for corralling dogs and which don’t work for making humans do what you want. In as much as your alignment strategy is aiming at a flavor of problem that only occurs when you have superhuman things, you don’t get to test that either way. I don’t think this is a super important point unless you think it is. I guess I feel good about moving on from here.

Rohin Shah: Mm-hmm (affirmative). Sounds good to me.

Lucas Perry: Okay, we’ve talked about what you guys have called gradual and fast takeoff scenarios, or continuous and discontinuous. Could you guys put some probabilities down on the likelihood of, and stories that you have in your head, for fast and slow takeoff scenarios?

Rohin Shah: That is a hard question. There are two sorts of reasoning I do about probabilities. One is: use my internal simulation of whatever I’m trying to predict, internally simulate what it looks like, whether it’s by my own models, is it likely? How likely is it? At what point would I be willing to bet on it. Stuff like that. And then there’s a separate extra step where I’m like, “What do other people think about this? Oh, a lot of people think this thing that I assigned one percent probability to is very likely. Hmm, I should probably not be saying one percent then.” I don’t know how to do that second part for, well, most things but especially in this setting. So I’m going to just report Rohin’s model only, which will predictably be understating the probability for fast takeoff in that if someone from MIRI were to talk to me for five hours, I would probably say a higher number for the probability of fast takeoff after that, and I know that that’s going to happen. I’m just going to ignore that fact and report my own model anyway.

On my own model, it’s something like in worlds where AGI happens soon, like in the next couple of decades, then I’m like, “Man, 95% on gradual take off.” If it’s further away, like three to five decades, then I’m like, “Some things could have changed by then, maybe I’m 80%.” And then if it’s way off into the future and centuries, then I’m like, “Ah, maybe it’s 70%, 65%.” The reason it goes down over time is just because it seems to me like if you want to argue for discontinuous takeoff, you need to posit that there’s some paradigm change in how AI progress is happening and that seems more likely the further in the future you go.

Buck Shlegeris: I feel kind of surprised that you get so low, like to 65% or 70%. I would have thought that those arguments are a strong default and then maybe at the moment where in a position that seems particularly gradual takeoff-y, but I would have thought that you over time get to 80% or something.

Rohin Shah: Yeah. Maybe my internal model is like, “Holy shit, why do these MIRI people keep saying that discontinuous takeoff is so obvious.” I agree that the arguments in Paul’s posts feel very compelling to me and so maybe I should just be more confident in them. I think saying 80%, even in centuries is plausibly a correct answer.

Lucas Perry: So, Rohin, is the view here that since compute is the thing that’s being leveraged to make most AI advances that you would expect that to be the mechanism by which that continues to happen in the future and we have some certainty over how compute continues to change into the future? Whereas things that would be leading to a discontinuous takeoff would be world-shattering, fundamental insights into algorithms that would have powerful recursive self-improvement, which is something you wouldn’t necessarily see if we just keep going this leveraging compute route?

Rohin Shah: Yeah, I think that’s a pretty good summary. Again, on the backdrop of the default argument for this is people are really trying to build AGI. It would be pretty surprising if there is just this really important thing that everyone had just missed.

Buck Shlegeris: It sure seems like in machine learning when I look at the things which have happened over the last 20 years, all of them feel like the ideas are kind of obvious or someone else had proposed them 20 years earlier. ConvNets were proposed 20 years before they were good on ImageNet, and LSTMs were ages before they were good for natural language, and so on and so on and so on. Other subjects are not like this, like in physics sometimes they just messed around for 50 years before they knew what was happening. I don’t know, I feel confused how to feel about the fact that in some subjects, it feels like they just do suddenly get better at things for reasons other than having more compute.

Rohin Shah: I think physics, at least, was often bottlenecked by measurements, I want to say.

Buck Shlegeris: Yes, so this is one reason I’ve been interested in history of science recently, but there are certainly a bunch of things. People were interested in chemistry for a long time and it turns out that chemistry comes from quantum mechanics and you could, theoretically, have guessed quantum mechanics 70 years earlier than people did if you were smart enough. It’s not that complicated a hypothesis to think of. Or relativity is the classic example of something which could have been invented 50 years earlier. I don’t know, I would love to learn more about this.

Lucas Perry: Just to tie this back to the question, could you give your probabilities as well?

Buck Shlegeris: Oh, geez, I don’t know. Honestly, right now I feel like I’m 70% gradual takeoff or something, but I don’t know. I might change my mind if I think about this for another hour. And there’s also theoretical arguments as well for why most takeoffs are gradual, like the stuff in Paul’s post. The easiest summary is, before someone does something really well, someone else does it kind of well in cases where a lot of people are trying to do the thing.

Lucas Perry: Okay. One facet of this, that I haven’t heard discussed, is recursive self-improvement, and I’m confused about where that becomes the thing that affects whether it’s discontinuous or continuous. If someone does something kind of well before something does something really well, if recursive self-improvement is a property of the thing being done kind of well, is it just kind of self-improving really quickly, or?

Buck Shlegeris: Yeah. I think Paul’s post does a great job of talking about this exact argument. I think his basic claim is, which I find pretty plausible, before you have a system which is really good at self-improving, you have a system which is kind of good at self-improving, if it turns out to be really helpful to have a system be good at self-improving. And as soon as this is true, you have to posit an additional discontinuity.

Rohin Shah: One other thing I’d note is that humans are totally self improving. Productivity techniques, for example, are a form of self-improvement. You could imagine that AI systems might have advantages that humans don’t, like being able to read their own weights and edit them directly. How much of an advantage this gives to the AI system, unclear. Still, I think then I just go back to the argument that Buck already made, which is at some point you get to an AI system that is somewhat good at understanding its weights and figuring out how to edit them, and that happens before you get the really powerful ones. Maybe this is like saying, “Well, you’ll reach human levels of self-improvement by the time you have rat-level AI or something instead of human-level AI,” which argues that you’ll hit this hyperbolic point of the curve earlier, but it still looks like a hyperbolic curve that’s still continuous at every point.

Buck Shlegeris: I agree.

Lucas Perry: I feel just generally surprised about your probabilities on continuous takeoff scenarios that they’d be slow.

Rohin Shah: The reason I’m trying to avoid the word slow and fast is because they’re misleading. Slow takeoff is not slow in calendar time relative to fast takeoff. The question is, is there a spike at some point? Some people, upon reading Paul’s posts are like, “Slow takeoff is faster than fast takeoff.” That’s a reasonably common reaction to it.

Buck Shlegeris: I would put it as slow takeoff is the claim that things are insane before you have the human-level AI.

Rohin Shah: Yeah.

Lucas Perry: This seems like a helpful perspective shift on this takeoff scenario question. I have not read Paul’s post. What is it called so that we can include it in the page for this podcast?

Rohin Shah: It’s just called Takeoff Speeds. Then the corresponding AI Impacts post is called Will AI See Discontinuous Progress?, I believe.

Lucas Perry: So if each of you guys had a lot more reach and influence and power and resources to bring to the AI alignment problem right now, what would you do?

Rohin Shah: I get this question a lot and my response is always, “Man, I don’t know.” It seems hard to scalably use people right now for AI risk. I can talk about which areas of research I’d like to see more people focus on. If you gave me people where I’m like, “I trust your judgment on your ability to do good conceptual work” or something, where would I put them? I think a lot of it would be on making good robust arguments for AI risk. I don’t think we really have them, which seems like kind of a bad situation to be in. I think I would also invest a lot more in having good introductory materials, like this review, except this review is a little more aimed at people who are already in the field. It is less aimed at people who are trying to enter the field. I think we just have pretty terrible resources for people coming into the field and that should change.

Buck Shlegeris: I think that our resources are way better than they used to be.

Rohin Shah: That seems true.

Buck Shlegeris: In the course of my work, I talk to a lot of people who are new to AI alignment about it and I would say that their level of informedness is drastically better now than it was two years ago. A lot of which is due to things like 80,000 hours podcast, and other things like this podcast and the Alignment Newsletter, and so on. I think we just have made it somewhat easier for people to get into everything. The Alignment Forum, having its sequences prominently displayed, and so on.

Rohin Shah: Yeah, you named literally all of the things I would have named. Buck definitely has more information on this than I do. I do not work with people who are entering the field as much. I do think we could be substantially better.

Buck Shlegeris: Yes. I feel like I do have access to resources, not directly but in the sense that I know people at eg Open Philanthropy and the EA Funds  and if I thought there were obvious things they should do, I think it’s pretty likely that those funders would have already made them happen. And I occasionally embark on projects myself that I think are good for AI alignment, mostly on the outreach side. On a few occasions over the last year, I’ve just done projects that I was optimistic about. So I don’t think I can name things that are just shovel-ready opportunities for someone else to do, which is good news because it’s mostly because I think most of these things are already being done.

I am enthusiastic about workshops. I help run with MIRI these AI Risks for Computer Scientists workshops and I ran my own computing workshop with some friends, with kind of a similar purpose, aimed at people who are interested in this kind of stuff and who would like to spend some time learning more about it. I feel optimistic about this kind of project as a way of doing the thing Rohin was saying, making it easier for people to start having really deep thoughts about a lot of AI alignment stuff. So that’s a kind of direction of projects that I’m pretty enthusiastic about. A couple other random AI alignment things I’m optimistic about. I’ve already mentioned that I think there should be an Ought competitor just because it seems like the kind of thing that more work could go into. I agree with Rohin on it being good to have more conceptual analysis of a bunch of this stuff. I’m generically enthusiastic about there being more high quality research done and more smart people, who’ve thought about this a lot, working on it as best as they can.

Rohin Shah: I think the actual bottleneck is good research and not necessarily field building, and I’m more optimistic about good research. Specifically, I am particularly interested in universality, interpretability. I would love for there to be some way to give people who work on AI alignment the chance to step back and think about the high-level picture for a while. I don’t know if people don’t do this because they don’t want to or because they don’t feel like they have the affordance to do so, and I would like the affordance to be there. I’d be very interested in people building models of what AGI systems could look like. Expected utility maximizers are one example of a model that you could have. Maybe we just try to redo evolution. We just create a very complicated, diverse environment with lots of agents going around and in their multi-agent interaction, they develop general intelligence somehow. I’d be interested for someone to take that scenario, flesh it out more, and then talk about what the alignment problem looks like in that setting.

Buck Shlegeris: I would love to have someone get really knowledgeable about evolutionary biology and try and apply analogies of that to AI alignment. I think that evolutionary biology has lots of smart things to say about what optimizers are and it’d be great to have those insights. I think Eliezer sort of did this many years ago. It would be good for more people to do this in my opinion.

Lucas Perry: All right. We’re in the home stretch here. AI timelines. What do you think about the current state of predictions? There’s been surveys that have been done with people giving maybe 50% probability over most researchers at about 2050 or so. What are each of your AI timelines? What’s your probability distribution look like? What do you think about the state of predictions on this?

Rohin Shah: Haven’t looked at the state of predictions in a while. It depends on who was surveyed. I think most people haven’t thought about it very much and I don’t know if I expect their predictions to be that good, but maybe wisdom of the crowds is a real thing. I don’t think about it very much. I mostly use my inside view and talk to a bunch of people. Maybe, median, 30 years from now, which is 2050. So I guess I agree with them, don’t I? That feels like an accident. The surveys were not an input into this process.

Lucas Perry: Okay, Buck?

Buck Shlegeris: I don’t know what I think my overall timelines are. I think AI in the next 10 or 20 years is pretty plausible. Maybe I want to give it something around 50% which puts my median at around 2040. In terms of the state of things that people have said about AI timelines, I have had some really great conversations with people about their research on AI timelines which hasn’t been published yet. But at some point in the next year, I think it’s pretty likely that much better stuff about AI timelines modeling will have been published than has currently been published, so I’m excited for that.

Lucas Perry: All right. Information hazards. Originally, there seemed to be a lot of worry in the community about information hazards and even talking about superintelligence and being afraid of talking to anyone in positions of power, whether they be in private institutions or in government, about the strategic advantage of AI, about how one day it may confer a decisive strategic advantage. The dissonance here for me is that Putin comes out and says that who controls AI will control the world. Nick Bostrom published Superintelligence, which basically says what I already said. Max Tegmark’s Life 3.0 basically also. My initial reaction and intuition is the cat’s out of the bag. I don’t think that echoing this increases risks any further than the risk is already at. But maybe you disagree.

Buck Shlegeris: Yeah. So here are two opinions I have about info hazards. One is: how bad is it to say stuff like that all over the internet? My guess is it’s mildly bad because I think that not everyone thinks those things. I think that even if you could get those opinions as consequences from reading Superintelligence, I think that most people in fact have not read Superintelligence. Sometimes there are ideas where I just really don’t want them to be crystallized common knowledge. I think that, to a large extent, assuming gradual takeoff worlds, it kind of doesn’t matter because AI systems are going to be radically transforming the world inevitably. I guess you can affect how governments think about it, but it’s a bit different there.

The other point I want to make about info hazards is I think there are a bunch of trickinesses with AI safety, where thinking about AI safety makes you think about questions about how AI development might go. I think that thinking about how AI development is going to go occasionally leads to think about things that are maybe, could be, relevant to capabilities, and I think that this makes it hard to do research because you then get scared about talking about them.

Rohin Shah: So I think my take on this is info hazards are real in the sense that there, in fact, are costs to saying specific kinds of information and publicizing them a bit. I think I’ll agree in principle that some kinds of capabilities information has the cost of accelerating timelines. I usually think these are pretty strongly outweighed by the benefits in that it just seems really hard to be able to do any kind of shared intellectual work when you’re constantly worried about what you do and don’t make public. It really seems like if you really want to build a shared understanding within the field of AI alignment, that benefit is worth saying things that might be bad in some other ways. This depends on a lot of background facts that I’m not going to cover here but, for example, I probably wouldn’t say the same thing about bio security.

Lucas Perry: Okay. That makes sense. Thanks for your opinions on this. So at the current state in time, do you guys think that people should be engaging with people in government or in policy spheres on questions of AI alignment?

Rohin Shah: Yes, but not in the sense of we’re worried about when AGI comes. Even saying things like it might be really bad, as opposed to saying it might kill everybody, seems not great. Mostly on the basis of my model for what it takes to get governments to do things is, at the very least, you need consensus in the field so it seems kind of pointless to try right now. It might even be poisoning the well for future efforts. I think it does make sense to engage with government and policymakers about things that are in fact problems right now. To the extent that you think that recommender systems are causing a lot of problems, I think it makes sense to engage with government about how alignment-like techniques can help with that, especially if you’re doing a bunch of specification learning-type stuff. That seems like the sort of stuff that should have relevance today and I think it would be great if those of us who did specification learning were trying to use it to improve existing systems.

Buck Shlegeris: This isn’t my field. I trust the judgment of a lot of other people. I think that it’s plausible that it’s worth building relationships with governments now, not that I know what I’m talking about. I will note that I basically have only seen people talk about how to do AI governance in the cases where the AI safety problem is 90th percentile easiest. I basically only see people talking about it in the case where the technical safety problem is pretty doable, and this concerns me. I’ve just never seen anyone talk about what you do in a world where you’re as pessimistic as I am, except to completely give up.

Lucas Perry: All right. Wrapping up here, is there anything else that we didn’t talk about that you guys think was important? Or something that we weren’t able to spend enough time on, that you would’ve liked to spend more time on?

Rohin Shah: I do want to eventually continue the conversation with Buck about coordination, but that does seem like it should happen not on this podcast.

Buck Shlegeris: That’s what I was going to say too. Something that I want someone to do is write a trajectory for how AI goes down, that is really specific about what the world GDP is in every one of the years from now until insane intelligence explosion. And just write down what the world is like in each of those years because I don’t know how to write an internally consistent, plausible trajectory. I don’t know how to write even one of those for anything except a ridiculously fast takeoff. And this feels like a real shame.

Rohin Shah: That seems good to me as well. And also the sort of thing that I could not do because I don’t know economics.

Lucas Perry: All right, so let’s wrap up here then. So if listeners are interested in following either of you or seeing more of your blog posts or places where you would recommend they read more materials on AI alignment, where can they do that? We’ll start with you, Buck.

Buck Shlegeris: You can Google me and find my website. I often post things on the Effective Altruism Forum. If you want to talk to me about AI alignment in person, perhaps you should apply to the AI Risks for Computer Scientists workshops run by MIRI.

Lucas Perry: And Rohin?

Rohin Shah: I write the Alignment Newsletter. That’s a thing that you could sign up for. Also on my website, if you Google Rohin Shah Alignment Newsletter, I’m sure I will come up. These are also cross posted to the Alignment Forum, so another thing you can do is go to the Alignment Forum, look up my username and just see things that are there. I don’t know that this is actually the thing that you want to be doing. If you’re new to AI safety and want to learn more about it, I would echo the resources Buck mentioned earlier, which are the 80k podcasts about AI alignment. There are probably on the order of five of these. There’s the Alignment Newsletter. There are the three recommended sequences on the Alignment Forum. Just go to alignmentforum.org and look under recommended sequences. And this podcast, of course.

Lucas Perry: All right. Heroic job, everyone. This is going to be a really good resource, I think. It’s given me a lot of perspective on how thinking has changed over the past year or two.

Buck Shlegeris: And we can listen to it again in a year and see how dumb we are.

Lucas Perry: Yeah. There were lots of predictions and probabilities given today, so it’ll be interesting to see how things are in a year or two from now. That’ll be great. All right, so cool. Thank you both so much for coming on.

End of recorded material

FLI Podcast: Lessons from COVID-19 with Emilia Javorsky and Anthony Aguirre

The global spread of COVID-19 has put tremendous stress on humanity’s social, political, and economic systems. The breakdowns triggered by this sudden stress indicate areas where national and global systems are fragile, and where preventative and preparedness measures may be insufficient. The COVID-19 pandemic thus serves as an opportunity for reflecting on the strengths and weaknesses of human civilization and what we can do to help make humanity more resilient. The Future of Life Institute’s Emilia Javorsky and Anthony Aguirre join us on this special episode of the FLI Podcast to explore the lessons that might be learned from COVID-19 and the perspective this gives us for global catastrophic and existential risk.

Topics discussed in this episode include:

  • The importance of taking expected value calculations seriously
  • The need for making accurate predictions
  • The difficulty of taking probabilities seriously
  • Human psychological bias around estimating and acting on risk
  • The massive online prediction solicitation and aggregation engine, Metaculus
  • The risks and benefits of synthetic biology in the 21st Century

Timestamps: 

0:00 Intro 

2:35 How has COVID-19 demonstrated weakness in human systems and risk preparedness 

4:50 The importance of expected value calculations and considering risks over timescales 

10:50 The importance of being able to make accurate predictions 

14:15 The difficulty of trusting probabilities and acting on low probability high cost risks

21:22 Taking expected value calculations seriously 

24:03 The lack of transparency, explanation, and context around how probabilities are estimated and shared

28:00 Diffusion of responsibility and other human psychological weaknesses in thinking about risk

38:19 What Metaculus is and its relevance to COVID-19 

45:57 What is the accuracy of predictions on Metaculus and what has it said about COVID-19?

50:31 Lessons for existential risk from COVID-19 

58:42 The risk of synthetic bio enabled pandemics in the 21st century 

01:17:35 The extent to which COVID-19 poses challenges to democratic institutions

 

This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.

All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. Today’s episode is a special focused on lessons from COVID-19 with two members of the Future of Life Institute team, Anthony Aguirre and Emilia Javorsky. The ongoing coronavirus pandemic has helped to illustrate the frailty of human systems, the difficulty of international coordination on global issues and our general underpreparedness for risk. This podcast is focused on what COVID-19 can teach us about being better prepared for future risk from the perspective of global catastrophic and existential risk. The AI Alignment Podcast and the end of the month Future of Life Institute podcast will release as normally scheduled. 

Anthony Aguirre has been on the podcast recently to discuss the ultimate nature of reality and problems of identity. He is a physicist that studies the formation, nature, and evolution of the universe, focusing primarily on the model of eternal inflation—the idea that inflation goes on forever in some regions of universe—and what it may mean for the ultimate beginning of the universe and time. He is the co-founder and Associate Scientific Director of the Foundational Questions Institute and is also a Co-Founder of the Future of Life Institute. He also co-founded Metaculus, which is something we get into during the podcast, which is an effort to optimally aggregate predictions about scientific discoveries, technological breakthroughs, and other interesting issues.

Emilia Javorsky develops tools to improve human health and wellbeing and has a background in healthcare and research. She leads clinical research and work on translation of science from academia to commercial setting at Artic Fox, and is the Chief Scientific Officer and Co-Founder of Sundaily, as well as the Director of Scientists Against Inhumane Weapons. Emilia is an advocate for the safe and ethical deployment of technology, and is currently heavily focused on lethal autonomous weapons issues.  

And with that, let’s get into our conversation with Anthony and Emilia on COVID-19. 

We’re here to try and get some perspective on COVID-19 for how it is both informative surrounding issues regarding global catastrophic and existential risk and to see ways in which we can learn from this catastrophe and how it can inform existential risk and global catastrophic thought. Just to start off then, what are ways in which COVID-19 has helped demonstrate weaknesses in human systems and preparedness for risk?

Anthony Aguirre: One of the most upsetting things I think to many people is how predictable it was and how preventable it was with sufficient care taken as a result of those predictions. It’s been known by epidemiologists for decades that this sort of thing was not only possible, but likely given enough time going by. We had SARS and MERS as kind of dry runs that almost were pandemics, but didn’t have quite the right characteristics. Everybody in the community of people thinking hard about this, and I would like to hear more of Emilia’s perspective on this knew that something like this was coming eventually. That it might be a few percent probable each year, but after 10 or 20 or 30 years, you start to get large probability of something like this happening. So it was known that it was coming eventually and pretty well known what needed to happen to be well prepared for it.

And yet nonetheless, many countries have found themselves totally unprepared or largely unprepared and unclear on what exactly to do and making very poor decisions in response to things that they should be making high quality decisions on. So I think part of what I’m interested in doing is thinking about why has that happened, even though we scientifically understand what’s going on? We numerically model what could happen, we know many of the things that should happen in response. Nonetheless, as a civilization, we’re kind of being caught off guard in a way and making a bad situation much, much worse. So why is that happening and how can we do it better now and next time?

Lucas Perry: So in short, the ways in which this is frustrating is that it was very predictable and was likely to happen given computational models and then also, lived experience given historical cases like SARS and MERS.

Anthony Aguirre: Right. This was not some crazy thing out of the blue, this was just a slightly worse version of things that have happened before. Part of the problem, in my mind, is the sort of mismatch between the likely cost of something like this and how many resources society is willing to put into planning and preparing and preventing it. And so here, I think a really important concept is expected value. So, the basic idea that when you’re calculating the value of something that is unsure that you want to think about different probabilities for different values that that thing might have and combine them.

So for example, if I’m thinking I’m going to spend some money on something and there’s a 50% chance that it’s going to cost a dollar and there’s a 50% chance that it’s going to cost $1,000, so how much should I expect to pay for it? So on one hand, I don’t know, it’s a 50/50 chance, it could be a dollar, it could be $1,000, but if I think I’m going to do this over and over again, you can ask how much am I going to pay on average? And that’s about 50% of a dollar plus 50% of $1,000 so about $500, $500 and 50 cents. The idea of thinking in terms of expected value is that when I have probabilities for something, I should always think as if I’m going to do this thing many, many, many times, like I’m going to roll the dice many, many times and I should reason in a way that makes sense if I’m going to do it a lot of times. So I’d want to expect that I’m going to spend something like $500 on this thing, even though that’s not either of the two possibilities.

So, if we’re thinking about a pandemic, if you imagine the cost just in dollars, let alone all the other things that are going to happen, but just purely in terms of dollars, we’re talking about trillions of dollars. So if this was something that is going to cost trillions and trillions of dollars and there was something like a 10% chance of this happening over a period of a decade say, we should have been willing to pay hundreds and hundreds of billions of dollars to prevent this from happening or to dramatically decrease the cost when it does happen. And that is way, way, way orders of magnitude, more money than we have in fact spent on that.

So, part of the tricky thing is that people don’t generally think in these terms, they think of “What is the most likely thing?” And then they plan for that. But if the most likely thing is relatively cheap and a fairly unlikely thing is incredibly expensive, people don’t like to think about the incredibly expensive, unlikely thing, right? They think, “That’s scary. I don’t want to think about it. I’m going to think about the likely thing that’s cheap.” But of course, that’s terrible planning. You should put some amount of resources into planning for the unlikely incredibly expensive thing.

And it’s often, and it is in this case, that even a small fraction of the expected cost of this thing could have prevented the whole thing from happening in the sense that there’s going to be trillions and trillions of dollars of costs. It was anticipated at 10% likely, so it’s hundreds of billions of dollars that in principle society should have been willing to pay to prevent it from happening, but even a small fraction of that, in fact, could have really, really mitigated the problem. So it’s not even that we actually have to spend exactly the amount of money that we think we will lose in order to prevent something from happening.

Even a small fraction would have done. The problem is that we spend not even close to that. These sorts of situations where there’s a small probability of something extraordinarily costly happening, our reaction in society tends to be to just say, “It’s a small probability, so I don’t want to think about it.” Rather than “It’s a small probability, but the cost is huge, so I should be willing to pay some fraction of that small probability times that huge cost to prevent it from happening.” And I think if we could have that sort of calculation in mind a little bit more firmly, then we could prevent a lot of terrible things from happening at a relatively modest investment. But the tricky thing is that it’s very hard to take seriously those small probability, high cost things without really having a firm idea of what they are, what the probability of that happening is and what the cost will be.

Emilia Javorsky: I would add to that, but in complete agreement with Anthony, part of what is at issue here too is needing to think overtime scales, because if something has a certain probability that is small at any given short term horizon, but that probability rises to something that’s more significant with a tremendously high cost over a longer term time scale, you need to be able to be willing to think on those longer term timescales in order to act. And from the perspective of medicine, this is something we’ve struggled with a lot, at both the individual level, at the healthcare system level and at the societal public health policy level, is that prevention, while we know it’s much cheaper to prevent a disease than to treat it, the same thing with pandemic preparedness, a lot of the things we’re talking about were actually quite cheap mitigation measures to put in place. Right now, we’re seeing a crisis of personal protective equipment.

We’re talking about basic cheap supplies like gloves and masks and then national stockpiles of ventilators. These are very basic, very conserved across any pandemic type, right? We know that in all likelihood when a pandemic arises, it is some sort of respiratory borne illness. Things like masks and respirators are a very wise thing to stockpile and have on hand. Yet despite having several near misses, even in the very recent past, we’re talking about the past 20 years, there was not a critical will or a critical lobby or a critical voice that enabled us to do these very basic, relatively cheap measures to be prepared for something like this to happen.

If you talk about something like vaccine development, that’s something that you need to prepare pretty much in real time. That’s pathogen specific, but the places that were fumbling to manage this epidemic today are things that were totally basic, cheap and foreseeable. We really need to find ways in the here and now to motivate thinking on any sort of longterm horizon. Not even 50 years, a hundred years down the line, but one to five years are things that we struggle with.

Anthony Aguirre: To me, another surprising thing has been the sudden discovery of how important it is to be able to predict things. It’s of course, always super important. This is what we do throughout our life. We’re basically constantly predicting things, predicting the consequences of certain actions or choices we might make, and then making those choices dependent on which things we want to have happen. So we’re doing it all the time and yet when confronted with this pandemic, suddenly, we extra super realize how important it is to have good predictions, because what’s unusual I would say about a situation like this is that all of the danger is sort of in the future. If you look at it in any given time, you say, “Oh, there’s a couple of dozen cases here in my county, everything’s under control.” Unbelievably ineffective and wishful thinking, because of course, the number of cases is growing exponentially and by the time you notice that there’s any problem that’s of significance at all, the next day or the next few days, it’s going to be doubly as big.

So the fact that things are happening exponentially in a pandemic or an epidemic, makes it incredibly vital that you have the ability to think about what’s going to happen in the future and how bad things can get quite quickly, even if at the moment, everything seems fine. Everybody who thinks in this field or who just is comfortable with how exponentials work know this intellectually, but it still isn’t always easy to get the intuitive feeling for that, because it just seems like so not a big deal for so long, until suddenly it’s the biggest thing in the world.

This has been a particularly salient lesson that we really need to understand both exponential growth and how to do good projections and predictions about things, because there could be lots of things that are happening under the radar. Beyond the pandemic, there are lots of things that are exponentially growing that if we don’t pay attention to the people who are pointing out those exponentially growing things and just wait until they’re a problem, then it’s too late to do anything about the problem.

At the beginning stages, it’s quite easy to deal with. If we take ourselves back to sometime in late December, early January or something, there was a time where this pandemic could have easily been totally prevented by the actions of the few people, if they had just known exactly what the right things to do were. I don’t think you can totally blame people for that. It’s very hard to see what it would turn into, but there is a time at the beginning of the exponential where action is just so much easier and every little bit of delay just makes it incredibly harder to do anything about it. It really brings home how important it is to have good predictions about things and how important it is to believe those predictions if you can and take decisive action early on to prevent exponentially growing things from really coming to bite you.

Lucas Perry: I see a few central issues here and lessons from COVID-19 that we can draw on. The first is that this is something that was predictable and was foreseeable and that experts were saying had a high likelihood of happening, and the ways in which we failed were either in the global system, there aren’t the kinds of incentives for private organizations or institutions to work towards mitigating these kinds of risks or people just aren’t willing to listen to experts making these kinds of predictions. The second thing seems to be that even when we do have these kinds of predictions, we don’t know how basic decision theory works and we’re not able to feel and intuit the reality of exponential growth sufficiently well. So what are very succinct ways of putting solutions to these problems?

Anthony Aguirre: The really hard part is having probabilities that you feel like you can trust. If you go to a policy maker and tell them there’s a danger of this thing happening, maybe it’s a natural pandemic, maybe it’s a human engineered pandemic or a AI powered cyber attack, something that if it happens, is incredibly costly to society and you say, “I really think we should be devoting some resources to preventing this from happening, because I think there’s a 10% chance that this is going to happen in the next 10 years.” They’re going to ask you, “Where does that 10% chance come from?” And “Are you sure that it’s not a 1% chance or a 0.1% chance or a .00001% chance?” And that makes a huge difference, right? If something really is a tiny, tiny fraction of a percent likely, then that plays directly into how much effort you should go in to preventing it if it has some fixed cost.

So I think the reaction that people have often to low probability, high cost things is to doubt exactly what the probability is and having that doubt in their mind, just avoid thinking about the issue at all, because it’s so easy to not think about it if the probability is really small. A big part of it is really understanding what the probabilities are and taking them seriously. And that’s a hard thing to do, because it’s really, really hard to estimate what the probabilities say of a gigantic AI powered cyber attack is, where do you even start with that? It has all kinds of ingredients that there’s no model for, there’s no set quantitative assessment strategy for it. That’s a part of the root of the conundrum that even for things like this pandemic that everybody knew was coming at some level, I would say nobody knew whether it was a 5% chance over 10 years or a 50% chance over 10 years.

It’s very hard to get firm numbers, so one thing I think we need are better ways of assessing probabilities of different sorts of low probability, high cost things. That’s something I’ve been working a lot on over the past few years in the form of Metaculus which maybe we can talk about, but I think in general, most people and policy makers can understand that if there’s some even relatively low chance of a hugely costly thing that we should do some planning for it. We do that all the time, we do it with insurance, we do it with planning for wars. There are all kinds of low probability things that we plan for, but if you can’t tell people what the probability is and it’s small and the thing is weird, then it’s very, very hard to get traction.

Emilia Javorsky: Part of this is how do we find the right people to make the right predictions and have the ingredients to model those out? But the other side of this is how do we get the policy makers and decision makers and leaders in society to listen to those predictions and to have trust and confidence in them? From the perspective of that, when you’re communicating something that is counterintuitive, which is how many people end up making decisions, there really has to be a foundation of trust there, where you’re telling me something that is counterintuitive to how I would think about decision making and planning in this particular problem space. And so, it has to be built on a foundation and trust. And I think one of the things that characterize good models and good predictions is exactly as you say, they’re communicated with a lot of trepidation.

They explain what the different variables are that go into them and the uncertainty that bounds each of those variables and an acknowledgement that some things are known and unknown. And I think that’s very hard in today’s world where information is always at maximum volume and it’s very polarized and you’re competing against voices, whether they be in a policy maker’s ear or a CEO’s ear, that will speak in absolutes and speak in levels of certainty, overestimating risk, or underestimating risk.

That is the element that is necessary for these predictions to have impact is how do you connect ambiguous and qualified and cautious language that characterizes these kind of long term predictions with a foundation of trust so people can hear and appreciate those and you don’t get drowned out by the noise on either side of things that are much likely to be less well founded if they’re speaking in absolutes and problem spaces that we know just have a tremendous amount of uncertainty.

Anthony Aguirre: That’s a very good point. You’re mentioning of the kind of unfamiliarity with these things is an important one in the sense that, as an individual, I can think of improbable things that might happen to me and they seem, well, that’s probably not going to happen to me, but I know intellectually it will and I can look around the world and see that that improbable thing is happening to lots of people all the time. Even if there’s kind of a psychological barrier to my believing that it might happen to me, I can’t deny that it’s a thing and I can’t really deny what sort of probability it might have to happen to me, because I see it happening all around. Whereas when we’re talking about things that are happening to a country or a civilization, we don’t have a whole lot of statistics on them.

We can’t just say of all the different planets that are out there with civilizations like ours, 3% of them are undergoing pandemics right now. If we could do that then we could really count on those probabilities. We can’t do that. We can look historically at what happened in our world, but of course, since it’s really changing dramatically over the years, that’s not always such a great guide and so, we’re left with reasoning by putting together scientific models, all the uncertainties that you were mentioning that we have to feed into those sorts of models or just other ways of making predictions about things through various means and trying to figure out how can we have good confidence in those predictions. And this is an important point that you bring up, not so much in terms of certainty, because there are all of these complex things that we’re trying to predict about the possibility of good or bad things happening to our society as a whole, none of them can be predicted with certainty.

I mean, almost nothing in the world can be predicted with certainty, certainly not these things, and so it’s always a question of giving probabilities for things and both being confident in those probabilities and taking seriously what those probabilities mean. And as you say, people don’t like that. They want to be told what is going to happen or what isn’t going to happen and make a decision on that basis. That is unfortunately not information that’s available on most important things and so, we’d have to accept that they’re going to be probabilities, but then where do we them from? How do we use them? There’s a science and an art to that I think, and a subtlety to it as you say, that we really have to get used to and get comfortable with.

Lucas Perry: There seems to be lots of psychological biases and problems around human beings understanding and fully integrating probabilistic estimations into our lives and decision making. I’m sure there’s probably literature that already exists upon this, but it would be skillful I think to apply it to existential and global catastrophic risk. So, assuming that we’re able to sufficiently develop our ability to generate accurate and well-reasoned probabilistic estimations of risks, and Anthony, we’ll get into Metaculus shortly, then you mentioned that the prudent and skillful thing to do would be to feed that into a proper decision theory, which explain a little bit more about the nerdy side of that if you feel it would be useful, and in particular, you talked a little bit about expected value, could you say a little bit more about how if policy and government officials were able to get accurate probabilistic reasoning and then fed it into the correct decision theoretic models that it would produce better risk mitigation efforts?

Anthony Aguirre: I mean, there’s all kinds of complicated discussions and philosophical explorations of different versions of decision theory. We really don’t need to think about things in such complicated terms in the sense that what it really is about is just taking expected values seriously and thinking about actions we might take based on how much value we expect given each decision. When you’re gambling, this is exactly what you’re doing, you might say, “Here, I’ve got some cards in my hand. If I draw, there’s a 10% chance that I’ll get nothing and a 20% chance that I’ll get a pair and a tiny percent chance that I’ll fill out my flush or something.” And with each of those things, I want to think of, “What is the probable payoff when I have that given outcome?” And I want to make my decisions based on the expected value of things rather than just what is the most probable or something like that.

So it’s a willingness to quantitatively take into account, if I make decision A, here is the likely payoff of making decision A, if I make decision B, here’s the likely payoff that is the expected value of my payoff in decision B, looking at which one of those is higher and making that decision. So it’s not very complicated in that sense. There are all kinds of subtleties, but in practice it can be very complicated because usually you don’t know, if I make decision A, what’s going to happen? If I make decision B, what’s going to happen? And exactly what value can I associate with those things? But this is what we do all the time, when we weigh the pros and cons of things, we’re kind of thinking, “Well, if I do this, here are the things that I think are likely to happen. Here’s what I think I’m going to feel and experience and maybe gain in doing A, let me think through the same thing in my mind with B and then, which one of those feels better is the one that I do.”

So, this is what we do all the time on an intuitive level, but we can do quantitative and systematic method of it. If we are more carefully thinking about what the actual numerical and quantitative implications of something are and if we have actual probabilities that we can assign to the different outcomes in order to make our decision. All of this, I think, is quite well known to decision makers of all sorts. What’s hard is that often decision makers won’t really have those sorts of tools in front of them. They won’t have ability to look at different possibilities, ability to attribute probabilities and costs and payoffs to those things in order to make good decisions. So those are tools that we could put in people’s hands and I think would just allow people to make better decisions.

Emilia Javorsky: And what I like about what you’re saying, Anthony, implicit in that is that it’s a standardized tool. The way you assign the probabilities and decide between different optionalities is standardized. And I think one thing that can be difficult in the policy space is different advocacy groups or different stakeholders will present data and assign probabilities based on different assumptions and vested interests, right? So, when a policy maker is making a decision, they’re using probabilities and using estimates and outcomes that are developed using completely different models with completely different assumptions and different biases baked into them and different interests baked into them. What I think is so vital is to make sure as best one can, again knowing the inherent ambiguity that’s existing in modeling in general, that you’re having an apples to apples comparison when you’re assigning different probabilities and making decisions based off of them.

Anthony Aguirre: Yeah, that’s a great point that part of the problem is that people are just used to probabilities not meaning anything because they’re often given without context, without explanation and by groups that have a vested interest in them looking a certain way. If I ask someone, what’s the probability that this thing is going to happen, and they’d tell me 17%, I don’t know what to do with that. Do I believe them? I mean, on what basis are they telling me 17%? In order for me to believe that, I have to either have an understanding of what exactly went into that 17% and really agree step-by-step with all their assumptions and modeling and so on, or maybe I have to believe them from some other reason.

Like they’ve provided probabilities for lots of things before, and they’ve given accurate probabilities for all these different things that they provided, so I kind of trust their ability to give accurate probabilities. But usually that’s not available. That’s part of the problem. Our general lesson has been if people are giving you probabilities, usually they don’t mean much, but that’s not always the case. There are probabilities we use all the time, like for the weather where we more or less know what they mean. You see that there’s a 15% chance of rain.

That’s a meaningful thing, and it’s meaningful because both of you sort of trust that the weather people know what they’re doing, which they sort of do, and it’s meaningful in that it has a particular interpretation, which is that if I look at the weather forecast for a year and look at all the days where it said that there was a 15% chance of rain, about 15% of all those days it will have been raining. There’s a real meaning to that, and those numbers come from a careful calibration of weather models for exactly that reason. When you get 15% chance of rain from the weather forecast, what that generally means is that they’ve run a whole bunch of weather models with slightly different initial conditions and in 15% of them it’s raining today in your location.

They’re carefully calibrated usually, like the National Weather Service calibrates them, so that it really is true that if you look at all the days of, whatever, it’s 15% chance, about 15% of those days it was in fact raining. Those are probabilities that you can really use and you can say, “15% chance of rain, is it worth taking an umbrella? The umbrella is kind of annoying to carry around. Am I willing to take my chances for 15%? Yeah, maybe. If it was 30%, I’d probably take the umbrella. If it was 5%, I definitely wouldn’t.” That’s a number that you can fold into your decision theory because it means something. Whereas when somebody says, “There’s a 18% chance at this point that some political thing is going to happen, that some bill is going to pass,” maybe that’s true, but you have no idea where that 18% comes from. It’s really hard to make use of it.

Lucas Perry: Part of them proving this getting prepared for risks is better understanding and taking seriously the reasoning and reasons behind different risk estimations that experts or certain groups provide. You guys explained that there are many different vested interests or interest groups who may be biasing or framing percentages and risks in a certain way, so that policy and action can be directed towards things which may benefit them. Are there other facets to our failure to respond here other than our inability to take risks seriously?

Emilia Javorsky: If we had a sufficiently good understanding of the probabilities and we were able to see all of the reasons behind the probabilities and take them all seriously, and then we took those and we fed them into a standardized and appropriate decision theory, which used expected value calculations and some agreed upon risk tolerance to determine how much resources should be put into mitigating risks, are there other psychological biases or weaknesses in human virtue that would still lead to us insufficiently acting on these risks? An example that comes to mind maybe of something like a diffusion of responsibility.

That’s very much what COVID-19 in many ways has played out to be, right? We kind of started this with the assumptions that this was quite a foreseeable risk, and any which way you looked at the probabilities, it was a sufficiently high probability that basic levels of preparedness and a robustness of preparedness should have been employed. I think what you allude to in terms of diffusion of responsibility is certainly one aspect of it. It’s difficult to say where that decision-making fell apart, but we did hear very early on a lot of discussion of this is something that is a problem localized to China.

Anyone that has any familiarity with these models would have told you, “Based on the probabilities we already knew about, plus what we’re witnessing from this early data, which was publicly available in January, we had a pretty good idea of what was going on, that this would become something that was in all likelihood be global.” This next question becomes, why wasn’t anything done or acted on at that time? I think part of that comes with a lack of advocacy and a lack of having the ears of the key decision makers of what was actually coming. It is very, very easy when you have to make difficult decisions to listen to the vocal voices that tell you not to do something and provide reasons for inaction.

Then the voices of action are perhaps more muted coming from a scientific community, spoken in language that’s not as definitive as the other voices in the room and the other stakeholders in the room that have a vested interest in policymaking. The societal incentives to act or not act aren’t just from a pure, what’s the best long-term course of action, they’re very, very much vested in what are the loudest voices in the room, what is the kind of clout and power that they hold, and weighing those. I think there’s a very real political and social atmosphere and economic atmosphere that this happens in that dilutes some of the writing that was very clearly on the wall of what was coming.

Anthony Aguirre: I would add I think that it’s especially easy to ignore something that is predicted and quite understandable to experts who understand the dynamics of it, but unfamiliar or where historically you’ve seen it turn out the other way. Like on one hand, we had multiple warnings through near pandemics that this could happen, right? We had SARS and MERS and we had H1N1 and there was Ebola. All these things were clear indications of how possible it was for this to happen. But at the same time, you could easily take the opposite lesson, which is yes, an epidemic arises in some foreign country and people go and take care of it and it doesn’t really bother me.

You can easily take the lesson from that that the tendency of these things is to just go away on their own and the proper people will take care of them and I don’t have to worry about this. What’s tricky is understanding from the actual characteristics of the system and your understanding of the system what makes it different from those other previous examples. In this case, something that is more transmissible, transmissible when it’s not very symptomatic, yet has a relatively high fatality rate, not very high like some of these other things, which would have been catastrophic, but a couple of percent or whatever it turns out to be.

I think people who understood the dynamics of infectious disease and saw high transmissibility and potential asymptomatic transmission and a death rate that was much higher than the flu immediately put those three things together and saw, oh my god, this is a major problem and a little bit different from some of those previous ones that had a lower fatality rate or were very, very obviously symptomatic when they were transmissible, and so it was much easier to quarantine people and so on. Those characteristics you can understand if you’re trained for that sort of thing to look for it, and those people did, but if not, you just sort of see it as another far away disease in a far off land that people will take care of and it’s very easy to dismiss it.

I think it’s not really a failure of imagination, but a failure to take seriously something that could happen that is perfectly plausible just because something like it hasn’t really happened like that before. That’s a very dangerous one I think.

Emilia Javorsky: It comes back to human nature sometimes and the frailty of our biases and our virtue. It’s very easy to convince yourself and recall examples where things did not come to pass. Because dealing with the reality of the negative outcome that you’re looking at, even if it looks like it has a fairly high probability, is something that is innately adverse for people, right? We look at negative outcomes and we look for reasons that those negative outcomes will not come to pass.

It’s easy to say, “Well, yes, it’s only let’s say a 40% probability and we’ve had these before,” and it becomes very easy to identify reasons and not look at a situation completely objectively as to why the best course of action is not to take the kind of drastic measures that are necessary to avoid the probability of the negative outcome, even if you know that it’s likely to come to pass.

Anthony Aguirre: It’s even worst that when people do see something coming and take significant action and mitigate the problem, they rarely get the sort of credit that they should.

Emilia Javorsky: Oh, completely.

Anthony Aguirre: Because you never see the calamity unfold that they avoided.

Emilia Javorsky: Yes.

Anthony Aguirre: The tendency will be, “Oh, you overreacted, or oh, that was never a big problem in the first place.” It’s very hard to piece together like Y2K. I think it’s still unclear, at least it is to me, what exactly would have happened if we hadn’t made a huge effort to mitigate Y2K. There are many similar other things where it could be that there really was a calamity there and we totally prevented it by just being on top of it and putting a bunch of effort in, or it could be that it wasn’t that big of a deal, and it’s very, very hard to tell in retrospect.

That’s another unfortunate bias that if we could see the counterfactual world in which we didn’t do anything about Y2K and saw all this terrible stuff unfold, then we could make heroes out of the people that put all that effort in and sounded the warning and did all the mitigation. But we don’t see that. It’s rather unrewarding in a literal sense. It’s just you don’t get much reward for preventing catastrophes and you get lots of blame if you don’t prevent them.

Emilia Javorsky: This is something we deal with all the time on the healthcare side of things. This is why preventative health and public health and basic primary care really suffer to get the funding, get the attention that they really need. It’s exactly this. Nobody cares about the disease that they didn’t get, the heart attack they didn’t have, the stroke that they didn’t have. For those of us that come from a public health background, it’s been kind of a collective banging our head against the wall for a very long time because we know looking at the data that this is the best way to take care of population level health.

Emilia Javorsky: Yet knowing that and having the data to back it up, it’s very difficult to get the attention across all levels of the healthcare system, from getting the individual patient on board all the way up to how do we fund healthcare research in the US and abroad.

Lucas Perry: These are all excellent points. What I’m seeing from everything that you guys said is to back it up to what Anthony said quite while ago, there is a kind of risk exceptionalism where we feel that our country or ourselves won’t be exposed to catastrophic risks. It’s other people’s families who lose someone in a car accident but not mine, even though the risk of that is fairly high. There’s this second kind of bias going on that acting on risk in order to mitigate it based off pure reasoning alone seems to be very difficult, especially when the intervention to mitigate the risk is very expensive because it requires a lot of trust in the experts and the reasoning that goes behind it, like spending billions of dollars to prevent the next pandemic.

It feels more tangible and intuitive now, but maybe for people of newer generations it felt a little bit more silly and would have had to have been more of a rational cognitive decision. Then the last thing here seems to be that there’s asymmetry between different kinds of risks. Like if someone mitigates a pandemic from happening, it’s really hard to appreciate how good that was of a thing to do, but that seems to not be true of all risks. For example, with risks where the risk actually just exists somewhere like in a lab or a nuclear missile silo. For example, people like Stanislav Petrov and Vasili Arkhipov we’re able to appreciate it very easily just because there was a concrete event and there was a big dangerous thing and they have stopped it from happening.

It seems also skillful here to at least appreciate which kinds of risks are the kinds where if they would have happened, but they didn’t because we prevented them, we can notice that versus the kinds of risks where if we stop them from happening, we can’t even notice that we stopped it from happening. Adjusting our attitude towards those with each feature would seem skillful. Let’s focus in then on making good predictions. Anthony, earlier you brought up Metaculus, could you explain what Metaculus is and what it’s been doing and how it’s been involved in COVID-19?

Anthony Aguirre: Metaculus is at some level an effort to deal with precisely the problem that we’ve been discussing, that it’s difficult to make predictions and it’s difficult to have a reason to trust predictions, especially when they’re probabilistic ones about complicated things. The idea of Metaculus is sort of twofold or threefold maybe I would say. One part of it is that it’s been shown through the years and this is work by Tetlock and The Good Judgment Project and a whole series of projects within IARPA, the Intelligence Advanced Research Projects Agency, that groups of people making predictions about things and having those predictions carefully combined can make better predictions often than even small numbers of experts. There tend to be kind of biases on different sides.

If you carefully aggregate people’s predictions, you can at some level wash out those biases. As well, making predictions is something that some people are just really good at. It’s a skill that varies person to person and can be trained. There are people who are just really good at making predictions across a wide range of domains. Sometimes in making a prediction, general prediction skill can trump actual subject matter expertise. Of course, it’s good to have both if you possibly can, but lots of times experts have a huge understanding of the subject matter.

But if they’re not actually practiced or trained or spend a lot of time making predictions, they may not make better predictions than someone who is really good at making predictions, but has less depth of understanding of the actual topic. That’s something that some of these studies made clear. The idea of combining those two is to create a system that solicits predictions from lots of different people on questions of interest, aggregates those predictions, and identifies which people are really good at making predictions and kind of counts their prediction and input more heavily than other people.

So that if someone has just a year’s long track record of over and over again making good predictions about things, they have a tremendous amount of credibility and that gives you a reason to think that they’re going to make good predictions about things in the future. If you take lots of people, all of whom are good at making predictions in that way and combine their predictions together, you’re going to get something that’s much, much more reliable than just someone off the street or even an expert making a prediction in a one-off way about something.

That’s one aspect of it is identify good predictors, have them accrue a very objective track record of being right, and then have them in aggregate make predictions about things that are just going to be a lot more accurate than other methods you can come up with. Then the second thing, and it took me a long time to really see the importance of this, but I think our earlier conversation has kind of brought this out, is that if you have a single system or a single consistent set of predictions and checks on those predictions. Metaculus is a system that has many, many questions that have had predictions made on them and have resolved that has been checked against what actually happened.

What you can do then is start to understand what does it mean when Metaculus as a system says that there’s a 10% chance of something happening. You can really say of all the things on Metaculus that have a 10% chance of happening, about 10% of those actually happen. There’s a meaning to the 10%, which you can understand quite well, that if you say I went to Metaculus and where to go and make bets based on a whole bunch of predictions that were on it, you would know that the 10% predictions on Metaculus come true about 10% of the time, and you can use those numbers and actually making decisions. Whereas when you go to some random person and they say, “Oh, there’s a 10% chance,” as we discussed earlier, it’s really hard to know what exactly to make of that, especially if it’s a one-off event.

The idea of Metaculus was to both make a system that makes highly accurate predictions as best as possible, but also a kind of collection of events that have happened or not happened in the world that you can use to ground the probabilities and give meaning to them, so that there’s some operational meaning to saying that something on the system has a 90% chance of happening. This has been going on since about 2014 or ’15. It was born basically at the same time as the Future of Life Institute actually for very much the same reason, thinking about what can we do to positively affect the future.

In my mind, I went through exactly the reasoning of, if we want to positively affect the future, we have to understand what’s going to happen in probabilistic terms and how to think about what we can decide now and what sort of positive or negative effects will that have. To do that, you need predictions and you need probabilities. That got me thinking about, how could we generate those? What kind of system could give us the sorts of predictions and probabilities that we want? It’s now grown pretty big. Metaculus now has 1,800 questions that are live on the site and 210,000 predictions on them, sort of of order of a hundred predictions per question.

The questions are all manner of things from who is going to be elected in some election to will we have a million residents on Mars by 2052, to what will the case fatality rate be for COVID-19. It spans all kinds of different things. The track record has been pretty good. Something that’s unusual in the world is that you can just go on the site and see every prediction that the system has made and how it’s turned out and you can score it in various ways, but you can get just a clear sense of how accurate the system has been over time. Each user also has a similar track record that you can see exactly how accurate each person has been over time. They get a reputation and then the system folds that reputation in when it’s making predictions about new things.

With COVID-19, as I mentioned earlier, lots of people suddenly realized that they really wanted good predictions about things. We’ve had a huge influx of people and interest in the site focused on the pandemic. That suggested to us that this was something that people were really looking for and was helpful to people, so we put a bunch of effort into creating a kind of standalone subset of Metaculus called pandemic.metaculus.com that’s hosting just COVID-19 and pandemic related things. That has 120 questions or so live on it now with 23,000 predictions on them. All manner of how many cases, how many deaths will there be and various things, what sort of medical interventions might turn out to be useful, when will a lock down in a certain place be lifted. Of course, all these things are unknowable.

But again, the point here is to get a best estimate of the probabilities that can be folded into planning. I also find that even when it’s not a predictive thing, it’s quite useful as just an information aggregator. For example, one of the really frustratingly hard to pin down things in the COVID-19 pandemic is the infection or case fatality, like what is the ratio of fatalities to the total number of identified cases or symptomatic cases or infections. Those really are all over the place. There’s a lot of controversy right now about whether that’s more like 2% or more like 0.2% or even less. There are people advocating views like that. It’s a little bit surprising that it’s so hard to pin down, but that’s all tied up in the prevalence of testing and asymptomatic cases and all these sorts of things.

Even a way to have a sort of central aggregation place for people to discuss and compare and argue about and then make numerical estimates of this rate, even if it’s less a prediction, right, because this is something that exists now, there is some value of this ratio, so even something like that, having people come together and have a specific way to put in their numbers and compare and combine those numbers I think is a really useful service.

Lucas Perry: Can you say a little bit more about the efficacy of the predictions? Like for example, I think that you mentioned that Metaculus predicted COVID-19 at a 10% probability?

Anthony Aguirre: Well, somewhat amusingly, somewhat tragically, I guess, there was a series of questions on Metaculus about pandemics in general long before this one happened. In December, one of those questions closed, that is no more predictions were made on it, and that question was, will there be a naturally spawned pandemic leading to at least a hundred million reported infections or at least 10 million deaths in a 12 month period by the end of 2025? The probability that was given to that was 36% on Metaculus. It’s a surprisingly high number. We now know that that was more like 100% but of course we didn’t know that at the time, but I think that was a much higher number than a fair number of people would have given it and certainly a much higher number than we were taking into account in our decisions. If anyone in a position of power had really believed that there were 36% chance of that happening, that would have led, as we discussed earlier, to a lot different actions taken. So that’s one particular question that I found interesting, but I think the more interesting thing really is to look across a very large number of questions and how accurate the system is overall. And then again, to have a way to say that there’s a meaning to the probabilities that are generated by the system, even for things that are only going to happen once and never again.

Like there’s just one time that chloroquine is either going to work or not work. We’re going to discover that it does or that it doesn’t. Nonetheless, we can usefully take probabilities from the system predicting it, that are more useful than probabilities you’re going to get through almost any other way. If you ask most doctors what’s the probability that chloroquine is going to turn out to be useful? They’ll say, “Well we don’t know. Let’s do the clinical trials” and that’s a perfectly good answer. That’s true. We don’t know. But if you wanted to make a decision in terms of resource allocation say, you really want to know how is it looking, what’s the probability of that versus some other possible things that I might put resources into. Now in this case, I think we should just put resources into all of them if we possibly can because it’s so important that it makes sense to try everything.

But you can imagine lots of cases where there would be a finite set of resources and even in this case there is a finite set of resources. You might want to think about where are the highest probability things and you’d want numbers ideally associated with those things. And so that’s the hope is to help provide those numbers and more clarity of thinking about how to make decisions based on those numbers.

Lucas Perry: Are there things like Metaculus for experts?

Anthony Aguirre: Well, I would say that it is already for experts in that we certainly encourage people with subject matter expertise to be involved and often they are. There are lots of people who have training in infectious disease and so on that are on pandemic.metaculus and I think hopefully that expertise will manifest itself in being right. Though as I said, you could be very expert in something but pretty bad at making predictions on it and vice versa.

So I think there’s already a fairly high level of expertise, and I should plug this for the listeners. If you like making or reading predictions and having in depth discussions and getting into the weeds about the numbers. Definitely check this out. Metaculus could use more people making predictions and making discussion on it. And I would also say we’ve been working very hard to make it useful for people who want accurate predictions about things. So we really want this to be helpful and useful to people and if there are things that you’d like to see on it, questions you’d like to have answered, capabilities whatever. The system is there, ask for those, give us feedback and so on. So yeah, I think Metaculus is already aimed at being a system that experts in a given topic would use but it doesn’t base its weightings on expertise.

We might fold this in at some point if it proves useful, it doesn’t at the moment say, oh you’ve got a PhD in this so I’m going to triple the weight that I give to your prediction. It doesn’t do that. Your PhD should hopefully manifest itself as being right and then that would give you extra weight. That’s less useful though in something that is brand new. Like when we have lots of new people coming in and making predictions. It might be useful to fold in some weighting according to what their credentials or expertise are or creating some other systems where they can exhibit that on the system. Like say, “Here I am, I’m such and such an expert. Here’s my model. Here are the details, here’s the published paper. This is why you should believe me”. That might influence other people to believe their prediction more and use it to inform their prediction and therefore could end up having a lot of weight. We’re thinking about systems like that. That could add to just the pure reputation based system we have now.

Lucas Perry: All right. Let’s talk about this from a higher level. From the view of people who are interested and work in global catastrophic and existential risks and the kinds of broader lessons that we’re able to extract from COVID-19. For example, from the perspective of existential risk minded people, we can appreciate how disruptive COVID-19 is to human systems like the economy and the healthcare system, but it’s not a tail risk and its severity is quite low. The case fatality rate is somewhere around a percent plus or minus 0.8% or so and it’s just completely shutting down economies. So it almost makes one feel worse and more worried about something which is just a little bit more deadly or a little bit more contagious. The lesson or framing on this is the lesson of the fragility of human systems and how the world is dangerous and that we lack resilience.

Emilia Javorsky: I think it comes back to part of the conversation on a combination of how we make decisions and how decisions are made as a society being one part, looking at information and assessing that information and the other part of it being experience. And past experience really does steer how we think about attacking certain problem spaces. We have had near misses but we’ve gone through quite a long period of time where we haven’t had anything this in the case of pandemic or we can think of other categories of risk as well that’s been sufficient to disturb society in this way. And I think that there is some silver lining here that people now acutely understand the fragility of the system that we live in and how something like the COVID-19 pandemic can have such profound levels of disruption. Where on the spectrum of the types of risks that we’re assessing and talking about. This would be on the more milder end of the spectrum.

And so I do think that there is an opportunity potentially here where people now unfortunately have had the experience of seeing how severely life can be disrupted, and how quickly our systems break down, and that absence of fail-safes and sort of resilience baked into them to be able to deal with these sorts of things. From one perspective I can see how you would feel worse. From another perspective I definitely think there’s a conversation to have. And start to take seriously some of the other risks that fall into the category of being catastrophic on a global scale and not entirely remote in terms of their probabilities. Now that people are really listening and paying attention.

Anthony Aguirre: The risk of a pandemic has probably been going up with population density and people pushing into animals habitats and so on, but not maybe dramatically increasing with time. Whereas there are other things like a deliberately or accidentally human caused pandemic where people have deliberately taken a pathogen and made it more dangerous in one way or another. And there are risks, for example, in synthetic biology where things that would never have occurred naturally can be designed by people. These are risks and possibilities that I think are growing very, very rapidly because the technology is growing so rapidly and may therefore be very, very underestimated when we’re basing our risks on frequencies of things happening in the past. This really gets worse the more you think about it because the idea that a naturally occurring thing could be so devastating and that when you talk to people in infectious disease about what in principle could be made, there are all kinds of nasty properties of different pathogens that if combined would be something really, really terrible and nature wouldn’t necessarily combine them like that. There’s no particular reason to, but humans could.

Then you really open up really, really terrifying scenarios. I think this does really drive home in an intuitive, very visceral way that we’re not somehow magically immune to those things happening and that there isn’t necessarily some amazing system in place that’s just going to prevent or stop those things from happening if those things get out into the world. We’ve seen containment fail, what this lesson tells us that we should be doing and what we should be paying more attention to. And I think it’s something we really, really urgently need to discuss.

Emilia Javorsky: So much of the cultural psyche that we’ve had around these types of risks has focused so much primarily on bad actors. When we talk about the risks that arise from pandemics, tools like genetic engineering and synthetic biology. We hear a lot about bad actors and the risks of bio-terrorism, but what you’re discussing, and I think really rightly highlighting, is that there doesn’t have to be any sort of ill will baked into these kinds of risks for them to occur. There can just be sloppy science that’s part of this or science with inadequate safety engineering. I think that that’s something people are starting to appreciate now that we’re experiencing a naturally occurring pandemic where there’s no actor to point to. There’s no ill will, there’s no enemy so to speak. Which is how I think so much of the pandemic conversation has happened up until this point and other risks as well where everyone assumes that it’s some sort of ill will.

When we talk about nuclear risk, people think about generally the risk of a nuclear war starting. Well we know that the risk of nuclear war versus the risk of nuclear accident, those two things are very different and its accidental risk that is much more likely to be devastating than purposeful initiation of some global nuclear war. So I think that’s important too, is just getting an appreciation that these things can happen either naturally occurring or when we think about emerging technologies, just a failure to understand and appreciate and engage in the precautions and safety measures that are needed when dealing with largely unknown science.

Anthony Aguirre: I completely agree with you, while also worrying a little bit that our human tendency is to react more strongly against things that we see as deliberate. If you look at just the numbers of people that have died of terrorist attacks say, they’re tiny compared to many, many other causes. And yet we feel as a society very threatened and have spent incredible amounts of energy and resources protecting ourselves against those sorts of attacks. So there’s some way in which we tend to take much more seriously for some reason, problems and attacks that are willful and where we can identify a wrongdoer, an enemy.

So I’m not sure what to think. I totally agree with you that there are lots of problems that won’t have an enemy to be fighting against. Maybe I’m agreeing with you that I worry that we’re not going to take them seriously for that reason. So I wonder in terms of pandemic preparedness, whether we shouldn’t keep emphasizing that there are bad actors that could cause these things just because people might pay more attention to that, whereas they seem to be awfully dismissive of the natural ones. I’m not sure how to think about that.

Emilia Javorsky: I actually think I’m in complete agreement with you, Anthony, that my point is coming from perhaps misplaced optimism that this could be an inflection point in that kind of thinking.

Anthony Aguirre: Fair enough.

Lucas Perry: I think that what we like to do is actually just declare war on everything, at least in America. So maybe we’ll have to declare a war on pathogens or something and then people will have an enemy to fight against. So continuing here on trying to consider what lessons the coronavirus situation can teach us about global catastrophic and existential risks. We have an episode with Toby Ord coming out tomorrow, at the time of this recording. In that conversation, global catastrophic risk was defined as something which kills 10% of the global population. Coronavirus is definitely not going to do that via its direct effects nor its indirect effects. There are real risks and a real class of risks which are far more deadly and widely impacting than COVID-19 and one of these that I’d like to pivot into now is what you guys just mentioned briefly was the risk of synthetic bio.

So that would be like AI enabled synthetic biology. So pathogens or viruses which are constructed and edited in labs via new kinds of biotechnology. Could you explain this risk and how it may be a much greater risk in the 21st century than naturally occurring pandemics?

Emilia Javorsky: I think what I would separate out is thinking about synthetic biology vs genetic engineering. So there are definitely tools we can use to intervene in pathogens that we already know and exist and one can foresee and thinking down sort of the bad actor train of thought, how you could intervene in those to increase their lethality, increase their transmissibility. The other side of this that’s a more unexplored side and you alluded to it being sort of AI enabled. It can be enabled by AI, it can be enabled by human intelligence, which is the idea of synthetic biology and creating life forms, sort of nucleotide by nucleotide. So we now have that capacity to really design DNA, to design life in ways that we previously just did not have that capacity to do. There’s certainly a pathogen angle that, but there’s also a tremendously unknown element.

We could end up creating life forms that are not things that we would intuitively think of as sort of human designers of life. And so what are the certain risks that are posed by potential entirely new classes of pathogens that we have not yet encountered before? When we talk about tools for either intervening and pathogens that already exist and changing their characteristics or creating designer ones from scratch, is just how cheap and ubiquitous these technologies have become. They’re far more accessible in terms of how cheap they are, how available they are and the level of expertise required to work with them. There’s that aspect of being a highly accessible, dangerous technology that also changes how we think about that.

Anthony Aguirre: Unfortunately, it seems not hard for me or I think anyone, but unfortunately not also for the biologists that I’ve talked to, to imagine pathogens that are just categorically worse than the sorts of things that have happened naturally. With AIDS, HIV, it took us decades and we still don’t have a vaccine and that’s something that was able to spread quite widely before anyone even noticed that it existed. So you can imagine awful combinations of long asymptomatic transmission combined with terrible consequences and difficulty of any kind of countermeasures being deliberately combined into something that just would be really, really orders of magnitude more terrible in the things we’ve experienced. It’s hard to imagine why someone would do that, but there are lots of things that are hard to imagine that people nonetheless do unfortunately. I think everyone whose thought much about this agrees that it’s just a huge problem, potentially the sort of super pathogen that could in principle wipe out a significant fraction of the world’s population.

What is the cost associated with that? The value of the world is hard to even know how to calculate it. It is just a vast number.

Lucas Perry: Plus the deep future.

Emilia Javorsky: Right.

Anthony Aguirre: I suppose there’s a 0.01% chance of someone developing something like that in the next 20 years and deploying it. That’s a really tiny chance, probably not going to happen, but when you multiply it by quadrillions of dollars, that still merits a fairly large response because it’s a huge expected cost. So we should not be putting thousands or hundreds of thousands or even millions of dollars into worrying about that. We really should be putting billions of dollars into worrying about that, if we were running the numbers even within an order of magnitude correctly. So I think that’s an example where our response to a low probability, high impact threat is utterly, utterly tiny compared to where it should be. And there are some other examples, but that’s one of those ones where I think it would be hard to find someone who would say that that isn’t 0.1 or even 1% likely over the next 20 years.

But if you really take that seriously, we should be doing a ton about this and we’re just not. Looking at many such examples and there are not a huge number, but there are enough that it takes a fair amount of work to look at them. And that’s part of what the future of Life Institute is here to do. And I’m looking forward to hearing your interview with Toby Ord as well along those lines. We really should be taking those things more seriously as a society and we don’t have to put in the right amount of money in the sense that if it’s 1% likely we don’t have to put in 1% of a quadrillion dollars because fortunately it’s way, way cheaper to prevent these things than to actually deal with them. But at some level, money should be no object when it comes to making sure that our entire civilization doesn’t get wiped out.

We can take as a lesson from this current pandemic that terrible things do happen even if nobody wants them to or almost nobody wants them to, they can easily outstrip our ability to deal with them after they’ve happened, particularly if we haven’t correctly planned for them. But that we are at a place in the world history where we can see them potentially coming and do something about it. I do think when we’re stuck at home thinking about in this terrible case scenario, 1% or even a few percent of our citizens could be killed by this disease. And I think back to what it must’ve been like in the middle ages when a third of Europe was destroyed by the Black Death and they had no idea what was going on. Imagine how terrifying that was and as bad as it is now, we’re not in that situation. We know exactly what’s going on at some level. We know what we can do to prevent it and there’s no reason why we shouldn’t be doing that.

Emilia Javorsky: Something that keeps me up at night about these scenarios is that prevention is really the only key strategy that has a good shot at being effective because we see how much, and I take your HIV example as being a great one, of how long it takes us to even to begin to understand the consequences of a new pathogen on the human body and nevermind to figure out how to intervene. We are at the infancy of our understanding about human physiology and even more so in how do we intervene in it. And when you see the strategies that are happening today with vaccine development, we still know about approximately how long that takes. A lot of that’s driven by the need for clinical studies. We don’t have good models to predict how things perform in people. That’s on the vaccine side, It’s also on the therapeutic side.

This is why clinical trials are long and expensive and still fail quite late stage. Even when we get to the point of knowing that something works in a Petri dish and then a mouse and then an early pilot study. At a phase three clinical study, that drug can fail its efficacy endpoint. And that’s quite common and that’s part of what drives up the cost of drug development. And so from my perspective, having come from the human biology side, it just strikes me given where medical knowledge is and the rate at which it’s progressing, which is quick, but it’s not revolutionary and it’s dwarfed by the rate of progress in some of these other domains, be it AI or synthetic biology. And so I’m just not confident that our field will move fast enough to be able to deal with an entirely novel pathogen if it comes 10, 20 even 50 years down the road. Personally what motivates me and gets me really passionate is thinking about these issues and mitigation strategies today because I think that is the best place for our efforts at the moment.

Anthony Aguirre: One thing that’s encouraging I would say about the COVID-19 pandemic is seeing how many people are working so quickly and so hard to do things about it. There are all kinds of components to that. There’s vaccine and antivirals and then all of the things that we’re seeing play out are inventions that we’ve devised to fight against this new pathogen. You can imagine a lot of those getting better and more effective and some of them much more effective so you can in principle, imagine really quick and easy vaccine development, that seems super hard.

But you can imagine testing if there were sort of all over the place, little DNA sequencers that could just sequence whatever pathogens are around in the air or in a person and spit out the list of things that are in there. That would seem to be just an enormous extra tool in our toolkit. You can imagine things like, and I suspect that this is coming in the current crisis because it exists in other countries and it probably will exist with us. Something where if I am tested and either have or don’t have an infection, that that will go into a hopefully, but not necessarily privacy preserving and encrypted database that will then be coordinated and shared in some way with other people so that the system as a whole can assess the likelihood that the people that I’ve been in contact with, their risk has gone up and they might be notified, they might be told, “Oh, you should get a test this week instead of next week,” or something like that.

So you can imagine the sort of huge amount of data that are gathered on people now, as part of our modern, somewhat sketchy online ecosystem being used for this purpose. I think they probably will, if we could do so in a way that we actually felt comfortable with, like if I had a system where I felt like I can share my personal health data and feel like I’ve got trust in the system to respect my privacy and my interest, and to be a good fiduciary, like a doctor would, and keeping my interest paramount. Of course I’d be happy to share that information, and in return get useful information from the system.

So I think lots of people would want to buy into that, if they trusted the system. We’ve unfortunately gotten to this place where nobody trusts anything. They use it, even though they don’t trust it, but nobody actually trusts much of anything. But you can imagine having a trusted system like that, which would be incredibly useful for this sort of thing. So I’m curious what you see as the competition between these dangers and the new components of the human immune system.

Emilia Javorsky: I am largely in agreement that on the very short term, we have technologies available today. The system you just described is one of them that can deal with this issue of data, and understanding who, what, when where are these symptoms and these infections. And we can make so much smarter decisions as a society, and really have prevented a lot of what we’re seeing today, if such a system was in place. That system could be enabled by the technology we have today. I mean, it’s not a far reach to think that that would be out of grasp or require any kind of advances in science and technology to put in place. They require perhaps maybe advances in trust in society, but that’s not a technology problem. I do think that’s something that there will be a will to do after the dust settles on this particular pandemic.

I think where I’m most concerned is actually our short term future, because some of the technologies we’re talking about, genetic engineering, synthetic biology, will ultimately also be able to be harnessed to be mitigation strategies for the kinds of things that we will face in the future. What I guess I’m worried about is this gap between when we’ve advanced these technologies to a place that we’re confident that they’re safe and effective in people, and we have the models and robust clinical data in place to feel comfortable using them, versus how quickly the threat is advancing.

So I think in my vision towards the longer term future, maybe on the 100 year horizon, which is still relatively very short, beyond that I think there could be a balance between the risks and the ability to harness these technologies to actually combat those risks. I think in the shorter term future, to me there’s a gap between the rate at which the risk is increasing because of the increased availability and ubiquity of these tools, versus our understanding of the human body and ability to harness these technologies against those risks.

So for me, I think there’s total agreement that there’s things we can do today based on data and tesingt, and rapid diagnostics. We talk a lot about wearables and how those could be used to monitor biometric data to detect these things before people become symptomatic, those are all strategies we can do today. I think there’s longer term strategies of how we harness these new tools in biology to be able to be risk mitigators. I think there’s a gap in between there where the risk is very high and the tools that we have that are scalable and ready to go are still quite limited.

Lucas Perry: Right, so there’s a duality here where AI and big data can both be applied to helping mitigate the current threats and risks of this pandemic, but also future pandemics. Yet, the same technology can also be applied for speeding up the development of potentially antagonistic synthetic biology, organisms which bad actors or people who are deeply misanthropic, or countries wish to gain power and hold the world hostage, may be able to use to realize a global catastrophic or existential risk.

Emilia Javorsky: Yeah, I mean, I think AI’s part of it, but I also think that there’s a whole category of risk here that’s probably even more likely in the short term, which is just the risks introduced by human level intelligence with these pathogens. That knowledge exists of how to make things more lethal and more transmissible with the technology available today. So I would say both.

Lucas Perry: Okay, thanks for that clarification. So there’s clearly a lot of risks in the 21st Century from synthetic bio gone wrong, or used for nefarious purposes. What are some ways in which synthetic bio might be able to help us with pandemic preparedness, or to help protect us against bad actors?

Emilia Javorsky: When we think about the tools that are available to us today within the realm of biotechnology, so I would include genetic engineering and synthetic biology in that category. The upside is actually tremendously positive. Where we see the future for these tools, the benefits have the potential to far outweigh the risks. When we talk about using these tools, these are the same tools, very similar to when we think about developing more powerful AI systems that are very fundamental and able to solve many problems. So when you start to be able to intervene in really fundamental biology, that really unlocks the potential to treat so many of the diseases that lack good treatments today, and that are largely incurable.

But beyond that, they can take that a step further, and being able to increase our health spans and our life spans. Even more broadly than that, really are key to some of the things we think about as existential risks and existential hope for our species. Today we are talking in depth about pandemics and the role that biology can play as a risk factor. But those same tools can be harnessed. We’re seeing it now with more rapid vaccine development, but things like synthetic biology and genetic engineering, are fundamental leaps forward in being able to protect ourselves against these threats with new mitigation strategies, and making our own biology and immune systems more resilient to these types of threats.

That ability for us to really now engineer and intervene in human biology, and thinking towards the medium to longterm future, unlocks a lot of possibilities for us, beyond just being able to treat and cure diseases. We think about how our own planet and climate is evolving, and we can use these same tools to evolve with it, and evolve to be more tolerant to some of the challenges that lie ahead. We all kind of know that eventually, whether that eventual will be sooner or much later, the survival of our species is contingent on becoming multi planetary. When we think about enduring the kind of stressors that even near term space travel impose and living in alien environments and adapting to alien environments, these are the fundamental tools that will really enable us to do that.

Well today, we’re starting to see the downsides of biology and some of the limitations of the tools we have today to intervene, and understanding what some of the near term risks are that the science of today poses in terms of pandemics. But really the future here is very, very bright for how these tools can be used to mitigate risk in the future, but also take us forward.

Lucas Perry:You have me thinking here about a great Carl Sagan quote that I really like where he says, “It will not be who reach Alpha Centauri and the other nearby stars, it will be a species very like us, but with more of our strengths and fewer of our weaknesses.” So, yeah, that seems to be in line with the upsides of synthetic bio.

Emilia Javorsky: You could even see the foundations of how we could use the tools that we have today to start to get to Proxima B. I think that quote would be realized in hopefully the not too distant future.

Lucas Perry: All right. So, taking another step back here, let’s get a little bit more perspective again on extracting some more lessons.

Anthony Aguirre: There were countries that were prepared for this and acted fairly quickly, and efficaciously, partly because they maybe had more firsthand experience with the previous perspective pandemics, but also maybe they just had a slightly different constituted society and leadership structure. There’s a danger here, I think, of seeing that top down and authoritarian governments have seen to be potentially more effective in dealing with this, because they can just take quick action. They don’t have to do a bunch of red tape or worry about pesky citizen’s rights and things, and they can just do what they want and crush the virus.

I don’t think that’s entirely accurate, but to the degree that it is, or that people perceive it to be, that worries me a little bit, because I really do strongly favor open societies and western democratic institutions over more totalitarian ones. I do worry that when our society and system of government so abjectly fails in serving its people, that people will turn to something rather different, or become very tolerant of something rather different, and that’s really bad news for us, I think.

So that worries me, a kind of competition of forms of government level that I really would like to see a better version of ours making itself seen and being effective in something like this, and sort of proving that there isn’t necessarily a conflict between having a right conferring, open society, with a strong voice of the people, and having something that is competent and serves its people well, and is capable in a crisis. They should not be mutually exclusive, and if we make them so, then we do so at great peril, I think.

Emilia Javorsky: That same worry keeps me up at night. I’ll try an offer an optimistic take on it.

Anthony Aguirre: Please.

Emilia Javorsky: Which is that authoritarian regimes are also the type that are not noted for their openness, and their transparency, and their ability to share realtime data on what’s happening within their borders. And so I think when we think about this pandemic or global catastrophic risk more broadly, the we is inherently the global community. That’s the nature of a global catastrophic risk. I think part of what has happened in this particular pandemic is it hit in the time where the spirit of multilateralism and global cooperation is arguably, in modern memory, partially the weakest its been. And so I think that the other way to look at it is, how do we cultivate systems of government that are capable of working together and acting on a global scale, and understanding that pandemics and global catastrophic risk is not confined to national borders. And how do you develop the data sharing, the information sharing, and also the ability to respond to that data in realtime at a global scale?

The strongest argument for forms of government that comes out of this is a pivot towards one that is much more open, transparent, and cooperative than perhaps we’ve been seeing as of late.

Anthony Aguirre: Well, I hope that is the lesson that’s taken. I really do.

Emilia Javorsky: I hope so, too. That’s the best perspective I can offer on it, because I too, am a fan of democracy and human rights. I believe these are generally good things.

Lucas Perry: So wrapping things up here, let’s try to get some perspective and synthesis of everything that we’ve learned from the COVID-19 crisis and what we can do in the future, what we’ve learned about humanity’s weaknesses and strengths. So, if you were to have a short pitch each to world leaders about lessons from COVID-19, what would that be? We can start with Anthony.

Anthony Aguirre: This crisis has thrust a lot of leaders and policy makers into the situation where they’re realizing that they have really high stakes decisions to make, and simply not the information that they need to make them well. They don’t have the expertise on hand. They don’t have solid predictions and modeling on hand. They don’t have the tools to fold those things together to understand what the results of their decisions will be and make the best decision.

So I think, I would suggest strongly that policy makers put in place those sorts of systems, how am I going to get reliable information from experts that allows me to understand from them, and model what is going to happen given different choices that I could make and make really good decisions so that when a crisis like this hits, we don’t find ourselves in the situation of simply not having the tools at our disposal to handle the crisis. And then I’d say having put those things in place, don’t wait for a crisis to use them. Just use those things all the time and make good decisions for society based on technology and expertise and understanding that we now are able to put in place together as a society, rather than whatever decision making processes we’ve generated socially and historically and so on. We actually can do a lot better and have a really, really well run society if we do so.

Lucas Perry: All right, and Emilia?

Emilia Javorsky: Yeah, I want to echo Anthony’s sentiment there with the need for evidence based realtime data at scale. That’s just so critical to be able to orchestrate any kind of meaningful response. And also to be able to act as Anthony eludes to, before you get to the point of a crisis, because there was a lot of early indicators here that could have prevented this situation that we’re in today. I would add that the next step in that process is also developing mechanisms to be able to respond in realtime at a global scale, and I think we are so caught up in sort of moments of an us verse them, whether that be on a domestic or international level, but the spirit of multilateralism is just at an all-time low.

I think we’ve been sorely reminded that when there’s global level threats, they require a global level response. No matter how much people want to be insular and think that their countries have borders, the fact of the matter is is that they do not. And we’re seeing the interdependency of our global system. So I think that in addition to building those data structures to get information to policy makers, there also needs to be a sort of supply chain and infrastructure built, and decision making structure to be able to respond to that information in real time.

Lucas Perry: You mentioned information here. One of the things that you did want to talk about on the podcast was information problems and how information is currently extremely partisan.

Emilia Javorsky: It’s less so that it’s partisan, and more so that it’s siloed and biased and personalized. I think one aspect of information that’s been very difficult in this current information environment, is the ability to communicate to a large audience accurate information, because the way that we communicate information today is mainly through click bait style titles. When people are mainly consuming information in a digital format, and it’s highly personalized, it’s highly tailored to their preferences, both in terms of the news outlets that they innately turn to for information, but also their own personal algorithms that know what kind of news to show you, whether it be in your social feeds or what have you.

I think when the structure of how we disseminate information is so personalized and partisan, it becomes very difficult to bring through all of that noise to communicate to people accurate balanced, measured, information. Because even when you do, it’s human nature that that’s not the types of things people are innately going to seek out. So what in times like this are mechanisms of disseminating information that we can think about that supersede all of that individualized media, and really get through to say, “All right, everyone needs to be on the same page and be operating off the best state of information that we have at this point. And this is what that is.”

Lucas Perry: All right, wonderful. I think that helps to more fully unpack this data structure point that Anthony and you were making. So yeah, thank you both so much for your time, and for helping us to reflect on lessons from COVID-19.

FLI Podcast: The Precipice: Existential Risk and the Future of Humanity with Toby Ord

Toby Ord’s “The Precipice: Existential Risk and the Future of Humanity” has emerged as a new cornerstone text in the field of existential risk. The book presents the foundations and recent developments of this budding field from an accessible vantage point, providing an overview suitable for newcomers. For those already familiar with existential risk, Toby brings new historical and academic context to the problem, along with central arguments for why existential risk matters, novel quantitative analysis and risk estimations, deep dives into the risks themselves, and tangible steps for mitigation. “The Precipice” thus serves as both a tremendous introduction to the topic and a rich source of further learning for existential risk veterans. Toby joins us on this episode of the Future of Life Institute Podcast to discuss this definitive work on what may be the most important topic of our time.

Topics discussed in this episode include:

  • An overview of Toby’s new book
  • What it means to be standing at the precipice and how we got here
  • Useful arguments for why existential risk matters
  • The risks themselves and their likelihoods
  • What we can do to safeguard humanity’s potential

Timestamps: 

0:00 Intro 

03:35 What the book is about 

05:17 What does it mean for us to be standing at the precipice? 

06:22 Historical cases of global catastrophic and existential risk in the real world

10:38 The development of humanity’s wisdom and power over time  

15:53 Reaching existential escape velocity and humanity’s continued evolution

22:30 On effective altruism and writing the book for a general audience 

25:53 Defining “existential risk” 

28:19 What is compelling or important about humanity’s potential or future persons?

32:43 Various and broadly appealing arguments for why existential risk matters

50:46 Short overview of natural existential risks

54:33 Anthropogenic risks

58:35 The risks of engineered pandemics 

01:02:43 Suggestions for working to mitigate x-risk and safeguard the potential of humanity 

01:09:43 How and where to follow Toby and pick up his book

 

This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.

All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. This episode is with Toby Ord and covers his new book “The Precipice: Existential Risk and the Future of Humanity.” This is a new cornerstone piece in the field of existential risk and I highly recommend this book for all persons of our day and age. I feel this work is absolutely critical reading for living an informed, reflective, and engaged life in our time. And I think even for those well acquainted with this topic area will find much that is both useful and new in this book. Toby offers a plethora of historical and academic context to the problem, tons of citations and endnotes, useful definitions, central arguments for why existential risk matters that can be really helpful for speaking to new people about this issue, and also novel quantitative analysis and risk estimations, as well as what we can actually do to help mitigate these risks. So, if you’re a regular listener to this podcast, I’d say this is a must add to your science, technology, and existential risk bookshelf. 

The Future of Life Institute is a non-profit and this podcast is funded and supported by listeners like you. So if you find what we do on this podcast to be important and beneficial, please consider supporting the podcast by donating at futureoflife.org/donate. If you support any other content creators via services like Patreon, consider viewing a regular subscription to FLI in the same light. You can also follow us on your preferred listening platform, like on Apple Podcasts or Spotify, by searching for us directly or following the links on the page for this podcast found in the description.

Toby Ord is a Senior Research Fellow in Philosophy at Oxford University. His work focuses on the big picture questions facing humanity. What are the most important issues of our time? How can we best address them?

Toby’s earlier work explored the ethics of global health and global poverty, demonstrating that aid has been highly successful on average and has the potential to be even more successful if we were to improve our priority setting. This led him to create an international society called Giving What We Can, whose members have pledged over $1.5 billion to the most effective charities helping to improve the world. He also co-founded the wider effective altruism movement, encouraging thousands of people to use reason and evidence to help others as much as possible.

His current research is on the long-term future of humanity,  and the risks which threaten to destroy our entire potential.

Finally, the Future of Life Institute podcasts have never had a central place for conversation and discussion about the episodes and related content. In order to facilitate such conversation, I’ll be posting the episodes to the LessWrong forum at Lesswrong.com where you’ll be able to comment and discuss the episodes if you so wish. The episodes more relevant to AI alignment will be crossposted from LessWrong to the Alignment Forum as well at alignmentforum.org.  

And so with that, I’m happy to present Toby Ord on his new book “The Precipice.”

We’re here today to discuss your new book, The Precipice: Existential Risk and the Future of Humanity. Tell us a little bit about what the book is about.

Toby Ord: The future of humanity, that’s the guiding idea, and I try to think about how good our future could be. That’s what really motivates me. I’m really optimistic about the future we could have if only we survive the risks that we face. There have been various natural risks that we have faced for as long as humanity’s been around, 200,000 years of Homo sapiens or you might include an even broader definition of humanity that’s even longer. That’s 2000 centuries and we know that those natural risks can’t be that high or else we wouldn’t have been able to survive so long. It’s quite easy to show that the risks should be lower than about 1 in 1000 per century.

But then with humanity’s increasing power over that time, the exponential increases in technological power. We reached this point last century with the development of nuclear weapons, where we pose a risk to our own survival and I think that the risks have only increased since then. We’re in this new period where the risk is substantially higher than these background risks and I call this time the precipice. I think that this is a really crucial time in the history and the future of humanity, perhaps the most crucial time, this few centuries around now. And I think that if we survive, and people in the future, look back on the history of humanity, schoolchildren will be taught about this time. I think that this will be really more important than other times that you’ve heard of such as the industrial revolution or even the agricultural revolution. I think this is a major turning point for humanity. And what we do now will define the whole future.

Lucas Perry: In the title of your book, and also in the contents of it, you developed this image of humanity to be standing at the precipice, could you unpack this a little bit more? What does it mean for us to be standing at the precipice?

Toby Ord: I sometimes think of humanity has this grand journey through the wilderness with dark times at various points, but also moments of sudden progress and heady views of the path ahead and what the future might hold. And I think that this point in time is the most dangerous time that we’ve ever encountered, and perhaps the most dangerous time that there will ever be. So I see it in this central metaphor of the book, humanity coming through this high mountain pass and the only path onwards is this narrow ledge along a cliff side with this steep and deep precipice at the side and we’re kind of inching our way along. But we can see that if we can get past this point, there’s ultimately, almost no limits to what we could achieve. Even if we can’t precisely estimate the risks that we face, we know that this is the most dangerous time so far. There’s every chance that we don’t make it through.

Lucas Perry: Let’s talk a little bit then about how we got to this precipice and our part in this path. Can you provide some examples or a story of global catastrophic risks that have happened and near misses of possible existential risks that have occurred so far?

Toby Ord: It depends on your definition of global catastrophe. One of the definitions that’s on offer is 10%, or more of all people on the earth at that time being killed in a single disaster. There is at least one time where it looks like we’ve may have reached that threshold, which was the Black Death, which killed between a quarter and a half of people in Europe and may have killed many people in South Asia and East Asia as well and the Middle East. It may have killed one in 10 people across the whole world. Although because our world was less connected than it is today, it didn’t reach every continent. In contrast, the Spanish Flu 1918 reached almost everywhere across the globe, and killed a few percent of people.

But in terms of existential risk, none of those really posed an existential risk. We saw, for example, that despite something like a third of people in Europe dying, that there wasn’t a collapse of civilization. It seems like we’re more robust than some give us credit for, but there’ve been times where there hasn’t been an actual catastrophe, but there’s been near misses in terms of the chances.

There are many cases actually connected to the Cuban Missile Crisis, a time of immensely high tensions during the Cold War in 1962. I think that the closest we have come is perhaps the events on a submarine that was unknown to the U.S. that it was carrying a secret nuclear weapon and the U.S. Patrol Boats tried to force it to surface by dropping what they called practice depth charges, but the submarine thought that there were real explosives aimed at hurting them. The submarine was made for the Arctic and so it was overheating in the Caribbean. People were dropping unconscious from the heat and the lack of oxygen as they tried to hide deep down in the water. And during that time the captain, Captain Savitsky, ordered that this nuclear weapon be fired and the political officer gave his consent as well.

On any of the other submarines in this flotilla, this would have been enough to launch this torpedo that then would have been a tactical nuclear weapon exploding and destroying the fleet that was oppressing them, but on this one, it was lucky that the flotilla commander was also on board this submarine, Captain Vasili Arkhipov and so, he overruled this and talked Savitsky down from this. So this was a situation at the height of this tension where a nuclear weapon would have been used. And we’re not quite sure, maybe Savitsky would have decided on his own not to do it, maybe he would have backed down. There’s a lot that’s not known about this particular case. It’s very dramatic.

But Kennedy had made it very clear that any use of nuclear weapons against U.S. Armed Forces would lead to an all-out full scale attack on the Soviet Union, so they hadn’t anticipated that tactical weapons might be used. They assumed it would be a strategic weapon, but it was their policy to respond with a full scale nuclear retaliation and it looks likely that that would have happened. So that’s the case where ultimately zero people were killed in that event. The submarine eventually surfaced and surrendered and then returned to Moscow where people were disciplined, but it brought us very close to this full scale nuclear war.

I don’t mean to imply that that would have been the end of humanity. We don’t know whether humanity would survive the full scale nuclear war. My guess is that we would survive, but that’s its own story and it’s not clear.

Lucas Perry: Yeah. The story to me has always felt a little bit unreal. It’s hard to believe we came so close to something so bad. For listeners who are not aware, the Future of Life Institute gives out a $50,000 award each year, called the Future of Life Award to unsung heroes who have contributed greatly to the existential security of humanity. We actually have awarded Vasili Arkhipov’s family with the Future of Life Award, as well as Stanislav Petrov and Matthew Meselson. So if you’re interested, you can check those out on our website and see their particular contributions.

And related to nuclear weapons risk, we also have a webpage on nuclear close calls and near misses where there were accidents with nuclear weapons which could have led to escalation or some sort of catastrophe. Is there anything else here you’d like to add in terms of the relevant historical context and this story about the development of our wisdom and power over time?

Toby Ord: Yeah, that framing, which I used in the book comes from Carl Sagan in the ’80s when he was one of the people who developed the understanding of nuclear winter and he realized that this could pose a risk to humanity on the whole. The way he thought about it is that we’ve had this massive development over the hundred billion human lives that have come before us. This succession of innovations that have accumulated building up this modern world around us.

If I look around me, I can see almost nothing that wasn’t created by human hands and this, as we all know, has been accelerating and often when you try to measure exponential improvements in technology over time, leading to the situation where we have the power to radically reshape the Earth’s surface, both say through our agriculture, but also perhaps in a moment through nuclear war. This increasing power has put us in a situation where we hold our entire future in the balance. A few people’s actions over a few minutes could actually potentially threaten that entire future.

In contrast, humanity’s wisdom has grown only falteringly, if at all. Many people would suggest that it’s not even growing. And by wisdom here, I mean, our ability to make wise decisions for human future. I talked about this in the book under the idea about civilizational virtues. So if you think of humanity as a group of agents, in the same way that we think of say nation states as group agents, we talk about is it in America’s interest to promote this trade policy or something like that? We can think of what’s in humanity’s interests and we find that if we think about it this way, humanity is crazily impatient and imprudent.

If you think about the expected lifespan of humanity, a typical species lives for about a million years. Humanity is about 200,000 years old. We have something like 800,000 or a million or more years ahead of us if we play our cards right and we don’t lead to our own destruction. The analogy would be 20% of the way through our life, like an adolescent who’s just coming into his or her own power, but doesn’t have the wisdom or the patience to actually really pay any attention to this possible whole future ahead of them and so they’re just powerful enough to get themselves in trouble, but not yet wise enough to avoid that.

If you continue this analogy, what is often hard for humanity at the moment to think more than a couple of election cycles ahead at best, but that would correspond say eight years to just the next eight hours within this person’s life. For the kind of short term interests during the rest of the day, they put the whole rest of their future at risk. And so I think that that helps to see what this lack of wisdom looks like. It’s not that it’s just a highfalutin term of some sort, but you can kind of see what’s going on is that the person is incredibly imprudent and impatient. And I think that many others virtues or vices that we think of in an individual human’s life can be applied in this context and are actually illuminating about where we’re going wrong.

Lucas Perry: Wonderful. Part of the dynamic here in this wisdom versus power race seems to be one of the solutions being slowing down power seems untenable or that it just wouldn’t work. So it seems more like we have to focus on amplifying wisdom. Is this also how you view the dynamic?

Toby Ord: Yeah, that is. I think that if humanity was more coordinated, if we were able to make decisions in a unified manner better than we actually can. So, if you imagine this was a single player game, I don’t think it would be that hard. You could just be more careful with your development of power and make sure that you invest a lot in institutions, and in really thinking carefully about things. I mean, I think that the game is ours to lose, but unfortunately, we’re less coherent than that and if one country decides to hold off on developing things, then other countries might run ahead and produce similar amount of risk.

Theres this kind of the tragedy of the commons at this higher level and so I think that it’s extremely difficult in practice for humanity to go slow on progress of technology. And I don’t recommend that we try. So in particular, there’s only at the moment, only a small number of people who really care about these issues and are really thinking about the long-term future and what we could do to protect it. And if those people were to spend their time arguing against progress of technology, I think that it would be a really poor use of their energies and probably just annoy and alienate the people they were trying to convince. And so instead, I think that the only real way forward is to focus on improving wisdom.

I don’t think that’s impossible. I think that humanity’s wisdom, as you could see from my comment before about how we’re kind of disunified, partly, it involves being able to think better about things as individuals, but it also involves being able to think better collectively. And so I think that institutions for overcoming some of these tragedies of the commons or prisoner’s dilemmas at this international level, are an example of the type of thing that will make humanity make wiser decisions in our collective interest.

Lucas Perry: It seemed that you said by analogy, that humanity’s lifespan would be something like a million years as compared with other species.

Toby Ord: Mm-hmm (affirmative).

Lucas Perry: That is likely illustrative for most people. I think there’s two facets of this that I wonder about in your book and in general. The first is this idea of reaching existential escape velocity, where it would seem unlikely that we would have a reason to end in a million years should we get through the time of the precipice and the second is I’m wondering your perspective on Nick Bostrom calls what matters here in the existential condition, Earth-originating intelligent life. So, it would seem curious to suspect that even if humanity’s existential condition were secure that we would still be recognizable as humanity in some 10,000, 100,000, 1 million years’ time and not something else. So, I’m curious to know how the framing here functions in general for the public audience and then also being realistic about how evolution has not ceased to take place.

Toby Ord: Yeah, both good points. I think that the one million years is indicative of how long species last when they’re dealing with natural risks. It’s I think a useful number to try to show why there are some very well-grounded scientific reasons for thinking that a million years is entirely in the ballpark of what we’d expect if we look at other species. And even if you look at mammals or other hominid species, a million years still seems fairly typical, so it’s useful in some sense for setting more of a lower bound. There are species which have survived relatively unchanged for much longer than that. One example is the horseshoe crab, which is about 450 million years old whereas complex life is only about 540 million years old. So that’s something where it really does seem like it is possible to last for a very long period of time.

If you look beyond that the Earth should remain habitable for something in the order of 500 million or a billion years for complex life before it becomes too hot due to the continued brightening of our sun. If we took actions to limit that brightening, which look almost achievable with today’s technology, we would only need to basically shade the earth by about 1% of the energy coming at it and increase that by 1%, I think it’s every billion years, we will be able to survive as long as the sun would for about 7 billion more years. And I think that ultimately, we could survive much longer than that if we could reach our nearest stars and set up some new self-sustaining settlement there. And then if that could then spread out to some of the nearest stars to that and so on, then so long as we can reach about seven light years in one hop, we’d be able to settle the entire galaxy. There are stars in the galaxy that will still be burning in about 10 trillion years from now and there’ll be new stars for millions of times as long as that.

We could have this absolutely immense future in terms of duration and the technologies that are beyond our current reach and if you look at the energy requirements to reach nearby stars, they’re high, but they’re not that high compared to say, the output of the sun over millions of years. And if we’re talking about a scenario where we’d last millions of years anyway, it’s unclear why it would be difficult with the technology would reach them. It seems like the biggest challenge would be lasting that long in the first place, not getting to the nearest star using technology for millions of years into the future with millions of years of stored energy reserves.

So that’s the kind of big picture question about the timing there, but then you also ask about would it be humanity? One way to answer that is, unless we go to a lot of effort to preserve Homo sapiens as we are now then it wouldn’t be Homo sapiens. We might go to that effort if we decide that it’s really important that it be Homo sapiens and that we’d lose something absolutely terrible. If we were to change, we could make that choice, but if we decide that it would be better to actually allow evolution to continue, or perhaps to direct it by changing who we are with genetic engineering and so forth, then we could make that choice as well. I think that that is a really critically important choice for the future and I hope that we make it in a very deliberate and careful manner rather than just going gung-ho and letting people do whatever they want, but I do think that we will develop into something else.

But in the book, my focus is often on humanity in this kind of broad sense. Earth-originating intelligent life would kind of be a gloss on it, but that has the issue that suppose humanity did go extinct and suppose we got lucky and some other intelligent life started off again, I don’t want to count that in what I’m talking about, even though it would technically fit into Earth-originating intelligent life. Sometimes I put it in the book as humanity or our rightful heirs something like that. Maybe we would create digital beings to replace us, artificial intelligences of some sort. So long as they were the kinds of beings that could actually fulfill the potential that we have, they could realize one of the best trajectories that we could possibly reach, then I would count them. It could also be that we create something that succeeds us, but has very little value, then I wouldn’t count it.

So yeah, I do think that we may be greatly changed in the future. I don’t want that to distract the reader, if they’re not used to thinking about things like that because they might then think, “Well, who cares about that future because it will be some other things having the future.” And I want to stress that there will only be some other things having the future if we want it to be, if we make that choice. If that is a catastrophic choice, then it’s another existential risk that we have to deal with in the future and which we could prevent. And if it is a good choice and we’re like the caterpillar that really should become a butterfly in order to fulfill its potential, then we need to make that choice. So I think that is something that we can leave to future generations that it is important that they make the right choice.

Lucas Perry: One of the things that I really appreciate about your book is that it tries to make this more accessible for a general audience. So, I actually do like it when you use lower bounds on humanity’s existential condition. I think talking about billions upon billions of years can seem a little bit far out there and maybe costs some weirdness points and as much as I like the concept of Earth-originating intelligent life, I also think it costs some weirdness points.

And it seems like you’ve taken some effort to sort of make the language not so ostracizing by decoupling it some with effective altruism jargon and the kind of language that we might use in effective altruism circles. I appreciate that and find it to be an important step. The same thing I feel feeds in here in terms of talking about descendant scenarios. It seems like making things simple and leveraging human self-interest is maybe important here.

Toby Ord: Thanks. When I was writing the book, I tried really hard to think about these things, both in terms of communications, but also in terms of trying to understand what we have been talking about for all of these years when we’ve been talking about existential risk and similar ideas. Often when in effective altruism, there’s a discussion about the different types of cause areas that effective altruists are interested in. There’s people who really care about global poverty, because we can help others who are much poorer than ourselves so much more with our money, and also about helping animals who are left out of the political calculus and the economic calculus and we can see why it is that they’re interests are typically neglected and so we look at factory farms, and we can see how we could do so much good.

And then also there’s this third group of people and then the conversation drifts off a bit, it’s like who have this kind of idea about the future and it’s kind of hard to describe and how to kind of wrap up together. So I’ve kind of seen that as one of my missions over the last few years is really trying to work out what is it that that third group of people are trying to do? My colleague, Will MacAskill, has been working on this a lot as well. And what we see is that this other group of effective altruists are this long-termist group.

The first group is thinking about this cosmopolitan aspect as much as me and it’s not just people in my country that matter, it’s people across the whole world and some of those could be helped much more. And the second group is saying, it’s not just humans that could be helped. If we widen things up beyond the species boundary, then we can see that there’s so much more we could do for other conscious beings. And then this third group is saying, it’s not just our time that we can help, there’s so much we can do to help people perhaps across this entire future of millions of years or further into the future. And so the difference there, the point of leverage is this difference between what fraction of the entire future is our present generation is perhaps just a tiny fraction. And if we can do something that will help that entire future, then that’s where this could be really key in terms of doing something amazing with our resources and our lives.

Lucas Perry: Interesting. I actually had never thought of it that way. And I think it puts it really succinctly the differences between the different groups that people focused on global poverty are reducing spatial or proximity bias in people’s focus on ethics or doing good. Animal farming is a kind of anti-speciesism, broadening our moral circle of compassion to other species and then the long-termism is about reducing time-based ethical bias. I think that’s quite good.

Toby Ord: Yeah, that’s right. In all these cases, you have to confront additional questions. It’s not just enough to make this point and then it follows that things are really important. You need to know, for example, that there really are ways that people can help others in distant countries and that the money won’t be squandered. And in fact, for most of human history, there weren’t ways that we could easily help people in other countries just by writing out a check to the right place.

When it comes to animals, there’s a whole lot of challenging questions there about what is the effects of changing your diet or the effects of donating to a group that prioritize animals in campaigns against factory farming or similar and when it comes to the long-term future, there’s this real question about “Well, why isn’t it that people in the future would be just as able to protect themselves as we are? Why wouldn’t they be even more well-situated to attend to their own needs?” Given the history of economic growth and this kind of increasing power of humanity, one would expect them to be more empowered than us, so it does require an explanation.

And I think that the strongest type of explanation is around existential risk. Existential risks are things that would be an irrevocable loss. So, as I define them, which is a simplification, I think of it as the destruction of humanity’s long-term potential. So I think of our long term potential as you could think of this set of all possible futures that we could instantiate. If you think about all the different collective actions of humans that we could take across all time, this kind of sets out this huge kind of cloud of trajectories that humanity could go in and I think that this is absolutely vast. I think that there are ways if we play our cards right of lasting for millions of years or billions or trillions and affecting billions of different worlds across the cosmos, and then doing all kinds of amazing things with all of that future. So, we’ve got this huge range of possibilities at the moment and I think that some of those possibilities are extraordinarily good.

If we were to go extinct, though, that would collapse this set of possibilities to a much smaller set, which contains much worse possibilities. If we went extinct, there would be just one future, whatever it is that would happen without humans, because there’d be no more choices that humans could make. If we had an irrevocable collapse of civilization, something from which we could never recover, then that would similarly reduce it to a very small set of very meager options. And it’s possible as well that we could end up locked into some dystopian future, perhaps through economic or political systems, where we end up stuck in some very bad corner of this possibility space. So that’s our potential. Our potential is currently the value of the best realistically realizable worlds available to us.

If we fail in an existential catastrophe, that’s the destruction of almost all of this value, and it’s something that you can never get back, because it’s our very potential that would be being destroyed. That then has an explanation as to why it is that people in the future wouldn’t be better able to solve their own problems because we’re talking about things that could fail now, that helps explain why it is that there’s room for us to make such a contribution.

Lucas Perry: So if we were to very succinctly put the recommended definition or framing on existential risk that listeners might be interested in using in the future when explaining this to new people, what is the sentence that you would use?

Toby Ord: An existential catastrophe is the destruction of humanity’s long-term potential, and an existential risk is the risk of such a catastrophe.

Lucas Perry: Okay, so on this long-termism point, can you articulate a little bit more about what is so compelling or important about humanity’s potential into the deep future and which arguments are most compelling to you with a little bit of a framing here on the question of whether or not the long-termist’s perspective is compelling or motivating for the average person like, why should I care about people who are far away in time from me?

Toby Ord: So, I think that a lot of people if pressed and they’re told “does it matter equally much if a child 100 years in the future suffers as a child at some other point in time?” I think a lot of people would say, “Yeah, it matters just as much.” But that’s not how we normally think of things when we think about what charity to donate to or what policies to implement, but I do think that it’s not that foreign of an idea. In fact, the weird thing would be why it is that people in virtue of the fact that they live in different times matter different amounts.

A simple example of that would be suppose you do think that things further into the future matter less intrinsically. Economists sometimes represent this by a pure rate of time preference. It’s a component of a discount rate, which is just to do with things mattering less in the future, whereas most of the discount rate is actually to do with the fact that money is more important to have earlier which is actually a pretty solid reason, but that component doesn’t affect any of these arguments. It’s only this little extra aspect about things matter less just because we’re in the future. Suppose you have that 1% discount rate of that form. That means that someone’s older brother matters more than their younger brother, that their life is equally long and has the same kinds of experiences is fundamentally more important for their older child than the younger child, things like that. This just seems kind of crazy to most people, I think.

And similarly, if you have these exponential discount rates, which is typically the only kind that economists consider, it has these consequences that what happens in 10,000 years is way more important than what happens in 11,000 years. People don’t have any intuition like that at all, really. Maybe we don’t think that much about what happens in 10,000 years, but 11,000 is pretty much the same as 10,000 from our intuition, but these other views say, “Wow. No, it’s totally different. It’s just like the difference between what happens next year and what happens in a thousand years.”

It generally just doesn’t capture our intuitions and I think that what’s going on is not so much that we have a kind of active intuition that things that happen further into the future matter less and in fact, much less because they would have to matter a lot less to dampen the fact that we can have millions of years of future. Instead, what’s going on is that we just aren’t thinking about it. We’re not really considering that our actions could have irrevocable effects over the long distant future. And when we do think about that, such as within environmentalism, it’s a very powerful idea. The idea that we shouldn’t sacrifice, we shouldn’t make irrevocable changes to the environment that could damage the entire future just for transient benefits to our time. And people think, “Oh, yeah, that is a powerful idea.”

So I think it’s more that they’re just not aware that there are a lot of situations like this. It’s not just the case of a particular ecosystem that could be an example of one of these important irrevocable losses, but there could be these irrevocable losses at this much grander scale affecting everything that we could ever achieve and do. I should also explain there that I do talk a lot about humanity in the book. And the reason I say this is not because I think that non-human animals don’t count or they don’t have intrinsic value, I do. It’s because instead, only humanity is responsive to reasons and to thinking about this. It’s not the case that chimpanzees will choose to save other species from extinction and will go out and work out how to safeguard them from natural disasters that could threaten their ecosystems or things like that.

We’re the only ones who are even in the game of considering moral choices. So in terms of the instrumental value, humanity has this massive instrumental value, because what we do could affect, for better or for worse, the intrinsic value of all of the other species. Other species are going to go extinct in about a billion years, basically, all of them when the earth becomes uninhabitable. Only humanity could actually extend that lifespan. So there’s this kind of thing where humanity ends up being key because we are the decision makers. We are the relevant agents or any other relevant agents will spring from us. That will be things that our descendants or things that we create and choose how they function. So, that’s the kind of role that we’re playing.

Lucas Perry: So if there are people who just simply care about the short term, if someone isn’t willing to buy into these arguments about the deep future or realizing the potential of humanity’s future, like “I don’t care so much about that, because I won’t be alive for that.” There’s also an argument here that these risks may be realized within their lifetime or within their children’s lifetime. Could you expand that a little bit?

Toby Ord: Yeah, in the precipice, when I try to think about why this matters. I think the most obvious reasons are rooted in the present. The fact that it will be terrible for all of the people who are alive at the time when the catastrophe strikes. That needn’t be the case. You could imagine things that meet my definition of an existential catastrophe that it would cut off the future, but not be bad for the people who were alive at that time, maybe we all painlessly disappear at the end of our natural lives or something. But in almost all realistic scenarios that we’re thinking about, it would be terrible for all of the people alive at that time, they would have their lives cut short and witness the downfall of everything that they’ve ever cared about and believed in.

That’s a very obvious natural reason, but the reason that moves me the most is thinking about our long-term future, and just how important that is. This huge scale of everything that we could ever become. And you could think of that in very numerical terms or you could just think back over time and how far humanity has come over these 200,000 years. Imagine that going forward and how small a slice of things our own lives are and you can come up with very intuitive arguments to exceed that as well. It doesn’t have to just be multiply things out type argument.

But then I also think that there are very strong arguments that you could also have rooted in our past and in other things as well. Humanity has succeeded and has got to where we are because of this partnership of the generations. Edmund Burke had this phrase. It’s something where, if we couldn’t promulgate our ideas and innovations to the next generation, what technological level would be like. It would be like it was in the Paleolithic time, even a crude iron shovel would be forever beyond our reach. It was only through passing down these innovations and iteratively improving upon them, we could get billions of people working in cooperation over deep time to build this world around us.

If we think about the wealth and prosperity that we have the fact that we live as long as we do. This is all because this rich world was created by our ancestors and handed on to us and we’re the trustees of this vast inheritance and if we would have failed, if we’d be the first of 10,000 generations to fail to pass this on to our heirs, we will be the worst of all of these generations. We’d have failed in these very important duties and these duties could be understood as some kind of reciprocal duty to those people in the past or we could also consider it as duties to the future rooted in obligations to people in the past, because we can’t reciprocate to people who are no longer with us. The only kind of way you can get this to work is to pay it forward and have this system where we each help the next generation with the respect for the past generations.

So I think there’s another set of reasons more deontological type reasons for it and you could all have the reasons I mentioned in terms of civilizational virtues and how that kind of approach rooted in being a more virtuous civilization or species and I think that that is a powerful way of seeing it as well, to see that we’re very impatient and imprudent and so forth and we need to become more wise or alternatively, Max Tegmark has talked about this and Martin Rees, Carl Sagan and others have seen it as something based on a cosmic significance of humanity, that perhaps in all of the stars and all of the galaxies of the universe, perhaps this is the only place where there is either life at all or we’re the only place where there’s intelligent life or consciousness. There’s different versions of this and that could make this exceptionally important place and this very rare thing that could be forever gone.

So I think that there’s a whole lot of different reasons here and I think that previously, a lot of the discussion has been in a very technical version of the future directed one where people have thought, well, even if there’s only a tiny chance of extinction, our future could have 10 to the power of 30 people in it or something like that. There’s something about this argument that some people find it compelling, but not very many. I personally always found it a bit like a trick. It is a little bit like an argument that zero equals one where you don’t find it compelling, but if someone says point out the step where it goes wrong, you can’t see a step where the argument goes wrong, but you still think I’m not very convinced, there’s probably something wrong with this.

And then people who are not from the sciences, people from the humanities find it an actively alarming argument that anyone who would make moral decisions on the grounds of an argument like that. What I’m trying to do is to show that actually, there’s this whole cluster of justifications rooted in all kinds of principles that many people find reasonable and you don’t have to accept all of them by any means. The idea here is that if any one of these arguments works for you, then you can see why it is that you have reasons to care about not letting our future be destroyed in our time.

Lucas Perry: Awesome. So, there’s first this deontological argument about transgenerational duties to continue propagating the species and the projects and value which previous generations have cultivated. We inherit culture and art and literature and technology, so there is a duties-based argument to continue the stewardship and development of that. There is this cosmic significance based argument that says that consciousness may be extremely precious and rare, and that there is great value held in the balance here at the precipice on planet Earth and it’s important to guard and do the proper stewardship of that.

There is this short-term argument that says that there is some reasonable likelihood I think, total existential risk for the next century you put at one in six, which we can discuss a little bit more later, so that would also be very bad for us and our children and short-term descendants should that be realized in the next century. Then there is this argument about the potential of humanity in deep time. So I think we’ve talked a bit here about there being potentially large numbers of human beings in the future or our descendants or other things that we might find valuable, but I don’t think that we’ve touched on the part and change of quality.

There are these arguments on quantity, but there’s also I think, I really like how David Pearce puts it where he says, “One day we may have thoughts as beautiful as sunsets.” So, could you expand a little bit here this argument on quality that I think also feeds in and then also with regards to the digitalization aspect that may happen, that there are also arguments around subjective time dilation, which may lead to more better experience into the deep future. So, this also seems to be another important aspect that’s motivating for some people.

Toby Ord: Yeah. Humanity has come a long way and various people have tried to catalog the improvements in our lives over time. Often in history, this is not talked about, partly because history is normally focused on something of the timescale of a human life and things don’t change that much on that timescale, but when people are thinking about much longer timescales, I think they really do. Sometimes this is written off in history as Whiggish history, but I think that that’s a mistake.

I think that if you were to summarize the history of humanity in say, one page, I think that the dramatic increases in our quality of life and our empowerment would have to be mentioned. It’s so important. You probably wouldn’t mention the Black Death, but you would mention this. Yet, it’s very rarely talked about within history, but there are people talking about it and there are people who have been measuring these improvements. And I think that you can see how, say in the last 200 years, lifespans have more than doubled and in fact, even in the poorest countries today, lifespans are longer than they were in the richest countries 200 years ago.

We can now almost all read whereas very few people could read 200 years ago. We’re vastly more wealthy. If you think about this threshold we currently use of extreme poverty, it used to be the case 200 years ago that almost everyone was below that threshold. People were desperately poor and now almost everyone is above that threshold. There’s still so much more that we could do, but there have been these really dramatic improvements.

Some people seem to think that that story of well-being in our lives getting better, increasing freedoms, increasing empowerment of education and health, they think that that story runs somehow counter to their concern about existential risk that one is an optimistic story and one’s a gloomy story. Ultimately, what I’m thinking is that it’s because these trends seem to point towards very optimistic futures that would make it all the more important to ensure that we survive to reach such futures. If all the trends suggested that the future was just going to inevitably move towards a very dreary thing that had hardly any value in it, then I wouldn’t be that concerned about existential risk, so I think these things actually do go together.

And it’s not just in terms of our own lives that things have been getting better. We’ve been making major institutional reforms, so while there is regrettably still slavery in the world today, there is much less than there was in the past and we have been making progress in a lot of ways in terms of having a more representative and more just and fair world and there’s a lot of room to continue in both those things. And even then, a world that’s kind of like the best lives lived today, a world that has very little injustice or suffering, that’s still only a lower bound on what we could achieve.

I think one useful way to think about this is in terms of your peak experiences. These moments of luminous joy or beauty, the moments that you’ve been happiest, whatever they may be and you think about how much better they are than the typical moments. My typical moments are by no means bad, but I would trade hundreds or maybe thousands for more of these peak experiences, and that’s something where there’s no fundamental reason why we couldn’t spend much more of our lives at these peaks and have lives which are vastly better than our lives are today and that’s assuming that we don’t find even higher peaks and new ways to have even better lives.

It’s not just about the well-being in people’s lives either. If you have any kind of conception about the types of value that humanity creates, so much of our lives will be in the future, so many of our achievements will be in the future, so many of our societies will be in the future. There’s every reason to expect that these greatest successes in all of these different ways will be in this long future as well. There’s also a host of other types of experiences that might become possible. We know that humanity only has some kind of very small sliver of the space of all possible experiences. We see in a set of colors, this three-dimensional color space.

We know that there are animals that see additional color pigments, that can see ultraviolet, can see parts of reality that we’re blind to. Animals with magnetic sense that can sense what direction north is and feel the magnetic fields. What’s it like to experience things like that? We could go so much further exploring this space. If we can guarantee our future and then we can start to use some of our peak experiences as signposts to what might be experienceable, I think that there’s so much further that we could go.

And then I guess you mentioned the possibilities of digital things as well. We don’t know exactly how consciousness works. In fact, we know very little about how it works. We think that there’s some suggestive reasons to think that minds including consciousness are computational things such that we might be able to realize them digitally and then there’s all kinds of possibilities that would follow from that. You could slow yourself down like slow down the rate at which you’re computed in order to see progress zoom past you and kind of experience a dizzying rate of change in the things around you. Fast forwarding through the boring bits and skipping to the exciting bits one’s life if one was digital could potentially be immortal, have backup copies, and so forth.

You might even be able to branch into being two different people, have some choice coming up as to say whether to stay on earth or to go to this new settlement in the stars, and just split with one copy go into this new life and one staying behind or a whole lot of other possibilities. We don’t know if that stuff is really possible, but it’s just to kind of give a taste of how we might just be seeing this very tiny amount of what’s possible at the moment.

Lucas Perry: This is one of the most motivating arguments for me, the fact that the space of all possible minds is probably very large and deep and that the kinds of qualia that we have access to are very limited and the possibility of well-being not being contingent upon the state of the external world which is always in flux and is always impermanent, we’re able to have a science of well-being that was sufficiently well-developed such that well-being was information and decision sensitive, but not contingent upon the state of the external world that seems like a form of enlightenment in my opinion.

Toby Ord: Yeah. Some of these questions are things that you don’t often see discussed in academia, partly because there isn’t really a proper discipline that says that that’s the kind of thing you’re allowed to talk about in your day job, but it is the kind of thing that people are allowed to talk about in science fiction. Many science fiction authors have something more like space opera or something like that where the future is just an interesting setting to play out the dramas that we recognize.

But other people use the setting to explore radical, what if questions, many of which are very philosophical and some of which are very well done. I think that if you’re interested in these types of questions, I would recommend people read Diaspora by Greg Egan, which I think is the best and most radical exploration of this and at the start of the book, it’s a setting in a particular digital system with digital minds substantially in the future from where we are now that have been running much faster than the external world. Their lives lived thousands of times faster than the people who’ve remained flesh and blood, so culturally that vastly further on, and then you get to witness what it might be like to undergo various of these events in one’s life. And in the particular setting it’s in. It’s a world where physical violence is against the laws of physics.

So rather than creating utopia by working out how to make people better behaved, the longstanding project have tried to make us all act nicely and decently to each other. That’s clearly part of what’s going on, but there’s this extra possibility that most people hadn’t even thought about, where because it’s all digital. It’s kind of like being on a web forum or something like that, where if someone attempts to attack you, you can just make them disappear, so that they can no longer interfere with you at all. And it explores what life might be like in this kind of world where the laws of physics are consent based and you can just make it so that people have no impact on you if you’re not enjoying the kind of impact that they’re having is a fascinating setting to explore radically different ideas about the future, which very much may not come to pass.

But what I find exciting about these types of things is not so much that they’re projections of where the future will be, but that if you take a whole lot of examples like this, they span a space that’s much broader than you were initially thinking about for your probability distribution over where the future might go and they help you realize that there are radically different ways that it could go. This kind of expansion of your understanding about the space of possibilities, which is where I think it’s best as opposed to as a direct prediction that I would strongly recommend some Greg Egan for anyone who wants to get really into that stuff.

Lucas Perry: You sold me. I’m interested in reading it now. I’m also becoming mindful of our time here and have a bunch more questions I would like to get through, but before we do that, I also want to just throw out here. I’ve had a bunch of conversations recently on the question of identity and open individualism and closed individualism and empty individualism are some of the views here.

For the long-termist perspective, I think that it’s pretty much very or deeply informative for how much or how little one may care about the deep future or digital minds or our descendants in a million years or humans that are around a million years later. I think for many people who won’t be motivated by these arguments, they’ll basically just feel like it’s not me, so who cares? And so I feel like these questions on personal identity really help tug and push and subvert many of our commonly held intuitions about identity. So, sort of going off of your point about the potential of the future and how it’s quite beautiful and motivating.

A little funny quip or thought there is I’ve sprung into Lucas consciousness and I’m quite excited, whatever “I” means, for there to be like awakening into Dyson sphere consciousness in Andromeda or something, and maybe a bit of a wacky or weird idea for most people, but thinking more and more endlessly about the nature of personal identity makes thoughts like these more easily entertainable.

Toby Ord: Yeah, that’s interesting. I haven’t done much research on personal identity. In fact, the types of questions I’ve been thinking about when it comes to the book are more on how radical change would be needed before it’s no longer humanity, so kind of like the identity of humanity across time as opposed to the identity for a particular individual across time. And because I’m already motivated by helping others and I’m kind of thinking more about the question of why just help others in our own time as opposed to helping others across time. How do you direct your altruism, your altruistic impulses?

But you’re right that they could also be possibilities to do with individuals lasting into the future. There’s various ideas about how long we can last with lifespans extending very rapidly. It might be that some of the people who are alive now actually do directly experience some of this long-term future. Maybe there are things that could happen where their identity wouldn’t be preserved, because it’d be too radical a break. You’d become two different kinds of being and you wouldn’t really be the same person, but if being the same person is important to you, then maybe you could make smaller changes. I’ve barely looked into this at all. I know Nick Bostrom has thought about it more. There’s probably lots of interesting questions there.

Lucas Perry: Awesome. So could you give a short overview of natural or non-anthropogenic risks over the next century and why they’re not so important?

Toby Ord: Yeah. Okay, so the main natural risks I think we’re facing are probably asteroid or comet impacts and super volcanic eruptions. In the book, I also looked at stellar explosions like supernova and gamma ray bursts, although since I estimate the chance of us being wiped out by one of those in the next 100 years to be one in a billion, we don’t really need to worry about those.

But asteroids, it does appear that the dinosaurs were destroyed 65 million years ago by a major asteroid impact. It’s something that’s been very well studied scientifically. I think the main reason to think about it is A, because it’s very scientifically understood and B, because humanity has actually done a pretty good job on it. We only worked out 40 years ago that the dinosaurs were destroyed by an asteroid and that they could be capable of causing such a mass extinction. In fact, it was only in 1960, 60 years ago that we even confirmed that craters on the Earth’s surface were caused by asteroids. So we knew very little about this until recently.

And then we’ve massively scaled up our scanning of the skies. We think that in order to cause a global catastrophe, the asteroid would probably need to be bigger than a kilometer across. We’ve found about 95% of the asteroids between 1 and 10 kilometers across, and we think we’ve found all of the ones bigger than 10 kilometers across. We therefore know that since none of the ones were found are on a trajectory to hit us within the next 100 years that it looks like we’re very safe from asteroids.

Whereas super volcanic eruptions are much less well understood. My estimate for those for the chance that we could be destroyed in the next 100 years by one is about one in 10,000. In the case of asteroids, we have looked into it so carefully and we’ve managed to check whether any are coming towards us right now, whereas it can be hard to get these probabilities further down until we know more, so that’s why my what about the super volcanic corruptions is where it is. That the Toba eruption was some kind of global catastrophe a very long time ago, though the early theories that it might have caused a population bottleneck and almost destroyed humanity, they don’t seem to hold up anymore. It is still illuminating of having continent scale destruction and global cooling.

Lucas Perry: And so what is your total estimation of natural risk in the next century?

Toby Ord: About one in 10,000. All of these estimates are in order of magnitude estimates, but I think that it’s about the same level as I put the super volcanic eruption and the other known natural risks I would put as much smaller. One of the reasons that we can say these low numbers is because humanity has survived for 2000 centuries so far, and related species such as Homo erectus have survived for even longer. And so we just know that there can’t be that many things that could destroy all humans on the whole planet from these natural risks,

Lucas Perry: Right, the natural conditions and environment hasn’t changed so much.

Toby Ord: Yeah, that’s right. I mean, this argument only works if the risk has either been constant or expectably constant, so it could be that it’s going up and down, but we don’t know which then it will also work. The problem is if we have some pretty good reasons to think that the risks could be going up over time, then our long track record is not so helpful. And that’s what happens when it comes to what you could think of as natural pandemics, such as the coronavirus.

This is something where it’s got into humanity through some kind of human action, so it’s not exactly natural how it actually got into humanity in the first place and then its spread through humanity through airplanes, traveling to different continents very quickly, is also not natural and is a faster spread than you would have had over this long-term history of humanity. And thus, these kind of safety arguments don’t count as well as they would for things like asteroid impacts.

Lucas Perry: This class of risks then is risky, but less risky than the human-made risks, which are a result of technology, the fancy x-risk jargon for this is anthropogenic risks. Some of these are nuclear weapons, climate change, environmental damage, synthetic bio-induced pandemics or AI-enabled pandemics, unaligned artificial intelligence, dystopian scenarios and other risks. Could you say a little bit about each of these and why you view unaligned artificial intelligence as the biggest risk?

Toby Ord: Sure. Some of these anthropogenic risks we already face. Nuclear war is an example. What is particularly concerning is a very large scale nuclear war, such as between the U.S. and Russia and nuclear winter models have suggested that the soot from burning buildings could get lifted up into the stratosphere which is high enough that it wouldn’t get rained out, so it could stay in the upper atmosphere for a decade or more and cause widespread global cooling, which would then cause massive crop failures, because there’s not enough time between frosts to get a proper crop, and thus could lead to massive starvation and a global catastrophe.

Carl Sagan suggested it could potentially lead to our extinction, but the current people working on this, while they are very concerned about it, don’t suggest that it could lead to human extinction. That’s not really a scenario that they find very likely. And so even though I think that there is substantial risk of nuclear war over the next century, either an accidental nuclear war being triggered soon or perhaps a new Cold War, leading to a new nuclear war, I would put the chance that humanity’s potential is destroyed through nuclear war at about one in 1000 over the next 100 years, which is about where I’d put it for climate change as well.

There is debate as to whether climate change could really cause human extinction or a permanent collapse of civilization. I think the answer is that we don’t know. Similar with nuclear war, but they’re both such large changes to the world, these kind of unprecedentedly rapid and severe changes that it’s hard to be more than 99% confident that if that happens that we’d make it through and so this is difficult to eliminate risk that remains there.

In the book, I look at the very worst climate outcomes, how much carbon is there in the methane clathrates under the ocean and in the permafrost? What would happen if it was released? How much warming would there be? And then what would happen if you had very severe amounts of warming such as 10 degrees? And I try to sketch out what we know about those things and it is difficult to find direct mechanisms that suggests that we would go extinct or that we would collapse our civilization in a way from which you could never be restarted again, despite the fact that civilization arose five times independently in different parts of the worlds already, so we know that it’s not like a fluke to get it started again. So it’s difficult to see the direct reasons why it could happen, but we don’t know enough to be sure that it can’t happen. In my sense, that’s still an existential risk.

Then I also have a kind of catch all for other types of environmental damage, all of these other pressures that we’re putting on the planet. I think that it would be too optimistic to be sure that none of those could potentially cause a collapse from which we can never recover as well. Although when I look at particular examples that are suggested, such as the collapse of pollinating insects and so forth, for the particular things that are suggested, it’s hard to see how they could cause this, so it’s not that I am just seeing problems everywhere, but I do think that there’s something to this general style of argument that unknown effects of the stressors we’re putting on the planet could be the end for us.

So I’d put all of those kind of current types of risks at about one in 1,000 over the next 100 years, but then it’s the anthropogenic risks from technologies that are still on the horizon that scare me the most and this would be in keeping with this idea of humanity’s continued exponential growth in power where you’d expect the risks to be escalating every century. And I think that the ones that I’m most concerned about, in particular, engineered pandemics and the risk of unaligned artificial intelligence.

Lucas Perry: All right. I think listeners will be very familiar with many of the arguments around why unaligned artificial intelligence is dangerous, so I think that we could skip some of the crucial considerations there. Could you touch a little bit then on the risks of engineered pandemics, which may be more new and then give a little bit of your total risk estimate for this class of risks.

Toby Ord: Ultimately, we do have some kind of a safety argument in terms of the historical record when it comes to these naturally arising pandemics. There are ways that they could be more dangerous now than they could have been in the past, but there are also many ways in which they’re less dangerous. We have antibiotics. We have the ability to detect in real time these threats, sequence the DNA of the things that are attacking us, and then use our knowledge of quarantine and medicine in order to fight them. So we have reasons to look to our safety on that.

But there are cases of pandemics or pandemic pathogens being created to be even more spreadable or even more deadly than those that arise naturally because the natural ones are not being optimized to be deadly. The deadliness is only if that’s in service of them spreading and surviving and normally killing your host is a big problem for that. So there’s room there for people to try to engineer things to be worse than the natural ones.

One case is scientists looking to fight disease, like Ron Fouchier with the bird flu, deliberately made a more infectious version of it that could be transmitted directly from mammal to mammal. He did that because he was trying to help, but it was, I think, very risky and I think a very bad move and most of the scientific community didn’t think it was a good idea. He did it in a bio safety level three enhanced lab, which is not the highest level of biosecurity, that’s BSL four, and even at the highest level, there have been an escape of a pathogen from a BSL four facility. So these labs aren’t safe enough, I think, to be able to work on newly enhanced things that are more dangerous than anything that nature can create in a world where so far the biggest catastrophes that we know of were caused by pandemics. So I think that it’s pretty crazy to be working on such things until we have labs from which nothing has ever escaped.

But that’s not what really worries me. What worries me more is bio weapons programs and there’s been a lot of development of bio weapons in the 20th Century, in particular. The Soviet Union reportedly had 20 tons of smallpox that they had manufactured for example, and they had an accidental release of smallpox, which killed civilians in Russia. They had an accidental release of anthrax, blowing it out across the whole city and killing many people, so we know from cases like this, that they had a very large bioweapons program. And the Biological Weapons Convention, which is the leading institution at an international level to prohibit bio weapons is chronically underfunded and understaffed. The entire budget of the BWC is less than that of a typical McDonald’s.

So this is something where humanity doesn’t have its priorities in order. Countries need to work together to step that up and to give it more responsibilities, to actually do inspections and make sure that none of them are using bio weapons. And then I’m also really concerned by the dark side of the democratization of biotechnology. The fact that rapid developments that we make with things like Gene Drives and CRISPR. These two huge breakthroughs. They’re perhaps Nobel Prize worthy. That in both cases within two years, they are replicated by university students in science competitions.

So we now have a situation where two years earlier, there’s like one person in the world who could do it or no one who could do it, then one person and then within a couple of years, we have perhaps tens of thousands of people who could do it, soon millions. And so if that pool of people eventually includes people like those in the Aum Shinrikyo cults that was responsible for the Sarin gas in the Tokyo subway, who actively one of their goals was to destroy everyone in the world. Once enough people can do these things and could make engineered pathogens, you’ll get someone with this terrible but massively rare motivation, or perhaps even just a country like North Korea who wants to have a kind of blackmail policy to make sure that no one ever invades. That’s why I’m worried about that. These rapid advances are empowering us to make really terrible weapons.

Lucas Perry: All right, so wrapping things up here. How do we then safeguard the potential for humanity and Earth-originating intelligent life? You seem to give some advice on high level strategy, policy and individual level advice, and this is all contextualized within this grand plan for humanity, which is that we reach existential security by getting to a place where existential risk is decreasing every century that we then enter a period of long reflection to contemplate and debate what is good and how we might explore the universe and optimize it to express that good and then that we execute that and achieve our potential. So again, how do we achieve all this, how do we mitigate x-risk, how do we safeguard the potential of humanity?

Toby Ord: That’s an easy question to end on. So what I tried to do in the book is to try to treat this at a whole lot of different levels. You kind of refer to the most abstract level to some extent, the point of that abstract level is to show that we don’t need to get ultimate success right now, we don’t need to solve everything, we don’t need to find out what the fundamental nature of goodness is, and what worlds would be the best. We just need to make sure we don’t end up in the ones which are clearly among the worst.

The point of looking further onwards with the strategy is just to see that we can set some things aside for later. Our task now is to reach what I call existential security and that involves this idea that will be familiar to many people to do with existential risk, which is to look at particular risks and to work out how to manage them, and to avoid falling victim to them, perhaps by being more careful with technology development, perhaps by creating our protective technologies. For example, better bio surveillance systems to understand if bio weapons have been launched into the environment, so that we could contain them much more quickly or to develop say a better work on alignment with AI research.

But it also involves not just fighting fires, but trying to become the kind of society where we don’t keep lighting these fires. I don’t mean that we don’t develop the technologies, but that we build in the responsibility for making sure that they do not develop into existential risks as part of the cost of doing business. We want to get the fruits of all of these technologies, both for the long-term and also for the short-term, but we need to be aware that there’s this shadow cost when we develop new things, and we blaze forward with technology. There’s shadow cost in terms of risk, and that’s not normally priced in. We just kind of ignore that, but eventually it will come due. If we keep developing things that produce these risks, eventually, it’s going to get us.

So what we need to do to develop our wisdom, both in terms of changing our common sense conception of morality, to take this long-term future seriously or our debts to our ancestors seriously, and we also need the international institutions to help avoid some of these tragedies of the commons and so forth as well, to find these cases where we’d all be prepared to pay the cost to get the security if everyone else was doing it, but we’re not prepared to just do it unilaterally. We need to try to work out mechanisms where we can all go into it together.

There are questions there in terms of policy. We need more policy-minded people within the science and technology space. People with an eye to the governance of their own technologies. This can be done within professional societies, but also we need more technology-minded people in the policy space. We often are bemoan the fact that a lot of people in government don’t really know much about how the internet works or how various technologies work, but part of the problem is that the people who do know about how these things work, don’t go into government. It’s not just that you can blame the people in government for not knowing about your field. People who know about this field, maybe some of them should actually work in policy.

So I think we need to build that bridge from both sides and I suggest a lot of particular policy things that we could do. A good example in terms of how concrete and simple it can get is that we renew the New START Disarmament Treaty. This is due to expire next year. And as far as I understand, the U.S. government and Russia don’t have plans to actually renew this treaty, which is crazy, because it’s one of the things that’s most responsible for the nuclear disarmament. So, making sure that we sign that treaty again, it is a very actionable point that people can kind of motivate around and so on.

And I think that there’s stuff for everyone to do. We may think that existential risk is too abstract and can’t really motivate people in the way that some other causes can, but I think that would be a mistake. I’m trying to sketch a vision of it in this book that I think can have a larger movement coalesce around it and I think that if we look back a bit when it came to nuclear war, the largest protest in America’s history at that time was against nuclear weapons in Central Park in New York and it was on the grounds that this could be the end of humanity. And that the largest movement at the moment, in terms of standing up for a cause is on climate change and it’s motivated by exactly these ideas about irrevocable destruction of our heritage. It really can motivate people if it’s expressed the right way. And so that actually fills me with hope that things can change.

And similarly, when I think about ethics, and I think about how in the 1950s, there was almost no consideration of the environment within their conception of ethics. It just was considered totally outside of the domain of ethics or morality and not really considered much at all. And the same with animal welfare, it was scarcely considered to be an ethical question at all. And now, these are both key things that people are taught in their moral education in school. And we have an entire ministry for the environment and that was within 10 years of Silent Spring coming out, I think all, but one English speaking country had a cabinet level position on the environment.

So, I think that we really can have big changes in our ethical perspective, but we need to start an expansive conversation about this and start unifying these things together not to be just like the anti-nuclear movement and the anti-climate change movement where it’s fighting a particular fire, but to be aware that if we want to actually get out there preemptively for these things that we need to expand that to this general conception of existential risk and safeguarding humanity’s long-term potential, but I’m optimistic that we can do that.

That’s why I think my best guess is that there’s a one in six chance that we don’t make it through this Century, but the other way around, I’m saying there’s a five in six chance that I think we do make it through. If we really played our cards right, we could make it a 99% chance that we make it through this Century. We’re not hostages to fortune. We humans get to decide what the future of humanity will be like. There’s not much risk from external forces that we can’t deal with such as the asteroids. Most of the risk is of our own doing and we can’t just sit here and bemoan the fact we’re in some difficult prisoner’s dilemma with ourselves. We need to get out and solve these things and I think we can.

Lucas Perry: Yeah. This point on moving from these particular motivation and excitement around climate change and nuclear weapons issues to a broader civilizational concern with existential risk seems to be a crucial and key important step in developing the kind of wisdom that we talked about earlier. So yeah, thank you so much for coming on and thanks for your contribution to the field of existential risk with this book. It’s really wonderful and I recommend listeners read it. If listeners are interested in that, where’s the best place to pick it up? How can they follow you?

Toby Ord: You could check out my website at tobyord.com. You could follow me on Twitter @tobyordoxford or I think the best thing is probably to find out more about the book at theprecipice.com. On that website, we also have links as to where you can buy it in your country, including at independent bookstores and so forth.

Lucas Perry: All right, wonderful. Thanks again, for coming on and also for writing this book. I think that it’s really important for helping to shape the conversation in the world and understanding around this issue and I hope we can keep nailing down the right arguments and helping to motivate people to care about these things. So yeah, thanks again for coming on.

Toby Ord: Well, thank you. It’s been great to be here.

AI Alignment Podcast: On Lethal Autonomous Weapons with Paul Scharre

 Topics discussed in this episode include:

  • What autonomous weapons are and how they may be used
  • The debate around acceptable and unacceptable uses of autonomous weapons
  • Degrees and kinds of ways of integrating human decision making in autonomous weapons 
  • Risks and benefits of autonomous weapons
  • An arms race for autonomous weapons
  • How autonomous weapons issues may matter for AI alignment and long-term AI safety

Timestamps: 

0:00 Intro

3:50 Why care about autonomous weapons?

4:31 What are autonomous weapons? 

06:47 What does “autonomy” mean? 

09:13 Will we see autonomous weapons in civilian contexts? 

11:29 How do we draw lines of acceptable and unacceptable uses of autonomous weapons? 

24:34 Defining and exploring human “in the loop,” “on the loop,” and “out of loop” 

31:14 The possibility of generating international lethal laws of robotics

36:15 Whether autonomous weapons will sanitize war and psychologically distance humans in detrimental ways

44:57 Are persons studying the psychological aspects of autonomous weapons use? 

47:05 Risks of the accidental escalation of war and conflict 

52:26 Is there an arms race for autonomous weapons? 

01:00:10 Further clarifying what autonomous weapons are

01:05:33 Does the successful regulation of autonomous weapons matter for long-term AI alignment considerations?

01:09:25 Does Paul see AI as an existential risk?

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the AI Alignment Podcast. I’m Lucas Perry. Today’s conversation is with Paul Scharre and explores the issue of lethal autonomous weapons. And so just what is the relation of lethal autonomous weapons and the related policy and governance issues to AI alignment and long-term AI risk? Well there’s a key question to keep in mind throughout this entire conversation and it’s that: if we cannot establish a governance mechanism as a global community on the concept that we should not let AI make the decision to kill, then how can we deal with more subtle near term issues and eventual long term safety issues about AI systems? This question is aimed at exploring the idea that autonomous weapons and their related governance represent a possibly critical first step on the international cooperation and coordination of global AI issues. If we’re committed to developing beneficial AI and eventually beneficial AGI then how important is this first step in AI governance and what precedents and foundations will it lay for future AI efforts and issues? So it’s this perspective that I suggest keeping in mind throughout the conversation. And many thanks to FLI’s Emilia Javorsky for much help on developing the questions for this podcast. 

Paul Scharre is a Senior Fellow and Director of the Technology and National Security Program at the Center for a New American Security. He is the award-winning author of Army of None: Autonomous Weapons and the Future of War, which won the 2019 Colby Award and was named one of Bill Gates’ top five books of 2018.

Mr. Scharre worked in the Office of the Secretary of Defense (OSD) where he played a leading role in establishing policies on unmanned and autonomous systems and emerging weapons technologies. Mr. Scharre led the DoD working group that drafted DoD Directive 3000.09, establishing the Department’s policies on autonomy in weapon systems. Mr. Scharre also led DoD efforts to establish policies on intelligence, surveillance, and reconnaissance (ISR) programs and directed energy technologies. He was involved in the drafting of policy guidance in the 2012 Defense Strategic Guidance, 2010 Quadrennial Defense Review, and Secretary-level planning guidance. His most recent position was Special Assistant to the Under Secretary of Defense for Policy. Prior to joining the Office of the Secretary of Defense, Mr. Scharre served as a special operations reconnaissance team leader in the Army’s 3rd Ranger Battalion and completed multiple tours to Iraq and Afghanistan.

The Future of Life Institute is a non-profit and this podcast is funded and supported by listeners like you. So if you find what we do on this podcast to be important and beneficial, please consider supporting the podcast by donating at futureoflife.org/donate. If you support any other content creators via services like Patreon, consider viewing a regular subscription to FLI in the same light. You can also follow us on your preferred listening platform, like on Apple Podcasts or Spotify, by searching for us directly or following the links on the page for this podcast found in the description.

And with that, here’s my conversion with Paul Scharre. 

All right. So we’re here today to discuss your book, Army of None, and issues related to autonomous weapons in the 21st century. To start things off here, I think we can develop a little bit of the motivations for why this matters. Why should the average person care about the development and deployment of lethal autonomous weapons?

Paul Scharre: I think the most basic reason is because we all are going to live in the world that militaries are going to be deploying future weapons. Even if you don’t serve in the military, even if you don’t work on issues surrounding say, conflict, this kind of technology could affect all of us. And so I think we all have a stake in what this future looks like.

Lucas Perry: Let’s clarify a little bit more about what this technology actually looks like then. Often in common media, and for most people who don’t know about lethal autonomous weapons or killer robots, the media often portrays it as a terminator like scenario. So could you explain why this is wrong, and what are more accurate ways of communicating with the public about what these weapons are and the unique concerns that they pose?

Paul Scharre: Yes, I mean, the Terminator is like the first thing that comes up because it’s such a common pop culture reference. It’s right there in people’s minds. So I think go ahead and for the listeners, imagine that humanoid robot in the Terminator, and then just throw that away, because that’s not what we’re talking about. Let me make a different comparison. Self-driving cars. We are seeing right now the evolution of automobiles that with each generation of car incorporate more autonomous features: parking, intelligent cruise control, automatic braking. These increasingly autonomous features in cars that are added every single year, a little more autonomy, a little more autonomy, are taking us down at some point in time to a road of having fully autonomous cars that would drive themselves. We have something like the Google car where there’s no steering wheel at all. People are just passengers along for the ride. We’re seeing something very similar happen in the military with each generation of robotic systems and we now have air and ground and undersea robots deployed all around the world in over 100 countries and non state groups around the globe with some form of drones or robotic systems, and with each generation they’re becoming increasingly autonomous.

Now, the issue surrounding autonomous weapons is, what happens when a predator drone has as much autonomy as a self-driving car? What happens when you have a weapon that’s out in the battlefield, and it’s making its own decisions about whom to kill? Is that something that we’re comfortable with? What are the legal and moral and ethical ramifications of this? And the strategic implications? What might they do for the balance of power between nations, or stability among countries? These are really the issues surrounding autonomous weapons, and it’s really about this idea that we might have, at some point of time and perhaps the not very distant future, machines making their own decisions about whom to kill on the battlefield.

Lucas Perry: Could you unpack a little bit more about what autonomy really is or means because it seems to me that it’s more like an aggregation of a bunch of different technologies like computer vision and image recognition, and other kinds of machine learning that are aggregated together. So could you just develop a little bit more about where we are in terms of the various technologies required for autonomy?

Paul Scharre: Yes, so autonomy is not really a technology, it’s an attribute of a machine or of a person. And autonomy is about freedom. It’s the freedom that a machine or a person is given to perform some tasks in some environment for some period of time. As people, we have very little autonomy as children and more autonomy as we grow up, we have different autonomy in different settings. In some work environments, there might be more constraints put on you; what things you can and cannot do. And it’s also environment-specific and task-specific. You might have autonomy to do certain things, but not other things. It’s the same with machines. We’re ultimately talking about giving freedom to machines to perform certain actions under certain conditions in certain environments.

There are lots of simple forms of autonomy that we interact with all the time that we sort of take for granted. A thermostat is a very simple autonomous system, it’s a machine that’s given a freedom to decide… decide, let’s put that in air quotes, because we come back to what it means for machines to decide. But basically, the thermostat is given the ability to turn on and off the heat and air conditioning based on certain parameters that a human sets, a desired temperature, or if you have a programmable thermostat, maybe the desired temperature at certain times a day or days of the week, is a very bounded kind of autonomy. And that’s what we’re talking about for any of these machines. We’re not talking about freewill, or whether the machine develops consciousness. That’s not a problem today, maybe someday, but certainly not with the machines we’re talking about today. It’s a question really of, how much freedom do we want to give machines, or in this case, weapons operating on the battlefield to make certain kinds of choices?

Now we’re still talking about weapons that are designed by people, built by people, launched by people, and put into the battlefields to perform some mission, but there might be a little bit less human control than there is today. And then there are a whole bunch of questions that come along with that, like, is it going to work? Would it be effective? What happens if there are accidents? Are we comfortable with seeding that degree of control over to the machine?

Lucas Perry: You mentioned the application of this kind of technology in the context of battlefields. Is there also consideration and interest in the use of lethal autonomous weapons in civilian contexts?

Paul Scharre: Yes, I mean, I think there’s less energy on that topic. You certainly see less of a poll from the police community. I mean, I don’t really run into people in a police or Homeland Security context, saying we should be building autonomous weapons. Well, you will hear that from militaries. Oftentimes, groups that are concerned about the humanitarian consequences of autonomous weapons will raise that as a concern. There’s both what might militaries do in the battlefield, but then there’s a concern about proliferation. What happens when the technology proliferates, and it’s being used for internal security issues, could be a dictator, using these kinds of weapons to repress the population. That’s one concern. And that’s, I think, a very, very valid one. We’ve often seen one of the last checks against dictators, is when they tell their internal security forces to fire on civilians, on their own citizens. There have been instances where the security forces say, “No, we won’t.” That doesn’t always happen. Of course, tragically, sometimes security forces do attack their citizens. We saw in the massacre in Tiananmen Square that Chinese military troops are willing to murder Chinese citizens. But we’ve seen other instances, certainly in the fall of the Eastern Bloc at the end of the Cold War, that security forces… these are our friends, these are our family. We’re not going to kill them.

And autonomous weapons could take away one of those checks on dictators. So I think that’s a very valid concern. And that is a more general concern about the proliferation of military technology into policing even here in America. We’ve seen this in the last 20 years, is a lot of military tech ends up being used by police forces in ways that maybe isn’t appropriate. And so that’s, I think, a very valid and legitimate sort of concern about… even if this isn’t kind of the intended use, what would that look like and what are the risks that could come with that, and how should we think about those kinds of issues as well?

Lucas Perry: All right. So we’re developing autonomy in systems and there’s concern about how this autonomy will be deployed in context where lethal force or force may be used. So the question then arises and is sort of the question at the heart of lethal autonomous weapons: Where is it that we will draw a line between acceptable and unacceptable uses of artificial intelligence in autonomous weapons or in the military, or in civilian policing? So I’m curious to know how you think about where to draw those lines or that line in particular, and how you would suggest to any possible regulators who might be listening, how to think about and construct lines of acceptable and unacceptable uses of AI.

Paul Scharre: That’s a great question. So I think let’s take a step back first and sort of talk about, what would be the kinds of things that would make uses acceptable or unacceptable. Let’s just talk about the military context just to kind of bound the problem for a second. So in the military context, you have a couple reasons for drawing lines, if you will. One is legal issues, legal concerns. We have a legal framework to think about right and wrong in war. It’s called the laws of war or international humanitarian law. And it lays out a set of parameters for what is acceptable and what… And so that’s one of the places where there has been consensus internationally, among countries that come together at the United Nations through the Convention on Certain Conventional Weapons, the CCW, the process, we’ve had conversations going on about autonomous weapons.

One of the points of consensus among nations is that existing international humanitarian law or the laws of war would apply to autonomous weapons. And that any uses of autonomy in weapons, those weapons have to be used in a manner that complies with the laws of war. Now, that may sound trivial, but it’s a pretty significant point of agreement and it’s one that places some bounds on things that you can or cannot do. So, for example, one of the baseline principles of the laws of war is the principle of distinction. Military forces cannot intentionally target civilians. They can only intentionally target other military forces. And so any use of force these people to comply with this distinction, so right off the bat, that’s a very important and significant one when it comes to autonomous weapons. So if you have to use a weapon that could not be used in a way to comply with this principle of distinction, it would be illegal under the laws war and you wouldn’t be able to build it.

And there are other principles as well, principles about proportionality, and ensuring that any collateral damage that affects civilians or civilian infrastructure is not disproportionate to the military necessity of the target that is being attacked. There are principles about avoiding unnecessary suffering of combatants. Respecting anyone who’s rendered out of combat or the appropriate term is “hors de combat,” who surrendered have been incapacitated and not targeting them. So these are like very significant rules that any weapon system, autonomous weapon or not, has to comply with. And any use of any weapon has to comply with, any use of force. And so that is something that constrains considerably what nations are permitted to do in a lawful fashion. Now do people break the laws of war? Well, sure, that happens. We’re seeing that happen in Syria today, Bashar al-Assad is murdering civilians, there are examples of Rogue actors and non state terrorist groups and others that don’t care about respecting the laws of war. But those are very significant bounds.

Now, one could also say that there are more bounds that we should put on autonomous weapons that might be moral or ethical considerations that exist outside the laws of war, that aren’t written down in a formal way in the laws of war, but they’re still important and I think those often come to the fore with this topic. And there are other ones that might apply in terms of reasons why we might be concerned about stability among nations. But the laws of war, at least a very valuable starting point for this conversation about what is acceptable and not acceptable. I want to make clear, I’m not saying that the laws of war are insufficient, and we need to go beyond them and add in additional constraints. I’m actually not saying that. There are people that make that argument, and I want to give credit to their argument, and not pretend it doesn’t exist. I want the listeners to sort of understand the full scope of arguments about this technology. But I’m not saying myself that’s the case necessarily. But I do think that there are concerns that people raise.

For example, people might say it’s wrong for a machine to decide whom to kill, it’s wrong for a machine to make the decision about life and death. Now I think that’s an interesting argument. Why? Why is it wrong? Is it because we think the machine might get the answer wrong, it might perform not as well as the humans because I think that there’s something intrinsic about weighing the value of life and death that we want humans to do, and appreciating the value of another person’s life before making one of these decisions. Those are all very valid counter arguments that exist in this space.

Lucas Perry: Yes. So thanks for clarifying that. For listeners, it’s important here to clarify the difference where some people you’re saying would find the laws of war to be sufficient in the case of autonomous weapons, and some would not.

Paul Scharre: Yes, I mean, this is a hotly debated issue. I mean, this is in many ways, the crux of the issue surrounding autonomous weapons. I’m going to oversimplify a bit because you have a variety of different views on this, but you certainly have some people whose views are, look, we have a set of structures called the laws of war that tell us what right and wrong looks like and more. And most of the things that people are worried about are already prohibited under the laws of war. So for example, if what you’re worried about is autonomous weapons, running amok murdering civilians, that’s illegal under the laws of war. And so one of the points of pushback that you’ll sometimes get from governments or others to the idea of creating like an ad hoc treaty that would ban autonomous weapons or some class of autonomous weapons, is look, some of the things people worry about like they’re already prohibited under the laws of war, passing another law to say the thing that’s already illegal is now illegal again doesn’t add any value.

There’s group of arguments that says the laws of war dictate effects in the battlefield. So they dictate sort of what the end effect is, they don’t really affect the process. And there’s a line of reasoning that says, that’s fine. The process doesn’t matter. If someday we could use autonomous weapons in a way that was more humane and more precise than people, then we should use them. And just the same way that self-driving cars will someday save lives on roads by avoiding accidents, maybe we could build autonomous weapons that would avoid mistakes in war and accidentally targeting civilians, and therefore we should use them. And let’s just focus on complying better with the laws of war. That’s one school of thought.

Then there’s a whole bunch of reasons why you might say, well, that’s not enough. One reason might be, well, militaries’ compliance with the laws of war. Isn’t that great? Actually, like people talk a good game, but when you look at military practice, especially if the rules for using weapon are kind of convoluted, you have to take a bunch of additional steps in order to use it in a way that’s lawful, that kind of goes out the window in conflict. Real world and tragic historical example of this was experienced throughout the 20th century with landmines where land mines were permitted to be used lawfully, and still are, if you’re not a signatory to the Ottawa Convention, they’re permitted to be used lawfully provided you put in a whole bunch of procedures to make sure that minefields are marked and we know the location of minefields, so they can be demined after conflict.

Now, in practice, countries weren’t doing this. I mean, many of them were just scattering mines from the air. And so we had this horrific problem of millions of mines around the globe persisting after a conflict. The response was basically this global movement to ban mines entirely to say, look, it’s not that it’s inconceivable to use mines in a way that you mean, but it requires a whole bunch of additional efforts, that countries aren’t doing, and so we have to take this weapon away from countries because they are not actually using it in a way that’s responsible. That’s a school of thought with autonomous weapons. Is look, maybe you can conjure up thought experiments about how you can use autonomous weapons in these very specific instances, and it’s acceptable, but once you start any use, it’s a slippery slope, and next thing you know, it’ll be just like landmines all over again, and they’ll be everywhere and civilians will be being killed. And so the better thing to do is to just not let this process even start, and not letting militaries have access to the technology because they won’t use it responsibly, regardless of whether it’s theoretically possible. That’s a pretty reasonable and defensible argument. And there are other arguments too.

One could say, actually, it’s not just about avoiding civilian harm, but there’s something intrinsic about weighing the value of an enemy soldier’s life, that we want humans involved in that process. And that if we took humans away from that process, we’ll be losing something that sure maybe it’s not written down in the laws of war, but maybe it’s not written down because it was always implicit that humans will always be making these choices. And now that it’s decision in front of us, we should write this down, that humans should be involved in these decisions and should be weighing the value of the human life, even an enemy soldier. Because if we give that up, we might give up something that is a constraint on violence and war that holds back some of the worst excesses of violence, we might even can make something about ourselves. And this is, I think, a really tricky issue because there’s a cost to humans making these decisions. It’s a very real cost. It’s a cost in post traumatic stress that soldiers face and moral injury. It’s a cost in lives that are ruined, not just the people that are killed in a battlefield, but the people have to live with that violence afterwards, and the ramifications and even the choices that they themselves make. It’s a cost in suicides of veterans, and substance abuse and destroyed families and lives.

And so to say that we want humans to stay still evolved to be more than responsible for killing, is to say I’m choosing that cost. I’m choosing to absorb and acknowledge and take on the cost of post traumatic stress and moral injury, and also the burdens that come with war. And I think it’s worth reflecting on the fact that the burdens of war are distributed very unequally, not just between combatants, but also on the societies that fight. As a democratic nation in the United States, we make a decision as a country to go to war, through our elected representatives. And yet, it’s a very tiny slice of the population that bears the burden for that war, not just putting themselves at risk, but also carrying the moral burden of that afterwards.

And so if you say, well, I want there to be someone who’s going to live with that trauma for the rest of your life. I think that’s an argument that one can make, but you need to acknowledge that that’s real. And that’s not a burden that we all share equally, it’s a burden we’re placing on young women and men that we send off to fight on our behalf. The flip side is if we didn’t do that, if we fought a war and no one felt the moral burden of killing, no one slept uneasy at night afterwards, what would they say about us as a society? I think these are difficult questions. I don’t have easy answers to that. But I think these are challenging things for us to wrestle with.

Lucas Perry: Yes, I mean, there’s a lot there. I think that was a really good illustration of the different points of views on this. I hadn’t heard or considered much the implications of post traumatic stress. And I think moral burden, you called it that would be a factor in what autonomous weapons would relieve in countries which have the power to develop them. Speaking personally, I think I find the arguments most compelling about the necessity of having human beings integrated in the process of decision making with regards to killing, because if you remove that, then you’re removing the deep aspect of humanity, which sometimes does not follow the laws of war, which we currently don’t have complex enough preference learning techniques and machine learning techniques to actually train autonomous weapon systems in everything that human beings value and care about, and that there are situations where deviating from following the laws of war may be the best thing to do. I’m not sure if you have any thoughts about this, but I think you did a good job of illustrating all the different positions, and that’s just my initial reaction to it.

Paul Scharre: Yes, these are tricky issues. And so I think one of the things I want to try to do for listeners is try to lay out the landscape of what these arguments are, and some of the pros and cons of them because I think sometimes they will often oversimplify on all sides. The other people will be like, well, we should have humans involved in making these decisions. Well, humans involved where? If I get into a self-driving car that has no steering wheel, it’s not true that there’s no human involvement. The type of human involvement has just changed in terms of where it exists. So now, instead of manually driving the car, I’m still choosing the car’s destination, I’m still telling the car where I want to go. You’re going to get into the car and car take me wherever you want to go. So the type of human involvement is changed.

So what kind of human relationship do we want with decisions about life and death in the battlefield? What type of human involvement is right or necessary or appropriate and for what reason? For a legal reason, for a moral reason. These are interesting challenges. We haven’t had to confront anymore. These arguments I think unfairly get simplified on all sides. Conversely, you hear people say things like, it doesn’t matter, because these weapons are going to get built anyway. It’s a little bit overly simplistic in the sense that there are examples of successes in arms control. It’s hard to pull off. There are many examples of failures as well, but there are places where civilized nations have walked back from some technologies to varying degrees of success, whether it’s chemical weapons or biological weapons or other things. So what is success look like in constraining a weapon? Is it no one ever uses the weapon? Is it most nations don’t use it? It’s not used in certain ways. These are complicated issues.

Lucas Perry: Right. So let’s talk a little bit here about integrating human emotion and human reasoning and humanity itself into the autonomous weapon systems and the life or death decisions that they will be making. So hitting on a few concepts here, if you could help explain what people mean when they say human in the loop, and human on the loop, and how this relates to the integration of human control and human responsibility and human accountability in the use of autonomous weapons.

Paul Scharre: Let’s unpack some of this terminology. Broadly speaking, people tend to use the terms human in the loop, on the loop, or out of the loop to refer to semi autonomous weapons human is in the loop, which means that for any really semi autonomous process or system, the machine is taking an action and then it pauses and waits for humans to take a positive action before proceeding. A good example of a human in the loop system is the automated backups on your computer when they require you to push a button to say okay to do the backup now. They’re waiting music in action before proceeding. In a human on the loop system, or one where the supervisor control is one of the human doesn’t have to take any positive action for the system to proceed. The human can intervene, so the human can sit back, and if you want to, you can jump in.

Example of this might be your thermostat. When you’re in a house, you’ve already set the parameters, it’ll turn on the heat and air conditioning on its own, but if you’re not happy with the outcome, you could change it. Now, when you’re out of the house, your thermostat is operating in a fully autonomous fashion in this respect where humans out of the loop. You don’t have any ability to intervene for some period of time. It’s really all about time duration. For supervisory control, how much time does the human have to identify something is wrong and then intervene? So for example, things like the Tesla autopilots. That’s one where the human is in a supervisory control capacity. So the autopilot function in a car, the human doesn’t have to do anything, car’s driving itself, but they can intervene.

The problem with some of those control architectures is the time that you are permitting people to identify that there’s a problem, figure out what’s going on, decide to take action, intervene, really realistic before harm happens. Is it realistic that a human can be not paying attention, and then all of a sudden, identify that the car is in trouble and leap into action to avoid an accident when you’re speeding on the highway 70 miles an hour? And then you can see quite clearly in a number of fatal accidents with these autopilots, that that’s not feasible. People actually aren’t capable of doing that. So you’ve got to think about sort of what is the role of the human in this process? This is not just a semi autonomous or supervised autonomous or fully autonomous process. It’s one where the human is involved in some varying capacity.

And what are we expecting the human to do? Same thing with something that’s fully autonomous. We’re talking about a system that’s operating on its own for some period of time. How long before it checks back in with a person? What information is that person given? And what is their capacity to intervene or how bad could things go wrong when the person is not involved? And when we talk about weapons specifically. There are lots of weapons that operate in a semi autonomous fashion today where the human is choosing the target, but there’s a lot of automation in IDing targets presenting information to people in actually carrying out an attack, once the human has chosen a target, there are many, many weapons that are what the military calls fire and forget weapon, so once it’s launched, it’s not coming back. Those have been widely used for 70 years since World War Two. So that’s not new.

There are a whole bunch of weapons that operate in a supervisory autonomy mode, where humans on the loop. These are generally used in a more limited fashion for immediate localized defense of air bases or ships or ground vehicles defending against air or missile or rocket attack, particularly when the speed of these attacks might overwhelm people’s ability to respond. For humans to be in the loop, for humans to push a button, every time there’s a missile coming in, you could have so many missiles coming in so fast that you have to just simply activate an automatic defensive mode that will shoot down all have the missiles based on some pre-programmed parameters that humans put into the system. This exists today. The systems have been around for decades since the 1980s. And there were widespread use with at least 30 countries around the globe. So this is a type of weapon system that’s already in operation. These supervisory autonomous weapons. What really would be new would be fully autonomous weapons that operate on their own, whereas humans are still building them and launching them, but humans put them into operation, and then there’s some period of time where they were able to search a target area for targets and they were able to find these targets, and then based on some programming that was designed by people, identify the targets and attack them on their own.

Lucas Perry: Would you consider that out of the loop for that period of time?

Paul Scharre: Exactly. So over that period of time, humans are out of the loop on that decision over which targets they’re attacking. That would be potentially largely a new development in war. There are some isolated cases of some weapon systems that cross this line, by in large that would be new. That’s at least the starting point of what people might be concerned about. Now, you might envision things that are more advanced beyond that, but that’s sort of the near term development that could be on the horizon in the next five to 15 years, telling the weapon system, go into this area, fly around or search around underwater and find any ships of this type and attack them for some period of time in space. And that changes the human’s relationship with the use of force a little bit. It doesn’t mean the humans not involved at all, but the humans not quite as involved as they used to be. And is that something we’re comfortable with? And what are the implications of that kind of shift in warfare.

Lucas Perry: So the relevant things here are how this helps to integrate human control and human responsibility and human accountability into autonomous weapons systems. And just hearing you speak about all of that, it also seems like very relevant questions have to do with human psychology, about what human beings are actually likely to be able to do. And then also, I think you articulately put the practical question of whether or not people will be able to react to certain threats given certain situations. So in terms of trying to understand acceptable and unacceptable uses of autonomous weapons, that seems to supervene upon a lot of these facets of benefits and disadvantages of in the loop, on the loop, and out of the loop for different situations and different risks, plus how much we’re willing to automate killing and death and remove human decision making from some of these situations or not.

Paul Scharre: Yes, I mean, I think what’s challenging in this space is that it would be nice, it would be ideal if we could sort of reach agreement among nations for sort of a lethal laws of robotics, and Isaac Asimov’s books about robots you think of these three laws of robotics. Well, those laws aren’t going to work because one of them is not harming a human being and it’s not going to work in the military context, but could there be some agreement among countries for lethal laws of robots that would govern the behavior of autonomous systems in war, and it might sort of say, these are the things that are acceptable or not? Maybe. Maybe that’s possible someday. I think we’re not there yet at least, there are certainly not agreement as widespread disagreement among nations about what approach to take. But the good starting position of trying to understand what are the goals we want to achieve. And I think you’re right that we need to keep the human sort of front and center. But I this this is like a really important asymmetry between humans and machines that’s worth highlighting, which is to say that the laws of war government effects in the battlefield, and then in that sentence, the laws of war, don’t say the human has to pick every target, the laws of war say that the use of force must be executed according to certain principles of distinction and proportionality and other things.

One important asymmetry in the laws of war, however, is that machines are not legal agents. Only humans have legal agents. And so it’s ultimately humans that are responsible for complying with the laws of war. You can’t put a machine on trial for a war crime. It doesn’t make sense. It doesn’t have intentionality. So it’s ultimately a human responsibility to ensure this kind of compliance with the laws of war. It’s a good starting point then for conversation to try to understand if we start from that proposition that it’s a human responsibility to ensure compliance with the laws of war, then what follows from that? What balances that place on human involvement? One of the early parts of the conversations on autonomous weapons internationally came from this very technological based conversation. To say, well, based on the technology, draw these lines, you should put these limits in place. The problem with that approach is not that you can’t do it.

The problem is the state of the technology when? 2014 when discussions on autonomous weapons started at the very beginning of the deep learning revolution, today, in 2020, our estimate of whether technology might be in five years or 10 years or 50 years? The technology moving so quickly than any technologically based set of rules about how we should approach this problem and what is the appropriate use of machines versus human decision making in the use of force. Any technologically based answer is one that we may look back in 10 years or 20 years and say is wrong. We could get it wrong in the sense that we might be leaving valuable technological opportunities on the table and we’re banning technology that if we used it actually might make war more humane and reduce civilian casualties, or we might be permitting technologies that turned out in retrospect to be problematic, and we shouldn’t have done that.

And one of the things we’ve seen historically when you look at attempts to ban weapons is that ones that are technologically based don’t always fare very well over time. So for example, the early bans on poison gas banned the use of poison gas that are launched from artillery shells. It allowed actually poison gas administered via canisters, and so the first use of poison gas in World War One by the Germans was canister based, they actually just laid out little canisters and then open the valves. Now that turns out to be not very practical way of using poison gas in war, because you have someone basically on your side standing over this canister, opening a valve and then getting gassed. And so it’s a little bit tricky, but technically permissible.

One of the things that can be challenging is it’s hard to foresee how the technology is going to evolve. A better approach and one that we’ve seen the dialogue internationally sort of shift towards is our human-centered approach. To start from the position of the human and say, look, if we had all the technology in the world and war, what decisions would we want humans to make and why? Not because the technology cannot make decisions, but because it should not. I think it’s actually a very valuable starting place to understand a conversation, because the technology is moving so quickly.

What role do we want humans to play in warfare, and why do we think this is the case? Are there some tasks in war, or some decisions that we think are fundamentally human that should be decisions that only humans should make and we shouldn’t hand off to machines? I think that’s a really valuable starting position then to try to better interrogate how do we want to use this technology going forward? Because the landscape of technological opportunity is going to keep expanding. And so what do we want to do with this technology? How do we want to use it? And are there ways that we can use this technology that keeps humans in control of the use of force in the battlefield? Keep humans legally and morally and ethically responsible, but may make war more humane in the process, that may make war more precise, that may reduce civilian casualties without losing our humanity in the process.

Lucas Perry: So I guess the thought experiment, there would be like, if we had weapons that let us just delete people instantly without consequences, how would we want human decision making to be integrated with that? Reflecting on that also makes me consider this other point that I think is also important for my considerations around lethal autonomous weapons, which is the necessity of integrating human experience in the consequences of war, the pain and the suffering and the carnage and the PTSD as being almost necessary vehicles to some extent to make us tired of it to integrate how horrible it is. So I guess I would just be interested in integrating that perspective into it not just being about humans making decisions and the decisions being integrated in the execution process, but also about the experiential ramifications of being in relation to what actually happens in war and what violence is like and what happens in violence.

Paul Scharre: Well, I think that we want to unpack a little bit some of the things you’re talking about. Are we talking about ensuring that there is an accurate representation to the people carrying out the violence about what’s happening on the other end, that we’re not sanitizing things. And I think that’s a fair point. When we begin to put more psychological barriers between the person making the decision and the effects, it might be easier for them to carry out larger scale attacks, versus actually making war and more horrible. Now that’s a line of reasoning, I suppose, to say we should make war more horrible, so there’ll be less of it. I’m not sure we might get the outcome that there is less of it. We just might have more horrible war, but that’s a different issue. Those are more difficult questions.

I will say that I often hear philosophers raising things about skin in the game. I rarely hear them being raised by people who have had skin in the game, who have experienced up close in a personal way the horrors of war. And I’m less convinced that there’s a lot of good that comes from the tragedy of war. I think there’s value in us trying to think about how do we make war less terrible? How do we reduce civilian casualties? How do we have less war? But this often comes up in the context of technologies like we should somehow put ourselves at risk. No military does that, no military has ever done that in human history. The whole purpose of militaries getting technology in training is to get an advantage on the adversary. It’s not a fair fight. It’s not supposed to be, it’s not a boxing match. So these are things worth exploring. We need to come from the standpoint of the reality of what war is and not from a philosophical exercise about war might be, but deal with the realities of what actually occurs in the battlefield.

Lucas Perry: So I think that’s a really interesting point. And as someone with a background and interest in philosophy, it’s quite funny. So you do have experience in war, right?

Paul Scharre: Yes, I’ve fought in Iraq and Afghanistan.

Lucas Perry: Then it’s interesting for me, if you see this distinction between people who are actually veterans, who have experienced violence and carnage and tragedies of war, and the perspective here is that PTSD and associated trauma with these kinds of experiences, you find that they’re less salient for decreasing people’s willingness or decision to engage in further war. Is that your claim?

Paul Scharre: I don’t know. No, I don’t know. I don’t know the answer to that. I don’t know. That’s some difficult question for political scientists to figure out about voting preferences of veterans. All I’m saying is that I hear a lot of claims in this space that I think are often not very well interrogated or not very well explored. And there’s a real price that people pay for being involved. Now, people want to say that we’re willing to bear that price for some reason, like okay, but I think we should acknowledge it.

Lucas Perry: Yeah, that make sense. I guess the thing that I was just pointing at was it would be psychologically interesting to know if philosophers are detached from the experience, maybe they don’t actually know about the psychological implications of being involved in horrible war. And if people who are actually veterans disagree with philosophers about the importance of there being skin in the game, if philosophers say that skin in the game reduces willingness to be in war, if the claim is that that wouldn’t actually decrease their willingness to go to war. I think that seems psychologically very important and relevant, because there is this concern about how autonomous weapons and integrating human decision making to lethal autonomous weapons would potentially sanitize war. And so there’s the trade off between the potential mitigating effects of being involved in war, and then also the negative effects which are incurred by veterans who would actually have to be exposed by it and bring the trauma back for communities to have deeper experiential relation with.

Paul Scharre: Yes, and look, we don’t do that, right? We had a whole generation of veterans come back from Vietnam and we as society listen to the stories and understand them and understand, no. I have heard over the years people raise this issue whether it’s drones, autonomous weapons, this issue of having skin in the game either physically being at risk or psychologically. And I’ve rarely heard it raised by people who it’s been them who’s on the line. People often have very gut emotional reactions to this topic. And I think that’s valuable because it’s speaking to something that resonates with people, whether it’s an emotional reaction opposed to autonomous weapons, and that you often get that from many people that go, there’s something about this. It doesn’t feel right. I don’t like this idea. Or people saying, the opposite reaction. Other people that say that “wouldn’t this make war great, it’s more precise and more humane,” and which my reaction is often a little bit like… have you ever interacted with a computer? They break all the time. What are you talking about?

But all of these things I think they’re speaking to instincts that people have about this technology, but it’s worth asking questions to better understand, what is it that we’re reacting to? Is it an assumption about the technologies, is it an assumption about the nature of war? One of the concerns I’ve heard raised is like this will impersonalize war and create more distance between people killing. If you sort of buy that argument, that impersonal war is a bad thing, then you would say the greatest thing would be deeply personal war, like hand to hand combat. It appears to harken back to some glorious age of war when people looked each other in the eye and hacked each other to bits with swords, like real humans. That’s not that that war never occurred in human history. In fact, we’ve had conflicts like that, even in recent memory that involve hand to hand weapons. They tend not to be very humane conflicts. When we see civil violence, when people are murdering each other with machetes or garden tools or other things, it tends to be horrific communal violence, mass atrocities in Rwanda or Cambodia or other places. So I think it’s important to deal with the reality of what war is and not some fantasy.

Lucas Perry: Yes, I think that that makes a lot of sense. It’s really tricky. And the psychology around this I think is difficult and probably not studied enough.

Paul Scharre: There’s real war that occurs in the world, and then there’s the fantasy of war that we, as a society, tell ourselves when we go to movie theaters, and we watch stories about soldiers who are heroes, who conquer the bad guys. We’re told a fantasy, and it’s a fantasy as a society that allows society to perpetuate wars, that allows us to send young men and women off to die. And it’s not to say that there are no circumstances in which a nation might need to go to war to defend itself or its interest, but we sort of dress war up in these pretty clothes, and let’s not confuse that with the reality of what actually occurs. People said, well, through autonomous weapons, then we won’t have people sort of weighing the value of life and death. I mean, it happens sometimes, but it’s not like every time someone dies in war, that there was this thoughtful exercise where a committee sat around and said, “Do we really need to kill this person? Is it really appropriate?” There’s a lot of dehumanization that goes on on the battlefield. So I think this is what makes this issue very challenging. Many of the objections to autonomous weapons are objections to war. That’s what people are actually objecting to.

The question isn’t, is war bad? Of course war’s terrible? The question is sort of, how do we find ways going forward to use technology that may make war more precise and more humane without losing our humanity in the process, and are ways to do that? It’s a challenging question. I think the answer is probably yes, but it’s one that’s going to require a lot of interrogation to try to get there. It’s a difficult issue because it’s also a dynamic process where there’s an interplay between competitors. If we get this wrong, we can easily end up in a situation where there’s less human control, there’s more violence and war. There are lots of opportunities to make things worse as well.

If we could make war perfect, that would be great, in terms of no civilian suffering and reduce the suffering of enemy combatants and the number of lives lost. If we could push a button and make war go away, that would be wonderful. Those things will all be great. The more practical question really is, can we improve upon the status quo and how can we do so in a thoughtful way, or at least not make things worse than today? And I think those are hard enough problems to try to address.

Lucas Perry: I appreciate that you bring a very holistic, well-weighed perspective to the varying sides of this issue. So these are all very big and difficult. Are you aware of people actually studying whether some of these effects exist or not, and whether they would actually sanitize things or not? Or is this basically all just coming down to people’s intuitions and simulations in their head?

Paul Scharre: Some of both. There’s really great scholarship that’s being done on autonomous weapons, certainly there’s a robust array of legal based scholarship, people trying to understand how the law of war might interface with autonomous weapons. But there’s also been worked on by thinking about some of these human psychological interactions, Missy Cummings, who’s at Duke who runs the humans and automation lab down has done some work on human machine interfaces on weapon systems to think through some of these concerns. I think probably less attention paid to the human machine interface dimension of this and the human psychological dimension of it. But there’s been a lot of work done by people like Heather Roth, people at Article 36, and others thinking about concepts of meaningful human control and what might look like in weapon systems.

I think one of the things that’s challenging across the board in this issue is that it is a politically contentious topic. You have kind of levels of this debate going on, you have scholars trying to sort of understand the issue maybe, and then you also have a whole array of politically motivated groups, international organizations, civil society organizations, countries, duking it out basically, at the UN and in the media about where we should go with this technology. As you get a lot of motivated reasoning on all sides about what should the answer be. So for example, one of the things that fascinates me is i’ll often hear people say, autonomous weapons are terrible, and they’ll have a terrible outcome, and we need to ban them now. And if we just pass a treaty and we have enough political will we could ban them. I’ll also hear people say a ban would be pointless, it wouldn’t work. And anyways, wouldn’t autonomous weapons be great? There are other possible beliefs. One could say that a ban is feasible, but the weapons aren’t that big of a deal. So it just seems to me like there’s a lot of politically motivated reasoning that goes on this debate, which makes it very challenging.

Lucas Perry: So one of the concerns around autonomous weapons has to do with accidental escalation of warfare and conflict. Could you explore this point and explain what some strategies might be to prevent accidental escalation of warfare as AI is increasingly being used in the military?

Paul Scharre: Yes, so I think in general, you could bucket maybe concerns about autonomous weapons into two categories. One is a concern that they may not function very well and could have accidents, those accidents could lead to civilian casualties, that could lead to accidental escalation among nations and a crisis, military force forces operating in close proximity to one another and there could be accidents. This happens with people. And you might worry about actions with autonomous systems and maybe one shoots down an enemy aircraft and there’s an escalation and people are killed. And then how do you unwind that? How do you communicate to your adversary? We didn’t mean to do that. We’re sorry. How do you do that in a period of tension? That’s a particular challenge.

There’s a whole other set of challenges that come from the weapons might work. And that might get to some of these deeper questions about the role of humans in decision making about life and death. But this issue of accidental escalation kind of comes into the category of they don’t work very well, then they’re not reliable. And this is the case for a lot of AI and autonomous technology today, which isn’t to say it doesn’t work at all, if it didn’t work at all, it would be much easier. There’d be no debates about bias and facial recognition systems if they never identify faces. There’d be no debates about safety with self-driving cars if the car couldn’t go anywhere. The problem is that a lot of these AI based systems work very well in some settings, and then if the settings change ever so slightly, they don’t work very well at all anymore. And the performance can drop off very dramatically, and they’re not very robust to changes in environmental conditions. So this is a huge problem for the military, because in particular, the military doesn’t get to test its systems in its actual operating environment.

So you can take a car, and you can take it on the roads, and you can test it in an actual driving environment. And we’ve seen car companies rack up 10 million miles or more of driving data. And then they can go back and they can run simulations. So Waymo has said that they run 10 million miles of simulated driving every single day. And they can simulate in different lighting conditions, in different environmental conditions. Well, the military can build simulations too, but simulations of what? What will the next war look like? Well we don’t know because we haven’t fought it yet. The good news is that war’s very rare, which is great. But that also means that for these kinds of systems, we don’t necessarily know the operating conditions that they’ll be in, and so there is this real problem of this risk of accidents. And it’s exacerbated in the fact that this is also a very adversarial environment. So you actually have an enemy who’s trying to trick your system and manipulate it. That’s adds another layer of complications.

Driving is a little bit competitive, maybe somebody doesn’t want to let you into the lane, but the pedestrians aren’t generally trying to get hit by cars. That’s a whole other complication in the military space. So all of that leads to concerns that the systems may do okay in training, and then we take them out in the real world, and they fail and they fail a pretty bad way. If it’s a weapon system that is making its own decisions about whom to kill, it could be that it fails in a benign way, then it targets nothing. And that’s a problem for the military who built it, or fails in a more hazardous way, in a dangerous way and attacks the wrong targets. And when we’re talking about an autonomous weapon, the essence of this autonomous weapon is making its own decisions about which targets to attack and then carrying out those attacks. If you get that wrong, those could be pretty significant consequences with that. One of those things could be civilian harm. And that’s a major concern. There are processes in place for printing that operationally and test and evaluation, are those sufficient? I think they’re good reasons to say that maybe they’re not sufficient or not completely sufficient, and they need to be revised or improved.

And I’ll point out, we can come back to this that the US Defense Department actually has a more stringent procedure in place for reviewing autonomous weapons more than other weapons, beyond what the laws of war have, the US is one of the few countries that has this. But then there’s also question about accidental escalation, which also could be the case. Would that lead to like an entire war? Probably not. But it could make things a lot harder to defuse tensions in a crisis, and that could be problematic. So we just had an incident not too long ago, where the United States carried out an attack against the very senior Iranian General, General Soleimani, who’s the head of the Iranian Quds Force and killed him in a drone strike. And that was an intentional decision made by a person somewhere in the US government.

Now, did they fully think that through? I don’t know, that’s a different question. But a human made that decision in any case. Well, that’s a huge escalation of hostilities between the US and Iraq. And there was a lot of uncertainty afterwards about what would happen and Iran launched some ballistic missiles against US troops in Iraq. And whether that’s it, or there’s more retaliation to come, I think we’ll see. But it could be a much more challenging situation, if you had a situation in the future where an autonomous weapon malfunctioned and took some action. And now the other side might feel compelled to respond. They might say, well, we have to, we can’t let this go. Because humans emotions are on the line and national pride and prestige, and they feel like they need to maintain a principle of deterrence and they need to retaliate it. So these could all be very complicated things if you had an accident with an autonomous weapon.

Lucas Perry: Right. And so an adjacent issue that I’d like to explore now is how a potential arms race can have interplay with issues around accidental escalation of conflict. So is there already an arms race brewing for autonomous weapons? If so, why and what could potentially be done to deescalate such a situation?

Paul Scharre: If there’s an arms race, it’s a very strange one because no one is building the weapons. We see militaries advancing in robotics and autonomy, but we don’t really see sort of this rush to build autonomous weapons. I struggle to point to any programs that I’m aware of in militaries around the globe that are clearly oriented to build fully autonomous weapons. I think there are lots of places where much like these incremental advancements of autonomy in cars, you can see more autonomous features in military vehicles and drones and robotic systems and missiles. They’re adding more autonomy. And one might be violently concerned about where that’s going. But it’s just simply not the case that militaries have declared their intention. We’re going to build autonomous weapons, and here they are, and here’s our program to build them. I would struggle to use the term arms race. It could happen, maybe worth a starting line of an arms race. But I don’t think we’re in one today by any means.

It’s worth also asking, when we say arms race, what do we mean and why do we care? This is again, one of these terms, it’s often thrown around. You’ll hear about this, the concept of autonomous weapons or AI, people say we shouldn’t have an arms race. Okay. Why? Why is an arms race a bad thing? Militaries normally invest in new technologies to improve their national defense. That’s a normal activity. So if you say arms race, what do you mean by that? Is it beyond normal activity? And why would that be problematic? In the political science world, the specific definitions vary, but generally, an arms race is viewed as an increase in defense spending overall, or in a particular technology area above normal levels of modernizing militaries. Now, usually, this is problematic for a couple of reasons. One could be that it ends up just in a massive national expenditure, like during the case of the Cold War, nuclear weapons, that doesn’t really yield any military value or increase anyone’s defense or security, it just ends up net flushing a lot of money down the drain. That’s money that could be spent elsewhere for pre K education or healthcare or something else that might be societally beneficial instead of building all of these weapons. So that’s one concern.

Another one might be that we end up in a world that the large number of these weapons or the type of their weapons makes it worse off. Are we really better off in a world where there are 10s of thousands of nuclear weapons on hair-trigger versus a few thousand weapons or a few hundred weapons? Well, if we ever have zero, all things being equal, probably fewer nuclear weapons is better than more of them. So that’s another kind of concern whether in terms of violence and destructiveness of war, if a war breakout or the likelihood of war and the stability of war. This is an A in an area where certainly we’re not in any way from a spending standpoint, in an arms race for autonomous weapons or AI today, when you look at actual expenditures, they’re a small fraction of what militaries are spending on, if you look at, say AI or autonomous features at large.

And again for autonomous weapons, there really aren’t at least openly declared programs to say go build a fully autonomous weapon today. But even if that were the case, why is that bad? Why would a world where militaries are racing to build lots of atomic weapons be a bad thing? I think it would be a bad thing, but I think it’s also worth just answering that question, because it’s not obvious to everyone. This is something that’s often missing in a lot of these debates and dialogues about autonomous weapons, people may not share some of the underlying assumptions. It’s better to bring out these assumptions and explain, I think this would be bad for these reasons, because maybe it’s not intuitive to other people that they don’t share those reasons and articulating them could increase understanding.

For example, the FLI letter on autonomous weapons from a few years ago said, “the key question for humanity today is whether to start a global AI arms race or prevent it from starting. If any major military power pushes ahead with AI weapon development, the global arms race is virtually inevitable. And the endpoint of this technological trajectory is obvious. Autonomous weapons will become the Kalashnikovs of tomorrow.” I like the language, it’s very literary, “the Kalashnikovs of tomorrow.” Like it’s a very concrete image. But there’s a whole bunch of assumptions packed into those few sentences that maybe don’t work in the letter that’s intended to like sort of galvanize public interest and attention, but are worth really unpacking. What do we mean when we say autonomous weapons are the Kalashnikovs of tomorrow and why is that bad? And what does that mean? Those are, I think, important things to draw out and better understand.

It’s particularly hard for this issue because the weapons don’t exist yet. And so it’s not actually like debates around something like landlines. We could point to the mines and say like “this is a landmine, we all agree this is a landmine. This is what it’s doing to people.” And everyone could agree on what the harm is being caused. The people might disagree on what to do about it, but there’s agreement on what the weapon is and what the effect is. But for autonomous weapons, all these things are up to debate. Even the term itself is not clearly defined. And when I hear people describe it, people can be describing a whole range of things. Some people when they say the word autonomous weapon, they’re envisioning a Roomba with a gun on it. And other people are envisioning the Terminator. Now, both of those things are probably bad ideas, but for very different reasons. And that is important to draw out in these conversations. When you say autonomous weapon, what do you mean? What are you envisioning? What are you worried about? Worried about certain types of scenarios or certain types of effects?

If we want to get to the place where we really as a society come together and grapple with this challenge, I think first and foremost, a better communication is needed and people may still disagree, but it’s much more helpful. Stuart Russell from Berkeley has talked a lot about dangers of small anti-personnel autonomous weapons that would widely be the proliferated. He made the Slaughterbots video that’s been seen millions of times on YouTube. That’s a very specific image. It’s an image that’s very concrete. So then you can say, when Stuart Russell is worried about autonomous weapons, this is what he’s worried about. And then you can start to try to better understand the assumptions that go into that.

Now, I don’t share Stuart’s concerns, and we’ve written about it and talked about before, but it’s not actually because we disagree about the technology, I would agree that that’s very doable with existing technology. We disagree about the social responses to that technology, and how people respond, and what are the countermeasures and what are ways to prevent proliferation. So we, I think, disagree on some of the political or social factors that surround kind of how people approach this technology and use it. Sometimes people actually totally agree on the risks and even maybe the potential futures, they just have different values. And there might be some people who their primary value is trying to have fewer weapons in the world. Now that’s a noble goal. And they’re like, hey, anyway that we can have fewer weapons, fewer advanced technologies, that’s better. That’s very different from someone who’s coming from a position of saying, my goal is to improve my own nation’s defense. That’s a totally different value system. A total different preference. And they might be like, I also value what you say, but I don’t value it as much. And I’m going to take actions that advance these preferences. It’s important to really sort of try to better draw them out and understand them in this debate, if we’re going to get to a place where we can, as a society come up with some helpful solutions to this problem.

Lucas Perry: Wonderful. I’m totally on board with that. Two questions and confusions on my end. The first is, I feel a bit confused when you say these weapons don’t exist already. It seems to me more like autonomy exists on a spectrum and is the integration of many different technologies and decision making in systems. It seems to me there is already a certain degree of autonomy, there isn’t Terminator level autonomy, or specify an objective and the autonomous system can just basically go execute that, that seems to require very high level of generality, but there seems to already exist a level of autonomy today.

And so in that video, Stuart says that slaughterbots in particular represent a miniaturization and integration of many technologies, which already exist today. And the second thing that I’m confused about is when you say that it’s unclear to you that militaries are very interested in this or that there currently is an arms race. It seems like yes, there isn’t an arms race, like there was with nuclear weapons where it’s very clear, and they’re like Manhattan projects around this kind of technology, but given the strategic advantage conferred by this technology now and likely soon, it seems to me like game theoretically, from the position of militaries around the world that have the capacity to invest in these things, that it is inevitable given their battlefield importance that there would be massive ramping up or investments, or that there already is great interest in developing the autonomy and the subtechnologies required for developing fully autonomous systems.

Paul Scharre: Those are great questions and right on point. And I think the central issues in both of your questions are when we say these weapons or when I say these things, I should be more precise. When we say autonomous weapons, what do we mean exactly? And this is one of the things that can be tricky in this space, because there are not these universally agreed upon definitions. There are certainly many weapons systems used widely around the globe today that incorporate some autonomous features. Many of these are fire and forget weapons. When someone launches them, they’re not coming back. They have in that sense, autonomy to carry out their mission. But autonomy is relatively limited and narrowly bounded, and humans, for the most part are choosing the targets. So you can think of kind of maybe these three classes of weapons, these semi autonomous weapons, where humans are choosing the targets, but there’s lots of autonomy surrounding that decision, queuing information to people, flying the munition once the person launches it. That’s one type of weapon, widely used today by really every advanced military.

Another one is the supervised autonomous weapons that are used in these relatively limited settings for defensive purposes, where there is kind of this automatic mode that people can turn them on and activate them to defend the ship or the ground base or the vehicle. And these are really needed for these situations where the incoming threats are too fast for humans to respond. And these again are widely used around the globe and have been in place for decades. And then there are what we could call fully autonomous weapons, where the human’s launching them and human programs in the parameters, but they have some freedom to fly a search pattern over some area and then once they find a target, attack it on their own. For the most part, with some exceptions, those weapons are not widely used today. There have been some experimental systems that have been designed. There have been some put into operation in the past. The Israeli harpy drone is an example of this that is still in operation today. It’s been around since the ’90s, so it’s not really very new. And it’s been sold to a handful of countries, India, Turkey, South Korea, China, and the Chinese have reportedly reverse engineered their own version of this.

But it’s not like when widespread. So it’s not like a major component of militaries order of that. I think you see militaries investing in robotic systems, but the bulk of their fleets are still human occupied platforms, robotics are largely an adjunct to them. And in terms of spending, while there is increased spending on robotics, most of the spending is still going towards more traditional military platforms. The same is also true about the degree of autonomy, most of these robotic systems are just remote controlled, and they have very limited autonomy today. Now we’re seeing more autonomy over time in both robotic vehicles and in missiles. But militaries have a strong incentive to keep humans involved.

It is absolutely the case that militaries want technologies that will give them an advantage on the battlefield. But part of achieving an advantage means your systems work, they do what you want them to do, the enemy doesn’t hack them and take them over, you have control over them. All of those things point to more human control. So I think that’s the thing where you actually see militaries trying to figure out where’s the right place on the spectrum of autonomy? How much autonomy is right, and that line is going to shift over time. But it’s not the case that they necessarily want just full autonomy because what does that mean, then they do want weapon systems to sort of operate under some degree of human direction and involvement. It’s just that what that looks like may evolve over time as the technology advances.

And there are also, I should add, other bureaucratic factors that come into play that militaries investments are not entirely strategic. There’s bureaucratic politics within organizations. There’s politics more broadly with the domestic defense industry interfacing with the political system in that country. They might drive resources in certain directions. There’s some degree of inertia of course in any system that are also factors in play.

Lucas Perry: So I want to hit here a little bit on longer term perspectives. So the Future of Life Institute in particular is interested in mitigating existential risks. We’re interested in the advanced risks from powerful AI technologies where AI not aligned with human values and goals and preferences and intentions can potentially lead us to suboptimal equilibria that were trapped in permanently or could lead to human extinction. And so other technologies we care about are nuclear weapons and synthetic-bio enabled by AI technologies, etc. So there is this view here that if we cannot establish a governance mechanism as a global community on the concept that we should not let AI make the decision to kill then how can we deal with more subtle near term issues and eventual long term safety issues around the powerful AI technologies? So there’s this view of ensuring beneficial outcomes around lethal autonomous weapons or at least beneficial regulation or development of that technology, and the necessity of that for longer term AI risk and value alignment with AI systems as they become increasingly intelligent. I’m curious to know if you have a view or perspective on this.

Paul Scharre: This is the fun part of the podcast with the Future of Life because this rarely comes up in a lot of the conversations because I think in a lot of the debates, people are focused on just much more near term issues surrounding autonomous weapons or AI. I think that if you’re inclined to see that there are longer term risks for more advanced developments in AI, then I think it’s very logical to say that there’s some value in humanity coming together to come up with some set of rules about autonomous weapons today, even if the specific rules don’t really matter that much, because the level of risk is maybe not as significant, but the process of coming together and agreeing on some set of norms and limits on particularly military applications in AI is probably beneficial and may begin to create the foundations for future cooperation. The stakes for autonomous weapons might be big, but are certainly not existential. I think in any reasonable interpretation of autonomous weapons might do really, unless you start thinking about autonomy wired into, like nuclear launch decisions which is basically nuts. And I don’t think it’s really what’s on the table for realistically what people might be worried about.

When we try to come together as a human society to grapple with problems, we’re basically forced to deal with the institutions that we have in place. So for example, for autonomous weapons, we’re having debates in the UN Convention on Certain Conventional Weapons to CCW. Is that the best form for talking about autonomous weapons? Well, it’s kind of the form that exists for this kind of problem set. It’s not bad. It’s not perfect in some respects, but it’s the one that exists. And so if you’re worried about future AI risk, creating the institutional muscle memory among the relevant actors in society, whether it’s nation states, AI scientists, members of civil society, militaries, if you’re worried about military applications, whoever it is, to come together, to have these conversations, and to come up with some answer, and maybe set some agreements, some limits is probably really valuable actually because it begins to establish the right human networks for collaboration and cooperation, because it’s ultimately people, it’s people who know each other.

So oh, “I worked with this person on this last thing.” If you look at, for example, the international movement that The Campaign to Stop Killer Robots is spearheading, that institution or framework, those people, those relationships are born out of past successful efforts to ban landmines and then cluster munitions. So there’s a path dependency, and human relationships and bureaucracies, institutions that really matters. Coming together and reaching any kind of agreement, actually, to set some kind of limits is probably really vital to start exercising those muscles today.

Lucas Perry: All right, wonderful. And a final fun FLI question for you. What are your views on long term AI safety considerations? Do you view AI eventually as an existential risk and do you integrate that into your decision making and thinking around the integration of AI and military technology?

Paul Scharre: Yes, it’s a great question. It’s not something that comes up a lot in the world that I live in, in Washington in the policy world, people don’t tend to think about that kind of risk. I think it’s a concern. It’s a hard problem because we don’t really know how the technology is evolving. And I think that one of the things is challenging with AI is our frame for future more advanced AI. Often the default frame is sort of thinking about human like intelligence. When people talk about future AI, people talk about terms like AGI, or high level machine intelligence or human like intelligence, we don’t really know how the technology is evolving.

I think one of the things that we’re seeing with AI machine learning that’s quite interesting is that it often is evolving in ways that are very different from human intelligence, in fact, very quite alien and quite unusual. And I’m not the first person to say this, but I think that this is valid that we are, I think, on the verge of a Copernican revolution in how we think about intelligence, that rather than thinking of human intelligence as the center of the universe, that we’re realizing that humans are simply one type of intelligence among a whole vast array and space of possible forms of intelligence, and we’re creating different kinds, they may have very different intelligence profiles, they may just look very different, they may be much smarter than humans in some ways and dumber in other ways. I don’t know where things are going. I think it’s entirely possible that we move forward into a future where we see many more forms of advanced intelligent systems. And because they don’t have the same intelligence profile as human beings, we continue to kick the can down the road into being true intelligence because it doesn’t look like us. It doesn’t think like us. It thinks differently. But these systems may yet be very powerful in very interesting ways.

We’ve already seen lots of AI systems, even very simple ones exhibit a lot of creativity, a lot of interesting and surprising behavior. And as we begin to see the sort of scope of their intelligence widen over time, I think there are going to be risks that come with that. They may not be the risks that we were expecting, but I think over time, there going to be significant risks, and in some ways that our anthropocentric view is, I think, a real hindrance here. And I think it may lead us to then underestimate risk from things that don’t look quite like humans, and maybe miss some things that are very real. I’m not at all worried about some AI system one day becoming self aware, and having human level sentience, that does not keep me up at night. I am deeply concerned about advanced forms of malware. We’re not there today yet. But you could envision things over time that are adapting and learning and begin to populate the web, like there are people doing interesting ways of thinking about systems that have misaligned goals. It’s also possible to envision systems that don’t have any human directed goals at all. Viruses don’t. They replicate. They’re effective at replicating, but they don’t necessarily have a goal in the way that we think of it other than self replication.

If you have systems that are capable of replicating, of accumulating resources, of adapting, over time, you might have all of the right boxes to check to begin to have systems that could be problematic. They could accumulate resources that could cause problems. Even if they’re not trying to pursue either a goal that’s misaligned with human interest or even any goal that we might recognize. They simply could get out in the wild, if they’re effective at replication and acquiring resources and adapting, then they might survive. I think we’re likely to be surprised and continue to be surprised by how AI systems evolve, and where that might take us. And it might surprise us in ways that are humbling for how we think about human intelligence. So one question I guess is, is human intelligence a convergence point for more intelligent systems? As AI systems become more advanced, and they become more human like, or less human like and more alien.

Lucas Perry: Unless we train them very specifically on human preference hierarchies and structures.

Paul Scharre: Right. Exactly. Right. And so I’m not actually worried about a system that has the intelligence profile of humans, when you think about capacity in different tasks.

Lucas Perry: I see what you mean. You’re not worried about an anthropomorphic AI, you’re worried about a very powerful, intelligent, capable AI, that is alien and that we don’t understand.

Paul Scharre: Right. They might have cross domain functionality, it might have the ability to do continuous learning. It might be adaptive in some interesting ways. I mean, one of the interesting things we’ve seen about the field of AI is that people are able to tackle a whole variety of problems with some very simple methods and algorithms. And this seems for some reason offensive to some people in the AI community, I don’t know why, but people have been able to use some relatively simple methods, with just huge amounts of data and compute, it’s like a variety of different kinds of problems, some of which seem very complex.

Now, they’re simple compared to the real world, when you look at things like strategy games like StarCraft and Dota 2, like the world looks way more complex, but these are still really complicated kind of problems. And systems are basically able to learn totally on their own. That’s not general intelligence, but it starts to point towards the capacity to have systems that are capable of learning a whole variety of different tasks. They can’t do this today, continuously without suffering the problem of catastrophic forgetting that people are working on these things as well. The problems today are the systems aren’t very robust. They don’t handle perturbations in the environment very well. People are working on these things. I think it’s really hard to see how this evolves. But yes, in general, I think that our fixation on human intelligence as the pinnacle of intelligence, or even the goal of what we’re trying to build, and the sort of this anthropocentric view is, I think, probably one that’s likely to lead us to maybe underestimate some kinds of risks.

Lucas Perry: I think those are excellent points and I hope that mindfulness about that is able to proliferate in government and in actors who have power to help mitigate some of these future and short term AI risks. I really appreciate your perspective and I think you bring a wholesomeness and a deep authentic entertaining of all the different positions and arguments here on the question of autonomous weapons and I find that valuable. So thank you so much for your time and for helping to share information about autonomous weapons with us.

Paul Scharre: Thank you and thanks everyone for listening. Take care.

End of recorded material

FLI Podcast: Distributing the Benefits of AI via the Windfall Clause with Cullen O’Keefe

As with the agricultural and industrial revolutions before it, the intelligence revolution currently underway will unlock new degrees and kinds of abundance. Powerful forms of AI will likely generate never-before-seen levels of wealth, raising critical questions about its beneficiaries. Will this newfound wealth be used to provide for the common good, or will it become increasingly concentrated in the hands of the few who wield AI technologies? Cullen O’Keefe joins us on this episode of the FLI Podcast for a conversation about the Windfall Clause, a mechanism that attempts to ensure the abundance and wealth created by transformative AI benefits humanity globally.

Topics discussed in this episode include:

  • What the Windfall Clause is and how it might function
  • The need for such a mechanism given AGI generated economic windfall
  • Problems the Windfall Clause would help to remedy 
  • The mechanism for distributing windfall profit and the function for defining such profit
  • The legal permissibility of the Windfall Clause 
  • Objections and alternatives to the Windfall Clause

Timestamps: 

0:00 Intro

2:13 What is the Windfall Clause? 

4:51 Why do we need a Windfall Clause? 

06:01 When we might reach windfall profit and what that profit looks like

08:01 Motivations for the Windfall Clause and its ability to help with job loss

11:51 How the Windfall Clause improves allocation of economic windfall 

16:22 The Windfall Clause assisting in a smooth transition to advanced AI systems

18:45 The Windfall Clause as assisting with general norm setting

20:26 The Windfall Clause as serving AI firms by generating goodwill, improving employee relations, and reducing political risk

23:02 The mechanism for distributing windfall profit and desiderata for guiding it’s formation 

25:03 The windfall function and desiderata for guiding it’s formation 

26:56 How the Windfall Clause is different from being a new taxation scheme

30:20 Developing the mechanism for distributing the windfall 

32:56 The legal permissibility of the Windfall Clause in the United States

40:57 The legal permissibility of the Windfall Clause in China and the Cayman Islands

43:28 Historical precedents for the Windfall Clause

44:45 Objections to the Windfall Clause

57:54 Alternatives to the Windfall Clause

01:02:51 Final thoughts

 

This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.

All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. Today’s conversation is with Cullen O’Keefe about a recent report he was the lead author on called The Windfall Clause: Distributing the Benefits of AI for the Common Good. For some quick background, the agricultural and industrial revolutions unlocked new degrees and kinds of abundance, and so too should the intelligence revolution currently underway. Developing powerful forms of AI will likely unlock levels of abundance never before seen, and this comes with the opportunity of using such wealth in service of the common good of all humanity and life on Earth but also with the risks of increasingly concentrated power and resources in the hands of the few who wield AI technologies. This conversation is about one possible mechanism, the Windfall Clause, which attempts to ensure that the abundance and wealth likely to be created by transformative AI systems benefits humanity globally.

For those not familiar with Cullen, Cullen is a policy researcher interested in improving the governance of artificial intelligence using the principles of Effective Altruism.  He currently works as a Research Scientist in Policy at OpenAI and is also a Research Affiliate with the Centre for the Governance of AI at the Future of Humanity Institute.

The Future of Life Institute is a non-profit and this podcast is funded and supported by listeners like you. So if you find what we do on this podcast to be important and beneficial, please consider supporting the podcast by donating at futureoflife.org/donate. You can also follow us on your preferred listening platform, like on Apple Podcasts or Spotify, by searching for us directly or following the links on the page for this podcast found in the description.

And with that, here is Cullen O’Keefe on the Windfall Clause.

We’re here today to discuss this recent paper, that you were the lead author on called the Windfall Clause: Distributing the Benefits of AI for the Common Good. Now, there’s a lot there in the title, so we can start of pretty simply here with, what is the Windfall Clause and how does it serve the mission of distributing the benefits of AI for the common good?

Cullen O’Keefe: So the Windfall Clause is a contractual commitment AI developers can make, that basically stipulates that if they achieve windfall profits from AI, that they will donate some percentage of that to causes that benefit everyone.

Lucas Perry: What does it mean to achieve windfall profits?

Cullen O’Keefe: The answer that we give is that when a firm’s profits grow in excess of 1% of gross world product, which is just the sum of all countries GDP, then that firm has hit windfall profits. We use this slightly weird measurement of profits is a percentage of gross world product, just to try to convey the notion that the thing that’s relevant here is not necessarily the size of profits, but really the relative size of profits, relative to the global economy.

Lucas Perry: Right. And so an important background framing and assumption here seems to be the credence that one may have in transformative AI or in artificial general intelligence or in superintelligence, creating previously unattainable levels of wealth and value and prosperity. I believe that in terms of Nick Bostrom’s Superintelligence, this work in particular is striving to serve the common good principal, that superintelligence or AGI should be created in the service of and the pursuit of the common good of all of humanity and life on Earth. Is there anything here that you could add about the background to the inspiration around developing the Windfall Clause.

Cullen O’Keefe: Yeah. That’s exactly right. The phrase Windfall Clause actually comes from Bostrom’s book. Basically, the idea was something that people inside of FHI were excited about for a while, but really hadn’t done anything with because of some legal uncertainties. Basically, the fiduciary duty question that I examined in the third section of the report. When I was an intern there in the summer of 2018, I was asked to do some legal research on this, and ran away with it from there. My legal research pretty convincingly showed that it should be legal as a matter of corporate law, for a corporation to enter in to such a contract. In fact, I don’t think it’s a particularly hard case. I think it looks like things that operations do a lot already. And I think some of the bigger questions were around the implications and design of the Windfall Clause, which is also addressed in the report.

Lucas Perry: So, we have this common good principal, which serves as the moral and ethical foundation. And then the Windfall Clause it seems, is an attempt at a particular policy solution for AGI and superintelligence, serving the common good. With this background, could you expand a little bit more on why is that we need a Windfall Clause?

Cullen O’Keefe: I guess I wouldn’t say that we need a Windfall Clause. The Windfall Clause might be one mechanism that would solve some of these problems. The primary way in which cutting edge AI is being develop is currently in private companies. And the way that private companies are structured is perhaps not maximally conducive to the common good principal. This is not due to corporate greed or anything like that. It’s more just a function of the roles of corporations in our society, which is that they’re primarily vehicles for generating returns to investors. One might think that those tools that we currently have for taking some of the returns that are generated for investors and making sure that they’re distributed in a more equitable and fair way, are inadequate in the face of AGI. And so that’s kind of the motivation for the Windfall Clause.

Lucas Perry: Maybe if you could speak a little bit to the surveys of researchers of credence’s and estimates about when we might get certain kinds of AI. And then what windfall in the context of an AGI world actually means.

Cullen O’Keefe: The surveys of AGI timelines, I think this is an area with high uncertainty. We cite Katja Grace’s survey of AI experts, which is a few years old at this point. I believe that the median timeline that AI experts gave in that was somewhere around 2060, of attaining AGI as defined in a specific way by that paper. I don’t have opinions on whether that timeline is realistic or unrealistic. We just take it as a baseline, as the best specific timeline that has at least some evidence behind it. And what was the second question?

Lucas Perry: What other degrees of wealth might be brought about via transformative AI.

Cullen O’Keefe: The short and unsatisfying answer to this, is that we don’t really know. I think that the amount of economic literature really focusing on AGI in particular is pretty minimal. Some more research on this would be really valuable. A company earning profits that are defined as windfall via the report, would be pretty unprecedented in history, so it’s a very hard situation to imagine. Forecasts about the way that AI will contribute to growth are pretty variable. I think we don’t really have a good idea of what that might mean. And I think especially because the interface between economists and people thinking about AGI has been pretty minimal. A lot of the thinking has been more focused on more mainstream issues. If the strongest version of AGI were to come, the economic gains could be pretty huge. There’s a lot on the line that circumstance.

Part of what motivated the Windfall Clause, is trying to think of mechanisms that could withstand this uncertainty about what the actual economics of AGI will be like. And that’s kind of what the contingent commitment and progressively scaling commitment of the Windfall Clause is supposed to accomplish.

Lucas Perry: All right. So, now I’m going to explore here some of these other motivations that you’ve written in your report. There is the need to address loss of job opportunities. The need to improve the allocation of economic windfall, which if we didn’t do anything right now, there would actually be no way of doing that other than whatever system of taxes we would have around that time. There’s also this need to smooth the transition to advanced AI. And then there is this general norm setting strategy here, which I guess is an attempt to imbue and instantiate a kind of benevolent ethics based on the common good principle. Let’s start of by hitting on addressing the loss of job opportunities. How might transformative AI lead to the loss of job opportunities and how does the Windfall Clause help to remedy that?

Cullen O’Keefe: So I want to start of with a couple of caveats. So number one, I’m not an economist. Second is, I’m very wary of promoting Luddite views. It’s definitely true that in the past, technological innovation has been pretty universally positive in the long run, notwithstanding short term problems with transitions. So, it’s definitely by no means inevitable that advances in AI will lead to joblessness or decreased earnings. That said, I do find it pretty hard to imagine a scenario in which we achieve very general purpose AI systems, like AGI. And there are still bountiful opportunities for human employment. I think there might be some jobs which have human only employment or something like that. It’s kind of unclear, in an economy with AGI or something else resembling it, why there would be a demand for humans. There might be jobs I guess, in which people are inherently uncomfortable having non-humans. Good examples of this would be priests or clergy, probably most religions will not want to automate their clergy.

I’m not a theologian, so I can’t speak to the proper theology of that, but that’s just my intuition. People also mentioned things like psychiatrists, counselors, teachers, child care, stuff like that. That doesn’t look as automatable. And then the human meaning aspect of this, John Danaher, philosopher, recently released a book called Automation and Utopia, talking about how for most people work is the primary source of meaning. It’s certainly what they do with the great plurality of their waking hours. And I think for people like me and you, we’re lucky enough to like our jobs a lot, but for many people work is mostly a source of drudgery. Often unpleasant, unsafe, etcetera. But if we find ourselves in world in which work is largely automated, not only will we have to deal with the economic issues relating to how people who can no longer offer skills for compensation, will feed themselves and their families. But also how they’ll find meaning in life.

Lucas Perry: Right. If the category and meaning of jobs changes or is gone altogether, the Windfall Clause is also there to help meet fundamental universal basic human needs, and then also can potentially have some impact on this question of value and meaning. If the Windfall Clause allows you to have access to hobbies and nice vacations and other things that give human beings meaning.

Cullen O’Keefe: Yeah. I would hope so. It’s not a problem that we explicitly address in the paper. I think this is kind of in the broader category of what to actually do with the windfall, once it’s donated. You can think of this as like the bottom of the funnel. Whereas the Windfall Clause report is more focused at the top of the funnel, getting companies to actually commit to such a thing. And I think there’s a huge rich area of work to think about, what do we actually do with the surplus from AGI, once it manifests. And assuming that we can get it in to the coffers of a public minded organization. It’s something that I’m lucky enough to think about in my current job at OpenAI. So yeah, making sure that both material needs and psychological higher needs are taken care of. That’s not something I have great answers for yet.

Lucas Perry: So, moving on here to the second point. We also need a Windfall Clause or function or mechanism, in order to improve the allocation of economic windfall. So, could you explain that one?

Cullen O’Keefe: You can imagine a world in which employment kind of looks the same as it is today. Most people have jobs, but a lot of the gains are going to a very small group of people, namely shareholders. I think this is still a pretty sub-optimal world. There are diminishing returns on money for happiness. So all else equal and ignoring incentive effects, progressively distributing money seems better than not. Primarily firms looking to develop the AI are based in a small set of countries. In fact, within those countries, the group of people who are heavily invested in those companies is even smaller. And so in a world, even where employment opportunities for the masses are pretty normal, we could still expect to see pretty concentrated accrual of benefits, both within nations, but I think also very importantly, across nations. This seems pretty important to address and the Windfall Clause aims to do just that.

Lucas Perry: A bit of speculation here, but we could have had a kind of Windfall Clause for the industrial revolution, which probably would have made much of the world better off and there wouldn’t be such unequal concentrations of wealth in the present world.

Cullen O’Keefe: Yeah. I think that’s right. I think there’s sort of a Rawlsian or Harsanyian motivation there, that if we didn’t know whether we would be in an industrial country or a country that is later to develop, we would probably want to set up a system that has a more equal distribution of economic gains than the one that we have today.

Lucas Perry: Yeah. By Rawlsian, you meant the Rawls’ veil of ignorance, and then what was the other one you said?

Cullen O’Keefe: Harsanyi is another philosopher who is associated with the veil of ignorance idea and he argues, I think pretty forcefully, that actually the agreement that you would come to behind the veil of ignorance, is one that maximizes expected utility, just due to classic axioms of rationality. What you would actually want to do is maximize expected utility, whereas John Rawls has this idea that you would want to maximize the lot of the worst off, which Harsanyi argues doesn’t really follow from the veil of ignorance, and decision theoretic best practices.

Lucas Perry: I think that the veil of ignorance, which for listeners who don’t know what that is, it’s if you can imagine yourself not knowing how you were going to be born as in the world. You should make ethical and political and moral and social systems, with that view in mind. And if you do that, you will pretty honestly and wholesomely come up with something to your best ability, that is good for everyone. From behind that veil of ignorance, of knowing who you might be in the world, you can produce good ethical systems. Now this is relevant to the Windfall Clause, because going through your paper, there’s the tension between arguing that this is actually something that is legally permissible and that institutions and companies would want to adopt, which is in clear tension with maximizing profits for shareholders and the people with wealth and power in those companies. And so there’s this fundamental tension behind the Windfall Clause, between the incentives of those with power to maintain and hold on to the power and wealth, and the very strong and important ethical and normative views and compunctions, that say that this ought to be distributed to the welfare and wellbeing of all sentient beings across the planet.

Cullen O’Keefe: I think that’s exactly right. I think part of why I and others at the Future of Humanity Institute were interested in this project, is that we know a lot of people working in AI at all levels. And I think a lot of them do want to do the genuinely good thing. But feel the constraints of economics but also of fiduciary duties. We didn’t have any particular insights in to that with this piece, but I think part of the motivation is just that we want to put resources out there for any socially conscious AI developers to say, “We want to make this commitment and we feel very legally safe doing so,” for the reasons that I lay out.

It’s a separate question whether it’s actually in their economic interest to do that or not. But at least we think they have the legal power to do so.

Lucas Perry: Okay. So maybe we can get in to and explore the ethical aspect of this more. I think we’re very lucky to have people like you and your fellow colleagues who have the ethical compunction to follow through and be committed to something like this. But for the people that don’t have that, I’m interested in discussing more later about what to do with them. So, in terms of more of the motivations here, the Windfall Clause is also motivated by this need for a smooth transition to transformative AI or AGI or superintelligence or advanced AI. So what does that mean?

Cullen O’Keefe: As I mentioned, it looks like economic growth from AI will probably be a good thing if we manage to avoid existential and catastrophic risks. That’s almost tautological I suppose. But just as in the industrial revolution where you had a huge spur of economic growth, but also a lot of turbulence. So part of the idea of the Windfall Clause is basically to funnel some of that growth in to a sort of insurance scheme that can help make that transition smoother. An un-smooth transition would be something like a lot of countries are worried they’re not going to see any appreciable benefit from AI and indeed, might lose out a lot because a lot of their industries would be off shored or re-shored and a lot of their people would no longer be economically competitive for jobs. So, that’s the kind of stability that I think we’re worried about. And the Windfall Clause is basically just a way of saying, you’re all going to gain significantly from this advance. Everyone has a stake in making this transition go well.

Lucas Perry: Right. So I mean there’s a spectrum here and on one end of the spectrum there is say a private AI lab or company or actor, who is able to reach AGI or transformative AI first and who can muster or occupy some significant portion of the world GDP. That could be anywhere from one to 99 percent. And there could or could not be mechanisms in place for distributing that to the citizens of the globe. And so one can imagine, as power is increasingly concentrated in the hands of the few, that there could be quite a massive amount of civil unrest and problems. It could create very significant turbulence in the world, right?

Cullen O’Keefe: Yeah. Exactly. And it’s our hypothesis that having credible mechanisms ex-ante to make sure that approximately everyone gains from this, will make people and countries less likely to take destabilizing actions. It’s also a public good of sorts. You would expect that it would be in everyone’s interest for this to happen, but it’s never individually rational to commit that much to making it happen. Which is why it’s a traditional role for governments and for philanthropy to provide those sort of public goods.

Lucas Perry: So that last point here then on the motivations for why we need a Windfall Clause, would be general norm setting. So what do you have to say about general norm setting?

Cullen O’Keefe: This one is definitely a little more vague than some of the others. But if you think about what type of organization you would like to see develop AGI, it seems like one that has some legal commitment to sharing those benefits broadly is probably correlated with good outcomes. And in that sense, it’s useful to be able to distinguish between organizations that are credibly committed to that sort of benefit, from ones that say they want that sort of broad benefit but are not necessarily committed to making it happen. And so in the Windfall Clause report, we are basically trying to say, it’s very important to take norms about the development of AI seriously. One of the norms that we’re trying to develop is the common good principal. And even better is when you and develop those norms through high cost or high signal value mechanisms. And if we’re right that a Windfall Clause can be made binding, then the Windfall Clause is exactly one of them. It’s a pretty credible way for an AI developer to demonstrate their commitment to the common good principal and also show that they’re worthy of taking on this huge task of developing AGI.

The Windfall Clause makes the performance or adherence to the common good principal a testable hypothesis. It’s sets kind of a base line against which commitments to the common good principal can be measured.

Lucas Perry: Now there are also here in your paper, firm motivations. So, incentives for adopting a Windfall Clause from the perspective of AI labs or AI companies, or private institutions which may develop AGI or transformative AI. And your three points here for firm motivations are that it can generate general goodwill. It can improve employee relations and it could reduce political risk. Could you hit on each of these here for why firms might be willing to adopt the Windfall Clause?

Cullen O’Keefe: Yeah. So just as a general note, we do see private corporations giving money to charity and doing other pro-social actions that are beyond their legal obligations, so nothing here is particularly new. Instead, it’s just applying traditional explanations for why companies engage in, what’s sometimes called corporate social responsibility or CSR. And see whether that’s a plausible explanation for why they might be amenable to a Windfall Clause. The first one that we mentioned in the report, is just generating general goodwill, and I think it’s plausible that companies will want to sign a Windfall Clause because it brings some sort of reputational benefit with consumers or other intermediary businesses.

The second one we talk about is managing employee relationships. In general, we see that tech employees have had a lot of power to shape the behavior of their employers. Fellow FLI podcast guest Haydn Belfield just wrote a great paper, saying AI specifically. Tech talent is in very high demand and therefore they have a lot of bargaining power over what their firms do and I think it’s potentially very promising that tech employers lobby for commitments like the Windfall Clause.

The third is termed in a lot of legal and investment circles, as political risk, so that’s basically the risk of governments or activists doing things that hurt you, such as tighter regulation or expropriation, taxation, things like that. And corporate social responsibility, including philanthropy, is just a very common way for firms to manage that. And could be the case for AI firms as well.

Lucas Perry: How strong do you think these motivations listed here are, and what do you think will be the main things that drive firms or institutions or organizations to adopt the Windfall Clause?

Cullen O’Keefe: I think it varies from firm to firm. I think a big one that’s not listed here is how management likes the idea of a Windfall Clause. Obviously, they’re the ones ultimately making the decisions, so that makes sense. I think employee buy-in and enthusiasm about the Windfall Clause or similar ideas will ultimately be a pretty big determinate about whether this actually gets implemented. That’s why I would love to hear and see engagement around this topic from people in the technology industry.

Lucas Perry: Something that we haven’t talked about yet is the distribution mechanism. And in your paper, you come up with desiderata and important considerations for an effective and successful distribution mechanism. Philanthropic effectiveness, security from improper influences, political legitimacy and buy in from AI labs. So, these are just guiding principals for helping to develop the mechanism for distribution. Could you comment on what the mechanism for distribution is or could be and how these desiderata will guide the formation of that mechanism?

Cullen O’Keefe: A lot of this thinking is guided by a few different things. One is just involvement in the effective altruism community. I as a member of that community, spend a lot of time thinking about how to make philanthropy work well. That said, I think that the potential scale of the Windfall Clause requires thinking about factors other than effectiveness, in the way that effectiveness altruists think of that. Just because the scale of potential resources that you’re dealing here, begins to look less and less like traditional philanthropy and more and more like psuedo or para-government institution. And so that’s why I think things like accountability and legitimacy become extra important in the Windfall Clause context. And then firm buy-in I mentioned, just because part of the actual process of negotiating an eventual Windfall Clause would presumably be coming up with distribution mechanism that advances some of the firms objectives of getting positive publicity or goodwill from agreeing to the Windfall Clause, both with their consumers and also with employers and governments.

And so they’re key stakeholders in coming up with that process as well. This all happens in the backdrop of a lot of popular discussion about the role of philanthropy in society, such as recent criticism of mega-philanthropy. I take those criticisms pretty seriously and want to come up with a Windfall Clause distribution mechanism that manages those better than current philanthropy. It’s a big task in itself and one that needs to be taken pretty seriously.

Lucas Perry: Is the windfall function synonymous with the windfall distribution mechanism?

Cullen O’Keefe: No. So, the windfall function, it’s the mathematical function that determines how much money, signatories to the Windfall Clause are obligated to give.

Lucas Perry: So, the windfall function will be part of the windfall contract, and the windfall distribution mechanism is the vehicle or means or the institution by which that output of the function is distributed?

Cullen O’Keefe: Yeah. That’s exactly right. Again, I like to think of this as top of the funnel, bottom of the funnel. So the windfall function is kind of the top of the funnel. It defines how much money has to go in to the Windfall Clause system and then the bottom of the funnel is like the output, what actually gets done with the windfall, to advance the goals of the Windfall Clause.

Lucas Perry: Okay. And so here you have some desiderata for this function, in particular transparency, scale sensitivity, adequacy, pre-windfall commitment, incentive alignment and competitiveness. Are there any here that you want to comment on with regards to the windfall function.

Cullen O’Keefe: Sure. If you look at the windfall function, it looks kind of like a progressive tax system. You fall in to some bracket and the bracket that you’re in determines the marginal percentage of money that you owe. So, in a normal income tax scheme, the bracket is determined by your gross income. In the Windfall Clause scheme, the bracket is determined by a slightly modified thing, which is profits as a percent of gross world product, which we started off talking about.

We went back and forth for a few different ways that this could look, but we ultimately decided upon a simpler windfall function that looks much like an income tax scheme, because we thought it was pretty transparent and easy to understand. And for a project as potentially important as the Windfall Clause, we thought that was pretty important that people be able to understand the contract that’s being negotiated, not just the signatories.

Lucas Perry: Okay. And you’re bringing up this point about taxes. One thing that someone might ask is, “Why do we need a whole Windfall Clause when we could just have some kind of tax on benefits accrued from AI?” But the very important feature to be mindful here, about the Windfall Clause, is that it does something that taxing cannot do, which is redistribute funding from tech heavy first world countries to people around the world, rather than just to the government of the country able to tax them. So that also seems to be a very important consideration here for why the Windfall Clause is important, rather than just some new tax scheme.

Cullen O’Keefe: Yeah. Absolutely. And in talking to people about the Windfall Clause, this is one of the top concerns that comes up. So, you’re right to emphasize it. I agree that the potential for international distribution is one of the main reasons that I personally are more excited about the Windfall Clause than standard corporate taxation. Other reasons are just that it seems just more tractable to negotiate this individually with firms, a number of firms potentially in a position of developing advanced AI is pretty small now and might continue to be small for the foreseeable future. So the number of potential entities that you have persuaded to agree to this might be pretty small.

There’s also the possibility that we mention, but don’t propose an exact mechanism for in the paper of allowing taxation to supersede the Windfall Clause. So, if a government came up with a better taxation scheme, you might either release the signatories from the Windfall Clause or just have the windfall function compensate for that by reducing or eliminating total obligation. Of course, it gets tricky because then you would have to decide which types of taxes would you do that for, if you want to maintain the international motivations of the Windfall Clause. And you would also have to kind of figure out what the optimal tax rate is, which is obviously no small task. So those are definitely complicated questions, but at least in theory, there’s the possibility for accommodating those sorts of ex-post taxation efforts in a way that doesn’t burden firms too much.

Lucas Perry: Do you have any more insights or positives or negatives to comment here about the windfall function. It seems like in the paper, it is as you mention, open for a lot more research. Do you have directions for further investigation of the windfall function?

Cullen O’Keefe: Yeah. It’s one of the things that we lead out with, and it’s actually as you’re saying. This is primarily supposed illustrative and not the right windfall function. I’d be very surprised if this was ultimately the right way to do this. Just because the possibility in this space is so big and we’ve explored so little of it. One of the ideas that I am particularly excited about, and I think more and more might ultimately be the right thing to do, is instead of having a profits based trigger for the windfall function, instead having a market tap based trigger. And there are just basic accounting reasons why I’m more excited about this. Tracking profits is not as straight forward as it seems, because firms can do stuff with their money. They can spend more of it and reallocate it in certain ways. Whereas it’s much harder and they have less incentive to downward manipulate their stock price or market capitalization. So I’d be interested in potentially coming up with more value based approaches to the windfall function rather than our current one, which is based on profits.

That said, there is a ton of other variables that you could tweak here, and would be very excited to work with people or see other proposals of what this could look like.

Lucas Perry: All right. So this is an open question about how the windfall function will exactly look. Can you provide any more clarity on the mechanism for distribution, keeping mind here the difficulty of creating an effective way of distributing the windfall, which you list as the issues of effectiveness, accountability, legitimacy and firm buy-in?

Cullen O’Keefe: One concrete idea that I actually worked closely with FLI on, specifically with Anthony Aguirre and Jared Brown, was the windfall trust idea, which is basically to create a trust or kind of psuedo-trust that makes every person in world or as many people as we can, reach equal beneficiaries of a trust. So, in this structure, which is on page 41 of the report if people are interested in seeing it. It’s pretty simple. The idea is that the successful developer would satisfy their obligations by paying money to a body called the Windfall Trust. For people who don’t know what trust is, it’s a specific type of legal entity. And then all individuals would be either or actual or potential beneficiaries of the Windfall Trust, and would receive equal funding flows from that. And could even receive equal input in to how the trust is managed, depending on how the trust was set up.

Trusts are also exciting because they are very flexible mechanisms that you can arrange the governance of in many different ways. And then to make this more manageable, obviously a single trust with eight billion beneficiaries seems hard to manage, so you take a single trust for every 100,000 people or whatever number you think is manageable. I’m kind of excited about that idea, I think it hits a lot of the desiderata pretty well and could be a way in which a lot of people could see benefit from the windfall.

Lucas Perry: Are there any ways of creating proto-windfall clauses or proto-windfall trusts to sort of test the idea before transformative AI comes on the scene?

Cullen O’Keefe: I would be very excited to do that. I guess one thing I should say, OpenAI where I currently work, has a structure called a capped-profit structure, which is similar in many ways to the Windfall Clause. Our structure is such that profits above a certain cap that can be returned to investors, go to a non-profit, which is the OpenAI non-profit, which then has to use those funds for charitable purposes. But I would be very excited to see new companies and potentially companies aligned with the mission of the FLI podcast, to experiment with structures like this. In the fourth section of the report, we talk all about different precedents that exist already, and some of these have different features that are close to the Windfall Clause. And I’d be interested in someone putting all those together for their start-up or their company and making a kind of pseudo-windfall clause.

Lucas Perry: Let’s get in to the legal permissibility of the Windfall Clause. Now you said that this is actually one of the reasons why you first got in to this, was because it got tabled because people were worried about the fiduciary responsibilities that companies would have. Let’s start by reflecting on whether or not this is legally permissible in America, and then think about China, because these are the two biggest AI players today.

Cullen O’Keefe: Yeah. There’s actually a slight wrinkle there that we might also have to talk about, the Cayman Islands. But we’ll get to that. I guess one interesting fact about the Windfall Clause report, is that it’s slightly weird that I’m the person that ended up writing this. You might think an economist should be the person writing this, since it deals so much with labor economics and inequality, etcetera, etcetera. And I’m not an economist by any means. The reason that I got swept up in this is because of the legal piece. So I’ll first give a quick crash course in corporate law, because I think it’s an area than not a lot of people understand and it’s also important for this.

Corporations are legal entities. They are managed by a board of directors for the benefit of the shareholders, who are the owners of the firm. And accordingly, since the directors have the responsibility of managing a thing which is owned in part by other people, they owe certain duties to the shareholders. There are known as fiduciary duties. The two primary ones are the duty of loyalty and the duty of care. So, duty of loyalty, we don’t really talk about a ton in this piece, just the duty to manage the corporation for the benefit of the corporation itself, and not for the personal gain of the directors.

The duty of care is kind of what it sounds like, just the duty to take adequate care that the decisions made for the corporation by the board of directors will benefit the corporation. The reason that this is important for the purposes of a Windfall Clause and also for the endless speculation of corporate law professors and theorists, is when you engage in corporate philanthropy, it kind of looks like you’re doing something that is not for the benefit of the corporation. By definition, giving money to charity is primarily a philanthropic act or at least that’s kind of the prima facie case for why that might be a problem from the standpoint of corporate law. Because this is other people’s money largely, and the corporation is giving it away, seemingly not for the benefit of the corporation itself.

There actually hasn’t been that much case law, so actual court decisions on this issue. I found some of them across the US. As a side note, we primarily talk about Delaware law, because Delaware is the state in which the plurality of American corporations are incorporated for historical reasons. Their corporate law is by far the most influential in the United States. So, even though you have this potential duty of care issue, with making corporate donations, the standard by which directors are judged is the business judgment rule. Quoting from the American Law Institute, a summary of the business judgment rule is, “A director or officer who makes a business judgment in good faith, fulfills the duty of care if the director or officer, one, is not interested,” that means there is no conflict of interest, “In the subject of the business judgment. Two, is informed with respect to the business judgment to the extent that the director or officer reasonably believes to be appropriate under the circumstances. And three, rationally believes that the business judgment is in the best interests of the corporation.” So this is actually a pretty forgiving standard. It’s basically just use your best judgement standard, which is why it’s very hard for shareholders to successfully make a case that a judgement was a violation of the business judgement rules. It’s very rare for such challenges to actually succeed.

So a number of cases have examined the relationship of the business judgement rule to corporate philanthropy. They basically universally held that this is a permissible invocation or permissible example of the business judgement rule. That there are all these potential benefits that philanthropy could give to the corporation, therefore corporate directors decision to authorize corporate donations would be generally upheld under the business judgement rule, provided all these other things are met.

Lucas Perry: So these firm motivations that we touched on earlier were generating goodwill towards the company, improving employee relations and then reducing political risk I guess is also like having good faith with politicians who are, at the end of the day, hopefully being held accountable by their constituencies.

Cullen O’Keefe: Yeah, exactly. So these are all things that could plausibly, financially benefit the corporation in some form. So in this sense, corporate philanthropy looks less like a donation and more like an investment in the firm’s long term profitability, given all these soft factors like political support and employee relations. Another interesting wrinkle to this, if you read the case law of these corporate donation cases, they’re actually quite funny. The only case I quote from would be Sullivan v. Hammer. A corporate director wanted to make a corporate donation to an art museum, that had his name and kind of served basically as his personal art collection, more or less. And the court kind of said, this is still okay under business judgement rule. So, that was a pretty shocking example of how lenient this standard is.

Lucas Perry: So then they synopsis version here, is that the Windfall Clause is permissible in the United States, because philanthropy in the past has been seen as still being in line with fiduciary duties. And the Windfall Clause would do the same.

Cullen O’Keefe: Yeah, exactly. The one interesting wrinkle about the Windfall Clause that might distinguish it from most corporate philanthropy but though definitely not all, is that it has this potentially very high ex-post cost, even though it’s ex-ante cost might be quite low. So in a situation which a firm actually has to pay out the Windfall Clause, it’s very, very costly to the firm. But the business judgement rule, there’s actually a post to protect these exact types of decisions, because the things that courts don’t want to do is be second guessing every single corporate decision with the benefit of hindsight. So instead, they just instruct people to look at the ex-ante cost benefit analysis, and defer to that, even if ex-post it turns out to have been a bad decision.

There’s an analogy that we draw to stock option compensation, which is very popular, where you give an employee a block of stock options, that at the time is not very valuable because it’s probably just in line with the current value of the stock. But ex-post might be hugely valuable and this how a lot of early employees of companies get wildly rich, well beyond what they would have earned at fair market and cash value ex-ante. That sort of ex-ante reasoning is really the important thing, not the fact that it could be worth a lot ex-post.

One of the interesting things about the Windfall Clause is that it is a contract through time, and potentially over a long time. A lot of contracts that we make are pretty short term focus. But the Windfall Clause is in agreement now to do stuff, is stuff happens in the future, potentially in the distant future, which is part of the way the windfall function is designed. It’s designed to be relevant over a long period of time especially given the uncertainty that we started off talking about, with AI timelines. The important thing that we talked about was the ex-ante cost which means the cost to the firm in expected value right now. Which is basically the probability that this ever gets triggered, and if it does get triggered, how much will it be worth, all discounted by the time value of money etcetera.

One thing that I didn’t talk about is that there’s some language in some court cases about limiting the amount of permissible corporate philanthropy to a reasonable amount, which is obviously not a very helpful guide. But there’s a court case saying that this should be determined by looking to the charitable giving deduction, which is I believe about 10% right now.

Lucas Perry: So sorry, just to get the language correct. It’s the ex-post cost is very high because after the fact you have to pay huge percentages of your profit?

Cullen O’Keefe: Yeah.

Lucas Perry: But it still remains feasible that a court might say that this violates fiduciary responsibilities right?

Cullen O’Keefe: There’s always the possibility that a Delaware court would invent or apply new doctrine in application to this thing, that looks kind of weird from their perspective. I mean, this is a general question of how binding precedent is, which is an endless topic of conversation for lawyers. But if they were doing what I think they should do and just straight up applying precedent, I don’t see a particular reason why this would be decided differently than any of the other corporate philanthropy cases.

Lucas Perry: Okay. So, let’s talk a little bit now about the Cayman Islands and China.

Cullen O’Keefe: Yeah. So a number of significant Chinese tech companies are actually incorporated in the Cayman Islands. It’s not exactly clear to me why this is the case, but it is.

Lucas Perry: Isn’t it for hiding money off-shore?

Cullen O’Keefe: So I’m not sure if that’s why. I think even if taxation is a part of that, I think it also has to do with capital restrictions in China, and also they want to attract foreign investors which is hard if they’re incorporated in China. Investors might not trust Chinese corporate law very much. This is just my speculation right now, I don’t actually know the answer to that.

Lucas Perry: I guess the question then just is, what is the US and China relationship with the Cayman Islands? What is it used for? And then is the Windfall Clause permissible in China?

Cullen O’Keefe: Right. So, the Cayman Islands is where the big three Chinese tech firms, Alibaba, Baidu and Tencent are incorporated. I’m not a Caymanian lawyer by any means, nor am I an expert in China law, but basically from my outsider reading of this law, applying my general legal knowledge, it appears that similar principals of corporate law apply in the Cayman Islands which is why it might be a popular spot for incorporation. They have a rule that looks like the business judgement rule. This is in footnote 120 if anyone wants to dig in to it in the report. So, for the Caymanian corporations, it looks like it should be okay for the same reason. China being a self proclaimed socialist country, also has a pretty interesting corporate law that actually not only allows but appears to encourage firms to engage in corporate philanthropy. From the perspective of their law, at least it looks potentially more friendly than even Delaware law, so kind of a-fortiori should be permissible there.

That said, obviously there’s potential political reality to be considered there, especially also the influence of the Chinese government on state owned enterprises, so I don’t want to be naïve as to just thinking what the law says is what is actually politically feasible there. But all that caveating aside, as far as the law goes, the People’s Republic of China looks potentially promising for a Windfall Clause.

Lucas Perry: And that again matter, because China is currently second to the US in AI and are thus also likely potentially able to reach windfall via transformative AI in the future.

Cullen O’Keefe: Yeah. I think that’s the general consensus, is that after the United States, China seems to be the most likely place to develop AGI for transformative AI. You can listen and read a lot of the work by my colleague Jeff Ding on this, who recently appeared on 80,000 Hours podcast, talking about China’s AI dream and has a report by the same name, from FHI, that I would highly encourage everyone to read.

Lucas Perry: All right. Is it useful here to talk about historical precedents?

Cullen O’Keefe: Sure. I think one that’s potentially interesting is that a lot of sovereign nations have actually dealt with this problem of windfall governance before. It’s actually like natural resource based states. So Norway is kind of the leading example of this. They had a ton of wealth from oil, and had to come up with a way of distributing that wealth in a fair way. And as a sovereign wealth fund as a result, as do a lot of countries and provides for all sorts of socially beneficial applications.

Google actually when it IPO’d, gave one percent of its equity to it’s non-profit arm, the Google Foundation. So that’s actually significantly like the Windfall Clause in the sense that it gave a commitment that would grow in value as the firm’s prospects engaged. And therefore had low ex-ante costs but potentially higher ex-post-cost. Obviously, in personal philanthropy, a lot of people will be familiar with pledges like Founders Pledge or the Giving What We Can Pledge, where people pledge a percentage of their personal income to charity. The Founders Pledge kind of most resembles the Windfall Clause in this respect. People pledge a percentage of equity from their company upon exit or upon liquidity events and in that sense, it looks a lot like a Windfall Clause.

Lucas Perry: All right. So let’s get in to objections, alternatives and limitations here. First objection to the Windfall Clause, would be that the Windfall Clause will never be triggered.

Cullen O’Keefe: That certainly might be true. There’s a lot of reasons why that might be true. So, one is that we could all just be very wrong about the promise of AI. Also AI development could unfold in some other ways. So it could be a non-profit or an academic institution or a government that develops windfall generating AI and no one else does. Or it could just be that the windfall from AI is spread out sufficiently over a large number of firms, such that no one firm earns windfall, but collectively the tech industry does or something. So, that’s all certainly true. I think that those are all scenarios worth investing in addressing. You could potentially modify the Windfall Clause to address some of those scenarios.

hat said, I think there’s a significant non-trivial possibility that such a windfall occurs in a way that would trigger a Windfall Clause, and if it does, it seems worth investing in solutions that could mitigate any potential downside to that or share the benefits equally. Part of the benefit of the Windfall Clause is that if nothing happens, it doesn’t have any obligations. So, it’s quite low cost in that sense. From a philanthropic perspective, there’s a cost in setting this up and promoting the idea, etcetera, and those are definitely non-trivial costs. But the actual costs, signing the clause, only manifests upon actually triggering it.

Lucas Perry: This next one is that firms will find a way to circumvent their commitments under the clause. So it could never trigger because they could just keep moving money around in skillful ways such that the clause never ends up getting triggered. Some sub-points here are that firms will evade the clause by nominally assigning profits to subsidiary, parent or sibling corporations. That firms will evade the clause by paying out profits in dividends. That firms will sell all windfall generating AI assets to a firm that is not bound by the clause. Any thoughts on these here.

Cullen O’Keefe: First of all, a lot of these were raised by early commentators on the idea, and so I’m very thankful to those people for helping raise this. I think we probably haven’t exhausted the list of potential ways in which firms could evade their commitments, so in general I would want to come up with solutions that are not just patch work solutions, but also more like general incentive alignment solutions. That said, I think most of these problems are mitigable by careful contractual drafting. And then potentially also searching to other forms of the Windfall Clause like something based on firm share price. But still, I think there are probably a lot of ways to circumvent the clause in its kind of early form that we’ve proposed. And we would want to make sure that we’re pretty careful about drafting it and simulating potential ways that signatory could try to wriggle out of its commitment.

Cullen O’Keefe: I think it’s also worth noting that a lot of those potential actions would be pretty clear violations of general legal obligations that signatories to a contract have. Or could be mitigated with pretty easy contractual clauses.

Lucas Perry: Right. The solution to these would be foreseeing them and beefing up the actual windfall contract to not allow for these methods of circumvention.

Cullen O’Keefe: Yeah.

Lucas Perry: So now this next one I think is quite interesting. No firm with a realistic chance of developing windfall generating AI would sign the clause. How would you respond to that?

Cullen O’Keefe: I mean, I think that’s certainly a possibility, and if that’s the case, then that’s the case. It seems like our ability to change that might be pretty limited. I would hope that most firms in the potential position to be generating windfall, would take that opportunity as also carrying with it responsibility to follow the common good principle. And I think that a lot of people in those companies, both in leadership and the rank and file employee positions, do take that seriously. We do also think that the Windfall Clause could bring non-trivial benefits as we spent a lot of time talking about.

Lucas Perry: All right. The next one here is that quote, “If the public benefits of the Windfall Clause are supposed to be large, that is inconsistent with stating that the cost to firms will be small enough, that they would be willing to sign the clause.” This has a lot to do with this distinction with the ex-ante and the ex-post differences in cost. And also how there is probabilities and time involved here. So, your response to this objection.

Cullen O’Keefe: I think there’s some a-symmetries between the costs and benefit. Some of the costs are things that would happen in the future. So from a firms perspective, they should probably discount the costs of the Windfall Clause because if they earn windfall, it would be in future. From a public policy perspective, a lot of those benefits might not be as time sensitive. So you might no super-care when exactly those costs happen and therefore not really discount them from a present value standpoint.

Lucas Perry: You also probably wouldn’t want to live in the world in which there was no distribution mechanism or windfall function for allocating the windfall profits from one of your competitors.

Cullen O’Keefe: That’s an interesting question though, because a lot of corporate law principals suggest that firms should want to behave in a risk neutral sense, and then allow investors to kind of spread their bets according to their own risk tolerances. So, I’m not sure that this risks spreading between firms argument works that well.

Lucas Perry: I see. Okay. The next is that the Windfall Clause reduces incentives to innovate.

Cullen O’Keefe: So, I think it’s definitely true that it will probably have some effect on the incentive to innovate. That almost seems like kind of necessary or something. That said, I think people in our community are kind of the opinion that there are significant externalities to innovation and not all innovation towards AGI is strictly beneficial in that sense. So, making sure that those externalities are balanced seems important. And the Windfall Clause is one way to do that. In general, I think that the disincentive is probably just outweighed by the benefits of the Windfall Clause, but I would be open to reanalysis of that exact calculus.

Lucas Perry: Next objection is, the Windfall Clause will shift investment to competitive non-signatory firms.

Cullen O’Keefe: This was another particularly interesting comment and it has a potential perverse effect actually. Suppose you have two types of firms, you have nice firms and less nice firms. And all the nice firms sign the Windfall Clause. And therefore their future profit streams are taxed more heavily than the bad firms. And this is bad, because now investors will probably want to go to bad firms because they offer potentially more attractive return on investment. Like the previous objection, this is probably true to some extent. It kind of depends on the empirical case about how many firms you think are good and bad, and also what the exact calculus is regarding how much this disincentives investors from giving to good firms and causes the good firms to act better.

We do talk a little bit about different ways in which you could potentially mitigate this with careful mechanism design. So you could have the Windfall Clause consist in subordinated obligations but the firm could raise senior equity or senior debt to the Windfall Clause such that new investors would not be disadvantaged by investing in a firm that has signed the Windfall Clause. Those are kind of complicated mechanisms, and again, this is another point where thinking through this from a very careful micro-economic point in modeling this type of development dynamic would be very valuable.

Lucas Perry: All right. So we’re starting to get to the end here of objections or at least objections in the paper. The next is, the Windfall Clause draws attention to signatories in an undesirable way.

Cullen O’Keefe: I think the motivation for this objection is something like, imagine that tomorrow Boeing came out and said, “If we built a Death Star, we’ll only use it for good.” What are you talking about, building a Death Star? Why do you even have to talk about this? I think that’s kind of the motivation, is talking about earning windfall is itself drawing attention to the firm in potentially undesirable ways. So, that could potentially be the case. I guess the fact that we’re having this conversation suggests that this is not a super-taboo subject. I think a lot of people are generally aware of the promise of artificial intelligence. So the idea that the gains could be huge and concentrated in one firm, doesn’t seem that worrying to me. Also, if a firm was super close to AGI or something, it would actually be much harder for them to sign on to the Windfall Clause, because the costs would be so great to them in expectation, that they probably couldn’t justify it from a fiduciary duty standpoint.

So in that sense, signing on to the Windfall Clause at least from a purely rational standpoint, is kind of negative evidence that a firm is close to AGI. That said, there is certainly psychological elements that complicate that. It’s very cheap for me to just make a commitment that says, oh sure if I get a trillion dollars, I’ll give 75% of it some charity. Sure, why not? I’ll make that commitment right now in fact.

Lucas Perry: It’s kind of more efficacious if we get firms to adopt this sooner rather than later, because as time goes on, their credences in who will hit AI windfall will increase.

Cullen O’Keefe: Yeah. That’s exactly right. Assuming timelines are constant, the clock is ticking on stuff like this. Every year that goes by, committing to this gets more expensive to firms, and therefore rationally, less likely.

Lucas Perry: All right. I’m not sure that I understand this next one, but it is, the Windfall Clause will lead to moral licensing. What does that mean?

Cullen O’Keefe: So moral licensing is a psychological concept, that if you do certain actions that either are good or appear to be good, that you’re more like to do bad things later. So you have a license to act immorally because of the times that you acted morally. I think a lot of times this is a common objection to corporate philanthropy. People call this ethics washing or green washing, in the context of environmental stuff specifically. I think you should again, do pretty careful cost benefit analysis here to see whether the Windfall Clause is actually worth the potential licensing effect that it has. But of course, one could raise this objection to pretty much any pro-social act. Given that we think the Windfall Clause could actually have legally enforceable teeth, it seems kind of less likely unless you think that the licensing effects would just be so great that they’ll overcome the benefits of actually having an enforceable Windfall Clause. It seems kind of intuitively implausible to me.

Lucas Perry: Here’s another interesting one. The rule of law might not hold if windfall profits are achieved. Human greed and power really kicks in and the power structures which are meant to enforce the rule of law no longer are able to, in relation to someone with AGI or superintelligence. How do you feel about this objection?

Cullen O’Keefe: I think it’s a very serious one. I think it’s something that perhaps the AI safety maybe should be investing more in. I’m also having an interesting discussion, asynchronously on this with Rohin Shah on the EA Forum. I do think there’s a significant chance that if you have an actor that is potentially as powerful as a corporation with AGI and all the benefits that come with that at its disposal, could be such that it would be very hard to enforce the Windfall Clause against it. That said, I think we do kind of see Davids beating Goliaths in the law. People do win lawsuits against the United States government or very large corporations. So it’s certainly not the case that size is everything, though it would be naïve to suppose that it’s not correlated with the probability of winning.

Other things to worry about, are the fact that this corporation will have very powerful AI that could potentially influence the outcome of cases in some way or perhaps hide ways in which it was evading the Windfall Clause. So, I think that’s worth taking seriously. I guess just in general, I think this issue is worth a lot of investment from the AI safety and AI policy communities, for reasons well beyond the Windfall Clause. And it seems like a problem that we’ll have to figure out how to address.

Lucas Perry: Yeah. That makes sense. You brought up the rule of law not holding up because of its power to win over court cases. But the kind of power that AGI would give, would also potentially far extend beyond just winning court cases right? In your ability to not be bound by the law.

Cullen O’Keefe: Yeah. You could just act as a thug and be beyond the law, for sure.

Lucas Perry: It definitely seems like a neglected point, in terms of trying to have a good future with beneficial AI.

Cullen O’Keefe: I’m kind of the opinion that this is pretty important. It just seems like that this is just also a thing in general, that you’re going to want of a post-AGI world. You want the actor with AGI to be accountable to something other than its own will.

Lucas Perry: Yeah.

Cullen O’Keefe: You want agreements you make before AGI to still have meaning post-AGI and not just depend on the beneficence of the person with AGI.

Lucas Perry: All right. So the last objection here is, the Windfall Clause undesirably leaves control of advanced AI in private hands.

Cullen O’Keefe: I’m somewhat sympathetic to the argument that AGI is just such an important technology that it ought to be governed in a pro-social way. Basically, this project doesn’t have a good solution to that, other than to the extent that you could use Windfall Clause funds to perhaps purchase share stock from the company or have a commitment in shares of stock rather than in money. On the other hand, private companies are doing a lot of very important work right now, in developing AI technologies and are kind of the current leading developers of advanced AI. It seems to me like their behaving pretty responsibility overall. I’m just not sure what the ultimate ideal arrangement of ownership of AI will look like and want to leave that open for other discussion.

Lucas Perry: All right. So we’ve hit on all of these objections, surely there are more objections, but this gives a lot for listeners and others to consider and think about. So in terms of alternatives for the Windfall Clause, you list four things here. They are windfall profits should just be taxed. We should rely on anti-trust enforcement instead. We should establish a sovereign wealth fund for AI. We should implement a universal basic income instead. So could you just go through each of these sequentially and give us some thoughts and analysis on your end?

Cullen O’Keefe: Yeah. We talked about taxes already, so is it okay if I just skip that?

Lucas Perry: Yeah. I’m happy to skip taxes. The point there being that they will end up only serving the country in which they are being taxed, unless that country has some other mechanism for distributing certain kinds of taxes to the world.

Cullen O’Keefe: Yeah. And it also just seems much more tractable right now to work on, private commitments like the Windfall Clause rather than lobbying for pretty robust tax code.

Lucas Perry: Sure. Okay, so number two.

Cullen O’Keefe: So number two is about anti-trust enforcement. This was largely spurred by a conversation with Haydn Belfield. The idea here is that in this world, the AI developer will probably be a monopoly or at least extremely powerful in its market, and therefore we should consider anti-trust enforcement against it. I guess my points are two-fold. Number one is that just under American law, it is pretty clear that merely possessing monopoly power is not itself a reason to take anti-trust action. You have to have acquired that monopoly power in some illegal way. And if some of the stronger hypothesis about AI are right, AI could be a natural monopoly and so it seems pretty plausible that an AI monopoly could develop without any illegal actions taken to gain that monopoly.

I guess second, the Windfall Clause addresses some of the harms from monopoly, though not all of them, by transferring some wealth from shareholders to everyone and therefore transferring some wealth from shareholders to consumers.

Lucas Perry: Okay. Could focusing on anti-trust enforcement alongside the Windfall Clause be beneficial?

Cullen O’Keefe: Yeah. It certainty could be. I don’t want to suggest that we ought not to consider anti-trust, especially if there’s a natural reason to break up firms or if there’s a natural violation of anti-trust law going on. I guess I’m pretty sympathetic to the anti-trust orthodoxy that monopoly is not in itself a reason in itself to break up a firm. But I certainly think that we should continue to think about anti-trust as a potential response to these situations.

Lucas Perry: All right. And number three is we should establish a sovereign wealth fund for AI.

Cullen O’Keefe: So this is an idea that actually came out of FLI. Anthony Aguirre has been thinking about this. The idea is to set up something that looks like the sovereign wealth funds that I alluded to earlier, that places like Norway and other resource rich countries have. Some better and some worse governed, I should say. And I think Anthony’s suggestion was to set this up as a fund that held shares of stock of the corporation, and redistributed wealth in that way. I am sympathetic to this idea overall as I mentioned, I think stock based Windfall Clause could be potentially be an improvement over the cash based one that we suggest. That said, I think there are significant legal problems here if that’s kind of make this harder to imagine working. For one thing, it’s hard to imagine the government buying up all these shares of stock companies, just to acquire a significant portion of them so that you have a good probability of capturing a decent percentage of future windfall, you would have to just spend a ton of money.

Secondly, they couldn’t expropriate the shares of stock, but it would require just compensation under the US Constitution. Third, there are ways that corporations can prevent from accumulating a huge share of its stock if they don’t want it to, the poison pills, the classic example. So if the firms didn’t want a sovereign automation fund to buy up significant shares of their fund, which they might not want to since it might not govern in the best interest of other shareholders, they could just prevent it from acquiring a controlling stake. So all those seem like pretty powerful reasons why contractual mechanisms might be preferable to that kind of sovereign automation fund.

Lucas Perry: All right. And the last one here is, we should implement a universal basic income instead.

Cullen O’Keefe: Saving kind of one of the most popular suggestions for last. This isn’t even really an alternative to the Windfall Clause, it’s just one way that the Windfall Clause could look. And ultimately I think UBI is a really promising idea that’s been pretty well studied. Seems to be pretty effective. It’s obviously quite simple, has widespread appeal. And I would be probably pretty sympathetic to a Windfall Clause that ultimately implements a UBI. That said, I think there are some reasons that you might you prefer other forms of windfall distribution. So one is just that UBI doesn’t seem to target people particularly harmed by AI for example, if we’re worried about a future with a lot of automation of jobs. UBI might not be the best way to compensate those people that are harmed.

Others address that it might not be the best opportunity for providing public goods, if you thought that that’s something that the Windfall Clause should do, but I think it could be a very promising part of the Windfall Clause distribution mechanism.

Lucas Perry: All right. That makes sense. And so wrapping up here, are there any last thoughts you’d like to share with anyone particularly interested in the Windfall Clause or people in policy in government who may be listening or anyone who might find themselves at a leading technology company or AI lab?

Cullen O’Keefe: Yeah. I would encourage them to get in touch with me if they’d like. My email address is listed in the report. I think just in general, this is going to be a major challenge for society in the next century. At least it could be. As I said, I think there’s substantial uncertainty about a lot of this, so I think there’s a lot of potential opportunities to do research, not just in economics and law, but also in political science and thinking about how we can govern the windfall that artificial intelligence brings, in a way that’s universally beneficial. So I hope that other people will be interested in exploring that question. I’ll be working with the Partnership on AI to help think through this as well and if you’re interested in those efforts and have expertise to contribute, I would very much appreciate people getting touch, so they can get involved in that.

Lucas Perry: All right. Wonderful. Thank you and everyone else who helped to help work on this paper. It’s very encouraging and hopefully we’ll see widespread adoption and maybe even implementation of the Windfall Clause in our lifetime.

Cullen O’Keefe: I hope so too, thank you so much Lucas.

AI Alignment Podcast: On the Long-term Importance of Current AI Policy with Nicolas Moës and Jared Brown

 Topics discussed in this episode include:

  • The importance of current AI policy work for long-term AI risk
  • Where we currently stand in the process of forming AI policy
  • Why persons worried about existential risk should care about present day AI policy
  • AI and the global community
  • The rationality and irrationality around AI race narratives

Timestamps: 

0:00 Intro

4:58 Why it’s important to work on AI policy 

12:08 Our historical position in the process of AI policy

21:54 For long-termists and those concerned about AGI risk, how is AI policy today important and relevant? 

33:46 AI policy and shorter-term global catastrophic and existential risks

38:18 The Brussels and Sacramento effects

41:23 Why is racing on AI technology bad? 

48:45 The rationality of racing to AGI 

58:22 Where is AI policy currently?

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the AI Alignment Podcast. I’m Lucas Perry. Today’s episode is with Jared Brown and Nicolas Moës, two AI Policy researchers and AI influencers who are both concerned with the long-term and existential risks associated with artificial general intelligence and superintelligence. For us at the the Future of Life Institute, we’re particularly interested in mitigating threats from powerful AI that could lead to the extinction of life. One avenue of trying to address such threats could be through action in the space of AI policy. But just what can we do today to help ensure beneficial outcomes from AGI and superintelligence in the policy sphere? This podcast focuses on this question.

As for some key points to reflect on throughout the podcast, Nicolas Moes points out that engaging in AI policy today is important because: 1) Experience gained on short-term AI policy issues is important to be considered a relevant advisor on long-term AI policy issues coming up in the future. 2) There are very few people that care about AGI safety currently in government, politics or in policy communities. 3) There are opportunities to influence current AI policy decisions in order to provide a fertile ground for future policy decisions or, better but rarer, to be directly shaping AGI safety policy today though evergreen texts. Future policy that is implemented is path dependent on current policy that we implement today. What we do now is precedent setting. 4) There are opportunities today to develop a skillset useful for other policy issues and causes. 5) Little resource is being spent on this avenue for impact, so the current return on investment is quite good.

Finally I’d like to reflect on the need to bridge the long-term and short-term partitioning of AI risk discourse. You might have heard this divide before, where there are long-term risks from AI, like a long-term risk being powerful AGI or superintelligence misaligned with human values causing the extinction of life, and then short-term risk like algorithmic bias and automation induced disemployment. Bridging this divide means understanding the real and deep interdependencies and path dependencies between the technology and governance which choose to develop today, and the world where AGI or superintelligence emerges. 

For those not familiar with Jared Brown or Nicolas Moës, Nicolas is an economist by training focused on the impact of Artificial Intelligence on geopolitics, the economy and society. He is the Brussels-based representative of The Future Society. Passionate about global technological progress, Nicolas monitors global developments in the legislative framework surrounding AI. He completed his Masters degree in Economics at the University of Oxford with a thesis on institutional engineering for resolving the tragedy of the commons in global contexts. 

Jared is the Senior Advisor for Government Affairs at FLI, working to reduce global catastrophic and existential risk (GCR/x-risk) by influencing the U.S. policymaking process, especially as it relates to emerging technologies. He is also a Special Advisor for Government Affairs at the Global Catastrophic Risk Institute. He has spent his career working at the intersection of public policy, emergency management, and risk management, having previously served as an Analyst in Emergency Management and Homeland Security Policy at the U.S. Congressional Research Service and in homeland security at the U.S. Department of Transportation.

The Future of Life Institute is a non-profit and this podcast is funded and supported by listeners like you. So if you find what we do on this podcast to be important and beneficial, please consider supporting the podcast by donating at futureoflife.org/donate. These contributions make it possible for us to bring you conversations like these and to develop the podcast further. You can also follow us on your preferred listening platform by searching for us directly or following the links on the page for this podcast found in the description.

And with that, here is Jared Brown and Nicolas Moës on AI policy. 

I guess we can start off here, with developing the motivations around why it’s important for people to be considering AI policy. So, why is it important to be working on AI policy right now?

Nicolas Moës: It’s important right now because there has been an uptick in markets, right? So AI technologies are now embedded in many more products than ever before. Part of it is hype, but part of it is also having a real impact on profits and bottom line. So there is an impact on society that we have never seen before. For example, the way Facebook algorithms have affected recent history is something that has made the population and policy makers panic a bit.

And so quite naturally the policy window has opened. I think it’s also important to be working on it for people who would like to make the world better for two reasons. As I mentioned, since the policy window is open that means that there is a demand for advice to fill in the gaps that exist in the legislation, right? There have been many concrete situations where, as an AI policy researcher, you get asked to provide input either by joining expert group, or workshops or simply directly some people who say, “Oh, you know about AI, so could you just send me a position paper on this?”

Nicolas Moës: So these policies are getting written right now, which at first is quite soft and then becomes harder and harder policies, and now to the point that at least in the EU, you have regulations for AI on the agenda, which is one of the hardest form of legislation out there. Once these are written it is very difficult to change them. It’s quite sticky. There is a lot of path dependency in legislation. So this first legislation that passes, will probably shape the box in which future legislation can evolve. Its constraints, the trajectory of future policies, and therefore it’s really difficult to take future policies in another direction. So for people who are concerned about AGI, it’s important to be already present right now.

The second point, is that these people who are currently interacting with policymakers on a daily basis are concerned about very specific things and they are gaining a lot of experience with policymakers, so that in the future when you have more general algorithms that come into play, the people with experience to advise on these policies will actually be concerned about what many people call short term issues. People who are concerned more about the safety, the robustness of these more general algorithm would actually end up having a hard time getting into the room, right? You cannot just walk in and claim authority when you have people with 10, 15 or even 20 years of experience regulating this particular field of engineering.

Jared Brown: I think that sums it up great, and I would just add that there are some very specific examples of where we’re seeing what has largely been, up to this point, a set of principles being developed by different governments, or industry groups. We’re now seeing attempts to actually enact hard law or policy.

Just in the US, the Office of Management and Budget and the Office of Science and Technology Policy issued a memorandum calling for further AI regulation and non-regulatory actions and they issued a set of principles, that’s out for comment right now, and people are looking at those principles, trying to see if there’s ways of commenting on it to increase its longterm focus and its ability to adapt to increasingly powerful AI.

The OECD has already issued, and had sign ons to its AI principles, which are quite good.

Lucas Perry: What is the OECD?

Nicolas Moës: The Organization for Economic Cooperation and Development.

Jared Brown: Yes. Those principles are now going from principles to an observatory, and that will be launched by the end of February. And we’re seeing the effect of these principles now being adopted, and attempts now are being made to implement those into real regulatory approaches. So, the window from transitioning from principles to hard law is occurring right now, and as Nicholas said, decisions that are made now will have longterm effects because typically governments don’t turn their attention to issues more than once every five, maybe even 10 years. And so, if you come in three years from now with some brilliant idea about AI policy, chances are, the moment to enact that policy has already passed because the year prior, or two years prior, your government has enacted its formative legislation on AI.

Nicolas Moës: Yeah, yeah. So long as this policy benefits most people, they are very unlikely to even reopen, let’s say, the discussion, at all.

Lucas Perry: Right. So a few points here. The first is this one about path dependency, which means that the kinds of policies which we adopt now are going to be really important, because they’re going to inform and shape the kinds of policies that we’re able or willing to adopt later, and AI is going to be around for a long, long time. So we’re setting a lot of the foundation. The second thing was that if you care about AGI risk, or the risks of superintelligence, or very powerful forms of AI that you need to have been part of the conversation since the beginning, or else you’re not going to really be able to get a seat at the table when these things come around.

And Jared, is there a point here that I’m missing that you were trying to make?

Jared Brown: No, I think that sums it up nicely. The effect of these policies, and the ability of these policies to remain what you might call evergreen. So, long lasting and adaptive to the changing nature of AI technology is going to be critical. We see this all the time in tech policy. There are tech policies out there that were informed by the challenges of the time in which they were made and they quickly become detrimental, or outdated at best. And then there are tech policies that tend to be more adaptive, and those stand the test of time. And we need to be willing to engage with the short term policy making considerations, such that we’re making sure that the policies are evergreen for AI, as it becomes increasingly powerful.

Nicolas Moës: Besides the evergreen aspects of the policies that you want to set up now, there’s this notion of providing a fertile ground. So some policies that are very appropriate for short term issues, for example, fairness and deception, and fundamental rights abuse and that kind of thing, are actually almost copy pasted to future legislation. So, if you manage to already put concerns for safety, like robustness, corrigibility, and value alignment of the algorithm today, even if you don’t have any influence in 10 or 15 years when they review the legislation, you have some chances to see the policymakers just copy pasting this part on safety and to put it in whatever new legislation comes up in 10 years.

Jared Brown: There’s precedent setting, and legislators are woe to have to make fundamental reforms to legislation, and so if we see proper consideration of safety and security on AI in the evergreen pieces of legislation that are being developed now, that’s unlikely to be removed in future legislation.

Lucas Perry: Jared, you said that a lot of the principles and norms which have been articulated over say, the past five years are becoming codified into hard law slowly. It also would just be good if you guys could historically contextualize our position in terms of AI policy, whether or not we stand at an important inflection point, where we are in terms of this emerging technology.

Jared Brown: Sure, sure. So, I think if you went back just to 2017, 2016, at least in the US, there was very little attention to artificial intelligence. There were a smattering of congressional hearings being held, a few pertinent policy documents being released by executive agencies, but by and large, the term artificial intelligence remained in the science fiction realm of thinking.

Since that time, there’s been a massive amount of attention paid to artificial intelligence, such that in almost every Western democracy that I’m familiar with, it’s now part of the common discourse about technology policy. The phrase emerging tech is something that you see all over the place, regardless of the context, and there’s a real sensitivity by Western style democracy policymakers towards this idea that technology is shifting under our feet. There’s this thing called artificial intelligence, there’s this thing called synthetic biology, there’s other technologies linked into that — 5G and hypersonics are two other areas — where there’s a real understanding that something is changing, and we need to get our arms around it. Now, that has largely started with, in the past year, or year and a half, a slew of principles. There are at least 80 some odd sets of principles. FLI was one of the first to create a set of principles, along with many partners, and those are the Asilomar AI Principles.

Those principles you can see replicated and informing many sets of principles since then. We mentioned earlier, the OECD AI principles are probably the most substantive and important at this point, because they have the signature and backing of so many sovereign nation states, including the United States and most of the EU. Now that we have these core soft law principles, there’s an appetite for converting that into real hard law regulation or approaches to how AI will be managed in different governance systems.

What we’re seeing in the US, there’s been a few regulatory approaches already taken. For instance, rule making on the inclusion of AI algorithms into the housing market. This vision, if you will, from the Department of Transportation, about how to deal with autonomous vehicles. The FDA has approved products coming into the market that involve AI and diagnostics in the healthcare industry, and so forth. We’re seeing initial policies being established, but what we haven’t yet seen in any real context, is sort of a cross-sectoral AI broadly-focused piece of legislation or regulation.

And that’s what’s currently being developed both in the EU and in the US. That type of legislation, which seems like a natural evolution from where we’re at with principles, into a comprehensive holistic approach to AI regulation and legislation, is now occurring. And that’s why this time is so critical for AI policy.

Lucas Perry: So you’re saying that a broader and more holistic view about AI regulation and what it means to have and regulate beneficial AI is developed before more specific policies are implemented, with regards to the military, or autonomous weapons, or healthcare, or nuclear command and control.

Jared Brown: So, typically, governments try, whether or not they succeed remains to be seen, to be more strategic in their approach. If there is a common element that’s affecting many different sectors of society, they try and at least strategically approach that issue, to think: what is common across all policy arenas, where AI is having an effect, and what can we do to legislate holistically about AI? And then as necessary, build sector specific policies on particular issues.

So clearly, you’re not going to see some massive piece of legislation that covers all the potential issues that has to do with autonomous vehicles, labor displacement, workforce training, et cetera. But you do want to have an overarching strategic plan for how you’re regulating, how you’re thinking about governing AI holistically. And that’s what’s occurring right now, is we have the principles, now we need to develop that cross-sectoral approach, so that we can then subsequently have consistent and informed policy on particular issue areas as they come up, and as they’re needed.

Lucas Perry: And that cross-sectoral approach would be something like: AI should be interpretable and robust and secure.

Jared Brown: That’s written in principles to a large degree. But now we’re seeing, what does that really mean? So in the EU they’re calling it the European Approach to AI, and they’re going to be coming out with a white paper, maybe by the time this podcast is released, and that will sort of be their initial official set of options and opinions about how AI can be dealt with holistically by the EU. In the US, they’re setting regulatory principles for individual regulatory agencies. These are principles that will apply to the FDA, or the Department of Transportation, or the Department of Commerce, or the Department of Defense, as they think about how they deal with the specific issues of AI in their arenas of governance. Making sure that baseline foundation is informed and is an evergreen document, so that it incorporates future considerations, or is at least adaptable to future technological development in AI is critically important.

Nicolas Moës: With regards to the EU in particular, the historical context is maybe a bit different. As you mentioned, right now they are discussing this white paper with many transversal policy instruments that would be put forward, with this legislation. This is going to be negotiated over the next year. There is intentions to have the legislation at the EU level by the end of the current commission’s term. So that’s mean within five years. This is something that is quite interesting to explore, is that in 2016 there was this parliamentary dossier on initiative, so it’s something that does not have any binding power, just to show the opinion of the European parliament, that was dealing with robotics and civil laws. So, considering how civil law in Europe should be adjusted to robotics.

That was in 2016, right? And now there’s been this uptick in activities. This is something that we have to be aware of. It’s moved quite fast, but then again, there still is a couple of years before regulations get approved. This is one point that I wanted to clarify about, when we say it is fast or it is slow, we are talking still about a couple of years. Which is, when you know how long it takes for you to develop your network, to develop your understanding of the issues, and to try to influence the issues, a couple of years is really way too short. The second point I wanted to make is also, what will the policy landscape look like in two years? Will we have the EU again leveraging its huge market power to impose its regulations within the European Commission. There are some intentions to diffuse whatever regulations come out of the European Commission right now, throughout the world, right? To form a sort of influence sphere, where all the AI produced, even abroad, would actually be fitting EU standards.

Over the past two, three years there have been a mushrooming of AI policy players, right? The ITU has set up this AI For Good, and has reoriented its position towards AI. There has been the Global Forum on AI for Humanity, political AI summits, which kind of pace the discussions about the global governance of artificial intelligence.

But would there be space for new players in the future? That’s something that I’m a bit unsure. One of the reasons why it might be an inflection point, as you asked, is because now I think the pawns are set on the board, right? And it is unlikely that somebody could come in and just disturb everything. I don’t know in Washington how it plays, but in Brussels it seems very much like everybody knows each other already and it’s only about bargaining with each other, not especially listening to outside views.

Jared Brown: So, I think the policy environment is being set. I wouldn’t quite go so far as to say all of the pawns are on the chess board, but I think many of them are. The queen is certainly industry, and industry has stood up and taken notice that governments want to regulate and want to be proactive about their approach to artificial intelligence. And you’ve seen this, because you can open up your daily newspaper pretty much anywhere in the world and see some headline about some CEO of some powerful tech company mentioning AI in the same breath as government, and government action or government regulations.

Industry is certainly aware of the attention that AI is getting, and they are positioning themselves to influence that as much as possible. And so civil society groups such as the ones Nico and I represent have to step up, which is not to say the industry has all bad ideas, some of what they’re proposing is quite good. But it’s not exclusively a domain controlled by industry opinions about the regulatory nature of future technologies.

Lucas Perry: All right. I’d like to pivot here, more into some of the views and motivations the Future of Life Institute and the Future Society take, when looking at AI policy. The question in particular that I’d like to explore is how is current AI policy important for those concerned with AGI risk and longterm considerations about artificial intelligence growing into powerful generality, and then one day surpassing human beings in intelligence? For those interested in the issue of AGI risk or super intelligence risk, is AI policy today important? Why might it be important? What can we do to help shape or inform the outcomes related to this?

Nicolas Moës: I mean, obviously, I’m working full time on this and if I could, I would work double full time on this. So I do think it’s important. But it’s still too early to be talking about this in the policy rooms, at least in Brussels. Even though we have identified a couple of policymakers that would be keen to talk about that. But it’s politically not feasible to put forward these kind of discussions. However, AI policy currently is important because there is a demand for advice, for policy research, for concrete recommendations about how to govern this technological transition that we are experiencing.

So there is this demand where people who are concerned about fundamental rights, and safety, and robustness, civil society groups, but also academics and industry themselves sometime come in with their clear recommendations about how you should concretely regulate, or govern, or otherwise influence the development and deployment of AI technologies, and in that set of people, if you have people who are concerned about safety, you would be able then, to provide advice for providing evergreen policies, as we’ve mentioned earlier and set up, let’s say, a fertile ground for better policies in the future, as well.

The second part of why it’s important right now is also the longterm workforce management. If people who are concerned about the AGI safety are not in the room right now, and if they are in the room but focused only on AGI safety, they might be perceived as irrelevant by current policymakers, and therefore they might have restricted access to opportunities for gaining experience in that field. And therefore over the long term this dynamic reduces the growth rate, let’s say, of the workforce that is concerned about AGI safety, and that could be identified as a relevant advisor in the future. As a general purpose technology, even short term issues regarding AI policy have a long term impact on the whole of society.

Jared Brown: Both Nicholas and I have used this term “path dependency,” which you’ll hear a lot in our community and I think it really helps maybe to build out that metaphor. Various different members of the audience of this podcast are going to have different timelines in their heads when they think about when AGI might occur, and who’s going to develop it, what the characteristics of that system will be, and how likely it is that it will be unaligned, and so on and so forth. I’m not here to engage in that debate, but I would encourage everyone to literally think about whatever timeline you have in your head, or whatever descriptions you have for the characteristics that are most likely to occur when AGI occurs.

You have a vision of that future environment, and clearly you can imagine different environments by which humanity is more likely to be able to manage that challenge than other environments. An obvious example, if the world were engaged in World War Three, 30 years from now, and some company develops AGI, that’s not good. It’s not a good world for AGI to be developed in, if it’s currently engaged in World War Three at the same time. I’m not suggesting we’re doing anything to mitigate World War Three, but there are different environments for when AGI can occur that will make it more or less likely that we will have a beneficial outcome from the development of this technology.

We’re literally on a path towards that future. More government funding for AI safety research is a good thing. That’s a decision that has to get made, that’s made every single day, in governments all across the world. Governments have R&D budgets. How much is that being spent on AI safety versus AI capability development? If you would like to see more, then that decision is being made every single fiscal year of every single government that has an R&D budget. And what you can do to influence it is really up to you and how many resources you’re going to put into it.

Lucas Perry: Many of the ways it seems that AI policy currently is important for AGI existential risk are indirect. Perhaps it’s direct insofar as there’s these foundational evergreen documents, and maybe changing our trajectory directly is kind of a direct intervention.

Jared Brown: How much has nuclear policy changed? When our governance of nuclear weapons changed because the US initially decided to use the weapon. That decision irrevocably changed the future of Nuclear Weapons Policy, and there is no way you can counterfactually unspool all of the various different ways the initial use of the weapon, not once, but twice by the US sent a signal to the world A, the US was willing to use this weapon and the power of that weapon was on full display.

There are going to be junctures in the trajectory of AI policy that are going to be potentially as fundamental as whether or not the US should use a nuclear weapon at Hiroshima. Those decisions are going to be hard to see necessarily right now, if you’re not in the room and you’re not thinking about the way that policy is going to project into the future. That’s where this matters. You can’t unspool and rerun history. We can’t decide for instance, on lethal autonomous weapons policy. There is a world that exists, a future scenario 30 years from now, where international governance has never been established on lethal autonomous weapons. And lethal autonomous weapons is completely the norm for militaries to use indiscriminately or without proper safety at all. And then there’s a world where they’ve been completely banned. Those two conditions will have serious effect on the likelihood that governments are up to the challenge of addressing potential global catastrophic and existential risk arising from unaligned AGI. And so it’s more than just setting a path. It’s central to the capacity building of our future to deal with these challenges.

Nicolas Moës: Regarding other existential risks, I mean Jared is more of an expert on that than I am. In the EU, because this topic is so hot, it’s much more promising, let’s say as an avenue for impact, than other policy dossiers because we don’t have the omnibus type of legislation that you have in the US. The EU remains quite topic for topic. In the end, there is very little power embeded in the EU, mostly it depends on the nation states as well, right?

So AI is as moves at the EU level, which makes you want to walk at the EU level AI policy for sure. But for the other issues, it sometimes remains still at the national level. That’d being said, the EU also has this particularity, let’s say off being able to reshape debates at the national level. So, if there were people to consider what are the best approaches to reduce existential risk in general via EU policy, I’m sure there would be a couple of dossiers right now with policy window opens that could be a conduit for impact.

Jared Brown: If the community of folks that are concerned about the development of AGI are correct and that it may have potentially global catastrophic and existential threat to society, then you’re necessarily obviously admitting that AGI is also going to affect the society extremely broadly. It’s going to be akin to an industrial revolution, as is often said. And that’s going to permeate every which way in society.

And there’s been some great work to scope this out. For instance, in the nuclear sphere, I would recommend to all the audience that they take a look at a recent edited compendium of papers by the Stockholm International Peace Research Institute. They have a fantastic compendium of papers about AI’s effect on strategic stability in nuclear risk. That type of sector specific analysis can be done with synthetic biology and various other things that people are concerned about as evolving into existential or global catastrophic risk.

And then there are current concerns with non anthropomorphic risk. AI is going to be tremendously helpful if used correctly to track and monitor near earth objects. You have to be concerned about asteroid impacts. AI is a great tool to be used to help reduce that risk by monitoring and tracking near Earth objects.

We may yet make tremendous discoveries in geology to deal with supervolcanoes. Just recently there’s been some great coverage of a AI company called Blue Dot for monitoring the potential pandemics arising with the Coronavirus. We see these applications of AI very beneficially reducing other global catastrophic and existential risks, but there are aggravating factors as well, especially for other anthropomorphic concerns related to nuclear risk and synthetic biology.

Nicolas Moës: Some people who are concerned about is AGI sometimes might see AI as overall negative in expectation, but a lot of policy makers see AI as an opportunity more than as a risk, right? So, starting with a negative narrative or a pessimistic narrative is difficult in the current landscape.

In Europe it might be a bit easier because for odd historical reasons it tends to be a bit more cautious about technology and tends to be more proactive about regulations than maybe anywhere else in the world. I’m not saying whether it’s a good thing or a bad thing. I think there’s advantages and disadvantages. It’s important to know though that even in Europe you still have people who are anti-regulation. The European commission set this independent high level expert group on AI with 52 or 54 experts on AI to decide about the ethical principles that will inform the legislation on AI. So this was for the past year and a half, or the past two years even. Among them, the divisions are really important. Some of them wanted to just let it go for self-regulation because even issues of fairness or safety will be detected eventually by society and addressed when they arise. And it’s important to mention that actually in the commission, even though the current white paper seems to be more on the side of preventive regulations or proactive regulations, the commissioner for digital, Thierry Breton is definitely cautious about the approach he takes. But you can see that he is quite positive about the potential of technology.

The important thing here as well is that these players have an influential role to play on policy, right? So, going back to this negative narrative about AGI, it’s also something where we have to talk about how you communicate and how you influence in the end the policy debate, given the current preferences and the opinions of people in society as a whole, not only the opinions of experts. If it was only about experts, it would be maybe different, but this is politics, right? The opinion of everybody matters and it’s important that whatever influence you want to have on AI policy is compatible with the rest of society’s opinion.

Lucas Perry: So, I’m curious to know more about the extent to which the AI policy sphere is mindful of and exploring the shorter term global catastrophic or maybe even existential risks that arise from the interplay of more near term artificial intelligence with other kinds of technologies. Jared mentioned a few in terms of synthetic biology, and global pandemics, and autonomous weapons, and AI being implemented in the military and early warning detection systems. So, I’m curious to know more about the extent to which there are considerations and discussions around the interplay of shorter term AI risks with actual global catastrophic and existential risks.

Jared Brown: So, there’s this general understanding, which I think most people accept, that AI is not magical. It is open to manipulation, it has certain inherent flaws in its current capability and constructs. We need to make sure that that is fully embraced as we consider different applications of AI into systems like nuclear command and control. At a certain point in time, the argument could be sound that AI is a better decision maker than your average set of humans in a command and control structure. There’s no shortage of instances of near misses with nuclear war based on existing sensor arrays, and so on and so forth, and the humans behind those sensor arrays, with nuclear command and control. But we have to be making those evaluations fully informed about the true limitations of AI and that’s where the community is really important. We have to cut through the hype and cut through overselling what AI is capable of, and be brutally honest about the current limitations of AI as it evolves, and whether or not it makes sense from a risk perspective to integrate AI in certain ways.

Nicolas Moës: There has been human mistakes that have led to close calls, but I believe these close calls have been corrected because of another human in the loop. In early warning systems though, you might actually end up with no human in the loop. I mean, again, we cannot really say whether these humans in the loop were statistically important because we don’t have the alternatives obviously to compare it to.

Another thing regarding whether some people think that AI is magic, I, I think, would be a bit more cynical. I still find myself in some workshops or policy conferences where you have some people who apparently haven’t seen ever a line of code in their entire life and still believe that if you tell the developer “make sure your AI is explainable,” that magically the AI would become explainable. This is still quite common in Brussels, I’m afraid. But there is a lot of heterogeneity. I think now we have, even among the 705 MEPs, there is one of them who is a former programmer from France. And that’s the kind of person who, given his expertise, if he was placed on the AI dossier, I guess he would have a lot more influence because of his expertise.

Jared Brown: Yeah. I think in the US there’s this phrase that kicks around that the US is experiencing a techlash, meaning there’s a growing reluctance, cynicism, criticism of major tech industry players. So, this started with the Cambridge Analytica problems that arose in the 2016 election. Some of it’s related to concerns about potential monopolies. I will say that it’s not directly related to AI, but that general level of criticism, more skepticism, is being imbued into the overall policy environment. And so people are more willing to question the latest, next greatest thing that’s coming from the tech industry because we’re currently having this retrospective analysis of what we used to think of a fantastic and development may not be as fantastic as we thought it was. That kind of skepticism is somewhat helpful for our community because it can be leveraged for people to be more willing to take a critical eye in the way that we apply technology going forward, knowing that there may have been some mistakes made in the past.

Lucas Perry: Before we move on to more empirical questions and questions about how AI policy is actually being implemented today, are there any other things here that you guys would like to touch on or say about the importance of engaging with AI policy and its interplay and role in mitigating both AGI risk and existential risk?

Nicolas Moës: Yeah, the so called Brussels effect, which actually describes that whatever decisions in European policy that is made is actually influencing the rest of the world. I mentioned it briefly earlier. I’d be curious to hear what you, Jared, thinks about that. In Washington, do people consider it, the GDPR for example, as a pre made text that they can just copy paste? Because apparently, I know that California has released something quite similar based on GDPR. By the way, GDPR is the General Data Protection Regulations governing protection of privacy in the EU. It’s a regulation, so it has a binding effect on EU member States. That, by the Brussels effect, what I mean is that for example, this big piece of legislation as being, let’s say, integrated by big companies abroad, including US companies to ensure that they can keep access to the European market.

And so the commission is actually quite proud of announcing that for example, some Brazilian legislator or some Japanese legislator or some Indian legislators are coming to the commission to translate the text of GDPR, and to take it back to their discussion in their own jurisdiction. I’m curious to hear what you think of whether the European third way about AI has a greater potential to lead to beneficial AI and beneficial AGI than legislation coming out of the US and China given the economic incentives that they’ve got.

Jared Brown: I think in addition to the Brussels effect, we might have to amend it to say the Brussels and the Sacramento effect. Sacramento being the State Capitol of California because it’s one thing for the EU who have adopted the GDPR, and then California essentially replicated a lot of the GDPR, but not entirely, into what they call the CCPA, the California Consumer Privacy Act. If you combine the market size of the EU with California, you clearly have enough influence over the global economy. California for those who aren’t familiar, would be the seventh or sixth largest economy in the world if it were a standalone nation. So, the combined effect of Brussels and Sacramento developing tech policy or leading tech policy is not to be understated.

What remains to be seen though is how long lasting that precedent will be. And their ability to essentially be the first movers in the regulatory space will remain. With some of the criticism being developed around GDPR and the CCPA, it could be that leads to other governments trying to be more proactive to be the first out the door, the first movers in terms of major regulatory effects, which would minimize the Brussels effect or the Brussels and Sacramento effect.

Lucas Perry: So in terms of race conditions and sticking here on questions of global catastrophic risk and existential risks and why AI policy and governance and strategy considerations are important for risks associated with racing between say the United States and China on AI technology. Could you guys speak a little bit to the importance of appropriate AI policy and strategic positioning on mitigating race conditions and a why race would be bad for AGI risk and existential and global catastrophic risks in general?

Jared Brown: To simplify it, the basic logic here is that if two competing nations states or companies are engaged in a competitive environment to be the first to develop X, Y, Z, and they see tremendous incentive and advantage to being the first to develop such technology, then they’re more likely to cut corners when it comes to safety. And cut corners thinking about how to carefully apply these new developments to various different environments. There has been a lot of discussion about who will come to dominate the world and control AI technology. I’m not sure that either Nicolas or I really think that narrative is entirely accurate. Technology need not be a zero sum environment where the benefits are only accrued by one state or another. Or that the benefits accruing to one state necessarily reduce the benefits to another state. And there has been a growing recognition of this.

Nicolas earlier mentioned the high level expert group in the EU, an equivalent type body in the US, it’s called the National Security Commission on AI. And in their interim report they recognize that there is a strong need and one of their early recommendations is for what they call Track 1.5 or Track 2 diplomacy, which is essentially jargon for engagement with China and Russia on AI safety issues. Because if we deploy these technologies in reckless ways, that doesn’t benefit anyone. And we can still move cooperatively on AI safety and on the responsible use of AI without mitigating or entering into a zero sum environment where the benefits are only going to be accrued by one state or another.

Nicolas Moës: I definitely see the safety technologies as that would benefit everybody. If you’re thinking in two different types of inventions, the one that promotes safety indeed would be useful, but I believe that enhancing raw capabilities, you would actually race for that. Right? So, I totally agree with your decision narrative. I know people on both sides seeing this as a silly thing, you know, with media hype and of course industry benefiting a lot from this narrative.

There is a lot of this though that remains the rational thing to do, right? Whenever you start negotiating standards, you can say, “Well look at our systems. They are more advanced, so they should become the global standards for AI,” right? That actually is worrisome because the trajectory right now, since there is this narrative in place, is that over the medium term, you would expect the technologies maybe to diverge, and so both blocks, or if you want to charitably include the EU into this race, the three blocks would start diverging and therefore we’ll need each other less and less. The economic cost of an open conflict would actually decrease, but this is over the very long term.

That’s kind of the dangers of race dynamics as I see them. Again, it’s very heterogeneous, right? When we say the US against China, when you look at the more granular level of even units of governments are sometimes operating with a very different mindset. So, as for what in AI policy can actually be relevant to this for example, I do think they can, because at least on the Chinese side as far as I know, there is this awareness of the safety issue. Right? And there has been a pretty explicit article. It was like, “the US and China should work together to future proof AI.” So, it gives you the impression that some government officials or former government officials in China are interested in this dialogue about the safety of AI, which is what we would want. We don’t especially have to put the raw capabilities question on the table so long as there is common agreements about safety.

At the global level, there’s a lot of things happening to tackle this coordination problem. For example, the OECD AI Policy Observatory is an interesting setup because that’s an institution with which the US is still interacting. There have been fewer and fewer multilateral fora with which the US administration has been willing to interact constructively, let’s say. But for the OECD one yes, there’s been quite a lot of interactions. China is an observer to the OECD. So, I do believe that there is potential there to have a dialogue between the US and China, in particular about AI governance. And plenty of other fora exist at the global level to enable this Track 1.5 / Track 2 diplomacy that you mentioned Jared. For example, the Global Governance of AI Forum that the Future Society has organized, and Beneficial AGI that Future of Life Institute has organized.

Jared Brown: Yeah, and that’s sort of part and parcel with one of the most prominent examples of, some people call it scientific diplomacy, and that’s kind of a weird term, but the Pugwash conferences that occurred all throughout the Cold War where technical experts were meeting on the side to essentially establish a rapport between Russian and US scientists on issues of nuclear security and biological security as well.

So, there are plenty of examples where even if this race dynamic gets out of control, and even if we find ourselves 20 years from now in an extremely competitive, contentious relationship with near peer adversaries competing over the development of AI technology and other technologies, we shouldn’t, as civil society groups, give up hope and surrender to the inevitability that safety problems are likely to occur. We need to be looking to the past examples of what can be leveraged in order to appeal to essentially the common humanity of these nation states in their common interest in not wanting to see threats arise that would challenge either of their power dynamics.

Nicolas Moës: The context matters a lot, but sometimes it can be easier than one can think, right? So, I think when we organized the US China AI Tech Summit, because it was about business, about the cutting edge and because it was also about just getting together to discuss. And a bit before this US / China race dynamics was full on, there was not so many issues with getting our guests. Knowledge might be a bit more difficult with some officials not able to join events where officials from other countries are because of diplomatic reasons. And that was in June 2018 right? But back then there was the willingness and the possibility, since the US China tension was quite limited.

Jared Brown: Yeah, and I’ll just throw out a quick plug for other FLI podcasts. I recommend listeners check out the work that we did with Matthew Meselson. Max Tegmark had a great podcast on the development of the Biological Weapons Convention, which is a great example of how two competing nation states came to a common understanding about what was essentially a global catastrophic, or is, a global catastrophic and existential risk and develop the biological weapons convention.

Lucas Perry: So, tabling collaboration on safety, which can certainly be mutually beneficial in just focusing on capabilities research and how at least it seems basically just rational to race for that in a game theoretic sense.

That seems basically just rational to race for that in a game theoretic sense. I’m interested in exploring if you guys have any views or points to add here about mitigating the risks there, and how it may simply actually not be rational to race for that?

Nicolas Moës: So, there is the narrative currently that it’s rational to race on some aspect of raw capabilities, right? However, when you go beyond the typical game theoretical model, when you enable people to build bridges, you could actually find certain circumstances under which you have a so-called institutional entrepreneur building up in institutions that is legitimate so that everybody agrees upon that enforces the cooporation agreement.

In economics, the windfall clause is regarding the distribution of it. Here what I’m talking about in the game theoretical space, is how to avoid the negative impact, right? So, the windfall clause would operate in this very limited set of scenarios whereby the AGI leads to an abundance of wealth, and then a windfall clause deals with the distributional aspect and therefore reduce the incentive to a certain extent to produce AGI. However, to abide to the windfall clause, you still have to preserve the incentive to develop the AGI. Right? But you might actually tamp that down.

What I was talking about here, regarding the institutional entrepreneur, who can break this race by simply having a credible commitment from both sides and enforcing that commitment. So like the typical model of the tragedy of the commons, which here could be seen as you over-explored the time to superintelligence level, you can solve the tragedy of the commons, actually. So it’s not that rational anymore. Once you know that there is a solution, it’s not rational to go for the worst case scenario, right? You actually can design a mechanism that forces you to move towards the better outcome. It’s costly though, but it can be done if people are willing to put in the effort, and it’s not costly enough to justify not doing it.

Jared Brown: I would just add that the underlying assumptions about the rationality of racing towards raw capability development, largely depend on the level of risk you assign to unaligned AI or deploying narrow AI in ways that exacerbate global catastrophic and existential risk. Those game theories essentially can be changed and those dynamics can be changed if our community eventually starts to better sensitize players on both sides about the lose/lose situation, which we could find ourselves in through this type of racing. And so it’s not set in stone and the environment can be changed as information asymmetry is decreased between the two competing partners and there’s a greater appreciation for the lose/lose situations that can be developed.

Lucas Perry: Yeah. So I guess I just want to highlight the point then the superficial first analysis, it would seem that the rational game theoretic thing to do is to increase capability as much as possible, so that you have power and security over other actors. But that might not be true under further investigation.

Jared Brown: Right, and I mean, for those people who haven’t had to suffer through game theory classes, there’s a great popular culture example here that a lot of people have seen Stranger Things on Netflix. If you haven’t, maybe skip ahead 20 seconds until I’m done saying this. But there is an example of the US and Russia competing to understand the upside down world, and then releasing untold havoc onto their societies, because of this upside down discovery. For those of you who have watched, it’s actually a fairly realistic example of where this kind of competing technological development leads somewhere that’s a lose/lose for both parties, and if they had better cooperation and better information sharing about the potential risks, because they were each discovering it themselves without communicating those risks, neither would have opened up the portals to the upside down world.

Nicolas Moës: The same dynamics, the same “oh it’s rational to race” dynamic applied to nuclear policy and nuclear arms race has led to, actually, some treaties, far from perfection. Right? But some treaties. So this is the thing where, because the model, the tragedy of the commons, it’s easy to communicate. It’s a nice thing was doom and fatality that is embedded with it. This resonates really well with people, especially in the media, it’s a very simple thing to say. But this simply might not be true. Right? As I mentioned. So there is this institutional entrepreneurship aspect which requires resources, right? So that is very costly to do. But civil society is doing that, and I think the Future of Life Institute has agency to do that. The Future Society is definitely doing that. We are actually agents of breaking away from these game theoretical situations that would be otherwise unlikely.

We fixate a lot on the model, but in reality, we have seen the nuclear policy, the worst case scenario being averted sometimes by mistake. Right? The human in the loop not following the policy or something like that. Right. So it’s interesting as well. It shows how unpredictable all this is. It really shows that for AI, it’s the same. You could have the militaries on both sides, literally from one day to the next, start a discussion about AI safety, and how to ensure that they keep control. There’s a lot of goodwill on both sides and so maybe we could say like, “Oh, the economist” — and I’m an economist by just training so I can be a bit harsh on myself — they’re like, the economist would say, “But this is not rational.” Well, in the end, it is more rational, right? So long as you win, you know, remain in a healthy life and feel like you have done the right thing, this is the rational thing to do. Maybe if Netflix is not your thing, “Inadequate Equilibria” by Eliezer Yudkowsky explores these kinds of conundrums as well. Why do you have sub-optimal situations in life in general? It’s a very, general model, but I found it very interesting to think about these issues, and in the end it boils down to these kinds of situations.

Lucas Perry: Yeah, right. Like for example, the United States and Russia having like 7,000 nuclear warheads each, and being on hair trigger alert with one another, is a kind of in-optimal equilibrium that we’ve nudged ourself into. I mean it maybe just completely unrealistic, but a more optimum place to be would be no nuclear weapons, but have used all of that technology and information for nuclear power. Well, we would all just be better off.

Nicolas Moës: Yeah. What you describe seems to be a better situation. However, the rational thing to do at some point would have been before the Soviet Union developed, incapacitate Soviet Union to develop. Now, the mutually assured destruction policy is holding up a lot of that. But I do believe that the diplomacy, the discussions, the communication, even merely the fact of communicating like, “Look, if you do that and we will do that,” is a form of progress towards: basically you should not use it.

Jared Brown: Game theory is nice to boil things down into a nice little boxes, clearly. But the dynamics of the nuclear situation with the USSR and the US add countless number of boxes that you get end up in and yes, each of us having way too large nuclear arsenals is a sub-optimal outcome, but it’s not the worst possible outcome, that would have been total nuclear annihilation. So it’s important not just to look at it criticisms of the current situation, but also see the benefits of this current situation and why this box is better than some other boxes that we ended up in. And that way, we can leverage the past that we have taken to get to where we’re at, find the paths that were actually positive, and reapply those lessons learned to the trajectory of emerging technology once again. We can’t throw out everything that has happened on nuclear policy and assume that there’s nothing to be gained from it, just because the situation that we’ve ended up in is suboptimal.

Nicolas Moës: Something that I have experienced while interacting with policymakers and diplomats. You actually have an agency over what is going on. This is important also to note, is that it’s not like a small thing, and the world is passing by. No. Even in policy, which seems to be maybe a bit more arcane, in policy, you can pull the right levers to make somebody feel less like they have to obey this race narrative.

Jared Brown: Just recently in the last National Defense Authorization Act, there was a provision talking about the importance of military to military dialogues being established, potentially even with adversarial states like North Korea and Iran, for that exact reason. That better communication between militaries can lead to a reduction of miscalculation, and therefore adverse escalation of conflicts. We saw this just recently between the US and Iran. There was not direct communication perhaps between the US and Iran, but there was indirect communication, some of that over Twitter, about the intentions and the actions that different states might take. Iran and the US, in reaction to other events, and that may have helped deescalate the situation to where we find now. It’s far from perfect, but this is the type of thing that civil society can help encourage as we are dealing with new types of technology that can be as dangerous as nuclear weapons.

Lucas Perry: I just want to touch on what is actually going on now and actually being considered before we wrap things up. You talked about this a little bit before, Jared, you mentioned that currently in terms of AI policy, we are moving from principles and recommendations to the implementation of these into hard law. So building off of this, I’m just trying to get a better sense of where AI policy is, currently. What are the kinds of things that have been implemented, and what hasn’t, and what needs to be done?

Jared Brown: So there are some key decisions that have to be made in the near term on AI policy that I see replicating in many different government environments. One of them is about liability. I think it’s very important for people to understand the influence that establishing liability has for safety considerations. By liability, I mean who is legally responsible if something goes wrong? The basic idea is if an autonomous vehicle crashes into a school bus, who’s going to be held responsible and under what conditions? Or if an algorithm is biased and systematically violates the civil rights of one minority group, who is legally responsible for that? Is it the creator of the algorithm, the developer of the algorithm? Is it the deployer of that algorithm? Is there no liability for anyone at all in that system? And governments writ large are struggling with trying to assign liability, and that’s a key area of governance and AI policy that’s occurring now.

For the most part, it would be wise for governments to not provide blanket liability to AI, simply as a matter of trying to encourage and foster the adoption of those technologies; such that we encourage people to essentially use those technologies in unquestioning ways and sincerely surrender the decision making from the human to that AI algorithm. There are other key issue areas. There is the question of educating the populace. The example here I give is, you hear the term financial literacy all the time about how educated is your populace about how to deal with money matters.

There’s a lot about technical literacy, technology literacy being developed. The Finnish government has a whole course on AI that they’re making available to the entire EU. How we educate our population and prepare our population from a workforce training perspective matters a lot. If that training incorporates considerations for common AI safety problems, if we’re training people about how adversarial examples can affect machine learning and so on and so forth, we’re doing a better job of sensitizing the population to potential longterm risks. That’s another example of where AI policy is being developed. And I’ll throw out one more, which is a common example that people will understand. You have a driver’s license from your state. The state has traditionally been responsible for deciding the human qualities that are necessary, in order for you to operate a vehicle. And the same goes for state licensing boards have been responsible for certifying and allowing people to practice the law or practice medicine.

Doctors and lawyers, there are national organizations, but licensing is typically done at the state. Now if we talk about AI starting to essentially replace human functions, governments have to look again at this division about who regulates what and when. There’s sort of an opportunity in all democracies to reevaluate the distribution of responsibility between units of government, about who has the responsibility to regulate and monitor and govern AI, when it is doing something that a human being used to do. And there are different pros and cons for different models. But suffice it to say that that’s a common theme in AI policy right now, is how to deal with who has the responsibility to govern AI, if it’s essentially replacing what used to be formally, exclusively a human function.

Nicolas Moës: Yeah, so in terms of where we stand, currently, actually let’s bring some context maybe to this question as well, right? The way it has evolved over the past few years is that you had really ethical principles in 2017 and 2018. Let’s look at the global level first. Like at the global level, you had for example, the Montréal Declaration, which was intended to be global, but for mostly fundamental rights-oriented countries, so that that excludes some of the key players. We have already talked about dozens and dozens of principles for AI in values context or in general, right. That was 2018, and then once we have seen is more the first multi-lateral guidelines so we have the OECD principles, GPAI which is this global panel on AI, was also a big thing between Canada and France, which was initially intended to become kind of the international body for AI governance, but that deflated a bit over time, and so you had also the establishment of all this fora for discussion, that I have already mentioned. Political AI summits and the Global Forum on AI for Humanity, which is, again, a Franco-Canadian initiative like the AI for Good. The Global Governance of AI Forum in the Middle East. There was this ethically aligned design initiative at the IEEE, which is a global standards center, which has garnered a lot of attention among policymakers and other stakeholders. But the move towards harder law is coming, and since it’s towards harder law, at the global level there is not much that can happen. Nation states remain sovereign in the eye of international law.

So unless you write up an international treaty, it would be at the government level that you have to move towards hard law. So at the global level, the next step that we can see is these audits and certification principles. It’s not hard law, but you use labels to independently certify whether an algorithm is good. Some of them are tailored for specific countries. So I think Denmark has its own certification mechanism for AI algorithms. The US is seeing the appearance of values initiatives, notably by the big consulting companies, which are all of the auditors. So this is something that is interesting to see how we shift from soft law, towards this industry-wide regulation for these algorithms. At the EU level, where you have some hard legislative power, you had also a high level group on liability. Which is very important, because they basically argued that we’re going to have to update product liability rules in certain ways for AI and for internet of things products.

This is interesting to look at as well, because when you look at product liability rules, this is hard law, right? So what they have recommended is directly translatable into this legislation. And so you move on at this stage since the end of 2019, you have this hard law coming up and this commission white paper which really kickstarts the debates about what will the regulation for AI be? And whether it will be a regulation. So it could be something else like a directive. The high level expert group has come up with a self assessment list for companies to see whether they are obeying the ethical principles decided upon in Europe. So these are kind of soft self regulation things, which might eventually affect court rulings or something like that. But they do not represent the law, and now the big players are moving in, either at the global level with these more and more powerful labeling initiatives, or certification initiatives, and at the EU level with this hard law.

And the reason why the EU level has moved on towards hard law so quickly, is because during the very short campaign of the commission president, AI was a political issue. The techlash was strong, and of course a lot of industry was complaining that there was nothing happening in AI in the EU. So they wanted strong action and that kind of stuff. The circumstances that led the EU to be in pole position for developing hard law. Elsewhere in the world, you actually have more fragmented initiatives at this stage, except the OECD AI policy observatory, which might be influential in itself, right? It’s important to note the AI principles that the OECD has published. Even though they are not binding, they would actually influence the whole debate. Right? Because at the international level, for example, when the OECD had privacy principles, this became the reference point for many legislators. So some countries who don’t want to spend years even debating how to legislate AI might just be like, “okay, here is the OECD principles, how do we implement that in our current body of law?” And that’s it.

Jared Brown: And I’ll just add one more quick dynamic that’s coming up with AI policy, which is essentially the tolerance of that government for the risk associated with emerging technology. A classic example here is, the US actually has a much higher level of flood risk tolerance than other countries. So we engineer largely, throughout the US, our dams and our flood walls and our other flood protection systems to a 1-in-100 year standard. Meaning the flood protection system is supposed to protect you from a severe storm that would have a 1% chance of occurring in a given year. Other countries have vastly different decisions there. Different countries make different policy decisions about the tolerance that they’re going to have for certain things to happen. And so as we think about emerging technology risk, it’s important to think about the way that your government is shaping policies and the underlying tolerance that they have for something going wrong.

It could be as simple as how likely it is that you will die because of an autonomous vehicle crash. And the EU, traditionally, has had what they call a precautionary principal approach, which is in the face of uncertain risks, they’re more likely to regulate and restrict development until those risks are better understood, than the US, which typically has adopted the precautionary principle less often.

Nicolas Moës: There is a lot of uncertainty. A lot of uncertainty about policy, but also a lot of uncertainty about the impact that all these technologies are having. The dam standard, you can quantify quite easily the force of nature, but here we are dealing with social forces that are a bit different. I still remember quite a lot of people being very negative about Facebook’s chances of success, because people would not be willing to put pictures of themself online. I guess 10 years later, these people have been proven wrong. The same thing could happen with AI, right? So people are currently, at least in the EU, afraid of some aspects of AI. So let’s say an autonomous vehicle. Surrendering decision-making about our life and death to an autonomous vehicle, that’s something that’s maybe as technology improves, people would be more and more willing to do that. So yeah, it’s very difficult to predict, and even more to quantify I think.

Lucas Perry: All right. So thank you both so much. Do either of you guys have any concluding thoughts about AI policy or anything else you’d just like to wrap up on?

Jared Brown: I just hope the audience really appreciates the importance of engaging in the policy discussion. Trying to map out a beneficial forward for AI policy, because if you’re concerned like we are about the long term trajectory of this emerging technology and other emerging technologies, it’s never too early to start engaging in the policy discussion on how to map a beneficial path forward.

Nicolas Moës: Yeah, and one last thought, we were talking with Jared a couple of days ago about the number of people doing that. So thank you by the way, Jared for inviting me, and Lucas, for inviting me on the podcast. But that led us to wonder how many people are doing what we are doing, with the motivation that we have regarding these longer term concerns. That makes me think, yeah, there’s very few resources like labor resources, financial resources, dedicated to this issue. And I’d be really interested if there is, in the audience, anybody interested in that issue, definitely, they should get in touch. There are too few people right now with similar motivations, and caring about the same thing in AI policy to actually miss the opportunity of meeting each other and coordinating better.

Jared Brown: Agreed.

Lucas Perry: All right. Wonderful. So yeah, thank you guys both so much for coming on.

End of recorded material

FLI Podcast: Identity, Information & the Nature of Reality with Anthony Aguirre

Our perceptions of reality are based on the physics of interactions ranging from millimeters to miles in scale. But when it comes to the very small and the very massive, our intuitions often fail us. Given the extent to which modern physics challenges our understanding of the world around us, how wrong could we be about the fundamental nature of reality? And given our failure to anticipate the counterintuitive nature of the universe, how accurate are our intuitions about metaphysical and personal identity? Just how seriously should we take our everyday experiences of the world? Anthony Aguirre, cosmologist and FLI co-founder, returns for a second episode to offer his perspective on these complex questions. This conversation explores the view that reality fundamentally consists of information and examines its implications for our understandings of existence and identity.

Topics discussed in this episode include:

  • Views on the nature of reality
  • Quantum mechanics and the implications of quantum uncertainty
  • Identity, information and description
  • Continuum of objectivity/subjectivity

Timestamps: 

3:35 – General history of views on fundamental reality

9:45 – Quantum uncertainty and observation as interaction

24:43 – The universe as constituted of information

29:26 – What is information and what does the view of reality as information have to say about objects and identity

37:14 – Identity as on a continuum of objectivity and subjectivity

46:09 – What makes something more or less objective?

58:25 – Emergence in physical reality and identity

1:15:35 – Questions about the philosophy of identity in the 21st century

1:27:13 – Differing views on identity changing human desires

1:33:28 – How the reality as information perspective informs questions of identity

1:39:25 – Concluding thoughts

 

This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.

All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. Recently we had a conversation between Max Tegmark and Yuval Noah Harari where in consideration of 21st century technological issues Yuval recommended “Get to know yourself better. It’s maybe the most important thing in life. We haven’t really progressed much in the last thousands of years and the reason is that yes, we keep getting this advice but we don’t really want to do it…. I mean, especially as technology will give us all, at least some of us, more and more power, the temptations of naive utopias are going to be more and more irresistible and I think the really most powerful check on these naive utopias is really getting to know yourself better.

Drawing inspiration from this, our following podcast was with Andres Gomez Emillson and David Pearce on different views of identity, like open, closed, and empty individualism, and their importance in the world. Our conversation today with Anthony Aguirre follows up on and further explores the importance of questions of self and identity in the 21st century.

This episode focuses on exploring this question from a physics perspective where we discuss the view of reality as fundamentally consisting of information. This helps us to ground what actually exists, how we come to know that, and how this challenges our commonly held intuitions about there existing a concrete reality out there populated by conventionally accepted objects and things, like cups and people, that we often take for granted without challenging or looking into much. This conversation subverted many of my assumptions about science, physics, and the nature of reality, and if that sounds interesting to you, I think you’ll find it valuable as well. 

For those of you not familiar with Anthony Athony, he is a physicist that studies the formation, nature, and evolution of the universe, focusing primarily on the model of eternal inflation—the idea that inflation goes on forever in some regions of universe—and what it may mean for the ultimate beginning of the universe and time. He is the co-founder and associate scientific director of the Foundational Questions Institute and is also a co-founder of the Future of Life Institute. He also co-founded Metaculus, an effort to optimally aggregate predictions about scientific discoveries, technological breakthroughs, and other interesting issues.

The Future of Life Institute is a non-profit and this podcast is funded and supported by listeners like you. So if you find what we do on this podcast to be important and beneficial, please consider supporting the podcast by donating at futureoflife.org/donate. These contributions make it possible for us to bring you conversations like these and to develop the podcast further. You can also follow us on your preferred listening platform by searching for us directly or following the links on the page for this podcast found in the description.

And with that, let’s get into our conversation with Anthony Aguirre.

So the last time we had you on, we had a conversation on information. Could you take us through the history of how people have viewed fundamental reality and fundamental ontology over time from a kind of idealism to then materialism to then this new shift that’s informed by quantum mechanics about seeing things as being constituted of information.

Anthony Aguirre: So, without being a historian of science, I can only give you the general impression that I have. And of course through history, many different people have viewed things very different ways. So, I would say in the history of humanity, there have obviously been many, many ways to think about the ultimate nature of reality, if you will, starting with a sense that the fundamental nature of external reality is one that’s based on different substances and tendencies and some level of regularity in those things, but without a sense that there are firm or certainly not mathematical regularities and things. And that there are causes of events, but without a sense that those causes can be described in some mathematical way.

So that changed obviously in terms of Western science with the advent of mechanics by Galileo and Newton and others showing that there are not just regularities in the sense that the same result will happen from the same causes over and over again, that was appreciated for a long time, but that those could be accessed not just experimentally but modeled mathematically and that there could be a relatively small set of mathematical laws that could then be used to explain a very wide range of different physical phenomena. I think that sense was not there before, it was clear that things caused other things and events caused other events, but I suspect the thinking was that it was more in a one off way, like, “That’s a complicated thing. It’s caused by a whole bunch of other complicated things. In principle, those things are connected.” But there wasn’t a sense that you could get in there and understand what that connection was analytically or intellectually and certainly not in a way that had some dramatic economy in the sense that we now appreciate from Galileo and Newton and subsequent physics.

Once we had that change to mathematical laws, then there was a question of, what are those mathematical laws describing? And the answer there was essentially that those mathematical laws are describing particles and forces between particles. And at some level, a couple of other auxiliary things like space and time are sort of there in the backdrop, but essentially the nature of reality is a bunch of little bits of stuff that are moving around under mathematically specified forces.

That is a sort of complete-ish description. I mean certainly Newton would have and have not said that that’s a complete description in the sense that, in Newton’s view, there were particles and those particles made up things and the forces told them exactly what to do, but at the same time there were lots of other things in Newton’s conception of reality like God and presumably other entities. So it’s not exactly clear how materialist Newton or Galileo for example were, but as time went on that became a more entrenched idea among hardcore theoretical physicists at least, or physicists, that there was ultimately this truest, most fundamental, most base description of reality that was lots of particles moving around under mathematical forces.

Now, that I think is a conception that is very much still with us in many senses but has taken on a much deeper level of subtlety given the advent of modern physics including particularly quantum mechanics and also I think a sort of modern recognition or sort of higher level maybe of sophistication and thinking about the relation between different descriptions of natural phenomena. So, let’s talk about quantum mechanics first. Quantum mechanics does say that there are particles in a sense, like you can say that there are particles but particles aren’t really the thing. You can ask questions of reality that entail that reality is made of particles and you will get answers that look like answers about particles. But you can also ask questions about the same physical system about how it is as a wave and you will get answers about how it is as a wave.

And in general in quantum mechanics, there are all sorts of questions that you can ask and you will get answers about the physical system in the terms that you asked those questions about. So as long as it is a sort of well-defined physical experiment that you can do and that you can translate into a kind of mathematical form, what does it mean to do that experiment? Quantum mechanics gives you a way to compute predictions for how that experiment will turn out without really taking a particular view on what that physical system is, is it a particle? Is it a wave? Or is it something else? And I think this is important to note, it’s not just that quantum mechanics says that things are particles and waves at the same time, it’s that they’re all sorts of things at the same time.

So you can ask how much of my phone is an elephant in quantum mechanics. A phone is totally not the same thing as an elephant, but a phone has a wave function, so if I knew the wave function of the phone and I knew a procedure for asking, “Is something an elephant?”, then I could apply that procedure to the phone and the answer would not be, “No, the phone is definitely not an elephant.” The answer would be, “The phone is a tiny, tiny, tiny, tiny, tiny bit an elephant.” So this is very exaggerated because we’re talking phones and elephants, all these numbers are so tiny. But the point is that I can interrogate reality in quantum mechanics in many different ways. I can formulate whatever questions I want and it will give me answers in terms of those questions.

And generally if my questions totally mismatched with what the system is, I’ll get, “No, it’s not really that.” But the no is always a, “No, the probability is incredibly tiny that it’s that.” But in quantum mechanics, there’s always some chance that if you look at your phone, you’ll notice that it’s an elephant. It’s just that that number is so tiny that it never matters, but when you’re talking about individual particles, you might find that that probability is significant, that the particle is somewhat different than you thought it was and that’s part of the quantum uncertainty and weirdness.

Lucas Perry: Can you unpack a little bit that quantum uncertainty and weirdness that explains, when you ask questions to quantum mechanics, you don’t ever get definite answers? Is that right?

Anthony Aguirre: Almost never. So there are occasions where you get definite answers. If you ask a question of a quantum system and it gives you an answer and then you ask that question immediately again, you’ll get the same answer for sure.

Lucas Perry: What does immediately mean?

Anthony Aguirre: Really immediately. So formally, like immediately, immediately. If time goes by between the two measurements then the system can evolve a little bit and then you won’t definitely get the same answer. That is if you have a quantum system, there is a particular set of questions that you can ask it that you will get definite answers to and the quantum state essentially is that set of questions. When you say an electron is here and it has this spin that is, it’s rotating around this direction, what you really mean is that there are a particular set of questions like, “Where are you? And what is your spin?” That if you asked them of this electron, you would get a definite answer.

Now if you take that same electron that I was going to ask those questions to and I would get a definite answer because that’s the state the electron is in, but you come along and ask a different question than one of the ones that is in that list, you will get an answer but it won’t be a definite answer. So that’s kind of the fundamental hallmark of quantum mechanics is that the list of questions you can ask to which you will get a definite answer is a finite one. And for a little particle it’s a very short list, like an electron is a very short list.

Lucas Perry: Is this because the act of observation includes interaction with the particle in such a way that it is changed by the interaction?

Anthony Aguirre: I think that’s a useful way to look at it in a sense, but it’s slightly misleading in the sense that as I said, if you ask exactly the right question, then you will get a definite answer. So you haven’t interfered with the system at all if you ask exactly the right question.

Lucas Perry: That means performing the kind of experiment that doesn’t change what the particle will be doing or its nature? Is that what that means?

Anthony Aguirre: Yes. It’s sort of like you’ve got a very, very particularly shaped net and you can cast it on something and if the thing happens to have exactly the right shape, your net just falls right over it and it doesn’t affect the thing at all and you say, “Oh, it has that property.” But if it has any other shape, then your net kind of messes it up, it gets perturbed and you catch something in your net. The net is your experiment, but you mess up the system while you’re doing it, but it’s not that you necessarily mess up the system, it’s that you’re asking it a question that it isn’t ready to answer definitively, but rather some other question.

So this is always true, but it’s kind of the crucial thing of reality. But the crucial thing about quantum mechanics is that that list is finite. We’re used to asking any question that… I’ve got a mug, I can ask, “Is it brown? Is it here? Is it there? How heavy?” Whatever question I think of, I feel like I can answer. I can ask the question and there will be an answer to it because whatever question I ask, if it’s a well-defined question before I ask it, the mug either has that property or it doesn’t. But quantum mechanics tells us that is true. But there’s only a finite number of answers there are built in to the object. And I can ask other questions, but I just can’t expect the answer to already be there in the sense that I’ll get a definite answer to it.

So this is a very subtle way that there’s this interactive process between the observer and the thing that’s observed. If we’re talking about something that is maximally specified that it has a particular quantum state, there is some way that it is in a sense, but you can’t ever find that out because as soon as you start asking questions of it, you change the thing unless you happen to ask exactly the right questions. But in order to ask exactly the right questions, you would already have to know what state it’s in. And the only way you can do that is by actually creating the system effectively.

So if I create an electron in a particular state in my lab, then I know what state it’s in and I know exactly what questions to ask it in order to get answers that are certain. But if I just come across an electron in the wild, I don’t know exactly what questions to ask. And so I just have to ask whatever questions I will and chances are it won’t be the right questions for that electron. And I won’t ever know whether they were or not because I’ll just get some set of answers and I won’t know whether those were the properties that the electron actually had already or if they were the ones that it fell into by chance upon my asking those questions.

Lucas Perry: How much of this is actual properties and features about the particles in and of themselves and how much is it about the fact that we’re like observers or agents that have to interact with the particles in some ways in order to get information about them? Such that we can’t ask too many questions without perturbing the thing in and of itself and then not being able to get definitive answers to other questions?

Anthony Aguirre: Well, I’m not sure how to answer that because I think it’s just that is the structure of quantum mechanics, which is the structure of reality. So it’s explicitly posed in terms of quantum states of things and a structure of observations that can be made or observables that can be measured so you can see whether the system has a particular value of that observable or not. If you take out the observation part or the measurement part, you just have a quantum state which evolves according to some equation and that’s fine, but that’s not something you can actually compare in any sense to reality or to observation or use in any way. You need something that will connect that quantum state and evolution equation to something that you can actually do or observe.

And I think that is something that’s a little bit different. You can say in Newtonian mechanics or classical physics, there’s something arguably reasonable about saying, “Here is the system, it’s these particles and they’re moving around in this way.” And that’s saying something. I think you can argue about whether that’s actually true, that that’s saying something. But you can talk about the particles themselves in a fairly meaningful way without talking about the observer or the person who’s measuring it or something like that. Whereas in quantum mechanics, it’s really fairly useless to talk about the wave function of something without talking about the way that you measure things or the basis that you operate it on and so on.

That was a long sort of digression in a sense, but I think that’s crucial because that I think is a major underlying change in the way that we think about reality, not as something that is purely out there, but understanding that even to the extent that there’s something out there, any sense of our experiencing that is unavoidably an interactive one and in a way that you cannot ignore the interaction, that you might have this idea that there’s an external objective reality that although it’s inconvenient to know, although on an everyday basis you might mess with it a little bit when you interact with it, in principle it’s out there and if you could just be careful enough, you could avoid that input from the observer. Quantum mechanics says, “No. That’s a fundamental part of it. There’s no avoiding that. It’s a basic part of the theory that reality is made up of this combination of the measurer and the state.”

I also think that once you admit, because you have to in this case that there is more to a useful or complete description of reality than just the kind of objective state of the physical system, then you notice that there are a bunch of other things that actually are there as well that you have to admit are part of reality. So, if you ask some quantum mechanical question, like if I ask, “Is my mug brown? And is it spinning? Where is it?” Those kinds of questions, you have to ask, what is the reality status of those questions or the categories that I’m defining and asking those questions? Like brownness, what is that? That’s obviously something that I invented, not me personally, but I invented in this particular case. Brownness is something that biological creatures and humans and so on invented. The sensation of brown is something that biological creatures maybe devised, the calling something brown and the word brown are obviously human and English creations.

So those are things that are created through this process and are not there certainly in the quantum state. And yet if we say that the quantum state on its own is not a meaningful or useful description of reality, but we have to augment it with the sorts of questions that we ask and the sort of procedure of asking and getting questions answered, then those extra things that we have to put into the description entail a whole lot of different things. So there’s not just the wave function. So in that simple example, there’s a set of questions and possible answers to those questions that the mug could give me. And there are different ways of talking about how mathematically to define those questions.

One way is to call them course grained states or macro states, that is, there are lots of ways that reality can be, but I want to extract out certain features of reality. So if I take the set of possible ways that a mug can be, there’s some tiny subset of all those different ways that the atoms in my mug could be that I would actually call a mug and a smaller subset of those that I would call a brown mug and a smaller subset of those that I would call a brown mug that’s sitting still and so on. So they’re kind of subsets of the set of all possible ways that a physical system with that many atoms and that mass and so on could be and when I’m asking questions about the mug, like are you brown? I’m asking, “Is the system in that particular subset of possibilities that I call a brown mug sitting on a table?”

I would say that at some level, almost all of what we do in interacting with reality is like that process. There’s this huge set of possible realities that we could inhabit. What we do are to divvy up that reality into many, many possibilities corresponding to questions that we might ask and answers to those questions we might ask and then we go and ask those questions of reality and we get sort of yes or no answers to them. And quantum mechanics is sort of the enactment of that process with full exactness that applies to even the smallest systems, but we can think of that process just on a day to day level, like we can think of, what are all the possible ways that the system could be? And then ask certain questions. Is it this? Is it that?

So this is a conception of reality that’s kind of like a big game of 20 questions. Every time we look out at reality, we’re just asking different questions of it. Normally we’re narrowing down the possibility space of how reality is by asking those questions, getting answers to it. To me a really interesting question is like, what is the ontological reality status of all those big sets of questions that we’re asking? Your tendency as a theoretical physicist is to say, “Oh, the wave function is the thing that’s real and that’s what actually exists, and all these extra things are just extra things that we made up and our globbed onto the wave function.” But I think that’s kind of a very impoverished view of reality, not just impoverished, but completely useless and empty of any utility or meaning because quantum mechanics by its nature requires both parts. The questions and the state. If you cut out all the questions, you’re just left with this very empty thing that has no applicability or meaning.

Lucas Perry: But doesn’t that tell us how reality is in and of itself?

Anthony Aguirre: I don’t think it tells you anything, honestly. It’s almost impossible to even say what the wave function is except in some terms. Like if I just write down, “Okay, the wave function of the universe is psi.” What did that tell me? Nothing. There’s nothing there. There’s no way that I could even communicate to you what the wave function is without reference to some set of questions because remember the wave function is a definite set of answers to a particular set of questions. So, I have to communicate to you the set of questions to which the wave function is the definite answer and those questions are things that have to do with macroscopic reality.

There’s no way that I can tell you what the wave function is if I were to try to communicate it to you without reference to those questions. Like if I say, “Okay, I’ve got a thingie here and it’s got a wave function,” and you asked me, “Okay, what is the wave function?” I don’t know how to tell you. I could tell you it’s mass, but now what I’m really saying is, here’s a set of energy measuring things that I might do and the amplitude for getting those different possible outcomes in that energy measuring thing is 0.1 for that one and 0.2 for that one and so on. But I have to tell you what those energy measuring things are in order to be able to tell you what the wave function is.

Lucas Perry: If you go back to the starting conditions of the universe, that initial state is a definite thing, right? Prior to any observers and defined coherently and exactly in and of itself. Right?

Anthony Aguirre: I don’t know if I would say that.

Lucas Perry: I understand that for us to know anything we have to ask questions. I’m asking you about something that I know that has no utility because we’re always going to be the observer standing in reference, right? But just to think about reality in and of itself.

Anthony Aguirre: Right. But you’re assuming that there is such a thing and that’s not entirely clear to me. So I recognize that there’s a desire to feel like there is a sort of objective reality that is out there and that there’s meaning to saying what that reality is, but that is not entirely clear to me that that’s a safe assumption to make. So it is true that we can go back in time and attribute all kinds of pretty objective properties of the universe and it certainly is true that it can’t be that we needed people and observers and things back at the beginning in order to be able to talk about those things. But it’s a very thorny question to me, that it’s meaningful to say that there was a quantum state that the universe had at the very beginning when I don’t know what operationally that means. I wouldn’t know how to describe that quantum state or make it meaningful other than in terms of measurable things which require adding a whole bunch of ingredients to the description of what the universe is.

To say that the universe started in this quantum state, to make that meaningful requires these extra ingredients. But we also recognize that those extra ingredients are themselves parts of the universe. So, either you have to take this view that there is a quantum state and somehow we’re going to get out of that in this kind of circular self-consistent way, a bunch of measuring apparatuses that are hidden in that quantum state and make certain measurements and then define the quantum state in this bootstrapping way. Or you have to say that the quantum state, and I’m not sure how different these things are, that the quantum state is part of reality, but in order to actually specify what reality is, there’s a whole bunch of extra ingredients that we have to define and we have to put in there.

And that’s kind of the view that I take nowadays, that there is reality and then there’s our description of reality. And as we describe reality, one of the things that we need to describe reality are quantum states and one of the things that we need to describe reality are coarse grainings or systems of measurement or bases and so on. There are all these extra things that we need to put in. And the quantum states are one of them and a very important one. And evolution equations are one of them in a very important one. But to identify reality with the state plus the fundamental laws that evolve that state, I just don’t think is quite the right way to think about it.

Lucas Perry: Okay, so this is all very illuminating for this perspective here that we’re trying to explore, which is the universe being simply constituted of information.

Anthony Aguirre: Yeah, so let’s talk about that. Once you let go, I think of the idea that there is matter that is made of particles and then there are arrangements of that matter and there are things that that matter does, but the matter is this intrinsically existing stuff. Once you start to think of there being the state, which is a set of answers to questions, that set of answers to questions is a very informative thing. It’s a kind of maximally informative thing, but it isn’t a different kind of thing to other sets of answers to questions.

That is to say that I’ve got information about something, kind of is saying that I’ve asked a bunch of questions and I’ve gotten answers about it so I know about it. If I keep asking enough incredibly detailed questions that maybe I’ve maximally specified the state of the cup and I have as much information as I can have about the cup. But in that process, as I ask more and more information, as I more and more specify what the cup is like, there’s no particular place in which the cup changes its nature. So I start out asking questions and I get more and more and more information until I get the most information that I can. And then I call that, that’s the most information I can get and now I’ve specified the quantum state of the cup.

But in that sense then a quantum state is like the sort of end state of a process of interrogating a physical system to get more and more information about it. So to me that suggests this interpretation that the nature of something like the quantum state of something is an informational thing. It’s identified with a maximal set of information that you can have about something. But that’s kind of one end of the spectrum, the maximal knowing about that thing end of the spectrum. But if we don’t go that far, then we just have less information about the thing. And once you start to think that way, well what then isn’t information? If the nature of things is to be a state and a set of questions and the state gives me answers to those questions, that’s a set of information. But as I said, that sort of applies to all physical systems that’s kind of what they are according to quantum mechanics.

So there used to be a sense, I think that there was a thing, it was a bunch of particles and then when I ask questions I could learn about that thing. The lesson to me of quantum mechanics is that there’s no space between the answers to questions that I get when I ask questions of a thing and the thing itself. The thing is in a sense, the set of answers to the questions that I have or could ask of it. It comes much less of a kind of physical tangible thing made of stuff and much more of a thing made out of information and it’s information that I can get by interacting with that thing, but there isn’t a thing there that the information is about. That notion seems to be sort of absent. There’s no need to think that there is a thing that the information is about. All we know is the information.

Lucas Perry: Is that true of the particles arranged cup wise or the cup thing that is there? Is it true of that thing in and of itself or is that basically just the truth of being an epistemic agent who’s trying to interrogate the cup thing?

Anthony Aguirre: Suppose the fundamental nature of reality was a bunch of particles, then what I said is still true. I can imagine if things like observers exist, then they can ask questions and they can get answers and those will be answers about the physical system that kind of has this intrinsic nature of bits of stuff. And it would still, I think, be true that most of reality is made of everything but the little bits of stuff, the little bits of stuff are only there at the very end. If you ask the very most precise questions you get more and more a sense of, “Oh they’re little bits of stuff.” But I think what’s interesting is that what quantum mechanics tells us is we keep getting more and more fine grained information about something, but then at the very end rather than little bits of stuff, it sort of disappears before our eyes. There aren’t any little bits of stuff there, there’s just the answers to the most refined sets of questions that we can ask.

So that’s where I think there’s sort of a difference is that there’s this sense in classical physics that underlying all these questions and answers and information is this other thing of a different nature, that is matter and it has a different fundamental quality to it than the information. And in quantum mechanics it seems to me like there’s no need to think that there is such a thing, that there is no need to think that there is some other different stuff that is non-informational that’s out there that the information is about because the informational description is complete.

Lucas Perry: So I guess there’s two questions here that come out of this. It’d be good if you could define and unpack what information exactly is and then if you could explore and get further into the idea of how this challenges our notion of what a macroscopic thing is or what a microscopic or what a quantum thing is, something that we believe to have identity. And then also how this impacts identity like cup identity or particle identity, what it means for people and galaxies and the universe to be constituted of information. So those two things.

Anthony Aguirre: Okay. So there are lots of ways to talk about information. There are also qualitative and quantitative ways to talk about it. So let me talk about the quantitative way first. So you can say that if I have a whole possibility space, like many different possibilities for the way something can be and then I restrict those possibilities to a smaller set of possibilities in some way. Either I say it’s definitely in one of these, or maybe there’s a higher probability that it’s one of these than one of those. I, in some way restrict rather than every possibility is the same, I say that some possibilities are more than others, they’re more likely or it’s restricted to some subset. Then I have information about that system and that information is precisely the gap between everything being just equally likely and every possibility being equally good and knowing that some of them are more likely or valid or something than others.

So, information is that gap that says it’s more this than some of those other things. So, that’s a super general way of talking about it but that can be made very mathematically precise. So if I say there are four bits of information stored in my computer, exactly what I mean is that there are a bunch of registers and if I don’t know whether they’re ones or zeros, I say I have no information. If I know that these four are 1101, then I’ve restricted my full set of possibilities to this subset in which those are 1101 and I have those four bits of information. So I can be very mathematically precise about this. And I can even say if the first bit, well I don’t know whether it’s 01 but it’s 75% chance that it’s zero and 25% chance that it’s one, that’s still information. It’s less than one bit of information.

People think of bits as being very discrete things, but you can have fractions of bits of information. There’s nothing wrong with that. The very general definition as restrictions away from every possibility being equally likely to some being more likely than others. And that can be made mathematically precise and is exactly the sort of information we talk about when we say, “My hard drive is 80 gigabytes in size or I have 20 megabits per second of internet speed.” It’s exactly that sort of information that we’re quantifying.

Now, when I think about a cup, I can think about the system in some way like, there are some number of atoms like 10 to the 25th or whatever, atoms or electrons and protons and neutrons or whatever, and there are then some huge, huge possible set of ways that those things can be and some tiny, tiny, tiny, tiny, tiny, tiny, almost infinitesimally tiny subset of those ways that can be are something that I would label a cup. So if I say, “Oh look, I have a cup”, I’m actually specifying a vast amount of information by saying, “Look, I have a cup.”

Now if I say, “Look, I have a cup and inside it is some dregs of coffee.” I’ve got a huge amount more information. Now, it doesn’t feel like a huge amount more of information. It’s just like, “Yeah, what did I expect? Dregs of coffee.” It’s not that big of a deal but physically speaking, it’s a huge amount of information that I’ve specified just by noticing that there are dregs of coffee in the cup instead of dregs of all kinds of other liquids and all kinds of other states and so on.

So that’s the quantitative aspect, I can quantify how much information is in a description of a system and the description of it is important because you might come along and you can’t see this cup. So I can tell you, there’s some stuff on my desk. You know a lot less about what’s on my desk than I do. So we have different descriptions of this same system and I’ve got a whole lot more information than you do about what’s on my desk. So the information, and this is an important thing, is associated with somebody’s description of the system, not necessarily a person’s, but any way of specifying probabilities of the system being in a subset of all of its possibilities. Whether that’s somebody describing it or whatever else, anything that defines probabilities over the states that the system could be in, that’s defining an amount of information associated with those probabilities.

So there’s that quantity. But there’s also, when I say, what is a mug? So you can say that the mug is made of protons, electrons, and neutrons, but of course pretty much anything in our world is made of protons, neutrons, and electrons. So what makes this a mug rather than a phone or a little bit of an elephant or whatever, is the particular arrangement that those atoms have. To say that a mug is just protons, neutrons, and electrons, I think is totally misleading in the sense that the protons, neutrons, and electrons are the least informative part of what makes it a mug. So there’s a quantity associated with that, the mug part of possibility space is very small compared to all of the possibilities. So that means that there’s a lot of information in saying that it’s a mug.

But there’s also the quality of what that particular subset is and that that particular subset is connected in various ways with things in my description, like solidity and mass and brownness and hardness and hollowness. It is at the intersection of a whole bunch of other properties that a system might have. So each of those properties I can also think of as subsets of possibility space. Suppose I take all things that are a kilogram, that’s how many protons, neutrons, and electrons they have. So, that’s my system. There’s a gazillion different ways that a kilogram of protons and neutrons and electrons can be where we could write down the very exponential numbers that it is.

Now, if I then say, “Okay, let me take a subset of that possibility space that are solid,” that’s a very small subset. There are lots of ways things can be gases and liquids. Okay, so I’ve made a small subset. Now let me take another property, which is hardness. So, that’s another subset of all possibilities. And where hardness intersect solid, I have hard, solid things and so on. So I can keep adding properties on and when I’ve specified enough properties, it’s something that I would give the label of a mug. So when I ask, what is a mug made of? In some sense it’s made of protons, neutrons, and electrons, but I think in a more meaningful sense, it’s made of the properties that make up it being a mug rather than some other thing. And those properties are these subsets or these ways of breaking up the state space of the mug into different possibilities.

In that sense, I kind of think of the mug as more made of properties with an associated amount of information with them and the sort of fundamental nature of the mug is that set of properties. And your reaction to that might be like, “Yes it has those properties but it is made of stuff.” But then if you go back and ask, what is that stuff? Again, the stuff is a particular set of properties. As deep as you go, it’s properties all the way down until you get to the properties of electrons, protons, and neutrons, which are just particular ways that those are and answers to those questions that you get by asking the right questions of those things.

And so that’s what it means to me to take the view that everything is made up of information in some way, it’s to take a view that there isn’t a separation between the properties that we intersect to say that it is something and the thing itself that has those properties.

Lucas Perry: So in terms of identity here, there was a question about the identity status of the cup. I think that, from hearing your talks previously, you propose a spectrum of subjectivity and objectivity rather than it being a kind of binary thing, because the cup is a set of questions and properties. Can you expand a little bit about the identity of the cup and what the meaning of the cup, given that it is constituted from this quantum mechanical perspective of just information about the kinds of questions and properties we may ask of cup-like objects.

Anthony Aguirre: I think there are different ways in which the description of a system or what it is that we mean when we say it is this kind of thing. “It is a cup” or the laws of physics or like, “There is this theorem of mathematics” or “I feel itchy”, are three fairly different statements. But my view is that we should not try to sort them into objective facts of the world and individual subjective or personal perspective kind of things.

But there’s really this continuum in between them. So when I say that there’s this thing on my desk that is a cup, there’s my particular point of view that sees the cup and that has a whole bunch of personal associations with the cup. Like I really like this one. I like that it’s made out of clay. I’ve had a lot of nice coffee out of it. And so I’m like … So that’s very personal stuff.

There’s cupness which is obviously not there in the fires of the Big Bang. It’s something that has evolved socially and via biological utility and all the processes that have led to our technological society and our culture having things that we store stuff in and liquids and-

Lucas Perry: That cupness though is kind of like the platonic idealism that we experience imbued upon the object, right? Because of our conventional experience of reality. We can forget the cupness experience is there like that and we identify it and like reify it, right? And then we’re like, “Oh, there’s just cupness there.”

Anthony Aguirre: We get this sense that there is an objectively speaking cup out there, but we forget the level of creation and formulation that has gone on historically and socially and so on to create this notion, this shared collective notion of cupness that is a creation of humanity and that we all carry around with us as part of our mental apparatus.

And then we say, “Oh, cupness is an objective thing and we all agree that this is a cup and the cup is out there.” But really it’s not. It’s somewhere in this spectrum, in the sense that there will certainly be cups, that it’s ambiguous whether it’s a cup or not. There will be people who don’t know what a cup is and so on.

It’s not like every possible person will agree even whether this is a brown cup. Some people may say, “Well actually I’d call that grayish.” It feels fairly objective, but obviously there’s this intersubjective component to it of all these ingredients that we invented going into making that a cup.

Now there are other things that feel more objective than that in a sense, like the laws of physics or some things about mathematics where you say like, “Oh, the ratio of the circumference to the diameter of a circle.” We didn’t make that up. That was there at the beginning of the universe. And that’s a longer conversation, but certainly that feels more objective than the cup.

Once it’s understood what the terms are, there’s sort of no disagreeing with that statement as long as we’re in flat space and so on. And there’s no sense in which we feel like that statement has a large human input. We certainly feel like that ratio was what it was and that we can express it as this series of fractions and so on. Long before there were people, that was true. So there’s a feeling that that is a much more objective thing. And I think that’s fair to say. It has more of that objectivity than a cup. But what I disagree with and find kind of not useful is the notion that there is a demarcation between things that are and aren’t objective.

I sort of feel like you will never find that bright line between an actually objective thing and a not actually objective thing. It will always be somewhere on this continuum and it’s probably not even a one dimensional continuum, but somewhere in this spectrum between things that are quite objective and things that are very, very subjective will be somewhere in that region, kind of everything that makes up our world that we experience.

Lucas Perry: Right. So I guess you could just kind of boil that down by saying that is true because all of the things are just constituted of the kinds of properties and questions that you’re interested in asking about the thing and the questions about the mathematical properties feel and seem more objective because they’re derived from primitive self-intuitive axioms. And then it’s just question wormholes from there, you know? That stand upon bedrock of slightly more and more dubious and relativistic and subjective questions and properties that one may or may not be interested in.

Anthony Aguirre: Yeah. So there are a couple of things I would say to that. One is that there’s a tendency among some people to feel like more objective is more true or more real or something like that. Whereas I think it’s different. And with more true and more real tends to come a normative sense of better. Like more true things are better things. There are two steps there from more objective to more true and from more true to better, both of which are kind of ones that we shouldn’t necessarily just swallow because I think it’s more complicated than that.

So more objective is different and might be more useful for certain purposes. Like it’s really great that the laws of physics are in the very objective side of the spectrum in that we feel like once we’ve found some, lots of different people can use them for all kinds of different things without having to refigure them out. And we can kind of agree on them. And we can also feel like they were true a long time ago and use them for all kinds of things that happened long ago and far away. So there are all these great things about the fact that they are on this sort of objective side of things.

At the same time, the things that actually matter to us in and that are like the most important things in the world to us are a totally subjective thing.

Lucas Perry: Love and human rights and the fact that other humans exist.

Anthony Aguirre: Right. Like all value at some level … I certainly see value as very connected with the subjective experience of things that are experiencing things and that’s purely subjective. Nobody would tell you that the subjective experience of beings is unimportant, I think.

Lucas Perry: But there’s the objectivity of the subjectivity, right? One might argue that the valence of the conscious experience is objective and that that is the objective ground.

Anthony Aguirre: So this was just to say that it’s not that objective is better or more valuable or something like that. It’s just different. And important in different ways. The laws of physics are super important and useful in certain ways, but if someone only knew and applied the laws of physics and held no regard or importance for the subjective experience of beings, I would be very worried about the sorts of things that they would do.

I think there’s some way in which people think dismissively of things that are less objective or that are subjective, like, “Oh, that’s just a subjective feeling of something.” Or, “That’s not like the true objective reality. Like I’m superior because I’m talking about the true objective reality” and I just don’t think that’s a useful way to think about it.

Lucas Perry: Yeah. These deflationary memes or jokes or arguments that love is an absurd reduction of a bunch of chemicals or whatever, that’s this kind of reduction of the supposed value of something which is subjective. But all of the things that we care about most in life, we talked about this last time that like hold together the fabric of reality and provide a ton of meaning, are subjective things. What are these kinds of things? I guess from the perspective of this conversation, it’s like they’re the kinds of questions that you can ask about systems and like how they will interact with each other and the kinds of properties that they have. Right?

Why are these particular questions and properties important? Well, I mean historically and evolutionarily speaking, they have particular functions, right? So it seems clearer and that I would agree with you that there’s the space of all possible questions and properties we can ask about things. And because of historical reasons, we care about a particularly arbitrary subset of those questions and properties that have functional use. And that is constituted of all of these subjective things like cups and houses and like love and like marriage and like rights.

Anthony Aguirre: I’m only, I think, objecting to the notion that those are somehow less real or sort of derivative of a description in terms of particles or fields or mathematics.

Lucas Perry: So the sense in which they’re less real is the sense in which we’ll get confused by the cupness being like a thing in the world. So that’s why I wanted to highlight that phenomenological sense of cupness before where the platonic idealism we see of the cupness is there in and of itself.

Anthony Aguirre: Yeah, I think I agree with that.

Lucas Perry: So what is it that defines whether or not something falls more on the objective side or more on the subjective side? Aren’t all the questions that we ask about macroscopic and fuzzy concepts like love and human rights and cups and houses and human beings … Don’t all those questions have definitive answers as long as the categories are coherent and properly defined?

Anthony Aguirre: I guess the way I see it is that there’s kind of a sense of how broadly shared through agents and through space and time are those categorizations or those sets of properties. Cupness is pretty widespread. It doesn’t go further back in time than humanity. Protozoa don’t use cups. So cupness is fairly objective in that sense. It’s tricky because there exists a subjectivity objectivity axis of how widely shared are the sets of properties and then there’s a different subjective objective axis of experience of my individual phenomenological experience of subjectivity versus an objective view of the world. And I think those are connected but they’re not quite the same sense of the subjective and objective.

Lucas Perry: I think that to put it on that axis is actually a little bit confusing. I understand that the more functional that a meme or a idea or concept is, the more widely shared it’s going to be. But I don’t think that just because more and more agents are agreeing to use some kind of concept like money, that that is becoming more objective. I think it’s just becoming more shared.

Anthony Aguirre: Yeah, that’s fine. I guess I would ask you what does more and less objective mean, if it’s not that?

Lucas Perry: Yeah, I mean I don’t know.

Anthony Aguirre: I’m not sure how to say something is more or less objective without referring to some sense like that, that it is more widespread in some way or that there are more sort of subjective views of the world that share that set of descriptions.

If we go back to the thinking about the probabilities in whatever sense you’re defining the probabilities and the properties, the more perspectives are using a shared set of properties, the more objectively defined are the things that are defined by those properties. Now, how to say that precisely like is this objectivity level 12 because 12 people share that set of properties and 50 people share these, so it’s objectivity level … I wouldn’t want to quantify it that way necessarily.

But I think there is some sort of sense of that, that the more different perspectives on the world use that same set of descriptions in order to interact with the world, the more kind of objective that set of descriptions is. Again, I don’t think that captures everything. Like I still think there was a sense in which the laws of physics were objective before anyone was talking about them and using them. It’s quite difficult. I mean when you think about mathematics-

Lucas Perry: Yeah, I was going to bring that up.

Anthony Aguirre: You know, if you think of mathematics as you’ve got a set of axioms and a set of rules for generating true statements out of those axioms. Even if you pick a particular set of rules, there are a huge number of sets of possible axioms and then each set of axioms, if you just grind those rules on those axioms, will produce just an infinite number of true statements. But grinding axioms into true statements is not doing mathematics, I would say.

So it is true that every true mathematical statement should have a sequence of steps that goes from the axioms to that true mathematical statement. But for every thing that we read in a math textbook, there’s an exponentially large number of other consequences of axioms that just nobody cares about because they’re totally uninteresting.

Lucas Perry: Yeah, there’s no utility to them. So this is again finding spaces of mathematics that have utility.

Anthony Aguirre: What makes certain ones more useful than others? So it seems like you know, e, Euler’s number is a very special number. It’s useful for all kinds of stuff. Obviously there are a continuous infinity of other numbers that are just as valid as that one. Right? But there’s something very special about that one because it shows up all the time, it’s really useful for all these different things.

So we’ve picked out that particular number as being special. And I would say there’s a lot of information associated with that pointing to e and saying, “Oh look, this number”, we’ve done something by that pointing. There’s a whole bunch of information and interesting stuff associated with pointing out that that number is special. So that pointing is something that we humans have done at some level. There wasn’t a symbol e or the notion of e or anything like that before humans were around.

Nonetheless, there’s some sense in which once we find e and see how cool it is and how useful it is, we say, “It was always true that e^ix = cos(x) + i sin(x). Like that was always true even though we just proved it a couple of centuries ago and so on. How could that have not been true? And it was always true, but it wasn’t always true that we knew that it was interesting.

So it’s kind of the interesting-ness and the pointing to that particular theorem as being an interesting one out of all the possible consequences that you could grind out of a set of axioms, that’s what was created by humanity. Now why the process by which we noticed that that was an interesting thing, much more interesting than many other things, how much objectivity there is to that is an interesting question.

Surely some other species that we encountered, almost surely, they would have noticed that that was a particularly interesting mathematical fact like we did. Why? That’s a really hard question to answer. So there is a subjective or non-objective part of it and that we as a species developed that thing. The interesting-ness of it wasn’t always there. We kind of created that interesting-ness of it, but we probably noticed its interesting-ness for some reason and that reason seems to go above and beyond the sort of human processes that noticed it. So there’s no easy answer to this, I think.

Lucas Perry: My layman’s easy answer would be just that it helps you describe and make the formalization and development of mathematical fields, right?

Anthony Aguirre: Sure. But is that helpfulness a fact of the world or a contingent thing that we’ve noticed as we’ve developed mathematics? How, among all species that ever could be imagined that exist, would almost all of them identify that as being useful and interesting or would only some of them and other ones have a very different concept of what’s useful and interesting? That’s really hard to know. And is it more or less objective in that sort of sense?

Lucas Perry: I guess, part of my intuition here is just that it has to do with the way that our universe is constituted. Calculus is useful for like modeling and following velocities and accelerations and objects in Newtonian physics. So like this calculus thing has utility because of this.

Anthony Aguirre: Right. But that which makes it useful, that feels like it’s something more objective, right? Like calculus is inheriting it objectiveness from the objective nature of the universe that makes calculus useful.

Lucas Perry: So the objectiveness is born of its relationship to the real world?

Anthony Aguirre: Yes, but again, what does that mean? It’s hard to put your finger at all on what that thing is that the real world has that makes calculus useful for describing it other than saying the real world is well-described by calculus, right? It feels very circular to say that.

Lucas Perry: Okay, so I’m thoroughly confused then about subjectivity and objectivity, so this is good.

Anthony Aguirre: I think we all have this intense desire to feel like we understand what’s going on. We don’t really understand how reality works or is constituted. We can nonetheless learn more about how it’s constituted and sitting on that razor’s edge between feeling pride and like, “Yes, we figured a bunch of stuff out and we really can predict the world and we can do technology and all these things”, all of which is true, while also feeling the humility that when we really go into it, reality is fundamentally very mysterious, I think is right, but difficult.

My frustration is when I see people purporting to fully understand things like, “Oh, I get it. This is the way that the world is.” And taking a very dismissive attitude toward thinking the world is not the way that they particularly see it. And that’s not as uncommon an attitude as one would like. Right? That is a lot of people’s tendency because there’s a great desire and safety in feeling like you understand this is the way that the world is and if only these poor benighted other souls could see it the way I do, they would be better off. That’s hard because we genuinely do understand much, much, much more about the world than we ever did.

So much so that there is a temptation to feel like we really understand it and I think at some level that’s more the notion that I feel like it’s important to push back against the notion that we get it all. Like you know, we more or less understand how the world is and how it works and how it fundamentally operates. Among some circles that’s more of the hubristic danger of falling into that then there is falling into the, “We don’t know anything.” Although there are other parts of society where there’s the other end too, the anti intellectual stances that like my conception of reality is just as good as yours that I just made up yesterday and we’re all equally good at understanding what the world is really like. Also quite dangerous.

Lucas Perry: The core draw away here for me is just this essential confusion about how to navigate this space of what it means for something to be more subjective and objective and the perspective of analyzing it through the kinds of questions and properties we would ask or be interested in. What you were just saying also had me reflecting a lot on people whose identity is extremely caught up in nationalism or like a team sport. It would seem to be trivial questions or properties you could ask. Like where did you happen to be born? Which city do you particularly have fondness towards? The identity of really being like an American or like really being a fan of the Patriots, people become just completely enthralled and engrossed by that. Your consciousness and ego just gets obliterated into identification with, “I am an American Patriot fan” and like there’s just no perspective. There is no context. When one goes way too far towards the objective, when one is mistaking the nature of things.

Anthony Aguirre: Yeah, there are all sorts of mistakes that we all make all the time and it’s interesting to see pathologies in all directions in terms of how we think about the world and our relation to it. And there are certain cases where you feel like if we could just all take a little bit more of an objective view of this, everyone would be so much better off and kind of vice versa. It takes a lot of very difficult skill to approach our complex world and reality in a way that we’re thinking about it in a useful way in this wide variety of different circumstances where sometimes it’s more useful to think about it more objectively and sometimes more subjectively or along all sorts of other different axes.

It’s a real challenge. I mean that’s part of what it is to be human and to engage in a worthy way with other people and with the world and so on, is to have to understand the more and less useful and skillful ways and lenses through which to look at those things.

At one time, almost everything we do is in error, but you also have to be forgiven because almost everything that you could do would be an error in some way from some standpoint. And sometimes thinking that the cup is objectively real is an error. Thinking that you made up the cup and invented it all on your own is also an error. So like the cup is real and isn’t real and is made up and isn’t made up. Any way you think about it is kind of wrong, but it’s also all kind of okay because you can still pick up the cup and take a drink.

So it’s very tricky. It’s a tricky reality we reside in, but that’s good. I think if everything was straightforward and obvious, that would be a boring world.

Lucas Perry: If everything were straightforward and obvious, then I would reprogram everyone to not find straightforward and obvious things boring and then we would not have this requirement to be in a complicated, un-understandable world.

Anthony Aguirre: I think there’s a Douglas Adams line that, “If you figure it all out, then immediately it all stops and starts again in a more complicated way that becomes more and more difficult. And of course this is something that’s happened many, many times before.”

Lucas Perry: I don’t know how useful it is now, but is talking about emergence here, is that something that’s useful, you think, for talking about identity?

Anthony Aguirre: Maybe. There’s a question of identity of what makes something one thing rather than another and then there’s another question of personal identity and sort of my particular perspective or view of the world, like what I identify as my awareness, my consciousness, my phenomenal experience of the world and that identity and how it persists through time. That identity and how it does or doesn’t connect with other ones. Like, is it truly its own island or should I take a more expansive view of it and is it something that persists over time?

Is there a core thing that persist over time or is it succession of things that are loosely identified or tightly identified with each other? I’m not sure whether all of the stuff that we’ve been talking about in terms of properties and questions and answers and states and things applies to that, but I’m not sure that it doesn’t either.

Lucas Perry: I think it does. Wouldn’t the self or like questions on personal identity be arbitrary questions in a very large state that we would be interested in asking particular questions about what constitutes the person? Is there a self? The self is like a squishy fuzzy concept like love. Does the self exist? Does love exist? Where do they fall on the subjective objective scale?

Anthony Aguirre: Well there are many different questions we could think about, but if I think of my identity through time, I could maybe talk about how similar some physical system is to the physical system I identify as me right now. And I could say I’ve sort of identified through time with the physical system that is really much like me and physics makes that easy because physical systems are very stable and this body kind of evolves slowly. But once you get to the really hard questions like suppose I duplicate this physical system in some way, is my identity one of those or two of those and what happens if you destroy the original one and, you know, those are genuinely confusing questions that I’m not sure that the sort of niceties of understanding emergence and the properties and so on, I’m not sure how much it has to say about it. I’m not sure that it doesn’t, but having thought a lot about the earlier identity questions, I feel no less confused.

Lucas Perry: The way in which emergence is helpful or interesting to me is the way in which … the levels of reality at which human beings conceptualize, which would be like quantum mechanics and then atomic science and then chemistry and then biology and so on.

We imagine them as being sort of stacked up on each other and that if reductionism is attractive to one, you would think that all the top layers supervene upon the nature of the very bottom layer, quantum mechanics. Which is true to some sense and you would want to say that there is fundamental brute identity facts about the like very, very, very base layer.

So you could say that there are such things as irreducible quantum atoms like maybe they reduce into other things but that’s an open question for now. And if we are confident about the identity of those things, there’s at least a starting place, you know from which we would have true answers about identity. Does that make sense?

Anthony Aguirre: Well the sentences make sense but I just largely don’t agree with them. And for all the reasons that we’ve talked about. I think there needs to be a word that is the opposite of emergence, like distillation or something, because I think it’s useful to think both directions.

Like I think it is certainly useful to be able to think about, I have a whole bunch of particles that do these things and then I have another description of them that glosses over say the individual actions of the particles, but creates some very reliable regularity that I can call a law like thermodynamics or like some chemical laws and so on.

So I think that is true, but it’s also useful to think of the other direction, which is we have complicated physical systems and by making very particular simplifications and carving away a lot of the complexity, we create systems that are simple enough to have very simple laws describe them. I would call that a sort of distillation process, which is one that we do. So we go through this process when we encounter new phenomena. We kind of look for ways that we can cut away lots of the complexity, cut away a lot of the properties, try to create a system that’s simple enough to describe in some mathematical way, using some simple attenuated set of concepts and so on.

And then often we take that set and then we try to work our way back up by using those laws and kind of having things that emerge from that lower level description. But I think both processes are quite important and it’s a little bit intellectually dangerous to think of what I’d call the distillation process as a truth-finding process. Like I’m finding these laws that were all already there rather than I’m finding some regularities that are left when I remove all this extra stuff and then forget that you’ve removed all the extra stuff and that when you go back from the so-called more fundamental description, to the emerged description, that you’re secretly sticking a lot of that stuff back in without noticing that you’re doing it.

So that’s sort of my point of view, that the notion that we can go from this description in terms of particles and fields and that we could derive all these emerged layers from it, I think it’s just not true in practice for sure, but also not really true in principle. There’s stuff that we have to add to the system in order to describe those other levels that we sort of pretend that we’re not adding. We say, “Oh, I’m just assuming this extra little thing” but really you’re adding concepts and quantities and all kinds of other apparatus to the thing that you started with.

Lucas Perry: Does that actually describe reality then or does that give you an approximation, the emergent levels?

Anthony Aguirre: Sure. It just gives you answers to different questions than the particle and field level does.

Lucas Perry: But given that the particle and field level stuff is still there, doesn’t that higher order thing still have the capacity for like strange quantum things to happen and that would not be accounted for in the emergent level understanding and therefore it would not always be true if there was some like entanglement or like quantum tunneling business going on?

Anthony Aguirre: Yeah, I think there’s more latitude perhaps. The statistical laws and statistical mechanics are statistical laws. They’re totally exact, but the things that they make are statistical descriptions of the world that are approximate in some way. So it’s like they’re approximate but they’re approximate in a very, very well defined way. I mean it’s certainly true that the different descriptions should not contradict each other. If you have a description of a macroscopic phenomenon that doesn’t conserve energy, then that’s a sort of wrongheaded way to look at that system.

Lucas Perry: But what if that macroscopic system does something quantum? Then the macroscopic description fails. So then it’s like not true or it’s not predictive.

Anthony Aguirre: Yeah, not true I think is not quite the right, like that description let you down in that circumstance. Everything will let you down sometimes.

Lucas Perry: I understand what you’re saying. The things are like functional at the perspective and scales at which you’re interested. And this goes back to kind of this more epistemological agent centered view of science and like engaging in the world that we were talking about earlier. I guess, for a very long time the way that I viewed science as explaining the intrinsic nature of the physical, but really it’s not doing that because all of these things are going to fail at different times. They just have strong predictive power. And maybe it was very wrong of me early on to ever think that science was describing the intrinsic nature of the physical.

Anthony Aguirre: I don’t think it’s entirely wrong. You do get something through distilling more and going more toward the particle and field level in that once you specify something that the quantum mechanics and the standard model of particle physics say gives you a well-defined answer to, then you feel really sure that you’re going to get that result. You do get a dramatically higher level of confidence from doing that distilling process and idealizing a system enough that you can actually do the mathematics to figure out what should happen according to the fundamental physical laws, as we describe them in terms of particles and fields and so on.

So I think that’s the sense in which they’re extra true or real or fundamental, is that you get that higher level of confidence. But at the cost that you had to shoehorn your physical system, either add in assumptions or cutaway things in order to make it something that is describable using that level of description.

You know, not everyone will agree with the way that I’m characterizing this. I think you’ll talk to other physicists and they would say, “Yes they are approximations, but really there’s this objective description and you know, there’s this fundamental description in terms of particles and fields and we’re just making different approximations to it when we talk about these other levels.”

I don’t think there’s much of a difference operationally in terms of that way of talking about it and mine. But I think this is a more true-to-life description of reality, I guess.

Lucas Perry: Right. So I mean there are the fundamental forces and the fundamental forces are what evolve everything. And you’re saying that the emergent things have to do with adding and cutting away things so that you can like simplify the whole process, extract out these other rules and laws which are still highly predictive. Is that all true to say so far?

Anthony Aguirre: Somewhat. I think it’s just that we don’t actually do any of that. We very, very, very, very rarely take a more fundamental set of rules and derive.

Lucas Perry: Yeah, yeah, yeah. That’s not how science works.

Anthony Aguirre: Right. We think that there is such a process in principle.

Lucas Perry: Right.

Anthony Aguirre: But not in practice.

Lucas Perry: But yeah, understanding it in principle would give us more information about how reality is.

Anthony Aguirre: I don’t believe that there is in principle that process. I think the going from the more fundamental level to the “emerged” can’t be done without taking input that comes from the emerged level. Like I don’t think you’re going to find the emerged level in the fundamental description in and of itself without unavoidably taking information from the emerged level.

Lucas Perry: Yeah. To modify the-

Anthony Aguirre: Not modifying but augmenting. Augmenting in the sense that you’re adding things like brownness that you will never find, as far as you will ever look, you will never find brownness in the wave function. It just isn’t there.

Lucas Perry: It’s like you wouldn’t find some kind of chemical law or property in the wave function.

Anthony Aguirre: Any more than you’ll find here or now in the state of the universe. Like they’re just not there. Those are things, incredibly useful things, important things like here and now are pretty central to my description of the world. I’m not going to do much without those, but they’re not in the wave function and they’re not in the boundary conditions of the universe and it’s okay that I have to add those. There’s nothing evil in that doing that.

Like I can just accept that I have to have some input from the reality that I’m trying to describe in order to use that fundamental description. It’s fine. But like, there’s nothing to be worried about, there’s nothing anti-scientific about that. It’s just the idea that someone’s going to hand you the wave function and you’ll derive that the cup is brown here and now is crazy. It just doesn’t work that way. Not in there. That’s my view anyway.

Lucas Perry: But the cup being brown here and now is a consequence of the wave function evolving an agent who then specifies that information, right?

Anthony Aguirre: Again, I don’t know what that would look like. Here’s the wave function. Here’s Schrodinger’s equation and the Hamiltonian. Now tell me is the brown cup in front of or in back of the tape measure? It’s not in there. There’s all colored cups and all colored tape measures and all kinds of configurations. They’re all there in the wave function. To get an answer to that question, you have to put in more information which is like which cup and where and when.

That’s just information you have to put in, in order to get an answer. The answer is not there to begin with and that’s okay. It doesn’t mean that there’s something wrong with the wave function description or that you’ve got the wrong Hamiltonian or the wrong Schrodinger’s equation. It just means that to call that a complete description of reality, I think that’s just very misleading. I understand what people intend by saying that everything is just the wave function and the Schrodinger equation. I just think that’s not the right way to look at it.

Lucas Perry: I understand what you’re saying, like the question only makes sense if say that wave function has evolved to a point that it has created human beings who would specify that information, right?

Anthony Aguirre: None of those things are in there.

Lucas Perry: They’re not in the primordial state but they’re born later.

Anthony Aguirre: Later is no different from the beginning thing. It’s just a wave function. There’s really no difference in quality between the wave function now and at the beginning. It’s exactly the same sort of entity. There’s no more, no less in it than there was then. Everything that we ascribe to being now in the universe that wasn’t there at the beginning are additional ingredients that we have to specify from our position, things like now and here and all those properties of thing.

Lucas Perry: Does the wave function just evolve the initial conditions? Are the initial conditions contained within the wave function?

Anthony Aguirre: Well, both in the sense that if there’s such a thing as the wave function of the universe, and that’s a whole nother topic as to whether that’s a right-minded thing to say, but say that there is, then there’s exactly the same information content to that wave function at anytime and that given the wave function at a time, and the Schrodinger equation, we can say what the wave function is at any other time. There’s nothing added or subtracted.

One is just as good as the other. In that sense, there’s no more stuff in the wave function “now” than there was at the beginning. It’s just the same. All of the sense in which there’s more in the universe now than there was at the Big Bang has to do with things that we specify in addition to the wave function, I would say, that constitute the other levels of reality that we interact with. They’re extra information that we’ve added to the wave function from our actual experience of reality.

If you take a timeline of all possible times, without pointing to any particular one, there’s no time information in that system, but when I say, “Oh look, I declare that I’m now 13.8 billion years from the big bang,” you’re pointing to a particular time by associating with my experience now. By doing that pointing, I’m creating information in just the same way that we’ve described it before. I’m making information by picking out a particular time. That’s something new that I’ve added to what was a barren timeline before I’ve added now something.

There’s more information than there was before by the fact of my pointing to it. I think most of the world is of that nature that it is made of information created by our pointing to it from our particular perspective here and now in the universe seeing this and that and having measured this and that and the other thing. Most of the universe I contend is made of that sort of stuff, information that comes from our pointing to it by seeing it, not information that was there intrinsically in the universe, which is, I think, radical in a sense, but I think is just the way reality is, and that none of that stuff is there in the wave function.

Lucas Perry: At least the capacity is there for it because the wave function will produce us to then specify that information.

Anthony Aguirre: Right, but it produces all kinds of other stuff. It’s like if I create a random number generator, and it just generates a whole list of random numbers, if I look at that list and find, “Oh look, there’s one, one, one, one, one, one, one, one, one,” that’s interesting. I didn’t see that before. By pointing to that, you’ve now created information. The information wasn’t there before. That’s largely what I see the universe as, and in large part, it’s low information in a sense.

I’m hemming and hawing because there are ways in which it’s very high information too, but I think most of the information that we see about the world is information of that type that exists because we very collectively as beings that have evolved and had culture and all the stuff that we’ve gone through historically we are pointing to it.

Lucas Perry: So connecting this back to the spectrum of objectivity and subjectivity, as we were talking for a long time about cups and as we talked about on the last podcast about human rights for example as being a myth or kinds of properties which we’re interested in ascribing to all people, which people actually intrinsically lack. People are numerically distinct over time. They’re qualitatively distinct, very often. There’s nothing in the heart of physics which gives us the kinds of properties.

Human rights, for example, are supposed to be instantiating in us. Rather, it’s a functional convention that is very useful for producing value. We’ve specified this information that all human beings share unalienable rights, but as we enter the 21st century, the way that things are changing is that the numerical and qualitative facts about being a human being that have held for thousands of years are going to begin to be perturbed.

Anthony Aguirre: Yes.

Lucas Perry: You brought this up by saying… You could either duplicate yourself arbitrarily, whether you do that physically via scans and instantiating actual molecular duplicates of yourself. You could be mind uploaded, and then you could have that duplicated arbitrarily. For hundreds of thousands of years, your atoms would cycle out every seven years or so, and that’s how you would be numerically distinct, and qualitatively, you would just change over your whole lifetime until you became thermodynamically very uninteresting and spread out and died.

Now, there’s this duplication stuff. There is your ability to qualitatively change yourself very arbitrarily. So at first, it will be through bioengineering like designer babies. There’s all these interesting things and lots of thought experiments that go along with it. What about people who have their corpus callosum cut? You have the sense of phenomenological self, which is associated with that. You feel like you’re a unitary subject of experience.

What happens to your first person phenomenological perspective if you do something like that? What about if you create a corpus callosum bridge to another person’s brain, what happens to the phenomenological self or identity? Science and AI and increasing intelligence and power over the universe will increasingly give us this power to radically change and subvert our commonly held intuitions about identity, which are constituted about the kinds of questions and properties which we’re interested in.

Then also the phenomenological experience, which is whether or not you have a strong sense of self, whether or not you are empty of a sense of self or whether or not you feel identified with all of consciousness and the whole world. There’s spectrums and degrees and all kinds of things around here. That is an introduction to the kind of problem that this is.

Anthony Aguirre: I agree with everything you said, but you’re very unhelpfully asking all the super interesting questions-

Lucas Perry: At once.

Anthony Aguirre: … which are all totally impossible to solve. No, I totally agree. We’ve had this enviable situation of one mind equals one self equals one brain equals one body that has made it much easier to accord to that whole set of things, all of which are identified with each other a set of rights and moral values and things like that.

Lucas Perry: Which all rest on these intuitions, right? That are all going to change.

Anthony Aguirre: Right.

Lucas Perry: Property and rights and value and relationships and phenomenological self, et cetera.

Anthony Aguirre: Right, so we either have a choice of trying to maintain that identity, and remove any possibility of breaking some of those identities because it’s really important to keep all those things identified, or we have to understand some other way to accord value and rights and all those things given that the one-to-one correspondence can break. Both of those are going to be very hard, I think. As a practical matter, it’s simply going to happen that those identifications are going to get broken sooner or later.

As you say, if we have a sufficient communication bandwidth between two different brains, for example, one can easily imagine that they’ll start to have a single identity just as the two hemispheres of our brain are connected enough that they generally have what feels like a single identity. Even though if you cut it, it seems fairly clear that there are in some sense two different identities. At minimum, technologically, we ought to be able to do that.

It seems very likely that we’ll have machine intelligence systems whose phenomenological awareness of the world is unclear but at least have a concept of self and a history and agency and will be easily duplicatable. They at least will have to face the question of what it means when they get duplicated because that’s going to happen to them, and they’re going to have to have a way of dealing with that reality because it’s going to be their everyday reality that they can be copied, ad infinitum, and reset and so on.

If they’re functioning is it all like a current digital computer. There are also going to be even bigger gulfs than there are now between levels of capability and awareness and knowledge and perhaps consciousness. We already have those, and we gloss over them, and I think that’s a good thing in according people fundamental human rights. We don’t give people at least explicitly legally more rights when they’re better educated and wealthier and so on, even if in practice they do get more.

Legally, we don’t, even though that range is pretty big, but if it gets dramatically bigger, it may get harder and harder to maintain even that principle. I find it both exciting and incredibly daunting because the questions are so hard to think of how we’re going to deal with that set of ethical questions and identity questions, and yet we’re going to have to somehow. I don’t think we can avoid them. One possibility is to decide that we’re going to attempt to never break those sets of identities.

I sometimes think about Star Wars. They’ve got all this amazing technology, right? They can zip across the universe, but then it’s incredibly primitive in others. Their computers suck and all of their AI is in robots. One robot, one brain, one consciousness, they’re all identical. So I have this theory of Star Wars that behind the scenes, there’s some vast intelligence that’s maybe baked into the midi-chlorians or whatever, that prevents more weird, complicated things like powerful AI or powerful software systems.

It’s like overseer that keeps everything just nicely embodied in individual physical agents that do stuff. Obviously, that’s not part of the Star Wars canon, but that’s how it plays out, right? Even though there’s all this high tech, they’ve neatly avoided all of these annoying questions and difficult questions by just maintaining that one-to-one correspondence. That is in some level an option. That is something that we could try to do because we might decide that not doing that leads to such a big open can of worms that we will never be able to deal with, that we better maintain that one-to-one correspondence.

My guess is that even if that was a good idea, we wouldn’t be coordinated enough or foresightful enough to maintain that.

Lucas Perry: There would be optimization pressures to do otherwise.

Anthony Aguirre: There would. It would take some almost God-like entity to keep it from happening. Then we have to ask, “Where is the theory of what to value and how do we value individual people? Where is that next going to come from?” That last time, at least in the West, it was born out of enlightenment philosophy and coming out of, honestly, I think Judeo Christian religion. That’s very tied together. Is there something that is going to come out of some other major philosophical work? I’m not sure that I see that project happening and unfolding.

Lucas Perry: Formally right now?

Anthony Aguirre: Yes. Do you?

Lucas Perry: No, I don’t see that, but I think that there are the beginnings of that. I think that I would propose and others, and I don’t know how others would feel, but that foundation instead of enlightenment philosophy about rights based off the immutable rights that beings have given their identity class, it would be in the future a sufficiently advanced science of consciousness would just value all of the different agents based off the understanding of the degrees and kinds of experience and awareness and causal implications that it could have in the world.

I would just do a kind of consequentialism and so far as it would be possible. Then I guess the interesting part would be where consequentialism fails because it’s computationally intractable. You would want to invent other kinds of things that would stand in the way, but I feel optimistic that the very very smart things in the future could do something like that. I would ground it on consciousness.

Anthony Aguirre: I mean, there are so many questions even if you take the view that you’re trying to maximize high quality phenomenological experience moments or whatever, I think there’s so many things that that leaves either problematic or unanswered.

Lucas Perry: Like what?

Anthony Aguirre: What about beings that may have super high levels of awareness and consciousness but not positive or negative valence? Do they count or not? Does it mean anything that experiences are connected through time in some large set of personal identity or is a bunch of disconnected experiences just as good as other ones? There may be a positive valence to experience that comes out of its aggregation over time and its development and evolution over time that is absent from any individual one of those moments, all of which may be less good than a drug trip or just eating a candy bar, but like a life of eating candy bars versus a less pleasurable but more fulfilling life. How do we quantify those things against each other?

Lucas Perry: The repugnant conclusion, what do we think about the repugnant conclusion that’s like kind of that. A quick definition, the repugnant conclusion is how you would compare a very small, limited number of amazing experiences against an astronomically large number of experiences which are just barely better than non-existence, very, very, very, very slightly better than a valence of zero. If all of those added up to be just like a fraction of a hair larger than the few really, really good experiences, which world should you pick? Hedonic consequentialism would argue that you should pick the astronomically large number of experiences that are barely worth living and that to some is repugnant.

Anthony Aguirre: I think it’s safe to say that there is no proposal on the table that everyone feels like, “Oh yeah, that’s the way to do it.” I’d be profoundly suspicious of anything that claimed to be that. So I don’t think there are going to be easy answers, but it may be that there’s at least a framework from which we can stand to get into some of the complexities. That may be a very different framework than the one that we have now.

Where that will come from and how we would transition to it and what that would mean and what kind of terrible and wonderful consequences that might have, I think, certainly nobody knows. It’s not even clear that anybody has a sense of what that will look like.

Lucas Perry: I think that one of the last questions here and perspectives that I’d like to get from you are how this perspective on how human perspectives on identity changes what we want. So this one-to-one correspondence, one body, one brain, one phenomenological self that feels like its consciousness is its own and is like an Island, how that experience changes what human beings want in the 21st century with regards to upgrading or merging with AI and technology or with cryonics.

If everything and everyone is numerically and quantitatively completely impermanent such that no matter what kind of technological intervention we do in 100 to 200 years, everyone will either be thermodynamically scattered or so completely and fundamentally changed that you won’t be able to recognize yourself and the ethical implications of this and how it changes what kinds of futures people want. I’m curious to know if you have any thoughts of this holding in the perspective in your head of Max’s book Life 3.0 and the kinds of world trajectories that people are interested in from there.

Anthony Aguirre: That’s a big question. That’s hard to know how to approach. I think there are many genuinely qualitatively different possible futures, so I don’t think there is a way that things are going to turn out in terms of all these questions. I think it’s going to be historically contingent and there are going to be real choices that we make. I’m of two minds in this, and that I do believe in something like moral progress and that I feel like there’s an agreed sense that we feel now that things that we did in the past were morally incorrect, and that we’ve learned new moral truths that allow us to live in a better way than we used to.

At the same time, I feel like there are ways that society has turned out. It could have been that the world became much more dominated by Eastern philosophy than Western philosophy say. I think we would probably still feel like we had made moral progress through that somewhat different history as we’ve made moral progress through this history that we did take. I’m torn between a feeling that there is real moral progress, but that progress is not toward some predefined optimal moral system that we’re going to progress towards and find, but that the progress will also have a whole bunch of contingent things that occur through our society’s evolution through chance or through choice that we make, and that there genuinely are very different paths that we have ahead of us.

No small part of that will be our current way of thinking in our current values and how we tried to keep things aligned with those current values. I think there will be a strong desire to maintain this one-to-one connection between identity and moral value and mind and so on, and that things that violate that, I think, are going to be seen as threats. They are profound threats to our current moral system. How that will play out is really unforeseeable.

Will those be seen as threats that we eventually just say actually, they weren’t that scary after all and we just have to adjust? Will they be threats that are just pushed aside by the tide of reality and technology? Will they be threats that we decide are so threatening that we want to hold on and really solidify and codify this relation? I think those are all possibilities, and it’s also possible that I’m wrong and that there will just be this smooth evolution where our connection between our phones will become brain interfaces, and we’ll just get more and more dr-individualized in some smooth way, and that people will sound an alarm that that’s happening and no one will care. That’s also quite possible, whether that alarm is appropriate or not.

Lucas Perry: They just look at the guy sounding the alarm, and then stick the plug in their head.

Anthony Aguirre: Right. So it’s good for us all to think deeply about this and think about what preferences we have, because where we go will end up being some combination of where the technology goes and what preferences we choose and how we express them. Part of the direction will be determined by those that express their preferences convincingly and loudly and have some arguments for them and that defend them and so on. That’s how our progress happens.

Some set of ideas prevails, and we can hope that it’s a good one. I have my own personal prejudices and preferences about some of the questions that are, for example, asked in Max’s book about what futures are most preferred. At some point, I may put more time into developing those into arguments and see if I still feel those preferences or believe them. I’m not sure that I’m ready to do that at the moment, but I think that’s something that we all have to do.

I mean, I think, I do feel a little bit like step one was to identify some of the thorny questions that we’re going to have to answer and talk about how we have to have a conversation about those things and how difficult those questions are going to be, but at some point, we’re actually going to have to start taking positions on some of those questions. I think that’s something that largely nobody is doing now, but it’s not clear how much time we have before we need to have thought about them and actually taking a position on them and argued it out and had some positions prevail.

The alternative to that is this random process driven by the technology and the other social forces that are at work like moneyed interests and social imperatives and all those sorts of things. Having these questions decided by those forces rather than reflection and thinking and debate among people who are trying really hard to think about these questions, that seems like not such a great idea.

Lucas Perry: I agree. That’s my felt sense too. We went from talking to information about emergence to identity. I think it would be really helpful if you could tie together in particular the information discussion with how that information perspective and discussion can inform these questions about identity in the 21st and 22nd centuries.

Anthony Aguirre: I guess one way that the identity and the information parts are connected is I made this argument that a lot of what the world is is information that is associated with a particular vantage point and a particular set of pointings to things that we have as an agent, as a prospective in the world. I think that there’s a question as to whether there is moral value in that. There’s a real sense that every person views the world from their own perspective, but I think it’s more real than that and that when you identify a view of the world and all that comes with that, it really is creating a world in a sense.

There’s some of the world that’s objective at various different levels, but a lot of what the world is is what is created by an individual standpoint and vantage point that is seeing that world and interacting with it. I do wonder is there some sense of grounding some level of value on that creative act? On the fact that as a individual agent that understands and exists over time and assembles this whole sophisticated, complicated view of the world that has all this information content to it, should we not accord some high level of normative value to that, that it’s not just a way to describe how the world is made, but what is valuable in the world be connected with that creation process by the individual standpoint?

That may be a seed for developing some bridge between the view of reality as information, information as something that is largely connected with a vantage point and a vantage point as something that is personal self identity and as connected now with individual consciousness and mind and brain and so on. Is there a way to inhere value in that ability to create lots of sophisticated information through interaction with the world that would bring value to also not just individuals but sets of individuals that together create large amounts of information?

That’s something that develop further, I think. That link that view of how the world is constituted is this interaction between the agent of the world. Maybe there’s something there in terms of a seed for how to ground moral value in a way that’s distinct from the identification that we do now.

Lucas Perry: I guess there’s also this facet where this process of agents asking particular questions and specifying certain kinds of properties that they care about and pointing to specific things, that that process is the same process of construction of the self or the egocentric phenomenal experience and conceptual experience of self. This is all just information that you specify as part of this identification process and the reification process of self.

It would be very good if everyone were mindful enough about thinking about where on the spectrum of objectivity and subjectivity these things they take to ultimately be part of self actually fall, and what are the questions and properties and features they’re actually constituted of? Then what will happen is likely, your commonly held intuitions will largely be subverted. Maybe you’ll still be interested in being a strong nationalist, but maybe you’ll have a better understanding of what it’s actually constituted of.

That’s the Buddhist perspective. I’m just articulating it, I think, through the language and concepts that you’ve provided, where one begins seeing conventional reality as how it’s actually being formulated and no longer confuses the conventional as the ultimate.

Anthony Aguirre: There’s a lot of sophistication, I think, to Buddhist moral thinking, but a lot of it is based around this notion of avoiding suffering and sentient beings. I think there’s so many different sorts suffering and there’s so many different levels that just avoiding suffering ends up implying a lot of stuff, because we’re very good at suffering when our needs are not met. Avoiding suffering is very, very complicated because our unmet needs are very, very complicated.

The view that I was just pointing to is pointing towards some level of value that is rather distinct from suffering because one can imagine a super sophisticated system that has this incredibly rich identity and incredibly rich view of the world and may suffer or not. It’s not clear how closely connected those things are. It’s always dangerous when you think about how to ground value because you realize that any answer you have to that question leave certain things out.

If we try to ground value in sophistication of worldview or something like that, then do we really not value the young kids? I mean, that seems monstrous. Even though they have a pretty simple minded worldview, that seems wrong. I think there are no easy answers to this, but that’s just a sense in which I think I do feel instinctively that there ought to be some level of moral value accorded to beautifully complex, self-aware systems in the world that have created this sophisticated universe through there being experience and existence and interaction with the world.

That ought to count for something. Certainly, it’s not something we want to just blindly destroy, but exactly why we don’t want to destroy it. The deep reason, I think, needs to be investigated. That seems true to me, but I can’t necessarily defend why.

Lucas Perry: That’s really good and, I think, an excellent place to wrap up concluding thoughts. My ethics is so sentience focused that that is an open question, and I would want to pursue deeply why that seems intrinsically valuable for me. Just the obvious direct answer would be because it allows or does not allow for certain kinds of conscious experiences, which is what matters. That is not intrinsically valuable, but it is valuable based off of its relationship to consciousness obviously.

Of course, that’s up for debate and to be argued about. Given uncertainty about consciousness, the view which you propose may be very skillful for dealing with the uncertainty. This is one of the most interesting conversations for me. Like you said, I think it’s very neglected. There’s no one working on it formally. Maybe it’s just too early. I think that it seems like there’s a really big role for popular media and communication to explore these issues.

There are so many good thought experiments in philosophy of personal identity and elsewhere that could be excellent and fun for the public. It’s not just that it’s philosophy that is becoming increasingly needed, but it’s also fun and interesting philosophy. Much of it like the teleportation machines and severing the corpus callosum, it’s perfect stuff for Black Mirror episodes and popular science things which are increasingly becoming interesting, but it’s also I feel existentially very important and interesting.

I think I have a pretty big fear of death. I feel like a lot of that fear is born of those individualism, where you identify it with your own personal consciousness and qualitative nature and some of your numerical nature perhaps, and there’s this great attachment to it. There’s the question in journey of further and always investigating this question of identity and who am I or what am I? That process, I think, also has a lot of important implications for people’s existential anxiety.

That also feeds into and informs how people wish to relate and deal with these technological changes in the 21st century and the kinds of futures they would or would not be excited about. I think those are generally my feelings about this. I hope that it doesn’t just come down to what you were talking about, the socioeconomic and social forces just determining how the whole process unfolds, but there’s actually a philosophical and moral reflection and idealization that happens there, so we can decide how consciousness ever evolves into the deep future.

Anthony Aguirre: I think I agree with a lot of what you said. I think we’ve had this very esoteric discussion about the nature of reality and self and all these things that obviously a lot of people in the world are not going to be that into, but at the same time, I think as you said, some will and some of the questions when framed in evocative ways are super just intrinsically interesting. I think it’s also important to realize how large an affect some of this pretty esoteric philosophical thinking about the nature of reality has had that we had our moral system and legal system and governmental system were largely created in response to careful philosophical thinking and long treatises in the 17th and 18th and 19th centuries.

We need more of those now. We need brilliant works that are not just asking these questions, but actually compellingly arguing for ways to think about them, and putting it out there and saying, “This is the way that we ought to value things, or this is the ground for valuing this or that, or this is the way that we should consider reality and what it means for us.” We don’t have to accept any one of those views, but I fear that in the lack of daringly trying to deeply develop those ideas and push for them and argue for them that we will end up, as you say, just randomly meandering around to where the social forces pushes.

If we really want a development of real ideas on which to found our long-term future, we better start really developing them and valuing them and putting them out there and taking them seriously rather than thinking, “Oh, this is weird esoteric conversation off in the corner of philosophy academia, blah, blah, blah.” De-valuing it in that way, I think, is not just not useful, but really misunderstanding how things have happened historically. Those discussions in the right way and published and pushed in the right ways have had huge influence on the course of humanity. So they shouldn’t be underestimated, and let’s keep going. You can write the book, and we’ll read it.

Lucas Perry: Wonderful. Well, the last point I think is very useful is what you’re saying is very true in terms of the pragmatics and illustrating that. In particular, the enlightenment treatises have very particular views on personal identity. The personal identity of people of color over time has shifted in terms of slavery. The way in which Western colonial powers conceptualize the West Africans for example, was in very particular way.

Even today with gender issues in general, that is also a mainstream discourse on the nature of personal identity. It’s already been a part of the formation of society and culture and civilization, and it will only continue to do so. With that, thanks so much, Anthony. I appreciate it.

AI Alignment Podcast: Identity and the AI Revolution with David Pearce and Andrés Gómez Emilsson

 Topics discussed in this episode include:

  • Identity from epistemic, ontological, and phenomenological perspectives
  • Identity formation in biological evolution
  • Open, closed, and empty individualism
  • The moral relevance of views on identity
  • Identity in the world today and on the path to superintelligence and beyond

Timestamps: 

0:00 – Intro

6:33 – What is identity?

9:52 – Ontological aspects of identity

12:50 – Epistemological and phenomenological aspects of identity

18:21 – Biological evolution of identity

26:23 – Functionality or arbitrariness of identity / whether or not there are right or wrong answers

31:23 – Moral relevance of identity

34:20 – Religion as codifying views on identity

37:50 – Different views on identity

53:16 – The hard problem and the binding problem

56:52 – The problem of causal efficacy, and the palette problem

1:00:12 – Navigating views of identity towards truth

1:08:34 – The relationship between identity and the self model

1:10:43 – The ethical implications of different views on identity

1:21:11 – The consequences of different views on identity on preference weighting

1:26:34 – Identity and AI alignment

1:37:50 – Nationalism and AI alignment

1:42:09 – Cryonics, species divergence, immortality, uploads, and merging.

1:50:28 – Future scenarios from Life 3.0

1:58:35 – The role of identity in the AI itself

 

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, StitcheriHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

You can listen to the podcast above or read the transcript below. 

The transcript has been edited for style and clarity

Lucas Perry: Welcome to the AI Alignment Podcast. I’m Lucas Perry. Today we have an episode with Andres Gomez Emillson and David Pearce on identity. This episode is about identity from the ontological, epistemological and phenomenological perspectives. In less jargony language, we discuss identity from the fundamental perspective of what actually exists, of how identity arises given functional world models and self models in biological organisms, and of the subjective or qualitative experience of self or identity as a feature of consciousness. Given these angles on identity, we discuss what identity is, the formation of identity in biological life via evolution, why identity is important to explore and it’s ethical implications and implications for game theory, and  we directly discuss its relevance to the AI alignment problem and the project of creating beneficial AI.

I think the question of “How is this relevant to AI Alignment?” is useful to explore here in the intro. The AI Alignment problem can be construed in the technical limited sense of the question of “how to program AI systems to understand and be aligned with human values, preferences, goals, ethics, and objectives.” In a limited sense this is strictly a technical problem that supervenes upon research in machine learning, AI, computer science, psychology, neuroscience, philosophy, etc. I like to approach the problem of aligning AI systems from a broader and more generalist perspective. In the way that I think about the problem, a broader view of AI alignment takes into account the problems of AI governance, philosophy, AI ethics, and reflects deeply on the context in which the technical side of the problem will be taking place, the motivations of humanity and the human beings engaged in the AI alignment process, the ingredients required for success, and other civilization level questions on our way hopefully to beneficial superintelligence. 

It is from both of these perspectives that I feel exploring the question of identity is important. AI researchers have their own identities and those identities factor into their lived experience of the world, their motivations, and their ethics. In fact, the same is of course true of policy makers and anyone in positions of power to influence the alignment process, so being aware of commonly held identity models and views is important for understanding their consequences and functions in the world. From a macroscopic perspective, identity has evolved over the past 4.5 billion years on earth and surely will continue to do so in AI systems themselves and in the humans which hope to wield that power. Some humans may wish to merge, other to pass away or simply die, and others to be upgraded or uploaded in some way. Questions of identity are also crucial to this process of relating to one another and to AI systems in a rapidly evolving world where what it means to be human is quickly changing, where copies of digital minds or AIs can be made trivially, and the boundary between what we conventionally call the self and world begins to dissolve and break down in new ways, demanding new understandings of ourselves and identity in particular. I also want to highlight an important thought from the podcast that any actions we wish to take with regards to improving or changing understandings or lived experience of identity must be Sociologically relevant, or such interventions simply risk being irrelevant. This means understanding what is reasonable for human beings to be able to update their minds with and accept over certain periods of time and also the game theoretic implications of certain views of identity and their functional usefulness. This conversation is thus an attempt to broaden the conversation on these issues outside of what is normally discussed and to flag this area as something worthy of consideration.

For those not familiar with David Pearce or Andres Gomez Emilsson. David is a co-founder of the World Transhumanist Association, rebranded humanity plus, and is a prominent figure within the transhumanism movement in general. You might know him from his work on the Hedonistic Imperative, a book which explores our moral obligation to work towards the abolition of suffering in all sentient life through technological intervention. Andrés is a consciousness researcher at the Qualia Research Institute and is also the Co-founder and President of the Stanford Transhumanist Association. He has a Master’s in Computational Psychology from Stanford.

The Future of Life Institute is a non-profit and this podcast is funded and supported by listeners like you. So if you find what we do on this podcast to be important and beneficial, please consider supporting the podcast by donating at futureoflife.org/donate

 If you’d like to be a regular supporter, please consider a monthly subscription donation to make sure we can continue our efforts into the future. 

These contributions make it possible for us to bring you conversations like these and to develop the podcast further. You can also follow us on your preferred listening platform by searching for us directly or following the links on the page for this podcast found in the description. 

And with that, here is my conversation with Andres Gomez Emilsson and David Pearce 

I just want to start off with some quotes here that I think would be useful. The last podcast that we had was with Yuval Noah Harari and Max Tegmark. One of the points that Yuval really emphasized was the importance of self understanding questions like, who am I? What am I in the age of technology? Yuval all said “Get to know yourself better. It’s maybe the most important thing in life. We haven’t really progressed much in the last thousands of years, and the reason is that yes, we keep getting this advice, but we don’t really want to do it,” he goes on to say that, “especially as technology will give us all, at least some of us more and more power, the temptations of naive utopias are going to be more and more irresistible, and I think the really most powerful check on these naive utopias is really getting to know yourself better.”

In search of getting to know ourselves better, I want to explore this question of identity with both of you. To start off, what is identity?

David Pearce: One problem is that we have more than one conception of identity. There is the straightforward, logical sense that philosophers call the indiscernibility of identicals, namely that if A equals B, then anything true of A is true of B. In one sense, that’s trivially true, but when it comes to something like personal identity, it just doesn’t hold water at all. You are a different person from your namesake who went to bed last night – and it’s very easy carelessly to shift between these two different senses of identity.

Or one might speak of the United States. In what sense is the United States the same nation in 2020 as it was in 1975? It’s interest-relative.

Andrés Gómez Emilsson: Yeah and to go a little bit deeper on that, I would make the distinction as David made it between ontological identity, what fundamentally is actually going on in the physical world? In instantiated reality? Then there’s conventional identity definitely, the idea of continuing to exist from one moment to another as a human and also countries and so on.

Then there’s also phenomenological identity, which is our intuitive common sense view of: What are we and basically, what are the conditions that will allow us to continue to exist? We can go into more detail but yet, the phenomenological notion of identity is an incredible can of worms because there’s so many different ways of experiencing identity and all of them have their own interesting idiosyncrasies. Most people tend to confuse the two. They tend to confuse ontological and phenomenological identity. Just as a simple example that I’m sure we will revisit in the future, when a person has, let’s say an ego dissolution or a mystical experience and they feel that they merged with the rest of the cosmos, and they come out and say, “Oh, we’re all one consciousness.” That tends to be interpreted as some kind of grasp of an ontological reality. Whereas we could argue in a sense that that was just the shift in phenomenological identity, that your sense of self got transformed, not necessarily that you’re actually directly merging with the cosmos in a literal sense. Although, of course it might be very indicative of how conventional our sense of identity is if it can be modified so drastically in other states of consciousness.

Lucas Perry: Right, and let’s just start with the ontological sense. How does one understand or think about identity from the ontological side?

Andrés Gómez Emilsson: In order to reason about this, you need a shared frame of reference for what actually exists, and a number of things including the nature of time and space, and memory because in the common sense view of time called presentism, where basically there’s just the present moment, the past is a convenient construction and the future is a fiction useful in practical sense, but they don’t literally exist in that sense. This notion that A equals B in the sense of, Hey, you could modify what happens to A and that will automatically also modify what happens to B. It kind of makes sense and you can perhaps think of identity is moving over time along with everything else.

On the other hand, if you have an eternalist point of view where basically you interpret the whole of space time as just basically there, on their own coordinates in the multiverse, that kind of provides a different notion of ontological identity because it’s in a sense, a moment of experience is its own separate piece of reality.

In addition, you also need to consider the question of connectivity: in what way different parts of reality are connected to each other? In a conventional sense, as you go from one second to the next, you’ve continued to be connected to yourself in an unbroken stream of consciousness and this has actually led some philosophers to hypothesize that the proper unit of identity is from the moment your wake up to the moment in which you go to sleep because that’s an unbroken chain/stream of consciousness.

From a scientific and philosophically rigorous point of view, it’s actually difficult to make the case that our stream of consciousness is truly unbroken. Definitely if you have an eternalist point of view on experience and on the nature of time, what you will instead see is from the moment you wake up to the moment you go to sleep, there’s actually been an extraordinarily large amount of snapshots of discrete and moments of experience. In that sense, each of those individual moments of experiences would be its own ontologically separate individual.

Now one of the things that becomes kind of complicated with a kind of an eternalist account of time and identity is that you cannot actually change it. There’s nothing you can actually do to A, so that reasoning of if you do anything to A an A equals B, then the same will happen to B, doesn’t even actually apply in here because everything is already there. You cannot actually modify A any more than you can modify the number five.

David Pearce: Yes, it’s a rather depressing perspective in many ways, the eternalist view. If one internalizes it too much, it can lead to a sense of fatalism and despair. A lot of the time it’s probably actually best to think of the future as open.

Lucas Perry: This helps to clarify some of the ontological part of identity. Now, you mentioned this phenomenological aspect and I want to say also the epistemological aspect of identity. Could you unpack those two? And maybe clarify this distinction for me if you wouldn’t parse it this way? I guess I would say that the epistemological one is the models that human beings have about the world and about ourselves. It includes how the world is populated with a lot of different objects that have identity like humans and planets and galaxies. Then we have our self model, which is the model of our body and our space in social groups and who we think we are.

Then there’s the phenomenological identity, which is that subjective qualitative experience of self or the ego in relation to experience. Or where there’s an identification with attention and experience. Could you unpack these two later senses?

Andrés Gómez Emilsson: Yeah, for sure. I mean in a sense you could have like an implicit self model that doesn’t actually become part of your consciousness or it’s not necessarily something that you’re explicitly rendering. This goes on all the time. You’ve definitely, I’m sure, had the experience of riding a bicycle and after a little while you can almost do it without thinking. Of course, you’re engaging with the process in a very embodied fashion, but you’re not cognizing very much about it. Definitely you’re not representing, let’s say your body state, or you’re representing exactly what is going on in a cognitive way. It’s all kind of implicit in the way in which you feel. I would say that paints a little bit of a distinction between a self model which is ultimately functional. It has to do with, are you processing the information that you’re required to solve the task that involves modeling what you are in your environment and distinguishing it from the felt sense of, are you a person? What are you? How are you located and so on.

The first one is the one that most of robotics and machine learning, that have like an embodied component, are really trying to get at. You just need the appropriate information processing in order to solve the task. They’re not very concerned about, does this feel like anything? Or does it feel like a particular entity or a self to be that particular algorithm?

Whereas, we’re talking about the phenomenological sense of identity. That’s very explicitly about how it feels like and there’s all kinds of ways in which a healthy so to speak, sense of identity, can be broken down in all sorts of interesting ways. There’s many failure modes, we can put it that way.

One might argue, I mean I suspect for example, David Pearce might say this, which is that, our self models or our implicit sense of self, because of the way in which it was brought up through Darwinian selection pressures, is already extremely ill in some sense at least, from the point of view of it, it actually telling us something true and actually making us do something ethical. It has all sorts of problems, but it is definitely functional. You can anticipate being a person tomorrow and plan accordingly. You leave messages to yourself by encoding them in memory and yeah, this is a convenient sense of conventional identity.

It’s very natural for most people’s experiences. I can briefly mention a couple of ways in which it can break down. One of them is depersonalization. It’s a particular psychological disorder where one stops feeling like a person, and it might have something to do with basically, not being able to synchronize with your bodily feelings in such a way that you don’t actually feel embodied. You may feel this incarnate entity or just a witness experiencing a human experience, but not actually being that person.

Then you also have things such as empathogen induced sense of shared identity with others. If you’d take MDMA, you may feel that all of humanity is deeply connected, or we’re all part of the same essence of humanity in a very positive sense of identity, but perhaps not in an evolutionary adaptive sense. Finally, is people with a multiple personality disorder, where in a sense they have a very unstable sense of who they are and sometimes it can be so extreme that there’s epistemological blockages from one sense of self to another.

David Pearce: As neuroscientist Donald Hoffman likes to say, fitness trumps truth. Each of us runs a world-simulation. But it’s not an impartial, accurate, faithful world-simulation. I am at the center of a world-simulation, my egocentric world, the hub of reality that follows me around. And of course there are billions upon billions of other analogous examples too. This is genetically extremely fitness-enhancing. But it’s systematically misleading. In that sense, I think Darwinian life is malware.

Lucas Perry: Wrapping up here on these different aspects of identity, I just want to make sure that I have all of them here. Would you say that those are all of the aspects?

David Pearce: One can add the distinction between type- and token- identity. In principle, it’s possible to create from scratch a molecular duplicate of you. Is that person you? It’s type-identical, but it’s not token-identical.

Lucas Perry: Oh, right. I think I’ve heard this used in some other places as numerical distinction versus qualitative distinction. Is that right?

David Pearce: Yeah, that’s the same distinction.

Lucas Perry: Unpacking here more about what identity is. Let’s talk about it purely as something that the world has produced. What can we say about the evolution of identity in biological life? What is the efficacy of certain identity models in Darwinian evolution?

Andrés Gómez Emilsson: I would say that self models most likely have existed, potentially since pretty early on in the evolutionary timeline. You may argue that in some sense even bacteria has some kind of self model. But again, a self model is really just functional. The bacteria does need to know, at least implicitly, it’s size in order to be able to navigate it’s environment, follow chemical gradients, and so on, not step on itself. That’s not the same, again, as a phenomenal sense of identity, and that one I would strongly suspect came much later. Perhaps with the advent of the first primitive nervous systems. That would be only if actually running that phenomenal model is giving you some kind of fitness advantage.

One of the things that you will encounter with David and I is that we think that phenomenally bound experiences have a lot of computational properties and in a sense, the reason why we’re conscious has to do with the fact that unified moments of experience are doing computationally useful legwork. It comes when you merge implicit self models in just the functional sense together with the computational benefits of actually running a conscious system that, perhaps for the first time in history, you will actually have a phenomenal self model.

I would suspect probably in the Cambrian explosion this was already going on to some extent. All of these interesting evolutionary oddities that happen in the Cambrian explosion probably had some kind of rudimentary sense of self. I would be skeptical that is going on.

For example, in plants. One of the key reasons is that running a real time world simulation in a conscious framework is very calorically expensive.

David Pearce: Yes, it’s a scandal. What, evolutionarily speaking, is consciousness “for”? What could a hypothetical p-zombie not do? The perspective that Andrés and I are articulating is that essentially what makes biological minds special is phenomenal binding – the capacity to run real-time, phenomenally-bound world-simulations, i.e. not just be 86 billion discrete, membrane-bound pixels of experience. Somehow, we each generate an entire cross-modally matched, real-time world-simulation, made up of individual perceptual objects, somehow bound into a unitary self. The unity of perception is extraordinarily computationally powerful and adaptive. Simply saying that it’s extremely fitness-enhancing doesn’t explain it, because something like telepathy would be extremely fitness-enhancing too, but it’s physically impossible.

Yes, how biological minds manage to run phenomenally-bound world-simulations is unknown: they would seem to be classically impossible. One way to appreciate just how advantageous is (non-psychotic) phenomenal binding is to look at syndromes where binding even partially breaks down: simultanagnosia, where one can see only one object at once, or motion blindness (akinetopsia), where you can’t actually see moving objects, or florid schizophrenia. Just imagine those syndromes combined. Why aren’t we just micro-experiential zombies?

Lucas Perry: Do we have any interesting points here to look at in the evolutionary tree for where identity is substantially different from ape consciousness? If we look back at human evolution, it seems that it’s given the apes and particularly our species a pretty strong sense of self, and that gives rise to much of our ape socialization and politics. I’m wondering if there was anything else like maybe insects or other creatures that have gone in a different direction? Also if you guys might be able to just speak a little bit on the formation of ape identity.

Andrés Gómez Emilsson: Definitely I think like the perspective of the selfish gene, it’s pretty illuminating here. Nominally, our sense of identity is the sense of one person, one mind. In practice however, if you make sense of identity as well in terms of that which you want to defend, or that of which you consider worth preserving, you will see that people’s sense of identity also extends to their family members and of course, with the neocortex and the ability to create more complex associations. Then you have crazy things like sense of identity being based on race or country of origin or other constructs like that.are building on top of imports from the sense of, hey, the people who are familiar to you feel more like you.

It’s genetically adaptive to have that and from the point of view of the selfish gene, genes that could recognize themselves in others and favor the existence of others that also share the same genes, are more likely to reproduce. That’s called the inclusive fitness in biology, you’re not just trying to survive yourself or make copies of yourself, you’re also trying to help those that are very similar to you do the same. Almost certainly, it’s a huge aspect of how we perceive the world. Just anecdotally from a number of trip reports, there’s this interesting thread of how some chemicals like MDMA and 2CB, for those who don’t know, it’s these empathogenic psychedelics, that people get the strange sense that people they’ve never met before in their life are as close to them as a cousin, or maybe a half brother, or half sister. It’s a very comfortable and quite beautiful feeling. You could imagine that nature was very selective on who do you give that feeling to in order to maximize inclusive fitness.

All of this builds up to the overall prediction I would make that, the sense of identity of ants and other extremely social insects might be very different. The reason being that they are genetically incentivized to basically treat each other as themselves. Most ants themselves don’t produce any offspring. They are genetically sisters and all of their genetic incentives are into basically helping the queen pass on the genes into other colonies. In that sense, I would imagine an ant probably sees other ants of the same colony pretty much as themselves.

David Pearce: Yes. There was an extraordinary finding a few years ago: members of one species of social ant actually passed the mirror test – which has traditionally been regarded as the gold standard for the concept of a self. It was shocking enough, to many people, when a small fish was shown to be capable of mirror self-recognition. If some ants too can pass the mirror test, it suggests some form of meta-cognition, self-recognition, that is extraordinarily ancient.

What is it that distinguishes humans from nonhuman animals? I suspect the distinction relates to something that is still physically unexplained: how is it that a massively parallel brain gives rise to serial, logico-linguistic thought? It’s unexplained, but I would say this serial stream is what distinguishes us, most of all – not possession of a self-concept.

Lucas Perry: Is there such a thing as a right answer to questions of identity? Or is it fundamentally just something that’s functional? Or is it ultimately arbitrary?

Andrés Gómez Emilsson: I think there is the right answer. From a functional perspective, there’s just so many different ways of thinking about it. As I was describing perhaps with ants and humans, their sense of identity is probably pretty different. But, they both are useful for passing on the genes. In that sense they’re all equally valid. Imagine in the future is some kind of a swarm mind that also has its own distinct functionally adaptive sense of identity, and I mean in that sense that it ground truth to what it should be from the point of view of functionality. It really just depends on what is the replication unit.

Ontologically though, I think there’s a case to be made that either or empty individualism are true. Maybe it would be good to define those terms first.

Lucas Perry: Before we do that. Your answer then is just that, yes, you suspect that also ontologically in terms of fundamental physics, there are answers to questions of identity? Identity itself isn’t a confused category?

Andrés Gómez Emilsson: Yeah, I don’t think it’s a leaky reification as they say.

Lucas Perry: From the phenomenological sense, is the self an illusion or not? Is the self a valid category? Is your view also on identity that there is a right answer there?

Andrés Gómez Emilsson: From the phenomenological point of view? No, I would consider it a parameter, mostly. Just something that you can vary, and there’s trade offs or different experiences of identity.

Lucas Perry: Okay. How about you David?

David Pearce: I think ultimately, yes, there are right answers. In practice, life would be unlivable if we didn’t maintain these fictions. These fictions are (in one sense) deeply immoral. We punish someone for a deed that their namesake performed, let’s say 10, 15, 20 years ago. America recently executed a murderer for a crime that was done 20 years ago. Now quite aside from issues of freedom and responsibility and so on, this is just scapegoating.

Lucas Perry: David, do you feel that in the ontological sense there are right or wrong answers to questions of identity? And in the phenomenological sense? And in the functional sense?

David Pearce: Yes.

Lucas Perry: Okay, so then I guess you disagree with Andres about the phenomenological sense?

David Pearce: I’m not sure, Andrés and I agree about most things. Are we disagreeing Andrés?

Andrés Gómez Emilsson: I’m not sure. I mean, what I said about the phenomenal aspect of identity was that I think of it as a parameter of our world simulation. In that sense, there’s no true phenomenological sense of identity. They’re all useful for different things. The reason I would say this too is, you can assume that something like each snapshot of experience, is its own separate identity. I’m not even sure you can accurately represent that in a moment of experience itself. This is itself a huge can of worms that opens up the problem of referents. Can we even actually refer to something from our own point of view? My intuition here is that, whatever sense of identity you have at a phenomenal level, I think of it as a parameter of the world simulation and I don’t think it can be an accurate representation of something true. It’s just going to be a feeling, so to speak.

David Pearce: I could endorse that. We fundamentally misperceive each other. The Hogan sisters, conjoined craniopagus twins, know something that the rest of us don’t. The Hogan sisters share a thalamic bridge, which enables them partially, to a limited extent, to “mind-meld”. The rest of us see other people essentially as objects that have feelings. When one thinks of one’s own ignorance, perhaps one laments one’s failures as a mathematician or a physicist or whatever; but an absolutely fundamental form of ignorance that we take for granted is we (mis)conceive other people and nonhuman animals as essentially objects with feelings, whereas individually, we ourselves have first-person experience. Whether it’s going to be possible to overcome this limitation in the future I don’t know. It’s going to be immensely technically challenging – building something like reversible thalamic bridges. A lot depends on one’s theory of phenomenal binding. But let’s imagine a future civilization in which partial “mind-melding” is routine. I think it will lead to a revolution not just in morality, but in decision-theoretic rationality too – one will take into account the desires, the interests, and the preferences of what will seem like different aspects of oneself.

Lucas Perry: Why does identity matter morally? I think you guys have made a good case about how it’s important functionally, historically in terms of biological evolution, and then in terms of like society and culture identity is clearly extremely important for human social relations, for navigating social hierarchies and understanding one’s position of having a concept of self and identity over time, but why does it matter morally here?

Andrés Gómez Emilsson: One interesting story where you can think of a lot of social movements, in a sense, a lot of ideologies that have existed in human history, as attempts to hack people’s sense of identities or make use of them for the purpose of the reproduction of the ideology or the social movement itself.

To a large extent, a lot of the things that you see in a therapy have a lot to do with expanding your sense of identity to include your future self as well, which is something that a lot of people struggle with when it comes to impulsive decisions or your rationality. There’s these interesting point of view of how a two year old or a three year old, hasn’t yet internalized the fact that they will wake up tomorrow and that the consequences of what they did today will linger on in the following days. This is kind of a revelation when a kid finally internalizes the fact that, Oh my gosh, I will continue to exist for the rest of my life. There’s going to be a point where I’m going to be 40 years old and also there’s going to be a time where I’m 80 years old and all of those are real, and I should plan ahead for it.

Ultimately, I do think that advocating for a very inclusive sense of identity, where the locus of identity is consciousness itself. I do think that might be a tremendous moral and ethical implications.

David Pearce: We want an inclusive sense of “us” that embraces all sentient beings.  This is extremely ambitious, but I think that should be the long-term goal.

Lucas Perry: Right, there’s a spectrum here and where you fall on the spectrum will lead to different functions and behaviors, solipsism or extreme egoism on one end, pure selflessness or ego death or pure altruism on the other end. Perhaps there are other degrees and axes on which you can move, but the point is it leads to radically different identifications and relations with other sentient beings and with other instantiations of consciousness.

David Pearce: Would our conception of death be different if it was a convention to give someone a different name when they woke up each morning? Because after all, waking up is akin to reincarnation. Why is it that when one is drifting asleep each night, one isn’t afraid of death? It’s because (in some sense) one believes one is going to be reincarnated in the morning.

Lucas Perry: I like that. Okay, I want to return to this question after we hit on the different views of identity to really unpack the different ethical implications more. I wanted to sneak that in here for a bit of context. Pivoting back to this sort of historical and contextual analysis of identity. We talked about biological evolution as like instantiating these things. How do you guys view religion as codifying an egoist view on identity? Religion codifies the idea of the eternal soul and the soul, I think, maps very strongly onto the phenomenological self. It makes that the thing that is immutable or undying or which transcends this realm?

I’m talking obviously specifically here about Abrahamic religions, but then also in Buddhism there is, the self is an illusion, or what David referred to as empty individualism, which we’ll get into, where it says that identification with the phenomenological self is fundamentally a misapprehension of reality and like a confusion and that that leads to attachment and suffering and fear of death. Do you guys have comments here about religion as codifying views on identity?

Andrés Gómez Emilsson: I think it’s definitely really interesting that there are different views of identity and religion. How I grew up, I always assumed religion was about souls and getting into heaven. As it turns out, I just needed to know about Eastern religions and cults. That also happened to sometimes have like different views of personal identity. That was definitely a revelation to me. I would actually say that I started questioning the sense of a common sense of personal identity before I learned about Eastern religions and I was really pretty surprised and very happy when I found out that, let’s say Hinduism actually, it has a kind of universal consciousness take on identity, a socially sanctioned way of looking at the world that has a very expansive sense of identity. Buddhism is also pretty interesting because as far as I understand it, they consider actually pretty much any view of identity to be a cause for suffering fundamentally has to do with a sense of craving either for existence or craving for non-existence, which they also consider a problem. A Buddhist would generally say that even something like universal consciousness, believing that we’re all fundamentally Krishna incarnating in many different ways, itself will also be a source of suffering to some extent because you may crave further existence, which may not be very good from their point of view. It makes me optimistic that there’s other types of religions with other views of identity.

David Pearce: Yes. Here is one of my earliest memories. My mother belonged to The Order of the Cross – a very obscure, small, vaguely Christian denomination, non-sexist, who worship God the Father-Mother. And I recall being told, aged five, that I could be born again. It might be as a little boy, but it might be as a little girl – because gender didn’t matter. And I was absolutely appalled at this – at the age of five or so – because in some sense girls were, and I couldn’t actually express this, defective.

And religious conceptions of identity vary immensely. One thinks of something like Original Sin in Christianity. I could now make a lot of superficial comments about religion. But one would need to explore in detail the different religious traditions and their different conceptions of identity.

Lucas Perry: What are the different views on identity? If you can say anything, why don’t you hit on the ontological sense and the phenomenological sense? Or if we just want to stick to the phenomenological sense then we can.

Andrés Gómez Emilsson: I mean, are you talking about an open, empty, closed?

Lucas Perry: Yeah. So that would be the phenomenological sense, yeah.

Andrés Gómez Emilsson: No, actually I would claim those are attempts at getting at the ontological sense.

Lucas Perry: Okay.

Andrés Gómez Emilsson: If you do truly have a soul ontology, something that implicitly a very large percentage of the human population have, that would be, yeah, in this view called a closed individualist perspective. Common sense, you start existing when you’re born, you stop existing when you die, you’re just a stream of consciousness. Even perhaps more strongly, you’re a soul that has experiences, but experiences maybe are not fundamental to what you are.

Then there is the more Buddhist and definitely more generally scientifically-minded view, which is empty individualism, which is that you only exist as a moment of experience, and from one moment to the next that you are a completely different entity. And then, finally, there is open individualism, which is like Hinduism claiming that we are all one consciousness fundamentally.

There is an ontological way of thinking of these notions of identity. It’s possible that a lot of people think of them just phenomenologically, or they may just think there’s no further fact beyond the phenomenal. In which case something like that closed individualism, for most people most of the time, is self-evidently true because you are moving in time and you can notice that you continue to be yourself from one moment to the next. Then, of course, what would it feel like if you weren’t the same person from one moment to the next? Well, each of those moments might completely be under the illusion that it is a continuous self.

For most things in philosophy and science, if you want to use something as evidence, it has to agree with one theory and disagree with another one. And the sense of continuity from one second to the next seems to be compatible with all three views. So it’s not itself much evidence either way.

States of depersonalization are probably much more akin to empty individualism from a phenomenological point of view, and then you have ego death and definitely some experiences of the psychedelic variety, especially high doses of psychedelics tend to produce very strong feelings of open individualism. That often comes in the form of noticing that your conventional sense of self is very buggy and doesn’t seem to track anything real, but then realizing that you can identify with awareness itself. And if you do that, then in some sense automatically, you realize that you are every other experience out there, since the fundamental ingredient of a witness or awareness is shared with every conscious experience.

Lucas Perry: These views on identity are confusing to me because agents haven’t existed for most of the universe and I don’t know why we need to privilege agents in our ideas of identity. They seem to me just emergent patterns of a big, ancient, old, physical universe process that’s unfolding. It’s confusing to me that just because there are complex self- and world-modeling patterns in the world, that we need to privilege them with some kind of shared identity across themselves or across the world. Do you see what I mean here?

Andrés Gómez Emilsson: Oh, yeah, yeah, definitely. I’m not agent-centric. And I mean, in a sense also, all of these other exotic feelings of identity often also come with states of low agency. You actually don’t feel that you have much of a choice in what you could do. I mean, definitely depersonalization, for example, often comes with a sense of inability to make choices, that actually it’s not you who’s making the choice, they’re just unfolding and happening. Of course, in some meditative traditions that’s considered a path to awakening, but in practice for a lot of people, that’s a very unpleasant type of experience.

It sounds like it might be privileging agents; I would say that’s not the case. If you zoom out and you see the bigger worldview, it includes basically this concept, David calls it non-materialist physicalist idealism, where the laws of physics describe the behavior of the universe, but that which is behaving according to the laws of physics is qualia, is consciousness itself.

I take very seriously the idea that a given molecule or a particular atom contains moments of experience, it’s just perhaps very fleeting and very dim or are just not very relevant in many ways, but I do think it’s there. And sense of identity, maybe not in a phenomenal sense, I don’t think an atom actually feels like an agent over time, but continuity of its experience and the boundaries of its experience would have strong bearings on ontological sense of identity.

There’s a huge, obviously, a huge jump between talking about the identity of atoms and then talking about the identity of a moment of experience, which presumably is an emergent effect of 100 billion neurons, themselves made of so many different atoms. Crazy as it may be, it is both David Pearce’s view and my view that actually each moment of experience does stand as an ontological unit. It’s just the ontological unit of a certain kind that usually we don’t see in physics, but it is both physical and ontologically closed.

Lucas Perry: Maybe you could unpack this. You know mereological nihilism, maybe I privilege this view where I just am trying to be as simple as possible and not build up too many concepts on top of each other.

Andrés Gómez Emilsson: Mereological nihilism basically says that there are no entities that have parts. Everything is part-less. All that exists in reality is individual monads, so to speak, things that are fundamentally self-existing. For that, if you have let’s say monad A and monad B, just put together side by side, that doesn’t entail that now there is a monad AB that mixes the two.

Lucas Perry: Or if you put a bunch of fundamental quarks together that it makes something called an atom. You would just say that it’s quarks arranged atom-wise. There’s the structure and the information there, but it’s just made of the monads.

Andrés Gómez Emilsson: Right. And the atom is a wonderful case, basically the same as a molecule, where I would say mereological nihilism with fundamental particles as just the only truly existing beings does seem to be false when you look at how, for example, molecules behave. The building block account of how chemical bonds happen, which is with these Lewis diagrams of how it can have a single bond or double bond and you have the octet rule, and you’re trying to build these chains of atoms strung together. And all that matters for those diagrams is what each atom is locally connected to.

However, if you just use these in order to predict what molecules are possible and how they behave and their properties, you will see that there’s a lot of artifacts that are empirically disproven. And over the years, chemistry has become more and more sophisticated where eventually, it’s come to the realization that you need to take into account the entire molecule at once in order to understand what its “dynamically stable” configuration, which involves all of the electrons and all of the nuclei simultaneously interlocking into a particular pattern that self replicates.

Lucas Perry: And it has new properties over and above the parts.

Andrés Gómez Emilsson: Exactly.

Lucas Perry: That doesn’t make any sense to me or my intuitions, so maybe my intuitions are just really wrong. Where does the new property or causality come from? Because it essentially has causal efficacy over and above the parts.

Andrés Gómez Emilsson: Yeah, it’s tremendously confusing. I mean, I’m currently writing an article about basically how this sense of topological segmentation can, in a sense, account both for this effect of what we might call weak downward causation, which is like, you get a molecule and now the molecule will have effects in the world; that you need to take into account all of the electrons and all of the nuclei simultaneously as a unit in order to actually know what the effect is going to be in the world. You cannot just take each of the components separately, but that’s something that we could call weak downward causation. It’s not that fundamentally you’re introducing a new law of physics. Everything is still predicted by Schrödinger equation, it’s still governing the behavior of the entire molecule. It’s just that the appropriate unit of analysis is not the electron, but it would be the entire molecule.

Now, if you pair this together with a sense of identity that comes from topology, then I think there might be a good case for why moments of experience are discrete entities. The analogy here with the topological segmentation, hopefully I’m not going to lose too many listeners here, but we can make an analogy with, for example, a balloon. That if you start out imagining that you are the surface of the balloon and then you take the balloon by two ends and you twist them in opposite directions, eventually at the middle point you get what’s called a pinch point. Basically, the balloon collapses in the center and you end up having these two smooth surfaces connected by a pinch point. Each of those twists creates a new topological segment, or in a sense is segmenting out the balloon. You could basically interpret things such as molecules as new topological segmentations of what’s fundamentally the quantum fields that is implementing them.

Usually, the segmentations may look like an electron or a proton, but if you assemble them together just right, you can get them to essentially melt with each other and become one topologically continuous unit. The nice thing about this account is that you get everything that you want. You explain, on the one hand, why identity would actually have causal implications, and it’s this weak downward causation effect, at the same time as being able to explain: how is it possible that the universe can break down into many different entities? Well, the answer is the way in which it is breaking down is through topological segmentations. You end up having these self-contained regions of the wave function that are discommunicated from the rest of it, and each of those might be a different subject of experience.

David Pearce: It’s very much an open question: the intrinsic nature of the physical. Commonly, materialism and physicalism are conflated. But the point of view that Andrés and I take seriously, non-materialist physicalism, is actually a form of idealism. Recently, philosopher Phil Goff, who used to be a skeptic-critic of non-materialist physicalism because of the binding problem, published a book defending it, “Galileo’s Error”.

Again, it’s very much an open question. We’re making some key background assumptions here. A critical background assumption is physicalism, and that quantum mechanics is complete:  there is no “element of reality” that is missing from the equations (or possibly the fundamental equation) of physics. But physics itself seems to be silent on the intrinsic nature of the physical. What is the intrinsic nature of a quantum field? Intuitively, it’s a field of insentience; but this isn’t a scientific discovery, it’s a (very strong) philosophical intuition.

And if you couple this with the fact that the only part of the world to which one has direct access, i.e., one’s own conscious mind (though this is controversial), is consciousness, sentience. The non-materialist physicalist conjectures that we are typical, in one sense – inasmuch as the fields of your central nervous system aren’t ontologically different from the fields of the rest of the world. And what makes sentient beings special is the way that fields are organized into unified subjects of experience, egocentric world-simulations.

Now, I’m personally fairly confident that we are, individually, minds running egocentric world-simulations: direct realism is false. I’m not at all confident – though I explore the idea – that experience is the intrinsic nature of the physical, the “stuff” of the world. This is a tradition that goes back via Russell, ultimately, to Schopenhauer. Schopenhauer essentially turns Kant on his head.

Kant famously said that all we will ever know is phenomenology, appearances; we will never, never know the intrinsic, noumenal nature of the world. But Schopenhauer argues that essentially we do actually know one tiny piece of the noumenal essence of the world, the essence of the physical, and it’s experiential. So yes, tentatively, at any rate, Andrés and I would defend non-materialist or idealistic physicalism. The actual term “non-materialist physicalism” is due to the late Grover Maxwell.

Lucas Perry: Sorry, could you just define that real quick? I think we haven’t.

David Pearce: Physicalism is the idea that no “element of reality” is missing from the equations of physics, presumably (some relativistic generalization of) the universal Schrödinger equation.

Lucas Perry: It’s a kind of naturalism, too.

David Pearce: Oh, yes. It is naturalism. There are some forms of idealism and panpsychism that are non-naturalistic, but this view is uncompromisingly monist. Non-materialist physicalism isn’t claiming that a primitive experience is attached in some way to fundamental physical properties. The idea is that the actual intrinsic nature, the essence of the physical, is experiential.

Stephen Hawking, for instance, was a wave function monist. A doctrinaire materialist, but he famously said that we have no idea what breathed fire into the equations and makes the universe first to describe. Now, intuitively, of course one assumes that the fire in the equations, Kant’s noumenal essence of the world, is non-experiential. But if so, we have the hard problem, we have the binding problem, we have the problem of causal efficacy, a great mess of problems.

But if, and it’s obviously a huge if, the actual intrinsic nature of the physical is experiential, then we have a theory of reality that is empirically adequate, that has tremendous explanatory and predictive power. It’s mind-bogglingly implausible, at least to those of us steeped in the conceptual framework of materialism. But yes, by transposing the entire mathematical apparatus of modern physics, quantum field theory or its generalization, onto an idealist ontology, one actually has a complete account of reality that explains the technological successes of science, its predictive power, and doesn’t give rise to such insoluble mysteries as the hard problem.

Lucas Perry: I think all of this is very clarifying. There are also background metaphysical views, which people may or may not disagree upon, which are also important for identity. I also want to be careful to define some terms, in case some listeners don’t know what they mean. I think you hit on like four different things which all had to do with consciousness. The hard problem is why different kinds of computation actually… why it’s something to be that computation or like why there is consciousness correlated or associated with that experience.

Then you also said the binding problem. Is it the binding problem, why there is a unitary experience that’s, you said, modally connected earlier?

David Pearce: Yes, and if one takes the standard view from neuroscience that your brain consists of 86-billion-odd discrete, decohered, membrane-bound nerve cells, then phenomenal binding, whether local or global, ought to be impossible. So yeah, this is the binding problem, this (partial) structural mismatch. If your brain is scanned when you’re seeing a particular perceptual object, neuroscanning can apparently pick out distributed feature-processors, edge-detectors, motion-detectors, color-mediating neurons (etc). And yet there isn’t the perfect structural match that must exist if physicalism is true. And David Chalmers – because of this (partial) structural mismatch – goes on to argue that dualism must be true. Although I agree with David Chalmers that yes, phenomenal binding is classically impossible, if one takes the intrinsic nature argument seriously, then phenomenal unity is minted in.

The intrinsic nature argument, recall, is that experience, consciousness, discloses the intrinsic nature of the physical. Now, one of the reasons why this idea is so desperately implausible is it makes the fundamental “psychon” of consciousness ludicrously small. But there’s a neglected corollary of non-materialist physicalism, namely that if experience discloses the intrinsic nature of the physical, then experience must be temporally incredibly fine-grained too. And if we probe your nervous system at a temporal resolution of femtoseconds or even attoseconds, what would we find? My guess is that it would be possible to recover a perfect structural match between what you are experiencing now in your phenomenal world-simulation and the underlying physics. Superpositions (“cat states”) are individual states [i.e. not classical aggregates].

Now, if the effective lifetime of neuronal superpositions and the CNS were milliseconds, they would be the obvious candidate for a perfect structural match and explain the phenomenal unity of consciousness. But physicists, not least Max Tegmark, have done the maths: decoherence means that the effective lifetime of neuronal superpositions in the CNS, assuming the unitary-only dynamics, is femtoseconds or less, which is intuitively the reductio ad absurdum of any kind of quantum mind.

But one person’s reductio ad absurdum is another person’s falsifiable prediction. I’m guessing – I’m sounding like a believer, but I’m not –  I am guessing that with sufficiently sensitive molecular matter- wave interferometry, perhaps using “trained up” mini-brains, that the non-classical interference signature will disclose a perfect structural match between what you’re experiencing right now, your unified phenomenal world-simulation, and the underlying physics.

Lucas Perry: So, we hit on the hard problem and also the binding problem. There was like two other ones that you threw out there earlier that… I forget what they were?

David Pearce: Yeah, the problem of causal efficacy. How is it that you and I can discuss consciousness? How is it that the “raw feels” of consciousness have not merely the causal, but also the functional efficacy to inspire discussions of their existence?

Lucas Perry: And then what was the last one?

David Pearce: Oh, it’s been called the palette problem, P-A-L-E-T-T-E. As in the fact that there is tremendous diversity of different kinds of experience and yet the fundamental entities recognized by physics, at least on the normal tale, are extremely simple and homogeneous. What explains this extraordinarily rich palette of conscious experience? Physics exhaustively describes the structural-relational properties of the world. What physics doesn’t do is deal in the essence of the physical, its intrinsic nature.

Now, it’s an extremely plausible assumption that the world’s fundamental fields are non-experiential, devoid of any subjective properties – and this may well be the case. But if so, we have the hard problem, the binding problem, the problem of causal efficacy, the palette problem – a whole raft of problems.

Lucas Perry: Okay. So, this all serves the purpose of codifying that there’s these questions up in the air about these metaphysical views which inform identity. We got here because we were talking about mereological nihilism, and Andrés said that one view that you guys have is that you can divide or cut or partition consciousness into individual, momentary, unitary moments of experience that you claim are ontologically simple. What is your credence on this view?

Andrés Gómez Emilsson: Phenomenological evidence. When you experience your visual fields, you don’t only experience one point at a time. The contents of your experience are not ones and zeros; it isn’t the case that you experience one and then zero and then one again. Rather, you experience many different types of qualia varieties simultaneously: visuals experience and auditory experience and so on. All of that gets presented to you. I take that very seriously. I mean, some other researchers may fundamentally say that that’s an illusion, that there’s actually never a unified experience, but that has way many more problems than actually thinking seriously that unity of consciousness.

David Pearce: A number of distinct questions arise here. Are each of us egocentric phenomenal world-simulations? A lot of people are implicitly perceptual direct realists, even though they might disavow the label. Implicitly, they assume that they have some kind of direct access to physical properties. They associate experience with some kind of stream of thoughts and feelings behind their forehead. But if instead we are world-simulationists, then there arises the question: what is the actual fundamental nature of the world beyond your phenomenal world-simulation? Is it experiential or non-experiential? I am agnostic about that – even though I explore non-materialist physicalism.

Lucas Perry: So, I guess I’m just trying to get a better answer here on how is it that we navigate these views of identity towards truth?

Andrés Gómez Emilsson: An example I thought of, of a very big contrast between what you may intuitively imagine is going on versus what’s actually happening, is if you are very afraid of snakes, for example, you look at a snake. You feel, “Oh, my gosh, it’s intruding into my world and I should get away from it,” and you have this representation of it as a very big other. Anything that is very threatening, oftentimes you represent it as “an other”.

But crazily, that’s actually just yourself to a large extent because it’s still part of your experience. Within your moment of experience, the whole phenomenal quality of looking at a snake and thinking, “That’s an other,” is entirely contained within you. In that sense, these ways of ascribing identity and continuity to the things around us or a self-other division are almost psychotic. They start out by assuming that you can segment out a piece of your experience and call it something that belongs to somebody else, even though clearly, it’s still just part of your own experience; it’s you.

Lucas Perry: But the background here is also that you’re calling your experience your own experience, which is maybe also a kind of psychopathy. Is that the word you used?

Andrés Gómez Emilsson: Yeah, yeah, yeah, that’s right.

Lucas Perry: Maybe the scientific thing is, there’s just snake experience and it’s neither yours nor not yours, and there’s what we conventionally call a snake.

Andrés Gómez Emilsson: That said, there are ways in which I think you can use experience to gain insight about other experiences. If you’re looking at a picture that has two blue dots, I think you can accurately say, by paying attention to one of those blue dots, the phenomenal property of my sensation of blue is also in that other part of my visual field. And this is a case where in a sense you can I think, meaningfully refer to some aspect of your experience by pointing at an other aspect of your experience. It’s still maybe in some sense kind of crazy, but it’s still closer to truth than many other things that we think of or imagine.

Honest and true statements about the nature of other people’s experiences, I think are very much achievable. Bridging the reference gap, I think it might be possible to overcome and you can probably aim for a true sense of identity, harmonizing the phenomenal and the ontological sense of identity.

Lucas Perry: I mean, I think that part of the motivation, for example in Buddhism, is that you need to always be understanding yourself in reality as it is or else you will suffer, and that it is through understanding how things are that you’ll stop suffering. I like this point that you said about unifying the phenomenal identity and phenomenal self with what is ontologically true, but that also seems not intrinsically necessary because there’s also this other point here where you can maybe function or have the epistemology of any arbitrary identity view but not identify with it. You don’t take it as your ultimate understanding of the nature of the world, or what it means to be this limited pattern in a giant system.

Andrés Gómez Emilsson: I mean, generally speaking, that’s obviously pretty good advice. It does seem to be something that’s constrained to the workings of the human mind as it is currently implemented. I mean, definitely all these Buddhists advises of “don’t identify with it” or “don’t get attached to it.” Ultimately, it cashes out in experiencing less of a craving, for example, or feeling less despair in some cases. Useful advice, not universally applicable.

For many people, their problem might be something like, sure, like desire, craving, attachment, in which case these Buddhist practices will actually be very helpful. But if your problem is something like a melancholic depression, then lack of desire doesn’t actually seem very appealing; that is the default state and it’s not a good one. Just be mindful of universalizing this advice.

David Pearce: Yes. Other things being equal, the happiest people tend to have the most desires. Of course, a tremendous desire can also bring tremendous suffering, but there are a very large number of people in the world who are essentially unmotivated. Nothing really excites them. In some cases, they’re just waiting to die: melancholic depression. Desire can be harnessed.

A big problem, of course, is that in a Darwinian world, many of our desires are mutually inconsistent. And to use (what to me at least would be) a trivial example – it’s not trivial to everyone –  if you have 50 different football teams with all their supporters, there is logically no way that the preferences of these fanatical football supporters can be reconciled. But nonetheless, by raising their hedonic set-points, one can allow all football supporters to enjoy information-sensitive gradients of bliss. But there is simply no way to reconcile their preferences.

Lucas Perry: There’s part of me that does want to do some universalization here, and maybe that is wrong or unskillful to do, but I seem to be able to imagine a future where, say we get aligned superintelligence and there’s some kind of rapid expansion, some kind of optimization bubble of some kind. And maybe there are the worker AIs and then there are the exploiter AIs, and the exploiter AIs just get blissed out.

And imagine if some of the exploiter AIs are egomaniacs in their hedonistic simulations and some of them are hive minds, and they all have different views on open individualism or closed individualism. Some of the views on identity just seem more deluded to me than others. I seem to have a problem with a self identification and reification of self as something. It seems to me, to take something that is conventional and make it an ultimate truth, which is confusing to the agent, and that to me seems bad or wrong, like our world model is wrong. Part of me wants to say it is always better to know the truth, but I also feel like I’m having a hard time being able to say how to navigate views of identity in a true way, and then another part of me feels like actually it doesn’t really matter only in so far as it affects the flavor of that consciousness.

Andrés Gómez Emilsson: If we find like the chemical or genetic levers for different notions of identity, we could presumably imagine a lot of different ecosystems of approaches to identity in the future, some of them perhaps being much more adaptive than others. I do think I grasp a little bit maybe the intuition pump, and I think that’s actually something that resonates quite a bit with us, which is that it is an instrumental value for sure to always be truth-seeking, especially when you’re talking about general intelligence.

It’s very weird and it sounds like it’s going to fail if you say, “Hey, I’m going to be truth-seeking in every domain except on here.” And these might be identity, or value function, or your model of physics or something like that, but perhaps actual superintelligence in some sense it really entails having an open-ended model for everything, including ultimately who you are. If you’re not having those open-ended models that can be revised with further evidence and reasoning, you are not a super intelligence.

That intuition pump may suggest that if intelligence turns out to be extremely adaptive and powerful, then presumably, the superintelligences of the future will have true models of what’s actually going on in the world, not just convenient fictions.

David Pearce: Yes. In some sense I would hope our long-term goal is ignorance of the entire Darwinian era and its horrors. But it would be extremely dangerous if we were to give up prematurely. We need to understand reality and the theoretical upper bounds of rational moral agency in the cosmos. But ultimately, when we have done literally everything that it is possible to do to minimize and prevent suffering, I think in some sense we should want to forget about it altogether. But I would stress the risks of premature defeatism.

Lucas Perry: Of course we’re always going to need a self model, a model of the cognitive architecture in which the self model is embedded, it needs to understand the directly adjacent computations which are integrated into it, but it seems like the views of identity go beyond just this self model. Is that the solution to identity? What does open, closed, or empty individualism have to say about something like that?

Andrés Gómez Emilsson: Open, empty and closed as ontological claims, yeah, I mean they are separable from the functional uses of a self model. It does however, have bearings on basically the decision theoretic rationality of an intelligence, because when it comes to planning ahead, if you have the intense objective of being as happy as you can, and somebody offers you a cloning machine and they say, “Hey, you can trade one year of your life for just a completely new copy of yourself.” Do you press the button to make that happen? For making that decision, you actually do require a model of ontological notion of identity, unless you just care about replication.

Lucas Perry: So I think that the problem there is that identity, at least in us apes, is caught up in ethics. If you could have an agent like that where identity was not factored into ethics, then I think that it would make a better decision.

Andrés Gómez Emilsson: It’s definitely a question too of whether you can bootstrap an impartial god’s-eye-view on the wellbeing of all sentient beings without first having developed a sense of own identity and then wanting to preserve it, and finally updating it with more information, you know, philosophy, reasoning, physics. I do wonder if you can start out without caring about identity, and finally concluding with kind of an impartial god’s-eye-view. I think probably in practice a lot of those transitions do happen because the person is first concerned with themselves, and then they update the model of who they are based on more evidence. You know, I could be wrong, it might be possible to completely sidestep Darwinian identities and just jump straight up into impartial care for all sentient beings, I don’t know.

Lucas Perry: So we’re getting into the ethics of identity here, and why it matters. The question for this portion of the discussion is what are the ethical implications of different views on identity? Andres, I think you can sort of kick this conversation off by talking a little bit about the game theory.

Andrés Gómez Emilsson: Right, well yeah, the game theory is surprisingly complicated. Just consider within a given person, in fact, the different “sub agents” of an individual. Let’s say you’re drinking with your friends on a Friday evening, but you know you have to wake up early at 8:00 AM for whatever reason, and you’re deciding whether to have another drink or not. Your intoxicated self says, “Yes, of course. Tonight is all that matters.” Whereas your cautious self might try to persuade you that no, you will also exist tomorrow in the morning.

Within a given person, there’s all kinds of complex game theory that happens between alternative views of identity. Even implicitly it becomes obviously much more tricky when you expand it outwards, how like some social movements in a sense are trying to hack people’s view of identity, whether the unit is your political party, or the country, or the whole ecosystem, or whatever it may be. A key thing to consider here is the existence of legible Schelling points, also called focal points, which is in the essence of communication between entities, what are some kind of guiding principles that they can use in order to effectively coordinate and move towards a certain goal?

I would say that having something like open individualism itself can be a powerful Schelling point for coordination. Especially because if you can be convinced that somebody is an open individualist, you have reasons to trust them. There’s all of this research on how high-trust social environments are so much more conducive to productivity and long-term sustainability than low-trust environments, and expansive notions of identity are very trust building.

On the other hand, from a game theoretical point of view, you also have the problem of defection. Within an open individualist society, you have a small group of people who can fake the test of open individualism. They can take over from within, and instantiate some kind of a dictatorship or some type of a closed individualist takeover of what was a really good society, good for everybody.

This is a serious problem, even when it comes to, for example, forming groups of people with all of them share a certain experience. For example, MDMA, or 5-MeO-DMT, or let’s say deep stages of meditation. Even then, you’ve got to be careful, because people who are resistant to those states may pretend that they have an expanded notion of identity, but actually covertly work towards a much more reduced sense of identity. I have yet to see a credible game theoretically aware solution to how to make this work.

Lucas Perry: If you could clarify the knobs in a person, whether it be altruism, or selfishness, or other things that the different views on identity turn, and if you could clarify how that affects the game theory, then I think that that would be helpful.

Andrés Gómez Emilsson: I mean, I think the biggest knob is fundamentally what experiences count from the point of view of the fact that you expect to, in a sense, be there or expect them to be real, in as real of a way as your current experience is. It’s also contingent on theories of consciousness, because you could be an open individualist and still believe that higher order cognition is necessary for consciousness, and that non-human animals are not conscious. That gives rise to all sorts of other problems, the person presumably is altruistic and cares about others, but they just still don’t include non-human animals for a completely different reason in that case.

Definitely another knob is how you consider what you will be in the future. Whether you consider that to be part of the universe or the entirety of the universe. I guess I used to think that personal identity was very tied to a hedonic tone. I think of them as much more dissociated now. There is a general pattern: people who are very low mood may have kind of a bias towards empty individualism. People who become open individualists often experience a huge surge in positive feelings for a while because they feel that they’re never going to die, like the fear of death greatly diminishes, but I don’t actually think it’s a surefire or a foolproof way of increasing wellbeing, because if you take seriously open individualism, it also comes with terrible implications. Like that hey, we are also the pigs in factory farms. It’s not a very pleasant view.

Lucas Perry: Yeah, I take that seriously.

Andrés Gómez Emilsson: I used to believe for a while that the best thing we could possibly do in the world was to just write a lot of essays and books about why open individualism is true. Now I think it’s important to combine it with consciousness technologies so that, hey, once we do want to upgrade our sense of identity to a greater circle of compassion, that we also have the enhanced happiness and mental stability to be able to actually engage with that without going crazy.

Lucas Perry: This has me thinking about one point that I think is very motivating for me for the ethical case of veganism. Take the common sense, normal consciousness, like most people have, and that I have, you just feel like a self that’s having an experience. You just feel like you are fortunate enough to be born as you, and to be having the Andrés experience or the Lucas experience, and that your life is from birth to death, or whatever, and when you die you will be annihilated, you will no longer have experience. Then who is it that is experiencing the cow consciousness? Who is it that is experiencing the chicken and the pig consciousness? There’s so many instantiations of that, like billions. Even if this is based off of the irrationality, it still feels motivating to me. Yeah, I could just die and wake up as a cow 10 billion times. That’s kind of the experience that is going on right now. The sudden confused awakening into cow consciousness plus factory farming conditions. I’m not sure if you find that completely irrational or motivating or what.

Andrés Gómez Emilsson: No, I mean I think it makes sense. We have a common friend as well, Magnus Vinding. He wrote a pro-veganism book actually kind of with this line of reasoning. It’s called You Are Them. About how post theoretical science of consciousness and identity itself is a strong case for an ethical lifestyle.

Lucas Perry: Just touching here on the ethical implications, some other points that I just want to add here are that when one is identified with one’s phenomenal identity, in particular, I want to talk about the experience of self, where you feel like you’re a closed individualist, which your life is like when you were born, and then up until when you die, that’s you. I think that that breeds a very strong duality in terms of your relationship with your own personal phenomenal consciousness. The suffering and joy which you have direct access to are categorized as mine or not mine.

Those which are mine take high moral and ethical priority over the suffering of others. You’re not mind-melded with all of the other brains, right? So there’s an epistemological limitation there where you’re not directly experiencing the suffering of other people, but the closed individualist view goes a step further and isn’t just saying that there’s an epistemological limitation, but it’s also saying that this consciousness is mine, and that consciousness is yours, and this is the distinction between self and other. And given selfishness, that self consciousness will take moral priority over other consciousness.

That I think just obviously has massive ethical implications with regards to their greed of people. I view here the ethical implications as being important because, at least in the way that human beings function, if one is able to fully rid themselves of the ultimate identification with your personal consciousness as being the content of self, then I can move beyond the duality of consciousness of self and other, and care about all instances of wellbeing and suffering much more equally than I currently do. That to me seems harder to do, at least with human brains. If we have a strong reification and identification with your instances of suffering or wellbeing as your own.

David Pearce: Part of the problem is that the existence of other subjects of experience is metaphysical speculation. It’s metaphysical speculation that one should take extremely seriously: I’m not a solipsist. I believe that other subjects of experience, human and nonhuman, are as real as my experience. But nonetheless, it is still speculative and theoretical. One cannot feel their experiences. There is simply no way, given the way that we are constituted, the way we are now, that one can behave with impartial, God-like benevolence.

Andrés Gómez Emilsson: I guess I would question it perhaps a little bit that we only care about our future suffering within our own experience, because this is me, this is mine, it’s not an other. In a sense I think we care about those more, largely because they’re are more intense, you do see examples of, for example, mirror touch synesthesia, of people who if they see somebody else get hurt, they also experience pain. I don’t mean a fleeting sense of discomfort, but perhaps even actual strong pain because they’re able to kind of reflect that for whatever reason.

People like that are generally very motivated to help others as well. In a sense, their implicit self model includes others, or at least weighs others more than most people do. I mean in some sense you can perhaps make sense of selfishness in this context as the coincidence that what is within our self model is experienced as more intense. But there’s plenty of counter examples to that, including sense of depersonalization or ego death, where you can experience the feeling of God, for example, as being this eternal and impersonal force that is infinitely more intense than you, and therefore it matters more, even though you don’t experience it as you. Perhaps the core issue is what gets the highest amount of intensity within your world simulation.

Lucas Perry: Okay, so I also just want to touch on a little bit about preferences here before we move on to how this is relevant to AI alignment and the creation of beneficial AI. From the moral realist perspective, if you take the metaphysical existence of consciousness very substantially, and you view it as the ground of morality, then different views on identity will shift how you weight the preferences of other creatures.

So from a moral perspective, whatever kinds of views of identity end up broadening your moral circle of compassion closer and closer to the end goal of impartial benevolence for all sentient beings according to their degree and kinds of worth, I would view as a good thing. But now there’s this other way to think about identity because if you’re listening to this, and you’re a moral anti-realist, there is just the arbitrary, evolutionary, and historical set of preferences that exist across all creatures on the planet.

Then the views on identity I think are also obviously again going to weigh into your moral considerations about how much to just respect different preferences, right. One might want to go beyond hedonic consequentialism here, and could just be a preference consequentialist. You could be a deontological ethicist or a virtue ethicist too. We could also consider about how different views on identity as lived experiences would affect what it means to become virtuous, if being virtuous means moving beyond the self actually.

Andrés Gómez Emilsson: I think I understand what you’re getting at. I mean, really there’s kind of two components to ontology. One is what exists, and then the other one is what is valuable. You can arrive at something like open individualism just from the point of view of what exists, but still have disagreements with other open individualists about what is valuable. Alternatively, you could agree on what is valuable with somebody but completely disagree on what exists. To get the power of cooperation of open individualism as a Schelling point, there also needs to be some level of agreement on what is valuable, not just what exists.

It definitely sounds arrogant, but I do think that by the same principle by which you arrive at open individualism or empty individualism, basically nonstandard views of identities, you can also arrive at hedonistic utilitarianism, and that is, again, like the principle of really caring about knowing who or what you are fundamentally. To know yourself more deeply also entails understanding from second to second how your preferences impact your state of consciousness. It is my view that just as open individualism, you can think of it as the implication of taking a very systematic approach to make sense of identity. Likewise, philosophical hedonism is also an implication of taking a very systematic approach at trying to figure out what is valuable. How do we know that pleasure is good?

David Pearce: Yeah, does the pain-pleasure axis disclose the world’s intrinsic metric of (dis)value? There is something completely coercive about pleasure and pain. One can’t transcend the pleasure/pain axis. Compare the effects of taking heroin, or “enhanced interrogation”. There is no one with an inverted pleasure/pain axis. Supposed counter-examples, like sado-masochists, in fact just validate the primacy of the pleasure/pain axis.

What follows from the primacy of the pleasure/pain axis? Should we be aiming, as classical utilitarians urge, to maximize the positive abundance of subjective value in the universe, or at least our forward light-cone? But if we are classical utilitarians, there is a latently apocalyptic implication of classical utilitarianism – namely, that we ought to be aiming to launch something like a utilitronium (or hedonium) shockwave – where utilitronium or hedonium is matter and energy optimized for pure bliss.

So rather than any kind of notion of personal identity as we currently understand it, if one is a classical utilitarian – or if one is programming a computer or a robot with the utility function of classical utilitarianism –  should one therefore essentially be aiming to launch an apocalyptic utilitronium shockwave? Or alternatively, should one be trying to ensure that the abundance of positive value within our cosmological horizon is suboptimal by classical utilitarian criteria?

I don’t actually personally advocate a utilitronium shockwave. I don’t think it’s sociologically realistic. I think much more sociologically realistic is to aim for a world based on gradients of intelligent bliss -because that way, people’s existing values and preferences can (for the most part) be conserved. But nonetheless, if one is a classical utilitarian, it’s not clear one is allowed this kind of messy compromise.

Lucas Perry: All right, so now that we’re getting into the juicy, hedonistic imperative type stuff, let’s talk about here how about how this is relevant to AI alignment and the creation of beneficial AI. I think that this is clear based off of the conversations we’ve had already about the ethical implications, and just how prevalent identity is in our world for the functioning of society and sociology, and just civilization in general.

Let’s limit the conversation for the moment just to AI alignment. And for this initial discussion of AI alignment, I just want to limit it to the definition of AI alignment as developing the technical process by which AIs can learn human preferences, and help further express and idealize humanity. So exploring how identity is important and meaningful for that process, two points I think that it’s relevant for, who are we making the AI for? Different views on identity I think would matter, because if we assume that sufficiently powerful and integrated AI systems are likely to have consciousness or to have qualia, they’re moral agents in themselves.

So who are we making the AI for? We’re making new patients or subjects of morality if we ground morality on consciousness. So from a purely egoistic point of view, the AI alignment process is just for humans. It’s just to get the AI to serve us. But if we care about all sentient beings impartially, and we just want to maximize conscious bliss in the world, and we don’t have this dualistic distinction of consciousness being self or other, we could make the AI alignment process something that is more purely altruistic. That we recognize that we’re creating something that is fundamentally more morally relevant than we are, given that it may have more profound capacities for experience or not.

David, I’m also holding in my hand, I know that you’re skeptical of the ability of AGI or superintelligence to be conscious. I agree that that’s not solved yet, but I’m just working here with the idea of, okay, maybe if they are. So I think it can change the altruism versus selfishness, the motivations around who we’re training the AIs for. And then the second part is why are we making the AI? Are we making it for ourselves or are we making it for the world?

If we take a view from nowhere, what Andrés called a god’s-eye-view, is this ultimately something that is for humanity or is it something ultimately for just making a better world? Personally, I feel that if the end goal is ultimate loving kindness and impartial ethical commitment to the wellbeing of all sentient creatures in all directions, then ideally the process is something that we’re doing for the world, and that we recognize the intrinsic moral worth of the AGI and superintelligence as ultimately more morally relevant descendants of ours. So I wonder if you guys have any reactions to this?

Andrés Gómez Emilsson: Yeah, yeah, definitely. So many. Tongue in cheek, but you’ve just made me chuckle when you said, “Why are we making the AI to begin with?” I think there’s a case to be made that the actual reason why we’re making AI is a kind of an impressive display of fitness in order to signal our intellectual fortitude and superiority. I mean sociologically speaking, you know, actually getting an AI to do something really well. It’s a way in which you can yourself signal your own intelligence, and I guess I worry to some extent that this is a bit of a tragedy of the commons, as it is the case with our weapon development. You’re so concerned with whether you can, and especially because of the social incentives, that you’re going to gain status and be looked at as somebody who’s really competent and smart, that you don’t really stop and wonder whether you should be building this thing in the first place.

Leaving that aside, just from a purely ethically motivated point of view, I do remember thinking and having a lot of discussions many years ago about if we can make a super computer experience what it is like for a human to be on MDMA. Then all of a sudden that supercomputer becomes a moral patient. It actually matters, you probably shouldn’t turn it off. Maybe in fact you should make more of them. A very important thing I’d like to say here is: I think it’s really important to distinguish the notion of intelligence.

On the one hand, as causal power over your environment, and on the other hand as the capacity for self insight, and introspection, and understanding reality. I would say that we tend to confuse these quite a bit. I mean especially in circles that don’t take consciousness very seriously. It’s usually implicitly assumed that having a superhuman ability to control your environment entails that you also have, in a sense, kind of a superhuman sense of self or a superhuman broad sense of intelligence. Whereas even if you are a functionalist, I mean even if you believe that a digital computer can be conscious, you can make a pretty strong case that even then, it is not something automatic. It’s not just that if you program the appropriate behavior, it will automatically also be conscious.

A super straight forward example here is that if you have the Chinese room, if it’s just a giant lookup table, clearly it is not a subject of experience, even though the input / output mapping might be very persuasive. There’s definitely still the problems there, and I think if we aim instead towards maximizing intelligence in the broad sense, that does entail also the ability to actually understand the nature and scope of other states of consciousness. And in that sense, I think a superintelligence of that sort would it be intrinsically aligned with the intrinsic values of consciousness. But there are just so many ways of making partial superintelligences that maybe are superintelligent in many ways, but not in that one in particular, and I worry about that.

David Pearce: I sometimes sketch this simplistic trichotomy, three conceptions of superintelligence. One is a kind of “Intelligence Explosion” of recursively self-improving software-based AI. Then there is the Kurzweilian scenario – a complete fusion of humans and our machines. And then there is, very crudely, biological superintelligence, not just rewriting our genetic source code, but also (and Neuralink prefigures this) essentially “narrow” superintelligence-on-a-chip so that anything that anything a classical digital computer can do a biological human or a transhuman can do.

So yes, I see full-spectrum superintelligence as our biological descendants, super-sentient, able to navigate radically alien states of consciousness. So I think the question that you’re asking is why are we developing “narrow” AI – non-biological machine superintelligence.

Lucas Perry: Speaking specifically from the AI alignment perspective, how you align current day systems and future systems to superintelligence and beyond with human values and preferences, and so the question born of that, in the context of these questions of identity, is who are we making that AI for and why are we making the AI?

David Pearce: If you’ve got Buddha, “I teach one thing and one thing only, suffering and the end of suffering”… Buddha would press the OFF button, and I would press the OFF button.

Lucas Perry: What’s the off button?

David Pearce: Sorry, the notional initiation of a vacuum phase-transition (or something similar) that (instantaneously) obliterates Darwinian life. But when people talk about “AI alignment”, or most people working in the field at any rate, they are not talking about a Buddhist ethic [the end of suffering] – they have something else in mind. In practical terms, this is not a fruitful line of thought to pursue – you know, the implications of Buddhist, Benatarian, negative utilitarian, suffering-focused ethics.

Essentially that one wants to ratchet up hedonic range and hedonic set-points in such a way that you’re conserving people’s existing preferences – even though their existing preferences and values are, in many cases, in conflict with each other. Now, how one actually implements this in a classical digital computer, or a classically parallel connectionist system, or some kind of hybrid, I don’t know precisely.

Andrés Gómez Emilsson: At least there is one pretty famous cognitive scientist and AI theorist does propose the Buddhist ethic of turning the off button of the universe. Thomas Metzinger, and his benevolent, artificial anti-natalism. I mean, yeah. Actually that’s pretty interesting because he explores the idea of an AI that truly kind of extrapolates human values and what’s good for us as subjects of experience. The AI concludes what we are psychologically unable to, which is that the ethical choice is non-existence.

But yeah, I mean, I think that’s, as David pointed out, implausible. I think it’s much better to put our efforts in creating a super cooperator cluster that tries to recalibrate the hedonic set point so that we are animated by gradients of bliss. Sociological constraints are really, really important here. Otherwise you risk…

Lucas Perry: Being irrelevant.

Andrés Gómez Emilsson: … being irrelevant, yeah, is one thing. The other thing is unleashing an ineffective or failed attempt at sterilizing the world, which would be so much, much worse.

Lucas Perry: I don’t agree with this view, David. Generally, I think that Darwinian history has probably been net negative, but I’m extremely optimistic about how good the future can be. And so I think it’s an open question at the end of time, how much misery and suffering and positive experience there was. So I guess I would say I’m agnostic as to this question. But if we get AI alignment right, and these other things, then I think that it can be extremely good. And I just want to tether this back to identity and AI alignment.

Andrés Gómez Emilsson: I do have the strong intuition that if empty individualism is correct at an ontological level, then actually negative utilitarianism can be pretty strongly defended on the grounds that when you have a moment of intense suffering, that’s the entirety of that entity’s existence. And especially with eternalism, once it happens, there’s nothing you can do to prevent it.

There’s something that seems particularly awful about allowing inherently negative experiences to just exist. That said, I think open individualism actually may to some extent weaken that. Because even if the suffering was very intense, you can still imagine that if you identify with consciousness as a whole, you may be willing to undergo some bad suffering as a trade-off for something much, much better in the future.

It sounds completely insane if you’re currently experiencing a cluster headache or something astronomically painful. But maybe from the point of view of eternity, it actually makes sense. Those are still tiny specs of experience relative to the beings that are going to exist in the future. You can imagine Jupiter brains and Dyson spheres just in a constant ecstatic state. I think open individualism might counterbalance some of the negative utilitarian worries and would be something that an AI would have to contemplate and might push it one way or the other.

Lucas Perry: Let’s go ahead and expand the definition of AI alignment. A broader way we can look at the AI alignment problem, or the problem of generating beneficial AI, and making future AI stuff go well, where that is understood is the project of making sure that the technical, political, social, and moral consequences of short-term to super intelligence and beyond, is that as we go through all of that, that is a beneficial process.

Thinking about identity in that process, we were talking about how strong nationalism or strong identity or identification with regards to a nation state is a form of identity construction that people do. The nation or the country becomes part of self. One of the problems of the AI alignment problem is arms racing between countries, and so taking shortcuts on safety. I’m not trying to propose clear answers or solutions here. It’s unclear how successful an intervention here could even be. But these views on identity and how much nationalism shifts or not, I think feed into how difficult or not the problem will be.

Andrés Gómez Emilsson: The point of game theory becomes very, very important in that yes, you do want to help other people who are also trying to improve the well-being of all consciousness. On the other hand, if there’s a way to fake caring about the entirety of consciousness, that is a problem because then you would be using resources on people who would hoard them or even worse wrestle the power away from you so that they can focus on their narrow sense of identity.

In that sense, I think having technologies in order to set particular phenomenal experiences of identity, as well as to be able to detect them, might be super important. But above all, and I mean this is definitely my area of research, having a way of objectively quantifying how good or bad a state of consciousness is based on the activity of a nervous system seems to me like an extraordinarily key component for any kind of a serious AI alignment.

If you’re actually trying to prevent bad scenarios in the future, you’ve got to have a principle way of knowing whether the outcome is bad, or at the very least knowing whether the outcome is terrible. The aligned AI should be able to grasp that a certain state of consciousness, even if nobody has experienced it before, will be really bad and it should be avoided, and that tends to be the lens through which I see this.

In terms of improving people’s internal self-consistency, as David pointed out, I think it’s kind of pointless to try to satisfy a lot of people’s preferences, such as having their favorite sports team win, because there’s really just no way of satisfying everybody’s preferences. In the realm of psychology is where a lot of these interventions would happen. You can’t expect an AI to be aligned with you, if you yourself are not aligned with yourself, right, if you have all of these strange, psychotic, competing sub-agents. It seems like part of the process is going to be developing techniques to become more consistent, so that we can actually be helped.

David Pearce: In terms of risks this century, nationalism has been responsible for most of the wars of the past two centuries, and nationalism is highly likely to lead to catastrophic war this century. And the underlying source of global catastrophic risk? I don’t think it’s AI. It’s male human primates doing what male human primates have been “designed” by evolution to do – to fight, to compete, to wage war. And even vegan pacifists like me, how do we spend their leisure time? Playing violent video games.

There are technical ways one can envisage mitigating the risk. Perhaps it’s unduly optimistic aiming for all-female governance or for a democratically-accountable world state under the auspices of the United Nations. But I think unless one does have somebody with a monopoly on the use of force that we are going to have cataclysmic nuclear war this century. It’s highly likely: I think we’re sleepwalking our way towards disaster. It’s more intellectually exciting discussing exotic risks from AI that goes FOOM, or something like that. But there are much more mundane catastrophes that are, I suspect, going to unfold this century.

Lucas Perry: All right, so getting into the other part here about AI alignment and beneficial AI throughout this next century, there’s a lot of different things that increased intelligence and capacity and power over the world is going to enable. There’s going to be human biological species divergence via AI-enabled bioengineering. There is this fundamental desire for immortality with many people, and the drive towards super intelligence and beyond for some people promises immortality. I think that in terms of closed individualism here, closed individualism is extremely motivating for this extreme self-concern of desire for immortality.

There are people currently today who are investing in say, cryonics, because they want to freeze themselves and make it long enough so that they can somehow become immortal, very clearly influenced by their ideas of identity. As Yuval Noah Harari was saying on our last podcast, it subverts many of the classic liberal myths that we have about the same intrinsic worth across all people; and then if you add humans 2.0 or 3.0 or 4.0 into the mixture, it’s going to subvert that even more. So there are important questions of identity there, I think.

With sufficiently advanced super intelligence people flirt with the idea of being uploaded. The identity questions here which are relevant are if we scan the information architecture or the neural architecture of your brain and upload it, will people feel like that is them? Is it not them? What does it mean to be you? Also, of course, in scenarios where people want to merge with the AI, what is it that you would want to be kept in the merging process? What is superfluous to you? What is not nonessential to your identity or what it means to be you, that you would be okay or not with merging?

And then I think that most importantly here, I’m very interested in the Descendants scenario, where we just view AI as like our evolutionary descendants. There’s this tendency in humanity to not be okay with this descendant scenario. Because of closed individualist views on identity, they won’t see that consciousness is the same kind of thing, or they won’t see it as their own consciousness. They see that well-being through the lens of self and other, so that makes people less interested in they’re being descendant, super-intelligent conscious AIs. Maybe there’s also a bit of speciesism in there.

I wonder if you guys want to have any reactions to identity in any of these processes? Again, they are human, biological species divergence via AI-enabled bioengineering, immortality, uploads, merging, or the Descendants scenario.

David Pearce: In spite of thinking that Darwinian life is sentient malware, I think cryonics should be opt-out, and cryothanasia should be opt-in, as a way to defang death. So long as someone is suspended in optimal conditions, it ought to be possible for advanced intelligence to reanimate that person. And sure, if one is an “empty” individualist, or if you’re the kind of person who wakes up in the morning troubled that you’re not the person who went to sleep last night, this reanimated person may not really be “you”. But if you’re more normal, yes, I think it should be possible to reanimate “you” if you are suspended.

In terms of mind uploads, this is back to the binding problem. Even assuming that you can be scanned with a moderate degree of fidelity, I don’t think your notional digital counterpart is a subject to experience. Even if I am completely wrong here and that somehow subjects or experience inexplicably emerge in classical digital computers, there’s no guarantee that the qualia would be the same. After all, you can replay a game of chess with perfect fidelity, but there’s no guarantee incidentals like the textures or the pieces will be the same. Why expect the textures of qualia to be the same, but that isn’t really my objection. It’s the fact that a digital computer cannot support phenomenally-bound subjects of experience.

Andrés Gómez Emilsson: I also think cryonics is really good. Even though with a different nonstandard view of personal identity, it’s kind of puzzling. Why would you care about it? Lots of practical considerations. I like what David said of like defanging death. I think that’s a good idea, but also giving people skin in the game for the future.

People who enact policy and become politically successful, often tend to be 50 years plus, and there’s a lot of things that they weigh on, that they will not actually get to experience, that probably biases politicians and people who are enacting policy to focus, especially just on short-term gains as opposed to really genuinely trying to improve the long-term; and I think cryonics would be helpful in giving people skin in the game.

More broadly speaking, it does seem to be the case that what aspect of transhumanism a person is likely to focus on depends a lot on their theories of identity. I mean, if we break down transhumanism into the three supers of super happiness, super longevity, and super intelligence, the longevity branch is pretty large. There’s a lot of people looking for ways of rejuvenating, preventing aging, and reviving ourselves, or even uploading ourselves.

Then there’s people who are very interested in super intelligence. I think that’s probably the most popular type of transhumanism nowadays. That one I think does rely to some extent on people having a functionalist information theoretic account of their own identity. There’s all of these tropes of, “Hey, if you leave a large enough digital footprint online, a super intelligence will be able to reverse engineer your brain just from that, and maybe reanimate you in the future,” or something of that nature.

And then there’s, yeah, people like David and I, and the Qualia Research Institute as well, that care primarily about super happiness. We think of it as kind of a requirement for a future that is actually worth living. You can have all the longevity and all the intelligence you want, but if you’re not happy, I don’t really see the point. A lot of the concerns with longevity, fear of death and so on, in retrospect, I think will be probably considered some kind of a neurosis. Obviously a genetically adaptive neurosis, but something that can be cured with mood-enhancing technologies.

Lucas Perry: Leveraging human selfishness or leveraging how most people are closed individualists seems like the way to having good AI alignment. To one extent, I find the immortality pursuits through cryonics to be pretty elitist. But I think it’s a really good point that giving the policymakers and the older generation and people in power more skin in the game over the future is both potentially very good and also very scary.

It’s very scary to the extent to which they could get absolute power, but also very good if you’re able to mitigate risks of them developing absolute power. But again, as you said, it motivates them towards more deeply and profoundly considering future considerations, being less myopic, being less selfish. So that getting the AI alignment process right and doing the necessary technical work, it’s not done for a short-term nationalistic gain. Again, with an asterisk here that the risk is unilaterally getting more and more power.

Andrés Gómez Emilsson: Yeah, yeah, yeah. Also, without cryonics, another way to increase skin in the game, may be more straight-forwardly positive. Bliss technologies do that. A lot of people who are depressed or nihilistic or vengeful or misanthropic, they don’t really care about destroying the world or watching it burn, so to speak, because they don’t have anything to lose. But if you have a really reliable MDMA-like technological device that reliably produces wonderful states of consciousness, I think people will be much more careful about preserving their own health, and also not watch the world burn, because they know “I could be back home and actually experiencing this rather than just trying to satisfy my misanthropic desires.”

David Pearce: Yeah, the happiest people I know work in the field of existential risk. Rather than great happiness just making people reckless, it can also make them more inclined to conserve and protect.

Lucas Perry: Awesome. I guess just one more thing that I wanted to hit on these different ways that technology is going to change society is… I don’t know. In my heart, the ideal is the vow to liberate all sentient beings in all directions from suffering. The closed individualist view seems generally fairly antithetical to that, but there’s also this desire for me to be realistic about leveraging that human selfishness towards that ethic. The capacity here for conversations on identity going forward, if we can at least give people more information to subvert or challenge or give them information about why the common sense closed individualist view might be wrong, I think it would just have a ton of implications for how people end up viewing human species divergence, or immortality, or uploads, or merging, or the Descendants scenario.

In Max’s book, Life 3.0, he describes a bunch of different scenarios for how you want the world to be as the impact of AI grows, if we’re lucky enough to reach superintelligent AI. These scenarios that he gives are, for example, an Egalitarian Utopia where humans, cyborgs and uploads coexist peacefully thanks to property abolition and guaranteed income. There’s a Libertarian Utopia where human, cyborgs, and uploads, and superintelligences coexist peacefully thanks to property rights. There is a Protector God scenario where essentially omniscient and omnipotent AI maximizes human happiness by intervening only in ways that preserve our feeling of control of our own destiny, and hides well enough that many humans even doubt the AI’s existence. There’s Enslaved God, which is kind of self-evident. The AI is a slave to our will. The Descendants Scenario, which I described earlier, where AIs replace human beings, but give us a graceful exit, making us view them as our worthy descendants, much as parents feel happy and proud to have a child who’s smarter than them, who learns from them, and then accomplishes what they could only dream of, even if they can’t live to see it.

After the book was released, Max did a survey of which ideal societies people were most excited about. And basically most people wanted either the Egalitarian Utopia or the Libertarian Utopia. These are very human centric of course, because I think most people are closed individualists, so okay, they’re going to pick that. And then they wanted to Protector God next, and then the fourth most popular was an Enslaved God. The fifth most popular was Descendants.

I’m a very big fan of the Descendants scenario. Maybe it’s because of my empty individualism. I just feel here that as views on identity are quite uninformed for most people or most people don’t take it, or closed individualism just seems intuitively true from the beginning because it seems like it’s been selected for mostly by Darwinian evolution to have a very strong sense of self. I just think that challenging conventional views on identity will very much shift the kinds of worlds that people are okay with or the kinds of worlds that people want.

If we had a big, massive public education campaign about the philosophy of identity and then take the same survey later, I think that the numbers would be much more different. That seems to be a necessary part of the education of humanity in the process of beneficial AI and AI alignment. To me, the Descendant scenario just seems best because it’s more clearly in line with this ethic of being impartially devoted to maximizing the well-being of sentience everywhere.

I’m curious to know your guys’ reaction to these different scenarios about how you feel views on identity as they shift will inform the kinds of worlds that humanity finds beautiful or meaningful or worthy of pursuit through and with AI.

David Pearce: If today’s hedonic range is -10 to zero to +10, yes, whether building a civilization with a hedonic range of +70 to +100, i.e. with more hedonic contrast, or +90 to a +100 with less hedonic contrast, the multiple phase-changes in consciousness involved are completely inconceivable to humans. But in terms of full-spectrum superintelligence, what we don’t know is the nature of their radically alien-state-spaces of consciousness – far more different than, let’s say, dreaming consciousness and waking consciousness – that I suspect that intelligence going to explore. And we just do not have the language, the concepts, to conceptualize what these alien state-spaces of consciousness are like. I suspect billions of years of consciousness-exploration lie ahead. I assume that a central element will be the pleasure-axis – that these states will be generically wonderful – but they will otherwise be completely alien. And so talk of “identity” with primitive Darwinian malware like us is quite fanciful.

Andrés Gómez Emilsson: Consider the following thought experiment where you have a chimpanzee right next to a person, who is right next to another person, where the third one is currently on a high dose of DMT, combined with ketamine and salvia. If you consider those three entities, I think very likely, actually the experience of the chimpanzee and the experience of the sober person are very much alike, compared to the person who is on DMT, ketamine, and salvia, who is in a completely different alien-state space of consciousness. And in some sense, biologically you’re unrelatable from the point of view of qualia and the sense of self, and time, and space, and all of those things.

Personally, I think having intimations with alien-state spaces of consciousness is actually good also quite apart from changes in a feeling that you’ve become one with the universe. Merely having experience with really different states of consciousness makes it easier for you to identify with consciousness as a whole: you realize, okay, my DMT self, so to speak, cannot exist naturally, and it’s just so much different to who I am normally, and even more different than perhaps being a chimpanzee, that you could imagine caring as well about alien-state spaces of consciousness that are completely nonhuman, and I think that it can be pretty helpful.

The other reason why I give a lot of credence to open individualism being a winning strategy, even just from a purely political and sociological point of view, is that open individualists are not afraid of changing their own state of consciousness, because they realize that it will be them either way. Whereas closed individualists can actually be pretty scared of, for example, taking DMT or something like that. They tend to have at least the suspicion that, oh my gosh, is the person who is going to be on DMT me? Am I going to be there? Or maybe I’m just being possessed by a different entity with completely different values and consciousness.

With open individualism, no matter what type of consciousness your brain generates, it’s going to be you. It massively amplifies the degrees of freedom for coordination. Plus, you’re not afraid of tuning your consciousness for particular new computational uses. Again, this could be extremely powerful as a cooperation and coordination tool.

To summarize, I think a plausible and very nice future scenario is going to be the mixture of open individualism, on the one hand; second, generically enhanced hedonic tone, so that everything is amazing; and third, expanded range of possible experiences. That we will have the tools to experience pretty much arbitrary state spaces of consciousness and consider them our own.

The Descendant scenario, I think it’s much easier to imagine thinking of the new entities as your offspring if you can at least know what they feel like. You can take a drug or something and know, “okay, this is what it’s like to be a post-human android. I like it. This is wonderful. It’s better than being a human.” That would make it possible.

Lucas Perry: Wonderful. This last question is just the role of identity in the AI itself, or the superintelligence itself, as it experiences the world, the ethical implications of those identity models, et cetera. There is the question of identity now, and if we get aligned superintelligence and post-human superintelligence, and we have Jupiter rings or Dyson spheres or whatever, that there’s the question of identity evolving in that system. We are very much creating Life 3.0, and there is a substantive question of what kind of identity views it will take, what it’s phenomenal experience of self or not will have. This all is relevant and important because if we’re concerned with maximizing conscious well-being, then these are flavors of consciousness which would require a sufficiently, rigorous science of consciousness to understand their valence properties.

Andrés Gómez Emilsson: I mean, I think it’s a really, really good thing to think about. The overall frame I tend to utilize, to analyze this kind of questions is, I wrote an article and you can find it in Qualia Computing that is called “Consciousness Versus Replicators.” I think that is a pretty good overarching ethical framework where I basically, I describe how different kinds of ethics can give different worldviews, but also they depend on your philosophical sophistication.

At the very beginning, you have ethics such as the battle between good and evil, but then you start introspecting. You’re like, “okay, what is evil exactly,” and you realize that nobody sets out to do evil from the very beginning. Usually, they actually have motivations that make sense within their own experience. Then you shift towards this other theory that’s called the balance between good and evil, super common in Eastern religions. Also, people who take a lot of psychedelics or meditate a lot tend to arrive to that view, as in like, “oh, don’t be too concerned about suffering or the universe. It’s all a huge yin and yang. The evil part makes the good part better,” or like weird things like that.

Then you have a little bit more developed, what I call it gradients of wisdom. I would say like Sam Harris, and definitely a lot of people in our community think that way, which is they come to the realization that there are societies that don’t help human flourishing, and there are ideologies that do, and it’s really important to be discerning. We can’t just say, “Hey, everything is equally good.”

But finally, I would say the fourth level would be consciousness versus replicators, which involves, one, taking open individualism seriously; and second, realizing that anything that matters, it matters because it influences experiences. Can you have that as your underlying ethical principle? There’s this danger of replicators hijacking our motivational architecture in order to pursue their own replication, independent of the well-being of sentience, and you guard for that. I think you’re in a pretty good space to actually do a lot of good. I would say perhaps that is the sort of ethics or morality we should think about how to instantiate in an artificial intelligence.

In the extreme, you have what I call a pure replicator, and a pure replicator essentially is a system or an entity that uses all of its resources exclusively to make copies of itself, independently of whether that causes good or bad experiences elsewhere. It just doesn’t care. I would argue that humans are not pure replicators. That in fact, we do care about consciousness, at the very least our own consciousness. And evolution is recruiting the fact that we care about consciousness in order to, as a side effect, increase our inclusiveness our genes.

But these discussions we’re having right now, this is the possibility of a post-human ethic is the genie is getting out of the bottle in the sense of consciousness is kind of taking its own values and trying to transcend the selfish genetic process that gave rise to it.

Lucas Perry: Ooh, I like that. That’s good. Anything to add, David?

David Pearce: No. Simply, I hope we have a Buddhist AI.

Lucas Perry: I agree. All right, so I’ve really enjoyed this conversation. I feel more confused now than when I came in, which is very good. Yeah, thank you both so much for coming on.

End of recorded material

FLI Podcast: On Consciousness, Morality, Effective Altruism & Myth with Yuval Noah Harari & Max Tegmark

Neither Yuval Noah Harari nor Max Tegmark need much in the way of introduction. Both are avant-garde thinkers at the forefront of 21st century discourse around science, technology, society and humanity’s future. This conversation represents a rare opportunity for two intellectual leaders to apply their combined expertise — in physics, artificial intelligence, history, philosophy and anthropology — to some of the most profound issues of our time. Max and Yuval bring their own macroscopic perspectives to this discussion of both cosmological and human history, exploring questions of consciousness, ethics, effective altruism, artificial intelligence, human extinction, emerging technologies and the role of myths and stories in fostering societal collaboration and meaning. We hope that you’ll join the Future of Life Institute Podcast for our final conversation of 2019, as we look toward the future and the possibilities it holds for all of us.

Topics discussed include:

  • Max and Yuval’s views and intuitions about consciousness
  • How they ground and think about morality
  • Effective altruism and its cause areas of global health/poverty, animal suffering, and existential risk
  • The function of myths and stories in human society
  • How emerging science, technology, and global paradigms challenge the foundations of many of our stories
  • Technological risks of the 21st century

Timestamps:

0:00 Intro

3:14 Grounding morality and the need for a science of consciousness

11:45 The effective altruism community and it’s main cause areas

13:05 Global health

14:44 Animal suffering and factory farming

17:38 Existential risk and the ethics of the long-term future

23:07 Nuclear war as a neglected global risk

24:45 On the risks of near-term AI and of artificial general intelligence and superintelligence

28:37 On creating new stories for the challenges of the 21st century

32:33 The risks of big data and AI enabled human hacking and monitoring

47:40 What does it mean to be human and what should we want to want?

52:29 On positive global visions for the future

59:29 Goodbyes and appreciations

01:00:20 Outro and supporting the Future of Life Institute Podcast

 

This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.

All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

You can listen to the podcast above or read the transcript below. 

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. Today, I’m excited to be bringing you a conversation between professor, philosopher, and historian Yuval Noah Harari and MIT physicist and AI researcher, as well as Future of Life Institute president, Max Tegmark.  Yuval is the author of popular science best sellers, Sapiens: A Brief History of Humankind, Homo Deus: A Brief History of Tomorrow, and of 21 Lessons for the 21st Century. Max is the author of Our Mathematical Universe and Life 3.0: Being human in the Age of Artificial Intelligence. 

This episode covers a variety of topics related to the interests and work of both Max and Yuval. It requires some background knowledge for everything to make sense and so i’ll try to provide some necessary information for listeners unfamiliar with the area of Max’s work in particular here in the intro. If you already feel well acquainted with Max’s work, feel free to skip ahead a minute or use the timestamps in the description for the podcast. 

Topics discussed in this episode include: morality, consciousness, the effective altruism community, animal suffering, existential risk, the function of myths and stories in our world, and the benefits and risks of emerging technology. For those new to the podcast or effective altruism, effective altruism or EA for short is a philosophical and social movement that uses evidence and reasoning to determine the most effective ways of benefiting and improving the lives of others. And existential risk is any risk that has the potential to eliminate all of humanity or, at the very least, to kill large swaths of the global population and leave the survivors unable to rebuild society to current living standards. Advanced emerging technologies are the most likely source of existential risk in the 21st century, for example through unfortunate uses of synthetic biology, nuclear weapons, and powerful future artificial intelligence misaligned with human values and objectives.

The Future of Life Institute is a non-profit and this podcast is funded and supported by listeners like you. So if you find what we do on this podcast to be important and beneficial, please consider supporting the podcast by donating at futureoflife.org/donate

These contributions make it possible for us to bring you conversations like these and to develop the podcast further. You can also follow us on your preferred listening platform by searching for us directly or following the links on the page for this podcast found in the description. 

And with that, here is our conversation between Max Tegmark and Yuval Noah Harari.

Max Tegmark: Maybe to start at a place where I think you and I both agree, even though it’s controversial, I get the sense from reading your books that you feel that morality has to be grounded on experience, subjective experience. It’s just what I like to call consciousness. I love this argument you’ve given, for example, that people who think consciousness is just bullshit and irrelevant. You challenge them to tell you what’s wrong with torture if it’s just a bunch of electrons and quarks moving around this way rather than that way.

Yuval Noah Harari: Yeah. I think that there is no morality without consciousness and without subjective experiences. At least for me, this is very, very obvious. One of my concerns, again, if I think about the potential rise of AI, is that AI will be super superintelligence but completely non-conscious, which is something that we never had to deal with before. There’s so much of the philosophical and theological discussions of what happens when there is a greater intelligence in the world. We’ve been discussing this for thousands of years with God of course as the object of discussion, but the assumption always was that this greater intelligence would be A) conscious in some sense, and B) good, infinitely good.

And therefore I think that the question we are facing today is completely different and to a large extent is I suspect that we are really facing philosophical bankruptcy that what we’ve done for thousands of years didn’t really prepare us for the kind of challenge that we have now.

Max Tegmark: I certainly agree that we have a very urgent challenge there. I think there is an additional risk which comes from the fact that, I’m embarrassed as a scientist that we actually don’t know for sure which kinds of information processing are conscious and which are not. For many, many years, I’ve been told for example that it’s okay to put lobsters in hot water to boil them but alive before we eat them because they don’t feel any suffering. And then I guess some guy asked the lobster does this hurt? And it didn’t say anything and it was a self serving argument. But then there was a recent study out that showed that actually lobsters do feel pain and they banned lobster boiling in Switzerland now.

I’m very nervous whenever we humans make these very self serving arguments saying, don’t worry about the slaves. It’s okay. They don’t feel, they don’t have a soul, they won’t suffer or women don’t have a soul or animals can’t suffer. I’m very nervous that we’re going to make the same mistake with machines just because it’s so convenient. When I feel the honest truth is, yeah, maybe future superintelligent machines won’t have any experience, but maybe they will. And I think we really have a moral imperative there to do the science to answer that question because otherwise we might be creating enormous amounts of suffering that we don’t even know exists.

Yuval Noah Harari: For this reason and for several other reasons, I think we need to invest as much time and energy in researching consciousness as we do in researching and developing intelligence. If we develop sophisticated artificial intelligence before we really understand consciousness, there is a lot of really big ethical problems that we just don’t know how to solve. One of them is the potential existence of some kind of consciousness in these AI systems, but there are many, many others.

Max Tegmark: I’m so glad to hear you say this actually because I think we really need to distinguish between artificial intelligence and artificial consciousness. Some people just take for granted that they’re the same thing.

Yuval Noah Harari: Yeah, I’m really amazed by it. I’m having quite a lot of discussions about these issues in the last two or three years and I’m repeatedly amazed that a lot of brilliant people just don’t understand the difference between intelligence and consciousness, and when it comes up in discussions about animals, but it also comes up in discussions about computers and about AI. To some extent the confusion is understandable because in humans and other mammals and other animals, consciousness and intelligence, they really go together, but we can’t assume that this is the law of nature and that it’s always like that. In a very, very simple way, I would say that intelligence is the ability to solve problems. Consciousness is the ability to feel things like pain and pleasure and love and hate.

Now in humans and chimpanzees and dogs and maybe even lobsters, we solve problems by having feelings. A lot of the problems we solve, who to mate with and where to invest our money and who to vote for in the elections, we rely on our feelings to make these decisions, but computers make decisions a completely different way. At least today, very few people would argue that computers are conscious and still they can solve certain types of problems much, much better than we.

They have high intelligence in a particular field without having any consciousness and maybe they will eventually reach superintelligence without ever developing consciousness. And we don’t know enough about these ideas of consciousness and superintelligence, but it’s at least feasible that you can solve all problems better than human beings and still have zero consciousness. You just do it in a different way. Just like airplanes fly much faster than birds without ever developing feathers.

Max Tegmark: Right. That’s definitely one of the reasons why people are so confused. There are two other reasons I noticed also among even very smart people why they are utterly confused on this. One is there’s so many different definitions of consciousness. Some people define consciousness in a way that’s almost equivalent intelligence, but if you define it the way you did, the ability to feel things simply having subjective experience. I think a lot of people get confused because they have always thought of subjective experience and intelligence for that matter as something mysterious. That can only exist in biological organisms like us. Whereas what I think we’re really learning from the whole last of century of progress in science is that no, intelligence and consciousness are all about information processing.

People fall prey to this carbon chauvinism idea that it’s only carbon or meat that can have these traits. Whereas in fact it really doesn’t matter whether the information is processed by a carbon atom and a neuron in the brain or by the silicon atom in a computer.

Yuval Noah Harari: I’m not sure I completely agree. I mean, we still don’t have enough data on that. There doesn’t seem to be any reason that we know of that consciousness would be limited to carbon based life forms, but so far this is the case. So maybe we don’t know something. My hunch is that it could be possible to have non-organic consciousness, but until we have better evidence, there is an open possibility that maybe there is something about organic biochemistry, which is essential and we just don’t understand.

And also with the other open case, we are not really sure that’s consciousness is just about information processing. I mean, at present, this is the dominant view in the life sciences, but we don’t really know because we don’t understand consciousness. My personal hunch is that nonorganic consciousness is possible, but I wouldn’t say that we know that for certain. And the other point is that really if you think about it in the broadest sense possible, I think that there is an entire potential universe of different conscious states and we know just a tiny, tiny bit of it.

Max Tegmark: Yeah.

Yuval Noah Harari: Again, thinking a little about different life forms, so human beings are just one type of life form and there are millions of other life forms that existed and billions of potential life forms that never existed but might exist in the future. And it’s a bit like that with consciousness that we really know just human consciousness, we don’t understand even the consciousness of other animals and beyond that potentially there is an infinite number of conscious states or traits that never existed and might exist in the future.

Max Tegmark: I agree with all of that. And I think if you can have nonorganic consciousness, artificial consciousness, which would be my guess, although we don’t know it, I think it’s quite clear then that the mind space of possible artificial consciousness is vastly larger than anything that evolution has given us, so we have to have a very open mind.

If we simply take away from this that we should understand which entities biological and otherwise are conscious and can experience suffering, pleasure and so on, and we try to base our morality on this idea that we want to create more positive experiences and eliminate suffering, then this leads straight into what I find very much at the core of the so called effective altruism community, which we with the Future of Life Institute view ourselves as part of where the idea is we want to help do what we can to make a future that’s good in that sense. Lots of positive experiences, not negative ones and we want to do it effectively.

We want to put our limited time and money and so on into those efforts which will make the biggest difference. And the EA community has for a number of years been highlighting a top three list of issues that they feel are the ones that are most worth putting effort into in this sense. One of them is global health, which is very, very non-controversial. Another one is animal suffering and reducing it. And the third one is preventing life from going extinct by doing something stupid with technology.

I’m very curious whether you feel that the EA movement has basically picked out the correct three things to focus on or whether you have things you would subtract from that list or add to it. Global health, animal suffering, X-risk.

Yuval Noah Harari: Well, I think that nobody can do everything, so whether you’re an individual or an organization, it’s a good idea to pick a good cause and then focus on it and not spend too much time wondering about all the other things that you might do. I mean, these three causes are certainly some of the most important in the world. I would just say that about the first one. It’s not easy at all to determine what are the goals. I mean, as long as health means simply fighting illnesses and sicknesses and bringing people up to what is considered as a normal level of health, then that’s not very problematic.

But in the coming decades, I think that the healthcare industry would focus and more, not on fixing problems but rather on enhancing abilities, enhancing experiences, enhancing bodies and brains and minds and so forth. And that’s much, much more complicated both because of the potential issues of inequality and simply that we don’t know where to aim for. One of the reasons that when you ask me at first about morality, I focused on suffering and not on happiness is that suffering is a much clearer concept than happiness and that’s why when you talk about health care, if you think about this image of the line of normal health, like the baseline of what’s a healthy human being, it’s much easier to deal with things falling under this line than things that potentially are above this line. So I think even this first issue, it will become extremely complicated in the coming decades.

Max Tegmark: And then for the second issue on animal suffering, you’ve used some pretty strong words before. You’ve said that industrial farming is one of the worst crimes in history and you’ve called the fate of industrially farmed animals one of the most pressing ethical questions of our time. A lot of people would be quite shocked when they hear you using strong words about this since they routinely eat factory farmed meat. How do you explain to them?

Yuval Noah Harari: This is quite straightforward. I mean, we are talking about billions upon billions of animals. The majority of large animals today in the world are either humans or are domesticated animals, cows and pigs and chickens and so forth. And so we’re talking about a lot of animals and we are talking about a lot of pain and misery. The industrially farmed cow and chicken are probably competing for the title of the most miserable creature that ever existed. They are capable of experiencing a wide range of sensations and emotions and in most of these industrial facilities they are experiencing the worst possible sensations and emotions.

Max Tegmark: In my case, you’re preaching to the choir here. I find this so disgusting that my wife and I just decided to mostly be vegan. I don’t go preach to other people about what they should do, but I just don’t want to be a part of this. It reminds me so much also things you’ve written about yourself, about how people used to justify having slaves before by saying, “It’s the white man’s burden. We’re helping the slaves. It’s good for them”. And much of the same way now, we make these very self serving arguments for why we should be doing this. What do you personally take away from this? Do you eat meat now, for example?

Yuval Noah Harari: Personally I define myself as vegan-ish. I mean I’m not strictly vegan. I don’t want to make kind of religion out of it and start thinking in terms of purity and whatever. I try to limit as far as possible mindful movement with industries that harm animals for no good reason and it’s not just meat and dairy and eggs, it can be other things as well. The chains of causality in the world today are so complicated that you cannot really extricate yourself completely. It’s just impossible. So for me, and also what I tell other people is just do your best. Again, don’t make it into a kind of religious issue. If somebody comes and tells you that you, I’m now thinking about this animal suffering and I decided to have one day a week without meat then don’t start blaming this person for eating meat the other six days. Just congratulate them on making one step in the right direction.

Max Tegmark: Yeah, that sounds not just like good morality but also good psychology if you actually want to nudge things in the right direction. And then coming to the third one, existential risk. There, I love how Nick Bostrom asks us to compare these two scenarios one in which some calamity kills 99% of all people and another where it kills 100% of all people and then he asks how much worse is the second one. The point being obviously is you know that if we kill everybody we might actually forfeit having billions or quadrillions or more of future minds in the future experiencing these amazing things for billions of years. This is not something I’ve seen you talk as much about in you’re writing it. So I’m very curious how you think about this morally? How you weigh future experiences that could exist versus the ones that we know exist now?

Yuval Noah Harari: I don’t really know. I don’t think that we understand consciousness and experience well enough to even start making such calculations. In general, my suspicion, at least based on our current knowledge, is that it’s simply not a mathematical entity that can be calculated. So we know all these philosophical riddles that people sometimes enjoy so much debating about whether you have five people have this kind and a hundred people of that kind and who should you save and so forth and so on. It’s all based on the assumption that experience is a mathematical entity that can be added and subtracted. And my suspicion is that it’s just not like that.

To some extent, yes, we make these kinds of comparison and calculations all the time, but on a deeper level, I think it’s taking us in the wrong direction. At least at our present level of knowledge, it’s not like eating ice cream is one point of happiness. Killing somebody is a million points of misery. So if by killing somebody we can allow 1,000,001 persons to enjoy ice cream, it’s worth it.

I think the problem here is not that we given the wrong points to the different experiences, it’s just it’s not a mathematical entity in the first place. And again, I know that in some cases we have to do these kinds of calculations, but I will be extremely careful about it and I would definitely not use it as the basis for building entire moral and philosophical projects.

Max Tegmark: I certainly agree with you that it’s an extremely difficult set of questions you get into if you try to trade off positives against negatives, like you mentioned in the ice cream versus murder case there. But I still feel that all in all, as a species, we tend to be a little bit too sloppy and flippant about the future and maybe partly because we haven’t evolved to think so much about what happens in billions of years anyway, and if we look at how reckless we’ve been with nuclear weapons, for example, I recently was involved with our organization giving this award to honor Vasily Arkhipov who quite likely prevented nuclear war between the US and the Soviet Union, and most people hadn’t even heard about that for 40 years. More people have heard of Justin Bieber, than Vasily Arkhipov even though I would argue that that would really unambiguously had been a really, really bad thing and that we should celebrate people who do courageous acts that prevent nuclear war, for instance.

In the same spirit, I often feel concerned that there’s so little attention, even paid to risks that we drive ourselves extinct or cause giants catastrophes compared to how much attention we pay to the Kardashians or whether we can get 1% less unemployment next year. So I’m curious if you have some sympathy for my angst here or whether you think I’m overreacting.

Yuval Noah Harari: I completely agree. I often define it that we are now kind of irresponsible gods. Certainly with regard to the other animals and the ecological system and with regard to ourselves, we have really divine powers of creation and destruction, but we don’t take our job seriously enough. We tend to be very irresponsible in our thinking, and in our behavior. On the other hand, part of the problem is that the number of potential apocalypses is growing exponentially over the last 50 years. And as a scholar and as a communicator, I think it’s part of our job to be extremely careful in the way that we discuss these issues with the general public. And it’s very important to focus the discussion on the more likely scenarios because if we just go on bombarding people with all kinds of potential scenarios of complete destruction, very soon we just lose people’s attention.

They become extremely pessimistic that everything is hopeless. So why worry about all that? So I think part of the job of the scientific community and people who deal with these kinds of issues is to really identify the most likely scenarios and focus the discussion on that. Even if there are some other scenarios which have a small chance of occurring and completely destroying all of humanity and maybe all of life, but we just can’t deal with everything at the same time.

Max Tegmark: I completely agree with that. With one caveat, I think it’s very much in the spirit of effective altruism, what you said. We want to focus on the things that really matter the most and not turn everybody into hypochondriac, paranoid, getting worried about everything. The one caveat I would give is, we shouldn’t just look at the probability of each bad thing happening but we should look at the expected damage it will do so the probability of times how bad it is.

Yuval Noah Harari: I agree.

Max Tegmark: Because nuclear war for example, maybe the chance of having an accidental nuclear war between the US and Russia is only 1% per year or 10% per year or one in a thousand per year. But if you have the nuclear winter caused by that by soot and smoke in the atmosphere, you know, blocking out the sun for years, that could easily kill 7 billion people. So most people on Earth and mass starvation because it would be about 20 Celsius colder. That means that on average if it’s 1% chance per year, which seems small, you’re still killing on average 70 million people. That’s the number that sort of matters I think. That means we should make it a higher priority to reduce that more.

Yuval Noah Harari: With nuclear war, I would say that we are not concerned enough. I mean, too many people, including politicians have this weird impression that well, “Nuclear war, that’s history. No, that was in the 60s and 70s people worried about it.”

Max Tegmark: Exactly.

Yuval Noah Harari: “It’s not a 21st century issue.” This is ridiculous. I mean we are now in even greater danger, at least in terms of the technology than we were in the Cuban missile crisis. But you must remember this in Stanley Kubrick, Dr Strange Love-

Max Tegmark: One of my favorite films of all time.

Yuval Noah Harari: Yeah. And so the subtitle of the film is “How I Stopped Fearing and Learned to Love the Bomb.”

Max Tegmark: Exactly.

Yuval Noah Harari: And the funny thing is it actually happened. People stopped fearing them. Maybe they don’t love it very much, but compared to the 50s and 60s people just don’t talk about it. Like you look at the Brexit debate in Britain and Britain is one of the leading nuclear powers in the world and it’s not even mentioned. It’s not part of the discussion anymore. And that’s very problematic because I think that this is a very serious existential threat. But I’ll take a counter example, which is in the field of AI, even though I understand the philosophical importance of discussing the possibility of general AI emerging in the future and then rapidly taking over the world and you know all the paperclips scenarios and so forth.

I think that at the present moment it really distracts attention of people from the immediate dangers of the AI arms race, which has a far, far higher chance of materializing in the next, say, 10, 20, 30 years. And we need to focus people’s minds on these short term dangers. And I know that there is a small chance that general AI would be upon us say in the next 30 years. But I think it’s a very, very small chance, whereas the chance that kind of primitive AI will completely disrupt the economy, the political system and human life in the next 30 years is about a 100%. It’s bound to happen.

Max Tegmark: Yeah.

Yuval Noah Harari: And I worry far more about what primitive AI will do to the job market, to the military, to people’s daily lives than about a general AI appearing in the more distant future.

Max Tegmark: Yeah, there are a few reactions to this. We can talk more about artificial general intelligence and superintelligence later if we get time. But there was a recent survey of AI researchers around the world asking what they thought and I was interested to note that actually most of them guessed that we will get artificial general intelligence within decades. So I wouldn’t say that the chance is small, but I would agree with you, that is certainly not going to happen tomorrow.

But if we eat our vitamins, you and I and meditate, go to the gym, it’s quite likely we will actually get to experience it. But more importantly, coming back to what you said earlier, I see all of these risks as really being one in the same risk in the sense that what’s happened is of course that science has kept getting ever more powerful. And science definitely gives us ever more powerful technology. And I love technology. I’m a nerd. I work at a university that has technology in its name and I’m optimistic we can create an inspiring high tech future for life if we win what I like to call the wisdom race.

The race between the growing power of the technology and the growing wisdom with which we manage it or putting it in your words, that you just used there, if we can basically learn to take more seriously our job as stewards of this planet, you can look at every science and see exactly the same thing happening. So we physicists are kind of proud that we gave the world cell phones and computers and lasers, but our problem child has been nuclear energy obviously, nuclear weapons in particular. Chemists are proud that they gave the world all these great new materials and their problem child is climate change. Biologists in my book actually have done the best so far, they actually got together in the 70s and persuaded leaders to ban biological weapons and draw a clear red line more broadly between what was acceptable and unacceptable uses of biology.

And that’s why today most people think of biology as really a force for good, something that cures people or helps them live healthier lives. And I think AI is right now lagging a little bit in time. It’s finally getting to the point where they’re starting to have an impact and they’re grappling with the same kind of question. They haven’t had big disasters yet, so they’re in the biology camp there, but they’re trying to figure out where do they draw the line between acceptable and unacceptable uses so you don’t get a crazy military AI arms race in lethal autonomous weapons, so you don’t create very destabilizing income inequality so that AI doesn’t create 1984 on steroids, et cetera.

And I wanted to ask you about what sort of new story as a society you feel we need in order to tackle these challenges. And I’ve been very, very persuaded by your arguments that stories are so central to society for us to collaborate and accomplish stuff, but you’ve also made a really compelling case. I think that’s the most popular recent stories are all getting less powerful or popular. Communism, now there’s a lot of disappointment, and this liberalism and it feels like a lot of people are kind of craving for a new story that involves technology somehow and that can help us get our act together and also help us feel meaning and purpose in this world. But I’ve never in your books seen a clear answer to what you feel that this new story should be.

Yuval Noah Harari: Because I don’t know. If I knew the new story, I will tell it. I think we are now in a kind of double bind, we have to fight on two different fronts. On the one hand we are witnessing in the last few years the collapse of the last big modern story of liberal democracy and liberalism more generally, which has been, I would say as a story, the best story humans ever came up with and it did create the best world that humans ever enjoyed. I mean the world of the late 20th century and early 21st century with all its problems, it’s still better for humans, not for cows or chickens for humans, it’s still better than it’s any previous moment in history.

There are many problems, but anybody who says that this was a bad idea, I would like to hear which year are you thinking about as a better year? Now in 2019, when was it better? In 1919, in 1719, in 1219? I mean, for me, it’s obvious this has been the best story we have come up with.

Max Tegmark: That’s so true. I have to just admit that whenever I read the news for too long, I start getting depressed. But then I always cheer myself up by reading history and reminding myself it was always worse in the past.

Yuval Noah Harari: That never fails. I mean, the last four years have been quite bad, things are deteriorating, but we are still better off than in any previous era, but people are losing faith. In this story, we are reaching really a situation of zero story. All the big stories of the 20th century have collapsed or are collapsing and the vacuum is currently filled by nostalgic fantasies, nationalistic and religious fantasies, which simply don’t offer any real solutions to the problems of the 21st century. So on the one hand we have the task of supporting or reviving the liberal democratic system, which is so far the only game in town. I keep listening to the critics and they have a lot of valid criticism, but I’m waiting for the alternative and the only thing I hear is completely unrealistic nostalgic fantasies about going back to some past golden era that as a historian I know was far, far worse, and even if it was not so far worse, you just can’t go back there. You can’t recreate the 19th century or the middle ages under the conditions of the 21st century. It’s impossible.

So we have this one struggle to maintain what we have already achieved, but then at the same time, on a much deeper level, my suspicion is that the liberal stories we know it at least is really not up to the challenges of the 21st century because it’s built on foundations that the new science and especially the new technologies of artificial intelligence and bioengineering are just destroying the belief we are inherited in the autonomous individual, in free will, in all these basically liberal mythologies. They will become increasingly untenable in contact with new powerful bioengineering and artificial intelligence.

To put it in a very, very concise way, I think we are entering the era of hacking human beings, not just hacking smartphones and bank accounts, but really hacking homo sapiens which was impossible before. I mean, AI gives us the computing power necessary and biology gives us the necessary biological knowledge and when you combine the two you get the ability to hack human beings and if you continue to try, and build society on the philosophical ideas of the 18th century about the individual and freewill and then all that in a world where it’s feasible technically to hack millions of people systematically, it’s just not going to work. And we need an updated story, I’ll just finish this thought. And our problem is that we need to defend the story from the nostalgic fantasies at the same time that we are replacing it by something else. And it’s just very, very difficult.

When I began writing my books like five years ago, I thought the real project was to really go down to the foundations of the liberal story, expose the difficulties and build something new. And then you had all these nostalgic populous eruption of the last four or five years, and I personally find myself more and more engaged in defending the old fashioned liberal story instead of replacing it. Intellectually, it’s very frustrating because I think the really important intellectual work is finding out the new story, but politically it’s far more urgent. If we allow the emergence of some kind of populist authoritarian regimes, then whatever comes out of it will not be a better story.

Max Tegmark: Yeah, unfortunately I agree with your assessment here. I love to travel. I work in basically the United Nations like environment at my university with students from all around the world, and I have this very strong sense that people are feeling increasingly lost around the world today because the stories that used to give them a sense of purpose and meaning and so on are sort of dissolving in front of their eyes. And of course, we don’t like to feel lost then likely to jump on whatever branches are held out for us. And they are often just retrograde things. Let’s go back to the good old days and all sorts of other unrealistic things. But I agree with you that the rise in population we’re seeing now is not the cause. It’s a symptom of people feeling lost.

So I think I was a little bit unfair to ask you in a few minutes to answer the toughest question of our time, what should our new story be? But maybe we could break it into pieces a little bit and say what are at least some elements that we would like the new story to have? For example, it should accomplish, of course, multiple things. It has to incorporate technology in a meaningful way, which our past stories did not and has to incorporate AI progress in biotech, for example. And it also has to be a truly global story, I think this time, which isn’t just a story about how America is going to get better off or China is going to get better off, but one about how we’re all going to get better off together.

And we can put up a whole bunch of other requirements. If we start maybe with this part about the global nature of the story, people disagree violently about so many things around world, but are there any ingredients at all of the story that you think people around the world, would already agreed to some principles or ideas?

Yuval Noah Harari: Again to, I don’t really know. I mean, I don’t know what the new story would look like. Historically, these kinds of really grand narratives, they aren’t created by two, three people having a discussion and thinking, okay, what new stories should we tell? It’s far deeper and more powerful forces that come together to create these new stories. I mean, even trying to say, okay, we don’t have the full view, but let’s try to put a few ingredients in place. The whole thing about the story is that the whole comes before the parts. The narrative is far more important than the individual facts that build it up.

So I’m not sure that we can start creating the story by just, okay, let’s put the first few sentences and who knows how it will continue. You wrote books. I write books, we know that the first few sentences are the last sentences that you usually write.

Max Tegmark: That’s right.

Yuval Noah Harari: Only when you know how the whole book is going to look like, but then you go back to the beginning and you write the first few sentences.

Max Tegmark: Yeah. And sometimes the very last thing you write is the new title.

Yuval Noah Harari: So I agree that whatever the new story is going to be, it’s going to be global. The world is now too small and too interconnected to have just a story for one part of the world. It won’t work. And also it will have to take very seriously both the most updated science and the most updated technology. Something that liberal democracy as we know it, it’s basically still in the 18th century. It’s taking an 18th century story and simply following it to its logical conclusions. For me, maybe the most amazing thing about liberal democracy is it really completely disregarded all the discoveries of the life sciences over the last two centuries.

Max Tegmark: And of the technical sciences!

Yuval Noah Harari: I mean, as if Darwin never existed and we know nothing about evolution. I mean, you can basically meet these folks from the middle of the 18th century, whether it’s Rousseau, Jefferson, and all these guys, and they will be surprised by some of the conclusions we have drawn for the basis they provided us. But fundamentally it’s nothing has changed. Darwin didn’t really change anything. Computers didn’t really change anything. And I think the next story won’t have that luxury of being able to ignore the discoveries of science and technology.

The number one thing it we’ll have to take into account is how do humans live in a world when there is somebody out there that knows you better than you know yourself, but that somebody isn’t God, that somebody is a technological system, which might not be a good system at all. That’s a question we never had to face before. We could always comfort yourself with the idea that we are kind of a black box with the rest of humanity. Nobody can really understand me better than I understand myself. The king, the emperor, the church, they don’t really know what’s happening within me. Maybe God knows. So we had a lot of discussions about what to do with that, the existence of a God who knows us better than we know ourselves, but we didn’t really have to deal with a non-divine system that can hack us.

And this system is emerging. I think it will be in place within our lifetime in contrast to generally artificial intelligence that I’m skeptical whether I’ll see it in my lifetime. I’m convinced we will see, if we live long enough, a system that knows us better than we know ourselves and the basic premises of democracy, of free market capitalism, even of religion just don’t work in such a world. How does democracy function in a world when somebody understands the voter better than the voter understands herself or himself? And the same with the free market. I mean, if the customer is not right, if the algorithm is right, then we need a completely different economic system. That’s the big question that I think we should be focusing on. I don’t have the answer, but whatever story will be relevant to the 21st century, will have to answer this question.

Max Tegmark: I certainly agree with you that democracy has totally failed to adapt to the developments in the life sciences and I would add to that to the developments in the natural sciences too. I watched all of the debates between Trump and Clinton in the last election here in the US and I didn’t know what is artificial intelligence getting mentioned even a single time, not even when they talked about jobs. And the voting system we have, with an electoral college system here where it doesn’t even matter how people vote except in a few swing states where there’s so little influence from the voter to what actually happens. Even though we now have blockchain and could easily implement technical solutions where people will be able to have much more influence. Just reflects that we basically declared victory on our democratic system hundreds of years ago and haven’t updated it.

And I’m very interested in how we can dramatically revamp it if we believe in some form of democracy so that we actually can have more influence on how our society is run as individuals and how we can have good reason to actually trust the system. If it is able to hack us. That is actually working in our best interest. There’s a key tenant in religions that you’re supposed to be able to trust the God as having your best interest in mind. And I think many people in the world today do not trust that their political leaders actually have their best interest in mind.

Yuval Noah Harari: Certainly, I mean that’s the issue. You give a really divine powers to far from divine systems. We shouldn’t be too pessimistic. I mean, the technology is not inherently evil either. And what history teaches us about technology is that technology is also never deterministic. You can use the same technologies to create very different kinds of societies. We saw that in the 20th century when the same technologies were used to build communist dictatorships and liberal democracies, there was no real technological difference between the USSR and the USA. It was just people making different decisions what to do with the same technology.

I don’t think that the new technology is inherently anti-democratic or inherently anti-liberal. It really is about choices that people make even in what kind of technological tools to develop. If I think about, again, AI and surveillance, at present we see all over the world that corporations and governments are developing AI tools to monitor individuals, but technically we can do exactly the opposite. We can create tools that monitor and survey government and corporations in the service of individuals. For instance, to fight corruption in the government as an individual. It’s very difficult for me to say monitor nepotism, politicians appointing all kinds of family members to lucrative positions in the government or in the civil service, but it should be very easy to build an AI tool that goes over the immense amount of information involved. And in the end you just get a simple application on your smartphone you enter the name of a politician and you immediately see within two seconds who he appointed or she appointed from their family and friends to what positions. It should be very easy to do it. I don’t see the Chinese government creating such an application anytime soon, but people can create it.

Or if you think about the fake news epidemic, basically what’s happening is that corporations and governments are hacking us in their service, but the technology can work the other way around. We can develop an antivirus for the mind, the same way we developed antivirus for the computer. We need to develop an antivirus for the mind, an AI system that serves me and not a corporation or a government, and it gets to know my weaknesses in order to protect me against manipulation.

At present, what’s happening is that the hackers are hacking me. they get to know my weaknesses and that’s how they are able to manipulate me. For instance, with fake news. If they discover that I already have a bias against immigrants, they show me one fake news story, maybe about a group of immigrants raping local women. And I easily believe that because I already have this bias. My neighbor may have an opposite bias. She may think that anybody who opposes immigration is a fascist and the same hackers will find that out and will show her a fake news story about, I don’t know, right wing extremists murdering immigrants and she will believe that.

And then if I meet my neighbor, there is no way we can have a conversation about immigration. Now we can and should, develop an AI system that serves me and my neighbor and alerts us. Look, somebody is trying to hack you, somebody trying to manipulate you. And if we learn to trust this system that it serves us, it doesn’t serve any corporation or government. It’s an important tool in protecting our minds from being manipulated. Another tool in the same field, we are now basically feeding enormous amounts of mental junk food to our minds.

We spend hours every day basically feeding our hatred, our fear, our anger, and that’s a terrible and stupid thing to do. The thing is that people discovered that the easiest way to grab our attention is by pressing the hate button in the mind or the fear button in the mind, and we are very vulnerable to that.

Now, just imagine that somebody develops a tool that shows you what’s happening to your brain or to your mind as you’re watching these YouTube clips. Maybe it doesn’t block you, it’s not Big Brother, that blocks, all these things. It’s just like when you buy a product and it shows you how many calories are in the product and how much saturated fat and how much sugar there is in the product. So at least in some cases you learn to make better decisions. Just imagine that you have this small window in your computer which tells you what’s happening to your brain as your watching this video and what’s happening to your levels of hatred or fear or anger and then make your own decision. But at least you are more aware of what kind of food you’re giving to your mind.

Max Tegmark: Yeah. This is something I am also very interested in seeing more of AI systems that empower the individual in all the ways that you mentioned. We are very interested at the Future of Life Institute actually in supporting this kind of thing on the nerdy technical side and I think this also drives home this very important fact that technology is not good or evil. Technology is an amoral tool that can be used both for good things and for bad things. That’s exactly why I feel it’s so important that we develop the wisdom to use it for good things rather than bad things. So in that sense, AI is no different than fire, which can be used for good things and for bad things and but we as a society have developed a lot of wisdom now in fire management. We educate our kids about it. We have fire extinguishers and fire trucks and with artificial intelligence and other powerful tech, I feel we need to do better in similarly developing the wisdom that can steer the technology towards better uses.

Now we’re reaching the end of the hour here. I’d like to just finish with two more questions. One of them is about what we wanted to ultimately mean to be human as we get ever more tech. You put it so beautifully and I think it was Sapiens that tech progress is gradually taking us beyond the asking what we want to ask instead what we want to want and I guess even more broadly how we want to brand ourselves, how we want to think about ourselves as humans in the high tech future.

I’m quite curious. First of all, you personally, if you think about yourself in 30 years, 40 years, what do you want to want and what sort of society would you like to live in say 2060 if you could have it your way?

Yuval Noah Harari: It’s a profound question. It’s a difficult question. My initial answer is that I would really like not just to know the truth about myself but to want to know the truth about myself. Usually the main obstacle in knowing the truth about yourself is that you don’t want to know it. It’s always accessible to you. I mean, we’ve been told for thousands of years by, all the big names in philosophy and religion. Almost all say the same thing. Get to know yourself better. It’s maybe the most important thing in life. We haven’t really progressed much in the last thousands of years and the reason is that yes, we keep getting this advice but we don’t really want to do it.

Working on our motivation in this field I think would be very good for us. It will also protect us from all the naive utopias which tend to draw far more of our attention. I mean, especially as technology will give us all, at least some of us more and more power, the temptations of naive utopias are going to be more and more irresistible and I think the really most powerful check on these naive utopias is really getting to know yourself better.

Max Tegmark: Would you like what it means to be, Yuval 2060 to be more on the hedonistic side that you have all these blissful experiences and serene meditation and so on, or would you like there to be a lot of challenges in there that gives you a sense of meaning or purpose? Would you like to be somehow upgraded with technology?

Yuval Noah Harari: None of the above. I mean at least if I think deeply enough about these issues and yes, I would like to be upgraded but only in the right way and I’m not sure what the right way is. I’m not a great believer in blissful experiences in meditation or otherwise, they tend to be traps that this is what we’ve been looking for all our lives and for millions of years all the animals they just constantly look for blissful experiences and after a couple of millions of years of evolution, it doesn’t seem that it brings us anywhere and especially in meditation you learn these kinds of blissful experiences can be the most deceptive because you fall under the impression that this is the goal that you should be aiming at.

This is a really good meditation. This is a really deep meditation simply because you’re very pleased with yourself and then you spend countless hours later on trying to get back there or regretting that you are not there and in the end it’s just another experience. What we experience with right now when we are now talking on the phone to each other and I feel something in my stomach and you feel something in your head, this is as special and amazing as the most blissful experience of meditation. The only difference is that we’ve gotten used to it so we are not amazed by it, but right now we are experiencing the most amazing thing in the universe and we just take it for granted. Partly because we are distracted by this notion that out there, there is something really, really special that we should be experiencing. So I’m a bit suspicious of blissful experiences.

Again, I would just basically repeat that to really understand yourself also means to really understand the nature of these experiences and if you really understand that, then so many of these big questions will be answered. Similarly, the question that we dealt with in the beginning of how to evaluate different experiences and what kind of experiences should we be creating for humans or for artificial consciousness. For that you need to deeply understand the nature of experience. Otherwise, there’s so many naive utopias that can tempt you. So I would focus on that.

When I say that I want to know the truth about myself, it’s really also it means to really understand the nature of these experiences.

Max Tegmark: To my very last question, coming back to this story and ending on a positive inspiring note. I’ve been thinking back about when new stories led to very positive change. And then I started thinking about a particular Swedish story. So the year was 1945, people were looking at each other all over Europe saying, “We screwed up again”. How about we, instead of using all this technology, people were saying then to build ever more powerful weapons. How about we instead use it to create a society that benefits everybody where we can have free health care, free university for everybody, free retirement and build a real welfare state. And I’m sure there were a lot of curmudgeons around who said “awe you know, that’s just hopeless naive dreamery, go smoke some weed and hug a tree because it’s never going to work.” Right?

But this story, this optimistic vision was sufficiently concrete and sufficiently both bold and realistic seeming that it actually caught on. We did this in Sweden and it actually conquered the world. Not like when the Vikings tried and failed to do it with swords, but this idea conquered the world. So now so many rich countries have copied this idea. I keep wondering if there is another new vision or story like this, some sort of welfare 3.0 which incorporates all of the exciting new technology that has happened since ’45 on the biotech side, on the AI side, et cetera, to envision a society which is truly bold and sufficiently appealing to people around the world that people could rally around this.

I feel that the shared positive experience is something that more than anything else can really help foster collaboration around the world. And I’m curious what you would say in terms of, what do you think of as a bold, positive vision for the planet now going away from what you spoke about earlier with yourself personally, getting to know yourself and so on.

Yuval Noah Harari: I think we can aim towards what you define as welfare 3.0 which is again based on a better understanding of humanity. The welfare state, which many countries have built over the last decades have been an amazing human achievement and it achieved many concrete results in fields that we knew what to aim for, like in health care. So okay, let’s vaccinate all the children in the country and let’s make sure everybody has enough to eat. We succeeded in doing that and the kind of welfare 3.0 program would try to expand that to other fields in which our achievements are far more moderate simply because we don’t know what to aim for. We don’t know what we need to do.

If you think about mental health, it’s much more difficult than providing food to people because we have a very poor understanding of the human mind and of what mental health is. Even if you think about food, one of the scandals of science is that we still don’t know what to eat, so we basically solve the problem of enough food. Now actually we have the opposite problem of people eating too much and not too little, but beyond the medical quantity, it’s I think one of the biggest scandals of science that after centuries we still don’t know what we should eat. And mainly because so many of these miracle diets, they are a one size fits all as if everybody should eat the same thing. Whereas obviously it should be tailored to individuals.

So if you harness the power of AI and big data and machine learning and biotechnology, you could create the best dietary system in the world that tell people individually what would be good for them to eat. And this will have enormous side benefits in reducing medical problems, in reducing waste of food and resources, helping the climate crisis and so forth. So this is just one example.

Max Tegmark: Yeah. Just on that example, I would argue also that part of the problem is beyond that we just don’t know enough that actually there are a lot of lobbyists who are telling people what to eat, knowing full well that that’s bad for them just because that way they’ll make more of a profit. Which gets back to your question of hacking, how we can prevent ourselves from getting hacked by powerful forces that don’t have our best interest in mind. But the things you mentioned seemed like a little bit of first world perspective which it’s easy to get when we live in Israel or Sweden, but of course there are many people on the planet who still live in pretty miserable situations where we actually can quite easily articulate how to make things at least a bit better.

But then also in our societies, I mean you touched on mental health. There’s a significant rise in depression in the United States. Life expectancy in the US has gone down three years in a row, which does not suggest the people are getting happier here. I’m wondering if you also in your positive vision of the future that we can hopefully end on here. We’d want to throw in some ingredients about the sort of society where we don’t just have the lowest rung of the Maslow pyramid taken care of food and shelter and stuff, but also feel meaning and purpose and meaningful connections with our fellow lifeforms.

Yuval Noah Harari: I think it’s not just a first world issue. Again, even if you think about food, even in developing countries, more people today die from diabetes and diseases related to overeating or to overweight than from starvation and mental health issues are certainly not just the problem for the first world. People are suffering from that in all countries. Part of the issue is that mental health is far, far more expensive. Certainly if you think in terms of going to therapy once or twice a week than just giving vaccinations or antibiotics. So it’s much more difficult to create a robust mental health system in poor countries, but we should aim there. It’s certainly not just for the first world. And if we really understand humans better, we can provide much better health care, both physical health and mental health for everybody on the planet, not just for Americans or Israelis or Swedes.

Max Tegmark: In terms of physical health, it’s usually a lot cheaper and simpler to not treat the diseases, but to instead prevent them from happening in the first place by reducing smoking, reducing people eating extremely unhealthy foods, et cetera. And the same way with mental health, presumably a key driver of a lot of the problems we have is that we have put ourselves in a human made environment, which is incredibly different from the environment that we evolved to flourish in. And I’m wondering rather than just trying to develop new pills to help us live in this environment, which is often optimized for the ability to produce stuff, rather than for human happiness. If you think that by deliberately changing our environment to be more conducive to human happiness might improve our happiness a lot without having to treat it, treat mental health disorders.

Yuval Noah Harari: It will demand the enormous amounts of resources and energy. But if you are looking for a big project for the 21st century, then yeah, that’s definitely a good project to undertake.

Max Tegmark: Okay. That’s probably a good challenge from you on which to end this conversation. I’m extremely grateful for having had this opportunity talk with you about these things. These are ideas I will continue thinking about with great enthusiasm for a long time to come and I very much hope we can stay in touch and actually meet in person, even, before too long.

Yuval Noah Harari: Yeah. Thank you for hosting me.

Max Tegmark: I really can’t think of anyone on the planet who thinks more profoundly about the big picture of the human condition here than you and it’s such an honor.

Yuval Noah Harari: Thank you. It was a pleasure for me too. Not a lot of opportunities to really go deeply about these issues. I mean, usually you get pulled away to questions about the 2020 presidential elections and things like that, which is important. But, we still have also to give some time to the big picture.

Max Tegmark: Yeah. Wonderful. So once again, todah, thank you so much.

Lucas Perry: Thanks so much for tuning in and being a part of our final episode of 2019. Many well and warm wishes for a happy and healthy new year from myself and the rest of the Future of Life Institute team. This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.

FLI Podcast: Existential Hope in 2020 and Beyond with the FLI Team

As 2019 is coming to an end and the opportunities of 2020 begin to emerge, it’s a great time to reflect on the past year and our reasons for hope in the year to come. We spend much of our time on this podcast discussing risks that will possibly lead to the extinction or the permanent and drastic curtailing of the potential of Earth-originating intelligent life. While this is important and useful, much has been done at FLI and in the broader world to address these issues in service of the common good. It can be skillful to reflect on this progress to see how far we’ve come, to develop hope for the future, and to map out our path ahead. This podcast is a special end of the year episode focused on meeting and introducing the FLI team, discussing what we’ve accomplished and are working on, and sharing our feelings and reasons for existential hope going into 2020 and beyond.

Topics discussed include:

  • Introductions to the FLI team and our work
  • Motivations for our projects and existential risk mitigation efforts
  • The goals and outcomes of our work
  • Our favorite projects at FLI in 2019
  • Optimistic directions for projects in 2020
  • Reasons for existential hope going into 2020 and beyond

Timestamps:

0:00 Intro

1:30 Meeting the Future of Life Institute team

18:30 Motivations for our projects and work at FLI

30:04 What we strive to result from our work at FLI

44:44 Favorite accomplishments of FLI in 2019

01:06:20 Project directions we are most excited about for 2020

01:19:43 Reasons for existential hope in 2020 and beyond

01:38:30 Outro

 

You can listen to the podcast above, or read the full transcript below. All of our podcasts are also now on Spotify and iHeartRadio! Or find us on SoundCloudiTunesGoogle Play and Stitcher.

Lucas Perry: Welcome to the Future of Life Institute Podcast. I’m Lucas Perry. Today’s episode is a special end of the year episode structured as an interview with members of the FLI core team. The purpose of this episode is to introduce the members of our team and their roles, explore the projects and work we’ve been up to at FLI throughout the year, and discuss future project directions we are excited about for 2020. Some topics we explore are the motivations behind our work and projects, what we are hoping will result from them, favorite accomplishments at FLI in 2019, and general trends and reasons we see for existential hope going into 2020 and beyond.

If you find this podcast interesting and valuable, you can follow us on your preferred listening platform like on itunes, soundcloud, google play, stitcher, and spotify

If you’re curious to learn more about the Future of Life Institute, our team, our projects, and our feelings about the state and ongoing efforts related to existential risk mitigation, then I feel you’ll find this podcast valuable. So, to get things started, we’re going to have the team introduce ourselves, and our role(s) at the Future of life Institute

Jared Brown: My name is Jared Brown, and I’m the Senior Advisor for Government Affairs at the Future of Life Institute. I help inform and execute FLI’s strategic advocacy work on governmental policy. It’s sounds a little bit behind the scenes because it is, but I primarily work in the U.S. and in global forums like the United Nations.

Kirsten Gronlund: My name is Kirsten and I am the Editorial Director for The Future of Life Institute. Basically, I run the website. I also create new content and manage the content that’s being created to help communicate the issues that FLI works on. I have been helping to produce a lot of our podcasts. I’ve been working on getting some new long form articles written; we just came out with one about CRISPR and gene drives. Right now I’m actually working on putting together a book list for recommended reading for things related to effective altruism and AI and existential risk. I also do social media, and write the newsletter, and a lot of things. I would say that my job is to figure out what is most important to communicate about what FLI does, and then to figure out how it’s best to communicate those things to our audience. Experimenting with different forms of content, experimenting with different messaging. Communication, basically, and writing and editing.

Meia Chita-Tegmark: I am Meia Chita-Tegmark. I am one of the co-founders of the Future of Life Institute. I am also the treasurer of the Institute, and recently I’ve been focusing many of my efforts on the Future of Life website and our outreach projects. For my day job, I am a postdoc in the human-robot interaction lab at Tufts University. My training is in social psychology, so my research actually focuses on the human end of the human-robot interaction. I mostly study uses of assistive robots in healthcare and I’m also very interested in ethical implications of using, or sometimes not using, these technologies. Now, with the Future of Life Institute, as a co-founder, I am obviously involved in a lot of the decision-making regarding the different projects that we are pursuing, but my main focus right now is the FLI website and our outreach efforts.

Tucker Davey: I’m Tucker Davey. I’ve been a member of the FLI core team for a few years. And for the past few months, I’ve been pivoting towards focusing on projects related to FLI’s AI communication strategy, various projects, especially related to advanced AI and artificial general intelligence, and considering how FLI can best message about these topics. Basically these projects are looking at what we believe about the existential risk of advanced AI, and we’re working to refine our core assumptions and adapt to a quickly changing public understanding of AI. In the past five years, there’s been much more money and hype going towards advanced AI, and people have new ideas in their heads about the risk and the hope from AI. And so, our communication strategy has to adapt to those changes. So that’s kind of a taste of the questions we’re working on, and it’s been really interesting to work with the policy team on these questions.

Jessica Cussins Newman: My name is Jessica Cussins Newman, and I am an AI policy specialist with the Future of Life Institute. I work on AI policy, governance, and ethics, primarily. Over the past year, there have been significant developments in all of these fields, and FLI continues to be a key stakeholder and contributor to numerous AI governance forums. So it’s been exciting to work on a team that’s helping to facilitate the development of safe and beneficial AI, both nationally and globally. To give an example of some of the initiatives that we’ve been involved with this year, we provided comments to the European Commission’s high level expert group on AI, to the Defense Innovation Board’s work on AI ethical principles, to the National Institute of Standards and Technology, or NIST, which developed a plan for federal engagement on technical AI standards.

We’re also continuing to participate in several multi-stakeholder initiatives, such as the Partnership on AI, the CNAS AI Task Force, and the UN Secretary General’s high level panel, and additional cooperation among others. I think all of this is helping to lay the groundwork for a more trustworthy AI, and we’ve also been engaged with direct policy engagement. Earlier this year we co-hosted an AI policy briefing at the California state legislature, and met with the White House Office of Science and Technology Policy. Lastly, on the educational side of this work, we maintain an online resource for global AI policy. So this includes information about national AI strategies and provides background resources and policy recommendations around some of the key issues.

Ian Rusconi: My name is Ian Rusconi and I edit and produce these podcasts. Since FLI’s podcasts aren’t recorded in a controlled studio setting, the interviews often come with a host of technical issues, so some of what I do for these podcasts overlaps with forensic audio enhancement, removing noise from recordings; removing as much of the reverb as possible from recordings, which works better sometimes than others; removing clicks and pops and sampling errors and restoring the quality of clipping audio that was recorded too loudly. And then comes the actual editing, getting rid of all the breathing and lip smacking noises that people find off-putting, and cutting out all of the dead space and vocal dithering, um, uh, like, you know, because we aim for a tight final product that can sometimes end up as much as half the length of the original conversation even before any parts of the conversation are cut out.

Part of working in an audio only format is keeping things to the minimum amount of information required to get your point across, because there is nothing else that distracts the listener from what’s going on. When you’re working with video, you can see people’s body language, and that’s so much of communication. When it’s audio only, you can’t. So a lot of the time, if there is a divergent conversational thread that may be an interesting and related point, it doesn’t actually fit into the core of the information that we’re trying to access, and you can construct a more meaningful narrative by cutting out superfluous details.

Emilia Javorsky: My name’s Emilia Javorsky and at the Future of Life Institute, I work on the topic of lethal autonomous weapons, mainly focusing on issues of education and advocacy efforts. It’s an issue that I care very deeply about and I think is one of the more pressing ones of our time. I actually come from a slightly atypical background to be engaged in this issue. I’m a physician and a scientist by training, but what’s conserved there is a discussion of how do we use AI in high stakes environments where life and death decisions are being made. And so when you are talking about the decisions to prevent harm, which is my field of medicine, or in the case of lethal autonomous weapons, the decision to enact lethal harm, there’s just fundamentally different moral questions, and also system performance questions that come up.

Key ones that I think about a lot are system reliability, accountability, transparency. But when it comes to thinking about lethal autonomous weapons in the context of the battlefield, there’s also this inherent scalability issue that arises. When you’re talking about scalable weapon systems, that quickly introduces unique security challenges in terms of proliferation and an ability to become what you could quite easily define as weapons of mass destruction. 

There’s also the broader moral questions at play here, and the question of whether we as a society want to delegate the decision to take a life to machines. And I personally believe that if we allow autonomous weapons to move forward and we don’t do something to really set a stake in the ground, it could set an irrecoverable precedent when we think about getting ever more powerful AI aligned with our values in the future. It is a very near term issue that requires action.

Anthony Aguirre: I’m Anthony Aguirre. I’m a professor of physics at the University of California at Santa Cruz, and I’m one of FLI’s founders, part of the core team, and probably work mostly on the policy related aspects of artificial intelligence and a few other topics. 

I’d say there are two major efforts that I’m heading up. One is the overall FLI artificial intelligence policy effort. That encompasses a little bit of our efforts on lethal autonomous weapons, but it’s mostly about wider issues of how artificial intelligence development should be thought about, how it should be governed, what kind of soft or hard regulations might we contemplate about it. Global efforts which are really ramping up now, both in the US and Europe and elsewhere, to think about how artificial intelligence should be rolled out in a way that’s kind of ethical, that keeps with the ideals of society, that’s safe and robust and in general is beneficial, rather than running into a whole bunch of negative side effects. That’s part of it.

And then the second thing is I’ve been thinking a lot about what sort of institutions and platforms and capabilities might be useful for society down the line that we can start to create, and nurture and grow now. So I’ve been doing a lot of thinking about… let’s imagine that we’re in some society 10 or 20 or 30 years from now that’s working well, how did it solve some of the problems that we see on the horizon? If we can come up with ways that this fictitious society in principle solved those problems, can we try to lay the groundwork for possibly actually solving those problems by creating new structures and institutions now that can grow into things that could help solve those problems in the future?

So an example of that is Metaculus. This is a prediction platform that I’ve been involved with in the last few years. So this is an effort to create a way to better predict what’s going to happen and make better decisions, both for individual organizations and FLI itself, but just for the world in general. This is kind of a capability that it would be good if the world had, making better predictions about all kinds of things and making better decisions. So that’s one example, but there are a few others that I’ve been contemplating and trying to get spun up.

Max Tegmark: Hi, I’m Max Tegmark, and I think of myself as having two jobs. During the day, I do artificial intelligence research at MIT, and on nights and weekends, I help lead the Future of Life Institute. My day job at MIT used to be focused on cosmology, because I was always drawn to the very biggest questions. The bigger the better, and studying our universe and its origins seemed to be kind of as big as it gets. But in recent years, I’ve felt increasingly fascinated that we have to understand more about how our own brains work, how our intelligence works, and building better artificial intelligence. Asking the question, how can we make sure that this technology, which I think is going to be the most powerful ever, actually becomes the best thing ever to happen to humanity, and not the worst.

Because all technology is really a double-edged sword. It’s not good or evil, it’s just a tool that we can do good or bad things with. If we think about some of the really horrible things that have happened because of AI systems, so far, it’s largely been not because of evil, but just because people didn’t understand how the system worked, and it did something really bad. So what my MIT research group is focused on is exactly tackling that. How can you take today’s AI systems, which are often very capable, but total black boxes… So that if you ask your system, “Why should this person be released on probation, but not this one?” You’re not going to get any better answer than, “I was trained on three terabytes of data and this is my answer. Beep, beep. Boop, boop.” Whereas, I feel we really have the potential to make systems that are just as capable, and much more intelligible. 

Trust should be earned and trust should be built based on us actually being able to peek inside the system and say, “Ah, this is why it works.” And the reason we have founded the Future of Life Institute was because all of us founders, we love technology, and we felt that the reason we would prefer living today rather than any time in the past, is all because of technology. But, for the first time in cosmic history, this technology is also on the verge of giving us the ability to actually self-destruct as a civilization. If we build AI, which can amplify human intelligence like never before, and eventually supersede it, then just imagine your least favorite leader on the planet, and imagine them having artificial general intelligence so they can impose their will on the rest of Earth.

How does that make you feel? It does not make me feel great, and I had a New Year’s resolution in 2014 that I was no longer allowed to complain about stuff if I didn’t actually put some real effort into doing something about it. This is why I put so much effort into FLI. The solution is not to try to stop technology, it just ain’t going to happen. The solution is instead win what I like to call the wisdom race. Make sure that the wisdom with which we manage our technology grows faster than the power of the technology.

Lucas Perry: Awesome, excellent. As for me, I’m Lucas Perry, and I’m the project manager for the Future of Life Institute. I’ve been with FLI for about four years now, and have focused on enabling and delivering projects having to do with existential risk mitigation. Beyond basic operations tasks at FLI that help keep things going, I’ve seen my work as having three cornerstones, these being supporting research on technical AI alignment, on advocacy relating to existential risks and related issues, and on direct work via our projects focused on existential risk. 

In terms of advocacy related work, you may know me as the host of the AI Alignment Podcast Series, and more recently the host of the Future of Life Institute Podcast. I see my work on the AI Alignment Podcast Series as promoting and broadening the discussion around AI alignment and AI safety to a diverse audience of both technical experts and persons interested in the issue.

There I am striving to include a diverse range of voices from many different disciplines, in so far as they can inform the AI alignment problem. The Future of Life Institute Podcast is a bit more general, though often dealing with related issues. There I strive to have conversations about avant garde subjects as they relate to technological risk, existential risk, and cultivating the wisdom with which to manage powerful and emerging technologies. For the AI Alignment Podcast, our most popular episode of all time so far is On Becoming a Moral Realist with Peter Singer, and a close second and third were On Consciousness, Qualia, and Meaning with Mike Johnson and Andres Gomez Emilsson, and An Overview of Technical AI Alignment with Rohin Shah. There are two parts to that podcast. These were really great episodes, and I suggest you check them out if they sound interesting to you. You can do that under the podcast tab on our site or by finding us on your preferred listening platform.

As for the main FLI Podcast Series, our most popular episodes have been an interview with FLI President Max Tegmark called Life 3.0: Being Human in the Age of Artificial intelligence. A podcast similar to this one last year, called Existential Hope in 2019 and Beyond was the second most listened to FLI podcast. And then the third is a more recent podcast called The Climate Crisis As An Existential Threat with Simon Beard and Hayden Belfield. 

In so far as the other avenue of my work, my support of research can be stated quite simply as fostering review of grant applications, and also reviewing interim reports for dispersing funds related to AGI safety grants. And then just touching again on my direct work around our projects, often if you see some project put out by the Future of Life Institute, I usually have at least some involvement with it from a logistics, operations, execution, or ideation standpoint related to it.

And moving into the next line of questioning here for the team, what would you all say motivates your interest in existential risk and the work that you do at FLI? Is there anything in particular that is motivating this work for you?

Ian Rusconi: What motivates my interest in existential risk in general I think is that it’s extraordinarily interdisciplinary. But my interest in what I do at FLI is mostly that I’m really happy to have a hand in producing content that I find compelling. But it isn’t just the subjects and the topics that we cover in these podcasts, it’s how you and Ariel have done so. One of the reasons I have so much respect for the work that you two have done and consequently enjoy working on it so much is the comprehensive approach that you take in your lines of questioning.

You aren’t afraid to get into the weeds with interviewees on very specific technical details, but still seek to clarify jargon and encapsulate explanations, and there’s always an eye towards painting a broader picture so we can contextualize a subject’s placement in a field as a whole. I think that FLI’s podcasts often do a tightrope act, walking the line between popular audience and field specialists in a way that doesn’t treat the former like children, and doesn’t bore the latter with a lack of substance. And that’s a really hard thing to do. And I think it’s a rare opportunity to be able to help create something like this.

Kirsten Gronlund: I guess really broadly, I feel like there’s sort of this sense generally that a lot of these technologies and things that we’re coming up with are going to fix a lot of issues on their own. Like new technology will help us feed more people, and help us end poverty, and I think that that’s not true. We already have the resources to deal with a lot of these problems, and we haven’t been. So I think, really, we need to figure out a way to use what is coming out and the things that we’re inventing to help people. Otherwise we’re going to end up with a lot of new technology making the top 1% way more wealthy, and everyone else potentially worse off.

So I think for me that’s really what it is, is to try to communicate to people that these technologies are not, on their own, the solution, and we need to all work together to figure out how to implement them, and how to restructure things in society more generally so that we can use these really amazing tools to make the world better.

Lucas Perry: Yeah. I’m just thinking about how technology enables abundance and how it seems like there are not limits to human greed, and there are limits to human greed. Human greed can potentially want infinite power, but also there’s radically diminishing returns on one’s own happiness and wellbeing as one gains more access to more abundance. It seems like there’s kind of a duality there. 

Kirsten Gronlund: I agree. I mean, I think that’s a very effective altruist way to look at it. That those same resources, if everyone has some power and some money, people will on average be happier than if you have all of it and everyone else has less. But I feel like people, at least people who are in the position to accumulate way more money than they could ever use, tend to not think of it that way, which is unfortunate.

Tucker Davey: In general with working with FLI, I think I’m motivated by some mix of fear and hope. And I would say the general fear is that, if we as a species don’t figure out how to cooperate on advanced technology, and if we don’t agree to avoid certain dangerous paths, we’ll inevitably find some way to destroy ourselves, whether it’s through AI or nuclear weapons or synthetic biology. But then that’s also balanced by a hope that there’s so much potential for large scale cooperation to achieve our goals on these issues, and so many more people are working on these topics as opposed to five years ago. And I think there really is a lot of consensus on some broad shared goals. So I have a hope that through cooperation and better coordination we can better tackle some of these really big issues.

Emilia Javorsky: Part of the reason as a physician I went into the research side of it is this idea of wanting to help people at scale. I really love the idea of how do we use science and translational medicine, not just to help one person, but to help whole populations of people. And so for me, this issue of lethal autonomous weapons is the converse of that. This is something that really has the capacity to both destroy lives at scale in the near term, and also as we think towards questions like value alignment and longer term, more existential questions, it’s something that for me is just very motivating. 

Jared Brown: This is going to sound a little cheesy and maybe even a little selfish, but my main motivation is my kids. I know that they have a long life ahead of them, hopefully, and there’s various different versions of the future that’ll better or worse for them. And I know that emerging technology policy is going to be key to maximizing the benefit of their future and everybody else’s, and that’s ultimately what motivates me. I’ve been thinking about tech policy basically ever since I started researching and reading Futurism books when my daughter was born about eight years ago, and that’s what really got me into the field and motivated to work on it full-time.

Meia Chita-Tegmark: I like to think of my work as being ultimately about people. I think that one of the most interesting aspects of this human drama is our relationship with technology, which recently has become evermore promising and also evermore dangerous. So, I want to study that, and I feel crazy lucky that there are universities willing to pay me to do it. And also to the best of my abilities, I want to try to nudge people in the technologies that they develop in more positive directions. I’d like to see a world where technology is used to save lives and not to take lives. I’d like to see technologies that are used for nurture and care rather than power and manipulation. 

Jessica Cussins Newman: I think the integration of machine intelligence into the world around us is one of the most impactful changes that we’ll experience in our lifetimes. I’m really excited about the beneficial uses of AI, but I worry about its impacts, and the questions of not just what we can build, but what we should build. And how we could see these technologies being destabilizing, or that won’t be sufficiently thoughtful about ensuring that the systems aren’t developed or used in ways that expose us to new vulnerabilities, or impose undue burdens on particular communities.

Anthony Aguirre: I would say it’s kind of a combination of things. Everybody looks at the world and sees that there are all kinds of problems and issues and negative directions that lots of things are going, and it feels frustrating and depressing. And I feel that given that I’ve got a particular day job that’ll affords me a lot of freedom, given that I have this position at Future of Life Institute, that there are a lot of talented people around who I’m able to work with, there’s a huge opportunity, and a rare opportunity to actually do something.

Who knows how effective it’ll actually be in the end, but to try to do something and to take advantage of the freedom, and standing, and relationships, and capabilities that I have available. I kind of see that as a duty in a sense, that if you find in a place where you have a certain set of capabilities, and resources, and flexibility, and safety, you kind of have a duty to make use of that for something beneficial. I sort of feel that, and so try to do so, but I also feel like it’s just super interesting, thinking about the ways that you can create things that can be effective, it’s just a fun intellectual challenge. 

There are certainly aspects of what I do at Future of Life Institute that are sort of, “Oh, yeah, this is important so I should do it, but I don’t really feel like it.” Those are occasionally there, but mostly it feels like, “Ooh, this is really interesting and exciting, I want to get this done and see what happens.” So in that sense it’s really gratifying in both ways, to feel like it’s both potentially important and positive, but also really fun and interesting.

Max Tegmark: What really motivates me is this optimistic realization that after 13.8 billion years of cosmic history, we have reached this fork in the road where we have these conscious entities on this little spinning ball in space here who, for the first time ever, have the future in their own hands. In the stone age, who cared what you did? Life was going to be more or less the same 200 years later regardless, right? Whereas now, we can either develop super powerful technology and use it to destroy life on earth completely, go extinct and so on. Or, we can create a future where, with the help of artificial intelligence amplifying our intelligence, we can help life flourish like never before. And I’m not talking just about the next election cycle, I’m talking about for billions of years. And not just here, but throughout much of our amazing universe. So I feel actually that we have a huge responsibility, and a very exciting one, to make sure we don’t squander this opportunity, don’t blow it. That’s what lights me on fire.

Lucas Perry: So I’m deeply motivated by the possibilities of the deep future. I often take cosmological or macroscopic perspectives when thinking about my current condition or the condition of life on earth. The universe is about 13.8 billion years old and our short lives of only a few decades are couched within the context of this ancient evolving system of which we are a part. As far as we know, consciousness has only really exploded and come onto the scene in the past few hundred million years, at least in our sector of space and time, and the fate of the universe is uncertain but it seems safe to say that we have at least billions upon billions of years left before the universe perishes in some way. That means there’s likely longer than the current lifetime of the universe for earth originating intelligent life to do and experience amazing and beautiful things beyond what we can even know or conceive of today.

It seems very likely to me that the peaks and depths of human consciousness, from the worst human misery to the greatest of joy, peace, euphoria, and love, represent only a very small portion of a much larger and higher dimensional space of possible conscious experiences. So given this, I’m deeply moved by the possibility of artificial intelligence being the next stage in the evolution of life and the capacities for that intelligence to solve existential risk, for that intelligence to explore the space of consciousness and optimize the world, for super-intelligent and astronomical degrees of the most meaningful and profound states of consciousness possible. So sometimes I ask myself, what’s a universe good for if not ever evolving into higher and more profound and intelligent states of conscious wellbeing? I’m not sure, and this is still an open question for sure, but this deeply motivates me as I feel that the future can be unimaginably good to degrees and kinds of wellbeing that we can’t even conceive of today. There’s a lot of capacity there for the future to be something that is really, really, really worth getting excited and motivated about.

And moving along in terms of questioning again here, this question is again for the whole team: do you have anything more specifically that you hope results from your work, or is born of your work at FLI?

Jared Brown: So, I have two primary objectives, the first is sort of minor but significant. A lot of what I do on a day-to-day basis is advocate for relatively minor changes to existing and future near term policy on emerging technology. And some of these changes won’t make a world of difference unto themselves, but the small marginal benefits to the future can cumulate rather significantly overtime. So, I look for as many small wins as possible in different policy-making environments, and try and achieve those on a regular basis.

And then more holistically in the long-run, I really want to help destigmatize the discussion around global catastrophic and existential risk, and Traditional National Security, and International Security policy-making. It’s still quite an obscure and weird thing to say to people, I work on global catastrophic and existential risk, and it really shouldn’t be. I should be able talk to most policy-makers in security related fields, and have it not come off as a weird or odd thing to be working on. Because inherently what we’re talking about is the very worst of what could happen to you or humanity or even life as we know it on this planet. And there should be more people who work on these issues both from an effective altruistic perspective and other perspectives going forward.

Jessica Cussins Newman: I want to raise awareness about the impacts of AI and the kinds of levers that we have available to us today to help shape these trajectories. So from designing more robust machine learning models, to establishing the institutional procedures or processes that can track and monitor those design decisions and outcomes and impacts, to developing accountability and governance mechanisms to ensure that those AI systems are contributing to a better future. We’ve built a tool that can automate decision making, but we need to retain human control and decide collectively as a society where and how to implement these new abilities.

Max Tegmark: I feel that there’s a huge disconnect right now between our potential, as the human species, and the direction we’re actually heading in. We are spending most of our discussions in news media on total BS. You know, like country A and country B are squabbling about something which is quite minor, in the grand scheme of things, and people are often treating each other very badly in the misunderstanding that they’re in some kind of zero-sum game, where one person can only get better off if someone else gets worse off. Technology is not a zero-sum game. Everybody wins at the same time, ultimately, if you do it right. 

Why are we so much better off now than 50,000 years ago or 300 years ago? It’s because we have antibiotics so we don’t die of stupid diseases all the time. It’s because we have the means to produce food and keep ourselves warm, and so on, with technology, and this is nothing compared to what AI can do.

I’m very much hoping that this mindset that we all lose together or win together is something that can catch on a bit more as people gradually realize the power of this tech. It’s not the case that either China is going to win and the U.S. is going to lose, or vice versa. What’s going to happen is either we’re both going to lose because there’s going to be some horrible conflict and it’s going to ruin things for everybody, or we’re going to have a future where people in China are much better off, and people in the U.S. and elsewhere in the world are also much better off, and everybody feels that they won. There really is no third outcome that’s particularly likely.

Lucas Perry: So, in the short term, I’m hoping that all of the projects we’re engaging with help to nudge the trajectory of life on earth in a positive direction. I’m hopeful that we can mitigate an arms race in lethal autonomous weapons. I see that as being a crucial first step in coordination around AI issues such that, if that fails, it may likely be much harder to coordinate in the future on making sure that beneficial AI takes place. I am also hopeful that we can promote beneficial AI alignment and AI safety research farther and mainstream its objectives and understandings about the risks posed by AI and what it means to create beneficial AI. I’m hoping that we can maximize the wisdom with which we handle technology through projects and outreach, which explicitly cultivate ethics and coordination and governance in ways which help to direct and develop technologies in ways that are beneficial.

I’m also hoping that we can promote and instantiate a culture and interest in existential risk issues and the technical, political, and philosophical problems associated with powerful emerging technologies like AI. It would be wonderful if the conversations that we have on the podcast and at FLI and in the surrounding community weren’t just something for us. These are issues that are deeply interesting and will ever become more important as technology becomes more powerful. And so I’m really hoping that one day discussions about existential risk and all the kinds of conversations that we have on the podcast are much more mainstream, are normal, that there are serious institutions in government and society which explore these, is part of common discourse as a society and civilization.

Emilia Javorsky: In an ideal world, all of FLI’s work in this area, a great outcome would be the realization of the Asilomar principle that an arms race in lethal autonomous weapons must be avoided. I hope that we do get there in the shorter term. I think the activities that we’re doing now on increasing awareness around this issue, better understanding and characterizing the unique risks that these systems pose across the board from a national security perspective, a human rights perspective, and an AI governance perspective, are a really big win in my book.

Meia Chita-Tegmark: When I allow myself to unreservedly daydream about how I want my work to manifest itself into the world, I always conjure up fantasy utopias in which people are cared for and are truly inspired. For example, that’s why I am very committed to fighting against the development of lethal autonomous weapons. It’s precisely because a world with such technologies would be one in which human lives would be cheap, killing would be anonymous, our moral compass would likely be very damaged by this. I want to start work on using technology to help people, maybe to heal people. In my research, I tried to think of various disabilities and how technology can help with those, but that is just one tiny aspect of a wealth of possibilities for using technology, and in particular, AI for good.

Anthony Aguirre: I’ll be quite gratified if I can find that some results of some of the things that I’ve done help society be better and more ready, and to wisely deal with challenges that are unfolding. There are a huge number of problems in society, but there are a particular subset that are just sort of exponentially growing problems, because they have to do with exponentially advancing technology. And the set of people who are actually thinking proactively of the problems that those technologies are going to create, rather than just creating the technologies or sort of dealing with the problems when they arise, it’s quite small.

FLI is a pretty significant part of that tiny community of people who are thinking about that. But I also think it’s very important. Problems are better solved in advance, if possible. So I think anything that we can do to nudge things in the right direction, taking the relatively high point of leverage I think the Future of Life Institute has, will feel useful and worthwhile. Any of these projects being successful, I think will have a significant positive impact, and it’s just a question of buckling down and trying to get them to work.

Kirsten Gronlund: A big part of this field, not necessarily, but sort of just historically has been that it’s very male, and it’s very white, and in and of itself is a pretty privileged group of people, and something that I personally care about a lot is to try to expand some of these conversations around the future, and what we want it to look like, and how we’re going to get there, and involve more people and more diverse voices, more perspectives.

It goes along with what I was saying, that if we don’t figure out how to use these technologies in better ways, we’re just going to be contributing to people who have historically been benefiting from technology, and so I think bringing some of the people who have historically not been benefiting from technology and the way that our society is structured into these conversations, can help us figure out how to make things better. I’ve definitely been trying, while we’re doing this book guide thing, to make sure that there’s a good balance of male and female authors, people of color, et cetera and same with our podcast guests and things like that. But yeah, I mean I think there’s a lot more to be done, definitely, in that area.

Tucker Davey: So with the projects related to FLI’s AI communication strategy, I am hopeful that as an overall community, as an AI safety community, as an effective altruism community, existential risk community, we’ll be able to better understand what our core beliefs are about risks from advanced AI, and better understand how to communicate to different audiences, whether these are policymakers that we need to convince that AI is a problem worth considering, or whether it’s just the general public, or shareholders, or investors. Different audiences have different ideas of AI, and if we as a community want to be more effective at getting them to care about this issue and understand that it’s a big risk, we need to figure out better ways to communicate with them. And I’m hoping that a lot of this communications work will help the community as a whole, not just FLI, communicate with these different parties and help them understand the risks.

Ian Rusconi: Well, I can say that I’ve learned more since I started working on these podcasts about more disparate subjects than I had any idea about. Take lethal autonomous weapon systems, for example, I didn’t know anything about that subject when I started. These podcasts are extremely educational, but they’re conversational, and that makes them accessible, and I love that. And I hope that as our audience increases, other people find the same thing and keep coming back because we learn something new every time. I think that through podcasts, like the ones that we put out at FLI, we are enabling that sort of educational enrichment.

Lucas Perry: Cool. I feel the same way. So, you actually have listened to more FLI podcasts than perhaps anyone, since you’ve listened to all of them. Of all of these podcasts, do you have any specific projects, or a series that you have found particularly valuable? Any favorite podcasts, if you could mention a few, or whatever you found most valuable?

Ian Rusconi: Yeah, a couple of things. First, back in February, Ariel and Max Tegmark did a two part conversation with Matthew Meselson in advance of FLI awarding him in April, and I think that was probably the most fascinating and wide ranging single conversation I’ve ever heard. Philosophy, science history, weapons development, geopolitics, the value of the humanities from a scientific standpoint, artificial intelligence, treaty development. It was just such an incredible amount of lived experience and informed perspective in that conversation. And, in general, when people ask me what kinds of things we cover on the FLI podcast, I point them to that episode.

Second, I’m really proud of the work that we did on Not Cool, A Climate Podcast. The amount of coordination and research Ariel and Kirsten put in to make that project happen was staggering. I think my favorite episodes from there were those dealing with the social ramifications of climate change, specifically human migration. It’s not my favorite topic to think about, for sure, but I think it’s something that we all desperately need to be aware of. I’m oversimplifying things here, but Kris Ebi’s explanations of how crop failure and malnutrition and vector borne diseases can lead to migration, Cullen Hendrix touching on migration as it relates to the social changes and conflicts born of climate change, Lindsay Getschel’s discussion of climate change as a threat multiplier and the national security implications of migration.

Migration is happening all the time and it’s something that we keep proving we’re terrible at dealing with, and climate change is going to increase migration, period. And we need to figure out how to make it work and we need to do it in a way that ameliorates living standards and prevents this extreme concentrated suffering. And there are questions about how to do this while preserving cultural identity, and the social systems that we have put in place, and I know none of these are easy. But if instead we’d just take the question of, how do we reduce suffering? Well, we know how to do that and it’s not complicated per se: have compassion and act on it. We need compassionate government and governance. And that’s a thing that came up a few times, sometimes directly and sometimes obliquely, in Not Cool. The more I think about how to solve problems like these, the more I think the intelligent answer is compassion.

Lucas Perry: So, do you feel like you just learned a ton about climate change from the Not Cool podcast that you just had no idea about?

Ian Rusconi: Yeah, definitely. And that’s really something that I can say about all of FLI’s podcast series in general, is that there are so many subtopics on the things that we talk about that I always learn something new every time I’m putting together one of these episodes. 

Some of the actually most thought provoking podcasts to me are the ones about the nature of intelligence and cognition, and what it means to experience something, and how we make decisions. Two of the AI Alignment Podcast episodes from this year stand out to me in particular. First was the one with Josh Green in February, which did an excellent job of explaining the signal grounding problem and grounded cognition in an understandable and engaging way. And I’m also really interested in his lab’s work using the veil of ignorance. And second was the episode with Mike Johnson and Andres Gomez Emilsson of the Qualia Research Institute in May, where I particularly liked the discussion of electromagnetic harmony in the brain, and the interaction between the consonance and dissonance of it’s waves, and how you can basically think of music as a means by which we can hack our brains. Again, it gets back to the fabulously, extraordinarily interdisciplinary aspect of everything that we talk about here.

Lucas Perry: Kirsten, you’ve also been integral to the podcast process. What are your favorite things that you’ve done at FLI in 2019, and are there any podcasts in particular that stand out for you?

Kirsten Gronlund: The Women For The Future campaign was definitely one of my favorite things, which was basically just trying to highlight the work of women involved in existential risk, and through that try to get more women feeling like this is something that they can do and to introduce them to the field a little bit. And then also the Not Cool Podcast that Ariel and I did. I know climate isn’t the major focus of FLI, but it is such an important issue right now, and it was really just interesting for me because I was much more closely involved with picking the guests and stuff than I have been with some of the other podcasts. So it was just cool to learn about various people and their research and what’s going to happen to us if we don’t fix the climate. 

Lucas Perry: What were some of the most interesting things that you learned from the Not Cool podcast? 

Kirsten Gronlund: Geoengineering was really crazy. I didn’t really know at all what geoengineering was before working on this podcast, and I think it was Alan Robock in his interview who was saying even just for people to learn about the fact that one of the solutions that people are considering to climate change right now being shooting a ton of crap into the atmosphere and basically creating a semi nuclear winter, would hopefully be enough to kind of freak people out into being like, “maybe we should try to fix this a different way.” So that was really crazy.

I also thought it was interesting just learning about some of the effects of climate change that you wouldn’t necessarily think of right away. The fact that they’ve shown the links between increased temperature and upheaval in government, and they’ve shown links between increased temperature and generally bad mood, poor sleep, things like that. The quality of our crops is going to get worse, so we’re going to be eating less nutritious food.

Then some of the cool things, I guess this ties in as well with artificial intelligence, is some of the ways that people are using some of these technologies like AI and machine learning to try to come up with solutions. I thought that was really cool to learn about, because that’s kind of like what I was saying earlier where if we can figure out how to use these technologies in productive ways. They are such powerful tools and can do so much good for us. So it was cool to see that in action in the ways that people are implementing automated systems and machine learning to reduce emissions and help out with the climate.

Lucas Perry: From my end, I’m probably most proud of our large conference, Beneficial AGI 2019, we did to further mainstream AGI safety thinking and research and then the resulting projects which were a result of conversations which took place there were also very exciting and encouraging. I’m also very happy about the growth and development of our podcast series. This year, we’ve had over 200,000 listens to our podcasts. So I’m optimistic about the continued growth and development of our outreach through this medium and our capacity to inform people about these crucial issues.

Everyone else, other than podcasts, what are some of your favorite things that you’ve done at FLI in 2019?

Tucker Davey: I would have to say the conferences. So the beneficial AGI conference was an amazing start to the year. We gathered such a great crowd in Puerto Rico, people from the machine learning side, from governance, from ethics, from psychology, and really getting a great group together to talk out some really big questions, specifically about the long-term future of AI, because there’s so many conferences nowadays about the near term impacts of AI, and very few are specifically dedicated to thinking about the long term. So it was really great to get a group together to talk about those questions and that set off a lot of good thinking for me personally. That was an excellent conference. 

And then a few months later, Anthony and a few others organized a conference called the Augmented Intelligence Summit, and that was another great collection of people from many different disciplines, basically thinking about a hopeful future with AI and trying to do world building exercises to figure out what that ideal world with AI would look like. These conferences and these events in these summits do a great job of bringing together people from different disciplines in different schools of thought to really tackle these hard questions, and everyone who attends them is really dedicated and motivated, so seeing all those faces is really inspiring.

Jessica Cussins Newman: I’ve really enjoyed the policy engagement that we’ve been able to have this year. You know, looking back to last year, we did see a lot of successes around the development of ethical principles for AI, and I think this past year, there’s been significant interest in actually implementing those principles into practice. So seeing many different governance forums, both within the U.S. and around the world, look to that next level, and so I think one of my favorite things has just been seeing FLI become a trusted resource for so many of those governance and policies processes that I think will significantly shape the future of AI.

I think the thing that I continue to value significantly about FLI is its ability as an organization to just bring together an amazing network of AI researchers and scientists, and to be able to hold events, and networking and outreach activities, that can merge those communities with other people thinking about issues around governance or around ethics or other kinds of sectors and disciplines. We have been playing a key role in translating some of the technical challenges related to AI safety and security into academic and policy spheres. And so that continues to be one of my favorite things that FLI is really uniquely good at.

Jared Brown: A recent example here, Future of Life Institute submitted some comments on a regulation that the Department of Housing and Urban Development put out in the U.S. And essentially the regulation is quite complicated, but they were seeking comment about how to integrate artificial intelligence systems into the legal liability framework surrounding something called ‘the Fair Housing Act,’ which is an old, very important civil rights legislation and protection to prevent discrimination in the housing market. And their proposal was essentially to grant users, such as a mortgage lender, or the banking system seeking loans, or even a landlord, if they were to use an algorithm to decide who they rent out a place to, or who to give a loan, that met certain technical standards, they’d be given liability protection. And this stems from the growing use of AI in the housing market. 

Now, in theory, there’s nothing wrong with using algorithmic systems so long as they’re not biased, and they’re accurate, and well thought out. However, if you grant it like HUD wanted to, blanket liability protection, you’re essentially telling that bank officer or that landlord that they should only exclusively use those AI systems that have the liability protection. And if they see a problem in those AI systems, and they’ve got somebody sitting across from them, and think this person really should get a loan, or this person should be able to rent my apartment because I think they’re trustworthy, but the AI algorithm says “no,” they’re not going to dispute what the AI algorithm tells them too, because to do that, they take on liability of their own, and could potentially get sued. So, there’s a real danger here in moving too quickly in terms of how much legal protection we give these systems. And so, the Future of Life Institute, as well as many other different groups, commented on this proposal and pointed out these flaws to the Department of Housing and Urban Development. That’s an example of just one of many different things that the Future of Life has done, and you can actually go online and see our public comments for yourself, if you want to.

Lucas Perry:Wonderful.

Jared Brown: Honestly, a lot of my favorite things are just these off the record type conversations that I have in countless formal and informal settings with different policymakers and people who influence policy. The policy-making world is an old-fashioned, face-to-face type business, and essentially you really have to be there, and to meet these people, and to have these conversations to really develop a level of trust, and a willingness to engage with them in order to be most effective. And thankfully I’ve had a huge range of those conversations throughout the year, especially on AI. And I’ve been really excited to see how well received Future of Life has been as an institution. Our reputation precedes us because of a lot of the great work we’ve done in the past with the Asilomar AI principles, and the AI safety grants. It’s really helped me get in the room for a lot of these conversations, and given us a lot of credibility as we discuss near-term AI policy.

In terms of bigger public projects, I also really enjoyed coordinating with some community partners across the space in our advocacy on the U.S. National Institute of Standards and Technology’s plan for engaging in the development of technical standards on AI. In the policy realm, it’s really hard to see some of the end benefit of your work, because you’re doing advocacy work, and it’s hard to get folks to really tell you why the certain changes were made, and if you were able to persuade them. But in this circumstance, I happen to know for a fact that we had real positive effect on the end products that they developed. I talked to the lead authors about it, and others, and can see the evidence in the final product of the effect of our changes.

In addition to our policy and advocacy work, I really, really like that FLI continues to interface with the AI technical expert community on a regular basis. And this isn’t just through our major conferences, but also informally throughout the entire year, through various different channels and personal relationships that we’ve developed. It’s really critical for anyone’s policy work to be grounded in the technical expertise on the topic that they’re covering. And I’ve been thankful for the number of opportunities I’ve been given throughout the year to really touch base with some of the leading minds in AI about what might work best, and what might not work best from a policy perspective, to help inform our own advocacy and thinking on various different issues.

I also really enjoy the educational and outreach work that FLI is doing. As with our advocacy work, it’s sometimes very difficult to see the end benefit of the work that we do with our podcasts, and our website, and our newsletter. But I know anecdotally, from various different people, that they are listened too, that they are read by leading policymakers and researchers in this space. And so, they have a real effect on developing a common understanding in the community and helping network and develop collaboration on some key topics that are of interest to the Future of Life and people like us.

Emilia Javorsky: 2019 was a great year at FLI. It’s my first year at FLI, so I’m really excited to be part of such an incredible team. There are two real highlights that come to mind. One was publishing an article in the British Medical Journal on this topic of engaging the medical community in the lethal autonomous weapons debate. In previous disarmament conversations, it’s always been a community that has played an instrumental role in getting global action on these issues passed, whether you look at nuclear, landmines, biorisk… So that was something that I thought was a great contribution, because up until now, they hadn’t really been engaged in the discussion.

The other that comes to mind that was really amazing was a workshop that we hosted, where we brought together AI researchers, and roboticists, and lethal autonomous weapons experts, with very divergent range of views of the topic, to see if they could achieve consensus on something. Anything. We weren’t really optimistic to say what that could be going into it, and the result of that was actually remarkably heartening. They came up with a roadmap that outlined four components for action on lethal autonomous weapons, including things like the potential role that a moratorium may play, research areas that need exploration, non-proliferation strategies, ways to avoid unintentional escalation. They actually published this in the IEEE Spectrum, which I really recommend reading, but it was just really exciting to see just how much area of agreement and consensus that can exist in people that you would normally think have very divergent views on the topic.

Max Tegmark: To make it maximally easy for them to get along, we actually did this workshop in our house, and we had lots of wine. And because they were in our house, also it was a bit easier to exert social pressure on them to make sure they were nice to each other, and have a constructive discussion. The task we gave them was simply: write down anything that they all agreed on that should be done to reduce the risk of terrorism or destabilizing events from this tech. And you might’ve expected a priori that they would come up with a blank piece of paper, because some of these people had been arguing very publicly that we need lethal autonomous weapons, and others had been arguing very vociferously that we should ban them. Instead, it was just so touching to see that when they actually met each other, often for the first time, they could actually listen directly to each other, rather than seeing weird quotes in the news about each other. 

Meia Chita-Tegmark: If I had to pick one thing, especially in terms of emotional intensity, it’s really been a while since I’ve been on such an emotional roller coaster as the one during the workshop related to lethal autonomous weapons. It was so inspirational to see how people that come with such diverging opinions could actually put their minds together, and work towards finding consensus. For me, that was such a hope inducing experience. It was a thrill.

Max Tegmark: They built a real camaraderie and respect for each other, and they wrote this report with five different sets of recommendations in different areas, including a moratorium on these things and all sorts of measures to reduce proliferation, and terrorism, and so on, and that made me feel more hopeful.

We got off to a great start I feel with our January 2019 Puerto Rico conference. This was the third one in a series where we brought together world leading AI researchers from academia, and industry, and other thinkers, to talk not about how to make AI more powerful, but how to make it beneficial. And what I was particularly excited about was that this was the first time when we also had a lot of people from China. So it wasn’t just this little western club, it felt much more global. It was very heartening to meet to see how well everybody got along and shared visions people really, really had. And I hope that if people who are actually building this stuff can all get along, can help spread this kind of constructive collaboration to the politicians and the political leaders in their various countries, we’ll all be much better off.

Anthony Aguirre: That felt really worthwhile in multiple aspects. One, just it was a great meeting getting together with this small, but really passionately positive, and smart, and well-intentioned, and friendly community. It’s so nice to get together with all those people, it’s very inspiring. But also, that out of that meeting came a whole bunch of ideas for very interesting and important projects. And so some of the things that I’ve been working on are projects that came out of that meeting, and there’s a whole long list of other projects that came out of that meeting, some of which some people are doing, some of which are just sitting, gathering dust, because there aren’t enough people to do them. That feels like really good news. It’s amazing when you get a group of smart people together to think in a way that hasn’t really been widely done before. Like, “Here’s the world 20 or 30 or 50 or 100 years from now, what are the things that we’re going to want to have happened in order for the world to be good then?”

Not many people sit around thinking that way very often. So to get 50 or 100 people who are really talented together thinking about that, it’s amazing how easy it is to come up with a set of really compelling things to do. Now actually getting those done, getting the people and the money and the time and the organization to get those done is a whole different thing. But that was really cool to see, because you can easily imagine things that have a big influence 10 or 15 years from now that were born right at that meeting.

Lucas Perry: Okay, so that hits on BAGI. So, were there any other policy-related things that you’ve done at FLI in 2019 that you’re really excited about?

Anthony Aguirre: It’s been really good to see, both at FLI and globally, the new and very serious attention being paid to AI policy and technology policy in general. We created the Asilomar principles back in 2017, and now two years later, there are multiple other sets of principles, many of which are overlapping and some of which aren’t. And more importantly, now institutions coming into being, international groups like the OECD, like the United Nations, the European Union, maybe someday the US government, actually taking seriously these sets of principles about how AI should be developed and deployed, so as to be beneficial.

There’s kind of now too much going on to keep track of, multiple bodies, conferences practically every week, so the FLI policy team has been kept busy just keeping track of what’s going on, and working hard to positively influence all these efforts that are going on. Because of course while there’s a lot going on, it doesn’t necessarily mean that there’s a huge amount of expertise that is available to feed those efforts. AI is relatively new on the world’s stage, at least at the size that it’s assuming. AI and policy expertise, that intersection, there just aren’t a huge number of people who are ready to give useful advice on the policy side and the technical side and what the ramifications are and so on.

So I think the fact that FLI has been there from the early days of AI policy five years ago, means that we have a lot to offer to these various efforts that are going on. I feel like we’ve been able to really positively contribute here and there, taking opportunistic chances to lend our help and our expertise to all kinds of efforts that are going on and doing real serious policy work. So that’s been really interesting to see that unfold and how rapidly these various efforts are gearing up around the world. I think that’s something that FLI can really do, bringing the technical expertise to make those discussions and arguments more sophisticated, so that we can really take it to the next step and try to get something done.

Max Tegmark: Another one which was very uplifting is this tradition we have to celebrate unsung heroes. So three years ago we celebrated the guy who prevented the world from getting nuked in 1962, Vasili Arkhipov. Two years ago, we celebrated the man who probably helped us avoid getting nuked in 1983, Stanislav Petrov. And this year we celebrated an American who I think has done more than anyone else to prevent all sorts of horrible things happening with bioweapons, Matthew Meselson from Harvard, who ultimately persuaded Kissinger, who persuaded Brezhnev and everyone else that we should just ban them. 

We celebrated them all by giving them or their survivors a $50,000 award and having a ceremony where we honored them, to remind the world of how valuable it is when you can just draw a clear, moral line between the right thing to do and the wrong thing to do. Even though we call this the Future of Life award officially, informally, I like to think of this as our unsung hero award, because there really aren’t awards particularly for people who prevented shit from happening. Almost all awards are for someone causing something to happen. Yet, obviously we wouldn’t be having this conversation if there’d been a global thermonuclear war. And it’s so easy to think that just because something didn’t happen, there’s not much to think about it. I’m hoping this can help create both a greater appreciation of how vulnerable we are as a species and the value of not being too sloppy. And also, that it can help foster a tradition that if someone does something that future generations really value, we actually celebrate them and reward them. I want us to have a norm in the world where people know that if they sacrifice themselves by doing something courageous, that future generations will really value, then they will actually get appreciation. And if they’re dead, their loved ones will get appreciation.

We now feel incredibly grateful that our world isn’t radioactive rubble, or that we don’t have to read about bioterrorism attacks in the news every day. And we should show our gratitude, because this sends a signal to people today who can prevent tomorrow’s catastrophes. And the reason I think of this as an unsung hero award, and the reason these people have been unsung heroes, is because what they did was often going a little bit against what they were supposed to do at the time, according to the little system they were in, right? Arkhipov and Petrov, neither of them got any medals for averting nuclear war because their peers either were a little bit pissed at them for violating protocol, or a little bit embarrassed that we’d almost had a war by mistake. And we want to send the signal to the kids out there today that, if push comes to shove, you got to go with your own moral principles.

Lucas Perry: Beautiful. What project directions are you most excited about moving in, in 2020 and beyond?

Anthony Aguirre: Along with the ones that I’ve already mentioned, something I’ve been involved with is Metaculus, this prediction platform, and the idea there is there are certain facts about the future world, and Metaculus is a way to predict probabilities for those facts being true about the future world. But they’re also facts about the current world, that we either don’t know whether they’re true or not or we disagree about whether they’re true or not. Something I’ve been thinking a lot about is how to extend the predictions of Metaculus into a general truth-seeking mechanism. If there’s something that’s contentious now, and people disagree about something that should be sort of a fact, can we come up with a reliable truth-seeking arbiter that people will believe, because it’s been right in the past, and it has very clear reliable track record for getting things right, in the same way that Metaculus has that record for getting predictions right?

So that’s something that interests me a lot, is kind of expanding that very strict level of accountability and track record creation from prediction to just truth-seeking. And I think that could be really valuable, because we’re entering this phase where people feel like they don’t know what’s true and facts are under contention. People simply don’t know what to believe. The institutions that they’re used to trusting to give them reliable information are either conflicting with each other or getting drowned in a sea of misinformation.

Lucas Perry: So, would this institution gain its credibility and epistemic status and respectability by taking positions on unresolved, yet concrete issues, which are likely to resolve in the short-term?

Anthony Aguirre: Or the not as short-term. But yeah, so just like in a prediction, where there might be disagreements as to what’s going to happen because nobody quite knows, and then at some point something happens and we all agree, “Oh, that happened, and some people were right and some people were wrong,” I think there are many propositions under contention now, but in a few years when the dust has settled and there’s not so much heat about them, everybody’s going to more or less agree on what the truth was.

And so I think, in a sense, this is about saying, “Here’s something that’s contentious now, let’s make a prediction about how that will turn out to be seen five or 10 or 15 years from now, when the dust has settled people more or less agree on how this was.”

I think there’s only so long that people can go without feeling like they can actually rely on some source of information. I mean, I do think that there is a reality out there, and ultimately you have to pay a price if you are not acting in accordance with what is true about that reality. You can’t indefinitely win by just denying the truth of the way that the world is. People seem to do pretty well for awhile, but I maintain my belief that eventually there will be a competitive advantage in understanding the way things actually are, rather than your fantasy of them.

We in the past did have trusted institutions that people generally listened to, and felt like I’m being told that basic truth. Now they weren’t always, and there were lots of problems with those institutions, but we’ve lost something, in that almost nobody trusts anything anymore at some level, and we have to get that back. We will solve this problem, I think, in the sense that we sort of have to. What that solution will look like is unclear, and this is sort of an effort to seek some way to kind of feel our way towards a potential solution to that.

Tucker Davey: I’m definitely excited to continue this work on our AI messaging and generally just continuing the discussion about advanced AI and artificial general intelligence within the FLI team and within the broader community, to get more consensus about what we believe and how we think we should approach these topics with different communities. And I’m also excited to see how our policy team continues to make more splashes across the world, because it’s really been exciting to watch how Jared and Jessica and Anthony have been able to talk with so many diverse shareholders and help them make better decisions about AI.

Jessica Cussins Newman: I’m most excited to see the further development of some of these global AI policy forums in 2020. For example, the OECD is establishing an AI policy observatory, which we’ll see further development on early in next year. And FLI is keen to support this initiative, and I think it may be a really meaningful forum for global coordination and cooperation on some of these key AI global challenges. So I’m really excited to see what they can achieve.

Jared Brown: I’m really looking forward to the opportunity the Future of Life has to lead the implementation of a recommendation related to artificial intelligence from the UN’s High-Level Panel on Digital Cooperation. This is a group that was led by Jack Ma and Melinda Gates, and they produced an extensive report that had many different recommendations on a range of digital or cyber issues, including one specifically on artificial intelligence. And because of our past work, we were invited to be a leader on the effort to implement and further refine the recommendation on artificial intelligence. And we’ll be able to do that with cooperation from the government of France, and Finland, and also with a UN agency called the UN Global Pulse. So I’m really excited about this opportunity to help lead a major project in the global governance arena, and to help actualize how some of these early soft law norms that have developed in AI policy can be developed for a better future.

I’m also excited about continuing to work with other civil society organizations, such as the Future of Humanity Institute, the Center for the Study of Existential Risk, other groups that are like-minded in their approach to tech issues. And helping to inform how we work on AI policy in a number of different governance spaces, including with the European Union, the OECD, and other environments where AI policy has suddenly become the topic du jour of interest to policy-makers.

Emilia Javorsky: Something that I’m really excited about is continuing to work on this issue of global engagement in the topic of lethal autonomous weapons, as I think this issue is heading in a very positive direction. By that I mean starting to move towards meaningful action. And really the only way we get to action on this issue is through education, because policy makers really need to understand what these systems are, what their risks are, and how AI differs from traditional other areas of technology that have really well established existing governance frameworks. So that’s something I’m really excited about for the next year. And this has been especially in the context of engaging with states at the United nations. So it’s really exciting to continue those efforts and continue to keep this issue on the radar.

Kirsten Gronlund: I’m super excited about our website redesign. I think that’s going to enable us to reach a lot more people and communicate more effectively, and obviously it will make my life a lot easier. So I think that’s going to be great.

Lucas Perry: I’m excited about that too. I think there’s a certain amount of a maintenance period that we need to kind of go through now, with regards to the website and a bunch of the pages, so that everything is refreshed and new and structured better. 

Kirsten Gronlund: Yeah, we just need like a little facelift. We are aware that the website right now is not super user friendly, and we are doing an incredibly in depth audit of the site to figure out, based on data, what’s working and what isn’t working, and how people would best be able to use the site to get the most out of the information that we have, because I think we have really great content, but the way that the site is organized is not super conducive to finding it, or using it.

So anyone who likes our site and our content but has trouble navigating or searching or anything: hopefully that will be getting a lot easier.

Ian Rusconi: I think I’d be interested in more conversations about ethics overall, and how ethical decision making is something that we need more of, as opposed to just economic decision making, and reasons for that with actual concrete examples. It’s one of the things that I find is a very common thread throughout almost all of the conversations that we have, but is rarely explicitly connected from one episode to another. And I think that there is some value in creating a conversational narrative about that. If we look at, say, the Not Cool Project, there are episodes about finance, and episodes about how the effects of what we’ve been doing to create global economy have created problems. And if we look at the AI Alignment Podcasts, there are concerns about how systems will work in the future, and who they will work for, and who benefits from things. And if you look at FLI’s main podcast, there are concerns about denuclearization, and lethal autonomous weapons, and things like that, and there are major ethical considerations to be had in all of these.

And I think that there’s benefit in taking all of these ethical considerations, and talking about them specifically outside of the context of the fields that they are in, just as a way of getting more people to think about ethics. Not in opposition to thinking about, say, economics, but just to get people thinking about ethics as a stand-alone thing, before trying to introduce how it’s relevant to something. I think if more people thought about ethics, we would have a lot less problems than we do.

Lucas Perry: Yeah, I would be interested in that too. I would first want to know empirically how much of the decisions that the average human being makes a day are actually informed by “ethical decision making,” which I guess my intuition at the moment is probably not that much?

Ian Rusconi: Yeah, I don’t know how much ethics plays into my autopilot-type decisions. I would assume. Probably not very much.

Lucas Perry: Yeah. We think about ethics explicitly a lot. I think that that definitely shapes my terminal values. But yeah, I don’t know, I feel confused about this. I don’t know how much of my moment to moment lived experience and decision making is directly born of ethical decision making. So I would be interested in that too, with that framing that I would first want to know the kinds of decision making faculties that we have, and how often each one is employed, and the extent to which improving explicit ethical decision making would help in making people more moral in general.

Ian Rusconi: Yeah, I could absolutely get behind that.

Max Tegmark: What I find also to be a concerning trend, and a predictable one, is that just like we had a lot of greenwashing in the corporate sector about environmental and climate issues, where people would pretend to care about the issues just so they didn’t really have to do much, we’re seeing a lot of what I like to call “ethics washing” now in AI, where people say, “Yeah, yeah. Okay, let’s talk about AI ethics now, like an ethics committee, and blah, blah, blah, but let’s not have any rules or regulations, or anything. We can handle this because we’re so ethical.” And interestingly, the very same people who talk the loudest about ethics are often among the ones who are the most dismissive about the bigger risks from human level AI, and beyond. And also the ones who don’t want to talk about malicious use of AI, right? They’ll be like, “Oh yeah, let’s just make sure that robots and AI systems are ethical and do exactly what they’re told,” but they don’t want to discuss what happens when some country, or some army, or some terrorist group has such systems, and tells them to do things that are horrible for other people. That’s an elephant in the room we are looking forward to help draw more attention to, I think, in the coming year. 

And what I also feel is absolutely crucial here is to avoid splintering the planet again, into basically an eastern and a western zone of dominance that just don’t talk to each other. Trade is down between China and the West. China has its great firewall, so they don’t see much of our internet, and we also don’t see much of their internet. It’s becoming harder and harder for students to come here from China because of visas, and there’s sort of a partitioning into two different spheres of influence. And as I said before, this is a technology which could easily make everybody a hundred times better or richer, and so on. You can imagine many futures where countries just really respect each other’s borders, and everybody can flourish. Yet, major political leaders are acting like this is some sort of zero-sum game. 

I feel that this is one of the most important things to help people understand that, no, it’s not like we have a fixed amount of money or resources to divvy up. If we can avoid very disruptive conflicts, we can all have the future of our dreams.

Lucas Perry: Wonderful. I think this is a good place to end on that point. So, what are reasons that you see for existential hope, going into 2020 and beyond?

Jessica Cussins Newman: I have hope for the future because I have seen this trend where it’s no longer a fringe issue to talk about technology ethics and governance. And I think that used to be the case not so long ago. So it’s heartening that so many people and institutions, from engineers all the way up to nation states, are really taking these issues seriously now. I think that momentum is growing, and I think we’ll see engagement from even more people and more countries in the future.

I would just add that it’s a joy to work with FLI, because it’s an incredibly passionate team, and everybody has a million things going on, and still gives their all to this work and these projects. I think what unites us is that we all think these are some of the most important issues of our time, and so it’s really a pleasure to work with such a dedicated team.

Lucas Perry:  Wonderful.

Jared Brown: As many of the listeners will probably realize, governments across the world have really woken up to this thing called artificial intelligence, and what it means for civil society, their governments, and the future really of humanity. And I’ve been surprised, frankly, over the past year, about how many of the new national, and international strategies, the new principles, and so forth are actually quite aware of both the potential benefits but also the real safety risks associated with AI. And frankly, this time this year, last year, I wouldn’t have thought as many principles would have come out, that there’s a lot of positive work in those principles, there’s a lot of serious thought about the future of where this technology is going. And so, on the whole, I think the picture is much better than what most people might expect in terms of the level of high-level thinking that’s going on in policy-making about AI, its benefits, and its risks going forward. And so on that score, I’m quite hopeful that there’s a lot of positive soft norms to work from. And hopefully we can work to implement those ideas and concepts going forward in real policy.

Lucas Perry: Awesome.

Emilia Javorsky: I am optimistic, and it comes from having had a lot of these conversations, specifically this past year, on lethal autonomous weapons, and speaking with people from a range of views and being able to sit down, coming together, having a rational and respectful discussion, and identifying actionable areas of consensus. That has been something that has been very heartening for me, because there is just so much positive potential for humanity waiting on the science and technology shelves of today, nevermind what’s in the pipeline that’s coming up. And I think that despite all of this tribalism and hyperbole that we’re bombarded with in the media every day, there are ways to work together as a society, and as a global community, and just with each other to make sure that we realize all that positive potential, and I think that sometimes gets lost. I’m optimistic that we can make that happen and that we can find a path forward on restoring that kind of rational discourse and working together.

Tucker Davey: I think my main reasons for existential hope in 2020 and beyond are, first of all, seeing how many more people are getting involved in AI safety, in effective altruism, and existential risk mitigation. It’s really great to see the community growing, and I think just by having more people involved, that’s a huge step. As a broader existential hope, I am very interested in thinking about how we can better coordinate to collectively solve a lot of our civilizational problems, and to that end, I’m interested in ways where we can better communicate about our shared goals on certain issues, ways that we can more credibly commit to action on certain things. So these ideas of credible commitment mechanisms, whether that’s using advanced technology like blockchain or whether that’s just smarter ways to get people to commit to certain actions, I think there’s a lot of existential hope for bigger groups in society coming together and collectively coordinating to make systemic change happen.

I see a lot of potential for society to organize mass movements to address some of the biggest risks that we face. For example, I think it was last year, an AI researcher, Toby Walsh, who we’ve worked with, he organized a boycott against a South Korean company that was working to develop these autonomous weapons. And within a day or two, I think, he contacted a bunch of AI researchers and they signed a pledge to boycott this group until they decided to ditch the project. And the boycotts succeeded basically within two days. And I think that’s one good example of the power of boycotts, and the power of coordination and cooperation to address our shared goals. So if we can learn lessons from Toby Walsh’s boycott, as well as from the fossil fuel and nuclear divestment movements, I think we can start to realize some of our potential to push these big industries in more beneficial directions.

So whether it’s the fossil fuel industry, the nuclear weapons industry, or the AI industry, as a collective, we have a lot of power to use stigma to push these companies in better directions. No company or industry wants bad press. And if we get a bunch of researchers together to agree that a company’s doing some sort of bad practice, and then we can credibly say that, “Look, you guys will get bad press if you guys don’t change your strategy,” many of these companies might start to change their strategy. And I think if we can better coordinate and organize certain movements and boycotts to get different companies and industries to change their practices, that’s a huge source of existential hope moving forward.

Lucas Perry: Yeah. I mean, it seems like the point that you’re trying to articulate is that there are particular instances like this thing that happened with Toby Walsh that show you the efficacy of collective action around our issues.

Tucker Davey: Yeah. I think there’s a lot more agreement on certain shared goals such,as we don’t want banks investing in fossil fuels, or we don’t want AI companies developing weapons that can make targeted kill decisions without human intervention. And if we take some of these broad shared goals and then we develop some sort of plan to basically pressure these companies to change their ways or to adopt better safety measures, I think these sorts of collective action can be very effective. And I think as a broader community, especially with more people in the community, we have much more of a possibility to make this happen.

So I think I see a lot of existential hope from these collective movements to push industries in more beneficial directions, because they can really help us, as individuals, feel more of a sense of agency that we can actually do something to address these risks.

Kirsten Gronlund: I feel like there’s actually been a pretty marked difference in the way that people are reacting to… at least things like climate change, and I sort of feel like more generally, there’s sort of more awareness just of the precariousness of humanity, and the fact that our continued existence and success on this planet is not a given, and we have to actually work to make sure that those things happen. Which is scary, and kind of exhausting, but I think is ultimately a really good thing, the fact that people seem to be realizing that this is a moment where we actually have to act and we have to get our shit together. We have to work together and this isn’t about politics, this isn’t about, I mean it shouldn’t be about money. I think people are starting to figure that out, and it feels like that has really become more pronounced as of late. I think especially younger generations, like obviously there’s Greta Thunberg and the youth movement on these issues. It seems like the people who are growing up now are so much more aware of things than I certainly was at that age, and that’s been cool to see, I think. They’re better than we were, and hopefully things in general are getting better.

Lucas Perry: Awesome.

Ian Rusconi: I think it’s often easier for a lot of us to feel hopeless than it is to feel hopeful. Most of the news that we get comes in the form of warnings, or the existing problems, or the latest catastrophe, and it can be hard to find a sense of agency as an individual when talking about huge global issues like lethal autonomous weapons, or climate change, or runaway AI.

People frame little issues that add up to bigger ones as things like death by 1,000 bee stings, or the straw that broke the camel’s back, and things like that, but that concept works both ways. 1,000 individual steps in a positive direction can change things for the better. And working on these podcasts has shown me the number of people taking those steps. People working on AI safety, international weapons bans, climate change mitigation efforts. There are whole fields of work, absolutely critical work, that so many people, I think, probably know nothing about. Certainly that I knew nothing about. And sometimes, knowing that there are people pulling for us, that’s all we need to be hopeful. 

And beyond that, once you know that work exists and that people are doing it, nothing is stopping you from getting informed and helping to make a difference. 

Kirsten Gronlund: I had a conversation with somebody recently who is super interested in these issues, but was feeling like they just didn’t have particularly relevant knowledge or skills. And what I would say is “neither did I when I started working for FLI,” or at least I didn’t know a lot about these specific issues. But really anyone, if you care about these things, you can bring whatever skills you have to the table, because we need all the help we can get. So don’t be intimidated, and get involved.

Ian Rusconi: I guess I think that’s one of my goals for the podcast, is that it inspires people to do better, which I think it does. And that sort of thing gives me hope.

Lucas Perry: That’s great. I feel happy to hear that, in general.

Max Tegmark: Let me first give a more practical reason for hope, and then get a little philosophical. So on the practical side, there are a lot of really good ideas that the AI community is quite unanimous about, in terms of policy and things that need to happen, that basically aren’t happening because policy makers and political leaders don’t get it yet. And I’m optimistic that we can get a lot of that stuff implemented, even though policy makers won’t pay attention now. If we get AI researchers around the world to formulate and articulate really concrete proposals and plans for policies that should be enacted, and they get totally ignored for a while? That’s fine, because eventually some bad stuff is going to happen because people weren’t listening to their advice. And whenever those bad things do happen, then leaders will be forced to listen because people will be going, “Wait, what are you going to do about this?” And if at that point, there are broad international consensus plans worked out by experts about what should be done, that’s when they actually get implemented. So the hopeful message I have to anyone working in AI policy is: don’t despair if you’re being ignored right now, keep doing all the good work and flesh out the solutions, and start building consensus for it among the experts, and there will be a time people will listen to you. 

To just end on a more philosophical note, again, I think it’s really inspiring to think how much impact intelligence has had on life so far. We realize that we’ve already completely transformed our planet with intelligence. If we can use artificial intelligence to amplify our intelligence, it will empower us to solve all the problems that we’re stumped by thus far, including curing all the diseases that kill our near and dear today. And for those so minded, even help life spread into the cosmos. Not even the sky is the limit, and the decisions about how this is going to go are going to be made within the coming decades, so within the lifetime of most people who are listening to this. There’s never been a more exciting moment to think about grand, positive visions for the future. That’s why I’m so honored and excited to get to work with the Future Life Institute.

Anthony Aguirre: Just like disasters, I think big positive changes can arise with relatively little warning and then seem inevitable in retrospect. I really believe that people are actually wanting and yearning for a society and a future that gives them fulfillment and meaning, and that functions and works for people.

There’s a lot of talk in the AI circles about how to define intelligence, and defining intelligence as the ability to achieve one’s goals. And I do kind of believe that for all its faults, humanity is relatively intelligent as a whole. We can be kind of foolish, but I think we’re not totally incompetent at getting what we are yearning for, and what we are yearning for is a kind of just and supportive and beneficial society that we can exist in. Although there are all these ways in which the dynamics of things that we’ve set up are going awry in all kinds of ways, and people’s own self-interest fighting it out with the self-interest of others is making things go terribly wrong, I do nonetheless see lots of people who are putting interesting, passionate effort forward toward making a better society. I don’t know that that’s going to turn out to be the force that prevails, I just hope that it is, and I think it’s not time to despair.

There’s a little bit of a selection effect in the people that you encounter through something like the Future of Life Institute, but there are a lot of people out there who genuinely are trying to work toward a vision of some better future, and that’s inspiring to see. It’s easy to focus on the differences in goals, because it seems like different factions that people want totally different things. But I think that belies the fact that there are lots of commonalities that we just kind of take for granted, and accept, and brush under the rug. Putting more focus on those and focusing the effort on, “given that we can all agree that we want these things and let’s have an actual discussion about what is the best way to get those things,” that’s something that there’s sort of an answer to, in the sense that we might disagree on what our preferences are, but once we have the set of preferences we agree on, there’s kind of the correct or more correct set of answers to how to get those preferences satisfied. We actually are probably getting better, we can get better, this is an intellectual problem in some sense and a technical problem that we can solve. There’s plenty of room for progress that we can all get behind.

Again, strong selection effect. But when I think about the people that I interact with regularly through the Future of Life Institute and other organizations that I work as a part of, they’re almost universally highly-effective, intelligent, careful-thinking, well-informed, helpful, easy to get along with, cooperative people. And it’s not impossible to create or imagine a society where that’s just a lot more widespread, right? It’s really enjoyable. There’s no reason that the world can’t be more or less dominated by such people.

As economic opportunity grows and education grows and everything, there’s no reason to see that that can’t grow also, in the same way that non-violence has grown. It used to be a part of everyday life for pretty much everybody, now many people I know go through many years without having any violence perpetrated on them or vice versa. We still live in a sort of overall, somewhat violent society, but nothing like what it used to be. And that’s largely because of the creation of wealth and institutions and all these things that make it unnecessary and impossible to have that as part of everybody’s everyday life.

And there’s no reason that can’t happen in most other domains, I think it is happening. I think almost anything is possible. It’s amazing how far we’ve come, and I see no reason to think that there’s some hard limit on how far we go.

Lucas Perry: So I’m hopeful for the new year simply because in areas that are important, I think things are on average getting better than they are getting worse. And it seems to be that much of what causes pessimism is perception that things are getting worse, or that we have these strange nostalgias for past times that we believe to be better than the present moment.

This isn’t new thinking, and is much in line with what Steven Pinker has said, but I feel that when we look at the facts about things like poverty, or knowledge, or global health, or education, or even the conversation surrounding AI alignment and existential risk, that things really are getting better, and that generally the extent to which it seems like it isn’t or that things are getting worse can be seen in many cases as our trend towards more information causing the perception that things are getting worse. But really, we are shining a light on everything that is already bad or we are coming up with new solutions to problems which generate new problems in and of themselves. And I think that this trend towards elucidating all of the problems which already exist, or through which we develop technologies and come to new solutions, which generate their own novel problems, this can seem scary as all of these bad things continue to come up, it seems almost never ending.

But they seem to me more now like revealed opportunities for growth and evolution of human civilization to new heights. We are clearly not at the pinnacle of life or existence or wellbeing, so as we encounter and generate and uncover more and more issues, I find hope in the fact that we can rest assured that we are actively engaged in the process of self-growth as a species. Without encountering new problems about ourselves, we are surely stagnating and risk decline. However, it seems that as we continue to find suffering and confusion and evil in the world and to notice how our new technologies and skills may contribute to these things, we have an opportunity to act upon remedying them and then we can know that we are still growing and that, that is a good thing. And so I think that there’s hope in the fact that we’ve continued to encounter new problems because it means that we continue to grow better. And that seems like a clearly good thing to me.

And with that, thanks so much for tuning into this Year In The Review Podcast on our activities and team as well as our feelings about existential hope moving forward. If you’re a regular listener, we want to share our deepest thanks for being a part of this conversation and thinking about these most fascinating and important of topics. And if you’re a new listener, we hope that you’ll continue to join us in our conversations about how to solve the world’s most pressing problems around existential risks and building a beautiful future for all. Many well and warm wishes for a happy and healthy end of the year for everyone listening from the Future of Life Institute team. If you find this podcast interesting, valuable, unique, or positive, consider sharing it with friends and following us on your preferred listening platform. You can find links for that on the pages for these podcasts found at futureoflife.org.