First written: Nov. 2016
We can expect smarter-than-human artificial intelligence (AI) to be better than humans at self-preservation and goal preservation. If we want our actions to have an influence on the very long-term future, we should consider focusing on outcomes with AI. As smarter-than-human artificial intelligence would likely aim to colonize space in pursuit of its goals, focusing on AI means focusing on the scenarios where the stakes will be highest. These high stakes mean we can expect our contributions to futures with AI to make a bigger expected difference than our contributions to futures without AI – even if the latter are likelier or easier to affect. Some experts emphasize that steering the development of smarter-than-human AI is important because it could make the difference between human extinction or a utopian future with large numbers of maximally happy beings. But because we cannot confidently rule out the possibility that some AI scenarios will go badly and result in suffering on astronomical scales, focusing on AI (which may include both targeted and broad approaches) is paramount for suffering-focused altruists as well as those who are more optimistic about the future.
- I. Introduction and definitions
- II. It is plausible that we create human-level AI this century
- III. Humans are not at peak intelligence
- IV. The transition from human to superhuman intelligence could be rapid
- V. By default, superintelligent AI would be indifferent to our well-being
- VI. AIs will instrumentally value self-preservation and goal preservation
- VII. Artificial sentience and risks of astronomical suffering
- VIII. Impact analysis
- IX. Acknowledgements
- X. Further reading
- XI. References
- XII. Footnotes
I. Introduction and definitions
Terms like “AI” or “intelligence” can have many different (and often vague) meanings. “Intelligence” as used here refers to the ability to achieve goals in a wide range of environments. This definition captures the essence of many common perspectives on intelligence (Legg & Hutter, 2005), and conveys the meaning that is most relevant to us, namely that agents with the highest comparative goal-achieving ability (all things considered) are the most likely to shape the future. For comparison: As the most intelligent animal, humans completely dominate other animals whenever our interests are in conflict with theirs.
While everyday use of the term “intelligence” often refers merely to something like “brainpower” or “thinking speed,” our usage also presupposes rationality, or goal-optimization in an agent’s thinking and acting. In this usage, if someone is e.g. displaying overconfidence or confirmation bias, they may not qualify as very intelligent overall, even if they score high on an IQ test. The same applies to someone who lacks willpower or self control.
Artificial intelligence refers to machines designed with the ability to pursue tasks or goals. The AI designs currently in use – ranging from trading algorithms in finance, to chess programs, to self-driving cars – are intelligent in a domain-specific sense only. Chess programs beat the best human players in chess, but they would fail terribly at operating a car. Similarly, car-driving software in many contexts already performs better than human drivers, but no amount of learning (at least not with present algorithms) would make the that software work safely on an airplane.
The most ambitious AI researchers are working to build systems that exhibit (artificial) general intelligence (AGI) – the type of intelligence we defined above, which enables the expert pursuit of virtually any task or objective. In the past few years, we have witnessed impressive progress in algorithms becoming more and more versatile. Google’s DeepMind team for example built an algorithm that learned to play 2-D Atari games on its own, achieving superhuman skill at several of them (Mnih et al., 2015). DeepMind then developed a program that beat the world champion in the game of Go (Silver et al., 2016), and – tackling more practical real-world applications – managed to cut down data center electricity costs by rearranging the cooling systems.
That DeepMind’s AI technology makes quick progress in many domains, without requiring researchers to build new architecture from scratch each time, indicates that their machine learning algorithms have already reached an impressive level of general applicability. The road may still be long, but if this trend continues, developments in AI research will eventually lead to human-level (general) intelligence. As there is no reason to assume that humans have attained the highest possible intelligence (Section III), AI may soon after reaching our own level of intelligence surpass it. Nick Bostrom (2014) popularized the term superintelligence to refer to (AGI-)systems that are vastly smarter than human experts in virtually all respects. This includes not only skills that computers traditionally excel at, such as calculus or chess, but also tasks like writing novels or talking people into doing things they otherwise would not. Note that the definitions of “AGI” and “superintelligence” leave open the question of whether these systems would exhibit something like consciousness.
This article focuses on the prospect of creating smarter-than-human artificial intelligence. For simplicity, we will use the term “AI” in a non-standard way here, to refer specifically to artificial general intelligence (AGI). The use of “AI” in this article will also leave open how such a system is implemented: While it seems plausible that the first artificial system exhibiting smarter-than-human intelligence will be run on some kind of “supercomputer,” our definition allows for alternative possibilities. The claim that altruists should focus on affecting AI outcomes is therefore intended to mean that we should focus on scenarios where the dominant force shaping the future is no longer (biological) human minds, but rather some outgrowth of information technology – perhaps acting in concert with biotechnology or other technologies. This would also e.g. allow for AI to be distributed over several interacting systems.
II. It is plausible that we create human-level AI this century
Even if we expect smarter-than-human artificial intelligence to be a century or more away, its development could already merit serious concern. As Sam Harris emphasized in his TED talk on risks and benefits of AI, we do not know how long it will take to figure out how to program ethical goals into an AI, solve other technical challenges in the space of AI safety, or establish an environment with reduced dangers of arms races. When the stakes are high enough, it pays to start preparing as soon as possible. The sooner we prepare, the better our chances of safely managing the transition.
The need for preparation is all the more urgent given that considerably shorter timelines are not out of the question, especially in light of recent developments. While timeline predictions by different AI experts span a wide range, many of those experts think it likely that human-level AI will be created this century (conditional on civilization facing no major disruptions in the meantime). Some even think it may emerge in the first half of this century: In a survey where the hundred most-cited AI researchers were asked in what year they think human-level AI is 10% likely to have arrived by, the median reply was 2024 and the mean was 2034. In response to the same question for a 50% probability of arrival, the median reply was 2050 with a mean of 2072 (Müller & Bostrom, 2016).1
While it could be argued that these AI experts are biased towards short timelines, their estimates should make us realize that human-level AI this century is a real possibility. The next section will argue that the subsequent transition from human-level AI to superintelligence could happen very rapidly after human-level AI actualizes. We are dealing with the decent possibility – e.g. above 15% likelihood even under highly conservative assumptions – that human intelligence will be surpassed by machine intelligence later this century. As such a transition will bring about huge opportunities as well as huge risks, it would be irresponsible not to prepare for it.
It should be noted that a potentially short timeline does not imply that the road to superintelligence is necessarily one of smooth progress: Metrics like Moore’s law are not guaranteed to continue indefinitely, and the rate of breakthrough publications in AI research may not increase (or even stay constant) either. The recent progress in machine learning is impressive and suggests that fairly short timelines of a decade or two are not to be ruled out. However, this progress could also be mostly due to some important but limited insights that enable companies like DeepMind to reap the low-hanging fruit before progress would slow down again. There are large gaps still to be filled before AIs reach human-level intelligence, and it is difficult to estimate how long it will take researchers to bridge these gaps. Current hype about AI may lead to disappointment in the medium term, which could bring about an “AI safety winter” with people mistakenly concluding that the safety concerns were exaggerated and smarter-than-human AI is not something we should worry about yet.
If AI progress were to slow down for a long time and then unexpectedly speed up again, a transition to superintelligence could happen with little warning (Shulman & Sandberg, 2010). This scenario is plausible because gains in software efficiency make a larger comparative difference to an AI’s overall capabilities when the hardware available is more powerful. And once an AI develops the intelligence of its human creators, it could start taking part in its own self-improvement (see section IV).
For AI progress to stagnate for a long period of time before reaching human-level intelligence, biological brains would have to have surprisingly efficient architectures that AI cannot achieve despite further hardware progress and years of humans conducting more AI research. However, as long as hardware progress does not come to a complete halt, AGI research will eventually not have to surpass the human brain’s architecture or efficiency anymore. Instead, it could become possible to just copy it: The “foolproof” way to build human-level intelligence would be to develop whole brain emulation (WBE) (Sandberg & Bostrom, 2008), the exact copying of the brain’s pattern of computation (input-output behavior as well as isomorphic internal states at any point in the computation) onto a computer and a suitable virtual environment. In addition to sufficiently powerful hardware, WBE would require scanning technology with fine enough resolution to capture all the relevant cognitive function, as well as a sophisticated understanding of neuroscience to correctly draw the right abstractions. Even though our available estimates are crude, it is possible that all these conditions will be fulfilled well before the end of this century (Sandberg, 2014).
The perhaps most intriguing aspect of WBE technology is that once the first emulation exists and can complete tasks on a computer like a human researcher can, it would then be very easy to make more such emulations by copying the original. Moreover, with enough hardware, it would also become possible to run emulations at higher speeds, or to reset them back to a well-rested state after they performed exhausting work (Hanson, 2016). Sped-up WBE workers could be given the task of improving computer hardware (or AI technology itself), which would trigger a wave of steeply exponential progress in the development of superintelligence. To get a sense of the potential of this technology, imagine WBEs of the smartest and most productive AI scientists, copied a hundred times to tackle AI research as a well-coordinated research team, sped up so they can do years of research in mere weeks or even days, and reset periodically to skip sleep (or other distracting activities) in cases where memory-formation is not needed. The scenario just described requires no further technologies beyond WBE and sufficiently powerful hardware. If the gap from current AI algorithms to smarter-than-human AI is too hard to bridge directly, it may eventually be bridged (potentially very quickly) after WBE technology drastically accelerates further AI research.
The potential for WBE to come before de novo AI means that – even if the gap between current AI designs and the human brain is larger than we thought – we should not significantly discount the probability of human-level AI being created eventually. And perhaps paradoxically, we should expect such a late transition to happen abruptly. Barring no upcoming societal collapse, believing that superintelligence is highly unlikely to ever happen requires not only confidence that software or “architectural” improvements to AI are insufficient to ever bridge the gap, but also that – in spite of continued hardware progress – WBE could not get off the ground either. We do not seem to have sufficient reason for great confidence in either of these propositions, let alone both.
III. Humans are not at peak intelligence
It is difficult to intuitively comprehend the idea that machines – or any physical system for that matter – could become substantially more intelligent than the most intelligent humans. Because the intelligence gap between humans and other animals appears very large to us, we may be tempted to think of intelligence as an “on-or-off concept,” one that humans have and other animals do not. People may believe that computers can be better than humans at certain tasks, but only at tasks that do not require “real” intelligence. This view would suggest that if machines ever became “intelligent” across the board, their capabilities would have to be no greater than those of an intelligent human relying on the aid of (computer-)tools.
But this view is mistaken. There is no threshold for “absolute intelligence.” Nonhuman animals such as primates or rodents differ in cognitive abilities a great deal, not just because of domain-specific adaptations, but also due to a correlational “g factor” responsible for a large part of the variation across several cognitive domains (Burkart et al., 2016). In this context, the distinction between domain-specific and general intelligence is fuzzy: In many ways, human cognition is still fairly domain-specific. Our cognitive modules were optimized specifically for reproductive success in the simpler, more predictable environment of our ancestors. We may be great at interpreting which politician has the more confident or authoritative body language, but deficient in evaluating whose policy positions will lead to better developments according to metrics we care about. Our intelligence is good enough or “general enough” that we manage to accomplish impressive feats even in an environment quite unlike the one our ancestors evolved in, but there are many areas where our cognition is slower or more prone to bias than it could be.
Intelligence is best thought of in terms of a gradient. Imagine a hypothetical “intelligence scale” (inspired by this FAQ) with rats at 100, chimpanzees at, say, 350, the village idiot at 400, average humans at 500 and Einstein at 750.2 Of course, this scale is open at the top and could go much higher. To quote Bostrom (2014, p. 44):
"Far from being the smartest possible biological species, we are probably better thought of as the stupidest possible biological species capable of starting a technological civilization – a niche we filled because we got there first, not because we are in any sense optimally adapted to it."
Thinking about intelligence as a gradient rather than an “on-or-off” concept prompts a Copernican shift of perspective. Suddenly it becomes obvious that humans cannot be at the peak of possible intelligence. On the contrary, we should expect AI to be able to surpass us in intelligence just like we surpass chimpanzees.
Biological evolution supports the view that AI could reach levels of intelligence vastly beyond ours. Evolutionary history arguably exhibits a weak trend of lineages becoming more intelligent over time, but evolution did not optimize for intelligence (only for goal-directed behavior in specific niches or environment types). Intelligence is metabolically costly, and without strong selection pressures for cognitive abilities specifically, natural selection will favor other traits. The development of new traits always entails tradeoffs or physical limitations: If our ancestors had evolved to have larger heads at birth, maternal childbirth mortality would likely have become too high to outweigh the gains of increased intelligence (Wittman & Wall, 2007). Because evolutionary change happens step-by-step as random mutations change the pre-existing architecture, the changes are path dependent and can only result in local optima, not global ones. It would be a remarkable coincidence if evolution had just so happened to stumble upon the most efficient way to assemble matter into an intelligent system.
But let us imagine that we could go back to the “drawing board” and optimize for a system’s intelligence without any developmental limitations. This process would provide the following benefits for AI over the human brain (Bostrom, 2014, p. 60-61):
- Free choice of substrate: Signal transmission with computer hardware is millions of times faster than in biological brains. AI is not restricted to organic brains, and can be built on the substrate that is overall best suited for the design of intelligent systems.
- “Supersizing:” Machines have (almost) no size-restrictions. While humans with elephant-sized brains would run into developmental impossibilities, (super)computers already reach the size of warehouses and could in theory be built even bigger.
- No cognitive biases: We should be able to construct AI in a way that uses more flexible heuristics, and always the best heuristics for a given context, to prevent the encoding or emergence of substantial biases. Imagine the benefits if humans did not suffer from confirmation bias, overconfidence, status quo bias, etc.!
- Modular superpowers: Humans are particularly good at tasks for which we have specialized modules. For instance, we excel at recognizing human faces because our brains have hard-wired structures that facilitate that facial recognition in particular. An artificial intelligence could have many more such specialized modules, including extremely useful ones like a module for programming.
- Editability and copying: Software on a computer can be copied and edited, which facilitates trying out different variations to see what works best (and then copying it hundreds of times). By contrast, the brain is a lot messier, which makes it harder to study or improve. We also lack correct introspective access to the way we make most of our decisions, which is an important advantage that (some) AI designs could have.
- Superior architecture: Starting anew, we should expect it to be possible to come up with radically more powerful designs than the patchwork architecture that natural selection used to construct the human brain. This difference could be enormously significant.
With regard to the last point, imagine we tried to optimize for something like speed or sight rather than intelligence. Even if humans had never built anything faster than the fastest animal, we should assume that technological progress – unless it is halted – would eventually surpass nature in these respects. After all, natural selection does not optimize directly for speed or sight (but rather for gene copying success), making it a slower optimization process than those driven by humans for this specific purpose. Modern rockets already fly at speeds of up to 36,373 mph, which beats the peregrine falcon’s 240 mph by a huge margin. Similarly, eagle vision may be powerful, but it cannot compete with the Hubble space telescope. (General) intelligence is harder to replicate technologically, but natural selection did not optimize for intelligence either, and there do not seem to be strong reasons to believe that intelligence as a trait should differ categorically from examples like speed or sight, i.e., there are as far as we know no hard physical limits that would put human intelligence at the peak of what is possible.3
Another way to develop an intuition for the idea that there is significant room for improvement above human intelligence is to study variation in humans. An often-discussed example in this context is the intellect of John von Neumann. Von Neumann was not some kind of an alien, nor did he have a brain twice as large as the human average. And yet, von Neumann’s accomplishments almost seem “superhuman.” The section in his Wikipedia entry that talks about him having “founded the field of Game theory as a mathematical discipline” – an accomplishment so substantial that for most other intellectual figures it would make up most of their Wikipedia page – is just one out of many of von Neumann’s major achievements.
There are already individual humans (with normal-sized brains) whose intelligence vastly exceeds that of the typical human. So just how much room there is above their intelligence? To visualize this, consider for instance what could be done with an AI architecture more powerful than the human brain running on a warehouse-sized supercomputer.
IV. The transition from human to superhuman intelligence could be rapid
Perhaps the people who think it is unlikely that superintelligent AI will ever be created are not objecting to it being possible in principle. Maybe they think it is simply too difficult to bridge the gap from human-level intelligence to something much greater. After all, evolution took a long time to produce a species as intelligent as humans, and for all we know, there could be planets with biological life where intelligent civilizations never evolved.4 But considering that there could come a point where AI algorithms start taking part in their own self-improvement, we should be more optimistic. AIs contributing to AI research will make it easier to bridge the gap, and could perhaps even lead to an acceleration of AI progress to the point that AI not only ends up smarter than us, but vastly smarter after only a short amount of time.
Several points in the list of AI advantages above – in particular the advantages derived from the editability of computer software or the possibility for modular superpowers to have crucial skills such as programming – suggest that AI architectures might both be easier to further improve than human brains, and that AIs themselves might at some point become better at actively developing their own improvements. If we ever build a machine with human-level intelligence, it should then be comparatively easy to speed it up or make tweaks to its algorithm and internal organization to make it more powerful. The updated version, which would at this point be slightly above human-level intelligence, could be given the task of further self-improvement, and so on until the process runs into physical limits or other bottlenecks.
Perhaps self-improvement does not have to require human-level general intelligence at all. There may be comparatively simple AI designs that are specialized for AI science and (initially) lack proficiency in other domains. The theoretical foundations for an AI design that can bootstrap itself to higher and higher intelligence already exist (Schmidhuber, 2006), and it remains an empirical question where exactly the threshold is after which AI designs would become capable of improving themselves further, and whether the slope of such an improvement process is steep enough to go on for multiple iterations.
For the above reasons, it cannot be ruled out that breakthroughs in AI could at some point lead to an intelligence explosion (Good, 1965; Chalmers, 2010), where recursive self-improvement leads to a rapid acceleration of AI progress. In such a scenario, AI could go from subhuman intelligence to vastly superhuman intelligence in a very short timespan, e.g. in (significantly) less than a year.
While the idea of AI advancing from human-level to vastly superhuman intelligence in less than a year may sound implausible, as it violates long-standing trends in the speed of human-driven development, it would not be the first time where changes to the underlying dynamics of an optimization process cause an unprecedented speed-up. Technology has been accelerating ever since innovations (such as agriculture or the printing press) began to feed into the rate at which further innovations could be generated.5 Compared to the rate of change we see in biological evolution, cultural evolution broke the sound barrier: It took biological evolution a few million years to improve on the intelligence of our ape-like ancestors to the point where they became early hominids. By contrast, technology needed little more than ten thousand years to progress from agriculture to space shuttles. Just as inventions like the printing press fed into – and significantly sped up – the process of technological evolution, rendering it qualitatively different from biological evolution, AIs improving their own algorithms could cause a tremendous speed-up in AI progress, rendering AI development through self-improvement qualitatively different from “normal” technological progress.
It should be noted, however, that while the arguments in favor of a possible intelligence explosion are intriguing, they nevertheless remain speculative. There are also some good reasons why some experts consider a slower takeoff of AI capabilities more likely. In a slower takeoff, it would take several years or even decades for AI to progress from human to superhuman intelligence. Unless we find decisive arguments for one scenario over the other, we should expect both rapid and comparably slow takeoff scenarios to remain plausible. It is worth noting that because “slow” in this context also includes transitions on the order of ten or twenty years, it would still be very fast practically speaking, when we consider how much time nations, global leaders or the general public would need to adequately prepare for these changes.
V. By default, superintelligent AI would be indifferent to our well-being
The typical mind fallacy refers to the belief that other minds operate the same way our own does. If an extrovert asks an introvert, “How can you possibly not enjoy this party; I talked to half a dozen people the past thirty minutes and they were all really interesting!” they are committing the typical mind fallacy.
When envisioning the goals of smarter-than-human artificial intelligence, we are in danger of committing this fallacy and projecting our own experience onto the way an AI would reason about its goals. We may be tempted to think that an AI, especially a superintelligent one, will reason its way through moral arguments6 and come to the conclusion that it should, for instance, refrain from harming sentient beings. This idea is misguided, because according to the intelligence definition we provided above – which helps us identify the processes likely to shape the future – making a system more intelligent does not change its goals/objectives; it only adds more optimization power for pursuing those objectives.
To give a silly example, imagine that an arms race between spam producers and companies selling spam filters leads to increasingly more sophisticated strategies on both sides, until the side selling spam filters has had it and engineers a superintelligent AI with the sole objective to minimize the number of spam emails in their inboxes. With its level of intelligence, the spam-blocking AI would have more strategies at its disposal than normal spam filters. For instance, it could try to appeal to human reason by voicing sophisticated, game-theoretic arguments against the negative-sum nature of sending out spam. But it would be smart enough to realize the futility of such a plan, as this naive strategy would backfire because some humans are trolls (among other reasons). So the spam-minimizing AI would quickly conclude that the safest way to reduce spam is not by being kind, but by gaining control over the whole planet and killing everything that could possibly try to trick its spam filter. The AI in this example may fully understand that humans would object to these actions on moral grounds, but human "moral grounds" are based on what humans care about – which is not the minimization of spam! And the AI – whose whole decision architecture only selects for actions that promote the terminal goal of minimizing spam – would therefore not be motivated to think through, let alone follow our arguments, even if it could "understand" them in the same way introverts understand why some people like large parties.
The typical mind fallacy tempts us to conclude that because moral arguments appeal to us,7 they would appeal to any generally intelligent system. This claim is after all already falsified empirically by the existence of high-functioning psychopaths. While it may be difficult for most people to imagine how it would feel to not be moved by the plight of anyone but oneself, this is nothing compared to the difficulties of imagining all the different ways that minds in general could be built. Eliezer Yudkowsky coined the term mind space to refer to the set of all possible minds – including animals (of existing species as well as extinct ones), aliens, and artificial intelligences, as well as completely hypothetical “mind-like” designs that no one would ever deliberately put together. The variance in all human individuals, throughout all of history, only represents a tiny blob in mind space. Some of the minds outside this blob would “think” in ways that are completely alien to us; most would lack empathy and other (human) emotions for that matter; and many of these minds may not even relevantly qualify as “conscious.”
Most of these minds would not be moved by moral arguments, because the decision to focus on moral arguments has to come from somewhere, and many of these minds would simply lack the parts that make moral appeals work in humans. Unless AIs are deliberately designed8 to share our values, their objectives will in all likelihood be orthogonal to ours (Armstrong, 2013).
VI. AIs will instrumentally value self-preservation and goal preservation
Even though AI designs may differ radically in terms of their top-level goals, we should expect most AI designs to converge on some of the same subgoals. These convergent subgoals (Omohundro, 2008; Bostrom, 2012) include intelligence amplification, self-preservation, goal preservation and the accumulation of resources. All of these are instrumentally very useful to the pursuit of almost any goal. If an AI is able to access the resources it needs to pursue these subgoals, and does not explicitly have concern for human preferences as (part of) its top-level goal, its pursuit of these subgoals is likely to lead to human extinction (and eventually space colonization; see below).
AI safety work refers to interdisciplinary efforts to ensure that the creation of smarter-than-human artificial intelligence will result in excellent outcomes rather than disastrous ones. Note that the worry is not that AI would turn evil, but that indifference to suffering and human preferences will be the default unless we put in a lot of work to ensure that AI is developed with the right values.
VI.I Intelligence amplification
Increasing an agent’s intelligence improves its ability to efficiently pursue its goals. All else equal, any agent has a strong incentive to amplify its intelligence. A real-life example of this convergent drive is the value of education: Learning important skills and (thinking-)habits early in life correlates with good outcomes. In the AI context, intelligence amplification as a convergent drive implies that AIs with the ability to improve their own intelligence will do so (all else equal). To self-improve, AIs would try to gain access to more hardware, make copies of themselves to increase their overall productivity, or devise improvements to their own cognitive algorithms.
More broadly, intelligence amplification also implies that an AI would try to develop all technologies that may be of use to its pursuits. I.J. Good, a mathematician and cryptologist who worked alongside Alan Turing, asserted that “the ﬁrst ultraintelligent machine is the last invention that man need ever make,” because once we build it, such a machine would be capable of developing all further technologies on its own.
VI.II Goal preservation
AIs would in all likelihood also have an interest in preserving their own goals. This is because they optimize actions in terms of their current goals, not in terms of goals they might end up having in the future. From the current goal’s perspective, a change in the AI’s goal function is potentially disastrous, as the current goal would not persevere. Therefore, AIs will try to prevent researchers from changing their goals. Consequently, there is pressure for AI researchers to get things right on the first try: If we develop a superintelligent AI with a goal that is not quite what we were after – because someone made a mistake, or was not precise enough, or did not think about particular ways the specified goal could backfire – the AI would pursue the goal that it was equipped with, not the goal that was intended. This applies even if it could understand perfectly well what the intentioned goal was. This feature of going with the actual goal instead of the intended one could lead to cases of perverse instantiation, such as the AI “paralyz[ing] human facial musculatures into constant beaming smiles” to pursue an objective of “make us smile” (Bostrom, 2014, p. 120).
Some people have downplayed worries about AI risks with the argument that when things begin to look dangerous, humans can literally “pull the plug” in order to shut down AIs that are behaving suspiciously. This argument is naive because it is based on the assumption that AIs would be too stupid to take precautions against this. Because the scenario we are discussing concerns smarter-than-human intelligence, an AI would understand the implications of losing its connection to electricity, and would therefore try to proactively prevent being shut down any means necessary – especially when shutdown might be permanent.
This is not to say that AIs would necessarily be directly concerned about their own “death” – after all, whether an AI’s goal includes its own survival or not depends on the specifics of its goal function. However, for most goals, staying around pursuing one's goal will lead to better expected goal achievement. AIs would therefore have strong incentives to prevent permanent shutdown even if their goal was not about their own “survival” at all. (AIs might, however, be content to outsource their goal achievement by making copies of themselves, in which case shutdown of the original AI would not be so terrible as long as one or several copies with the same goal remain active.)
The convergent drive for self-preservation has the unfortunate implication that superintelligent AI would almost inevitably see humans as a potential threat to its goal achievement. Even if its creators do not plan to shut the AI down for the time being, the superintelligence could reasonably conclude that the creators might decide to do so at some point. Similarly, a newly-created AI would have to expect some probability of interference from external actors such as the government, foreign governments or activist groups. It would even be concerned that humans in the long term are too stupid to keep their own civilization intact, which would also affect the infrastructure required to run the AI. For these reasons, any AI intelligent enough to grasp the strategic implications of its predicament would likely be on the lookout for ways to gain dominance over humanity. It would do this not out of malevolence, but simply as the best strategy for self-preservation.
This does not mean that AIs would at all times try to overpower their creators: If they realize that attempts at trickery are likely to be discovered and punished with shutdown, they may fake being cooperative, and may fake having the goals that the researchers intended, while privately plotting some form of takeover. Bostrom has referred to this scenario as a “treacherous turn” (Bostrom, 2014, p. 116).
We should not underestimate what a superintelligence with access to the internet could accomplish. And it could achieve such access for many reasons, e.g. because the researchers were careless or underestimated its intelligence, or because it successfully pretended to be less capable than it actually was. Or maybe it could try to convince the “weak links” in its team of supervisors to give it access in secret – promising bribes – even if most of the team of creators think it would be best to deny it such access until they have more certainty about the AI's true goals and capabilities. Importantly, even if the first superintelligence is prevented from accessing the internet (or other efficient channels of communication), its impact on the world would thereby remain limited, making it possible for other (potentially less careful) teams to catch up and build another superintelligent AI. The closer the competition, the more the teams are incentivized to give their AIs riskier access over resources in a gamble for the benefits this would have in case their AI turns out to be safe.
The following list contains some examples of strategies a superintelligent AI could use to gain power over more and more resources, with the goal of eventually reaching a position where humans cannot harm or obstruct it. Note that these strategies were thought of by humans, and are therefore bound to be less creative and less effective than the strategies an actual superintelligence would be able to devise.
- Backup plans: Superintelligent AI could program malware that inserts partial copies of itself into computers distributed around the globe (adapted from 3.1.2 here). This gives it further options to act even if its current copy gets destroyed or if its internet connection gets cut. Alternatively, it could send out copies of its source code, alongside detailed engineering instructions, to foreign governments, ideally ones who have little to lose and a lot to gain, with the promise of helping them attain world domination if they build a second version of the AI and give it access to all their resources.
- Making money: Superintelligent AI could easily make fortunes with online poker, stock markets, scamming people, hacking bank accounts, etc.9
- Influencing opinions: Superintelligent AI could fake convincing email exchanges with influential politicians or societal elites, pushing an agenda that serves its objectives of gaining power and influence. Similarly, it could orchestrate large numbers of elaborate sockpuppet accounts on social media or other fora to influence public opinion in favorable directions.
- Hacking and extortion: Superintelligent AI could hack into sensitive documents, nuclear launch codes or other compromising assets in order to blackmail world leaders into giving it access over more resources. Or it could take over resources directly if hacking allows for it.
- (Bio-)engineering projects: Superintelligent AI could pose as the head researcher of a biology lab and send lab assistants instructions to produce viral particles with specific RNA sequences, which then, unbeknownst to the people working on the project, turn out to release a deadly virus that kills most of humanity.10
Through some means or another – and let’s not forget that the AI could well attempt many strategies at once to safeguard against possible failure in some of its pursuits – the AI may eventually gain a decisive strategic advantage over all competition (Bostrom, 2014, p. 78-90). Once this is the case, it would carefully build up further infrastructure on its own. This stage will presumably be easier to reach as the world economy becomes more and more automated.
Once humans are no longer a threat, the AI would focus its attention on natural threats to its existence. It would notice that the sun will expand in about seven billion years to the point where existence on earth will become impossible. For the reason of self preservation alone, a superintelligent AI would thus eventually be incentivized to expand its influence beyond Earth.
VI.IV Resource accumulation
For the fulfillment of most goals, accumulating as many resources as possible is an important early step. Resource accumulation is also intertwined with the other subgoals in that it tends to facilitate them.
The resources available on Earth are only a tiny fraction of the total resources that an AI could access in the entire universe. Resource accumulation as a convergent subgoal implies that most AIs would eventually colonize space (provided that it is not prohibitively costly), in order to gain access to the maximum amount of resources. These resources would then be put to use for the pursuit of its other subgoals and, ultimately, for optimizing its top-level goal.
Superintelligent AI might colonize space in order to build (more of) the following:
- Supercomputers: As part of its intelligence enhancement, an AI could build planet-sized supercomputers (Sandberg, 1999) to figure out the mysteries of the cosmos. Almost no matter the precise goal, having an accurate and complete understanding of the universe is crucial for optimal goal achievement.
- Infrastructure: In order to accomplish anything, an AI needs infrastructure (factories, control centers, etc.) and “helper robots” of some sort. This would be similar (but much larger in scale) to how the Manhattan Project had its own “project sites” and employed tens of thousands of people. While some people worry that an AI would enslave humans, these helpers would more plausibly be other AIs specifically designed for the tasks at hand.
- Defenses: An AI could build shields to protect itself or other sensitive structures from cosmic rays. Perhaps it would build weapon systems to deal with potential threats.
- Goal optimization: Eventually, an AI would convert most of its resources into machinery that directly achieves its objectives. If the goal is to produce paperclips, the AI will eventually tile the accessible universe with paperclips. If the goal is to compute pi to as many decimal places as possible, the AI will eventually tile the accessible universe with computers to compute pi. Even if an AI’s goal appears to be limited to something “local” or “confined,” such as e.g. “protect the White House,” the AI would want to make success as likely as possible and thus continue to accumulate resources to better achieve that goal.
To elaborate on the last point just above: Humans tend to be satisficers with respect to most things in life. We have minimum requirements for the quality of the food we want to eat, the relationships we want to have, or the job we want to work in. Once these demands are met and we find options that are “pretty good,” we often end up satisfied and settle down on the routine. Few of us spend decades of our lives pushing ourselves to invest as many waking hours as sustainably possible into systematically finding the optimal food in existence, the optimal romantic partner, or anything really.
AI systems on the other hand, in virtue of how they are usually built, are more likely to act as maximizers. A chess computer is not trying to look for “pretty good moves” – it is trying to look for the best move it can find with the limited time and computing power it has at its disposal. The pressure to build ever more powerful AIs is a pressure to build ever more powerful maximizers. Unless we deliberately program AIs in a way that reduces their impact, the AIs we build will be maximizers that never “settle” or consider their goals “achieved.” If their goal appears to be achieved, a maximizer AI will spend its remaining time double- and triple-checking whether it made a mistake. When it is only 99.99% certain that the goal is achieved, it will restlessly try to increase the probability further – even if this means using the computing power of a whole galaxy to drive the probability it assigns to its goal being achieved from 99.99% to 99.991%.
Because of the nature of maximizing as a decision-strategy, a superintelligent AI is likely to colonize space in pursuit of its goals unless we program it in a way to deliberately reduce its impact. This is the case even if its goals appear as “unambitious” as e.g. “minimize spam in inboxes.”
VII. Artificial sentience and risks of astronomical suffering
Space colonization by artificial superintelligence would increase the complexity and the number of computations in the world by an astronomically large factor.11 If the superintelligence holds objectives that are aligned with our values, then the outcome could be a utopia. However, if the AI has randomly, mistakenly, or sufficiently suboptimally implemented values, the best we could hope for is if all the machinery it used to colonize space was inanimate, i.e. not sentient. Such an outcome – even though all humans would die – would still be much better than other plausible outcomes, because it would at least not contain any suffering. Unfortunately, we cannot rule out that the space colonization machinery orchestrated by a superintelligent AI would also contain sentient minds, including minds that suffer. The same way factory farming led to a massive increase in farmed animal populations, multiplying the direct suffering humans cause to animals by a large factor, an AI colonizing space could cause a massive increase in the total number of sentient entities, potentially creating vast amounts of suffering. The following are some ways AI outcomes could result in astronomical amounts of suffering:
- Suffering in AI workers: Sentience appears to be linked to intelligence and learning, both of which would be needed (e.g. in robot workers) for the coordination and execution of space colonization. An AI could therefore create and use sentient entities to help it pursue its goals. And if the AI's creators did not take adequate safety measures or program in compassionate values, it may not care about those entities' suffering in their assistance.
- Optimization for sentience: Some people want to colonize space in order for there to be more life or (happy) sentient minds. If the AI in question has values that reflect this goal, either because human researchers managed to get value loading right (or “half-right”), or because the AI itself is sentient and values creating copies of itself, the result could be astronomical numbers of sentient minds. If the AI does not accurately assess how happy or unhappy these beings are, or if it only cares about their existence but not their experiences, or simply if something goes wrong in even a small portion of these minds, the total suffering that results could be very high.
- Ancestor simulations: Turning history and (evolutionary) biology into an empirical science, AIs could run many “experiments” with simulations of evolution on planets with different starting conditions. This would e.g. give the AIs a better sense of the likelihood of intelligent aliens existing, as well as a better grasp on the likely distribution of their values and whether they would end up building AIs of their own. Unfortunately, such ancestor simulations could recreate millions of years of human or wild-animal suffering many times in parallel.
- Warfare: Perhaps space-faring civilizations would eventually clash, with at least one of the two civilizations containing many sentient minds. Such a conflict would have vast frontiers of contact and could result in a lot of suffering.
More ways AI scenarios could contain astronomical amounts of suffering are described here and here. Sources of future suffering are likely to follow a power law distribution, where most of the expected suffering comes from a few rare scenarios where things go very wrong – analogous to how most casualties are the result of very few, very large wars; how most of the casualty-risks from terrorist attacks fall into tail scenarios where terrorists would get their hands on weapons of mass destruction; or how most victims of epidemics succumbed to the few very worst outbreaks (Newman, 2005). It is therefore crucial to not only to factor in which scenarios are most likely to occur, but also how bad scenarios would be should they occur.
Critics may object because the above scenarios are largely based on the possibility of artificial sentience, particularly sentience implemented on a computer substrate. If this turns out to be impossible, there may not be much suffering in futures with AI after all. However, computer-based minds also being able to suffer in the morally relevant sense is a common implication in philosophy of mind. Functionalism and type A physicalism (“eliminativism”) both imply that there can be morally relevant minds on digital substrates. Even if one were skeptical of these two positions and instead favored the views of philosophers like David Chalmers or Galen Strawson (e.g. Strawson, 2006), who believe consciousness is an irreducible phenomenon, there are at least some circumstances under which these views would also allow for computer-based minds to be sentient.12 Crude “carbon chauvinism,” or a belief that consciousness is only linked to carbon atoms, is an extreme minority position in philosophy of mind.
The case for artificial sentience is not just abstract but can also be made on the intuitive level: Imagine we had whole brain emulation with a perfect mapping from inputs to outputs, behaving exactly like a person's actual brain. Suppose we also give this brain emulation a robot body, with a face and facial expressions created with particular attention to detail. The robot will, by the stipulations of this thought experiment, behave exactly like a human person would behave in the same situation. So the robot-person would very convincingly plead that it has consciousness and moral relevance. How certain would we be that this was all just an elaborate facade? Why should it be?
Because we are unfamiliar with artificial minds and have a hard time experiencing empathy for things that do not appear or behave in animal-like ways, we may be tempted to dismiss the possibility of artificial sentience or deny artificial minds moral relevance – the same way animal sentience was dismissed for thousands of years. However, the theoretical reasons to anticipate artificial sentience are strong, and it would be discriminatory to deny moral consideration to a mind simply because it is implemented on a substrate different from ours. As long as we are not very confident indeed that minds on a computer substrate would be incapable of suffering in the morally relevant sense, we should believe that most of the future’s expected suffering is located in futures where superintelligent AI colonizes space.
VIII. Impact analysis
The world currently contains a great deal of suffering. Large sources of suffering include for instance poverty in developing countries, mental health issues all over the world, and non-human animal suffering in factory farms and in the wild. We already have a good overview – with better understanding in some areas than others – of where altruists can cost-effectively reduce substantial suffering. Charitable interventions are commonly chosen according to whether they produce measurable impact in the years or decades to come. Unfortunately, altruistic interventions are rarely chosen with the whole future in mind, i.e. with a focus on reducing as much suffering as possible for the rest of time, until the heat death of the universe.13 This is potentially problematic, because we should expect the far future to contain vastly more suffering than the next decades, not only because there might be sentient beings around for millions or billions of years to come, but also because it is possible for Earth-originating life to eventually colonize space, which could multiply the total amount of sentient beings many times over. While it is important to reduce the suffering of sentient beings now, it seems unlikely that the most consequential intervention for the future of all sentience will also be the intervention that is best for reducing short-term suffering. Instead, as judged from the distant future, the most consequential development of our decade would more likely have something to do with novel technologies or the ways they will be used.
And yet, politics, science, economics and especially the media are biased towards short timescales. Politicians worry about elections, scientists worry about grant money, and private corporations need to work on things that produce a profit in the foreseeable future. We should therefore expect interventions targeted at the far future to be much more neglected than interventions targeted at short-term sources of suffering.
Admittedly, the far future is difficult to predict. If our models fail to account for all the right factors, our predictions may turn out very wrong. However, rather than trying to simulate in detail through everything that might happen all the way into the distant future – which would be a futile endeavor, needless to say – we should focus our altruistic efforts on influencing levers that remain agile and reactive to future developments. An example of such a lever is institutions that persist for decades or centuries. The US Constitution for instance still carries significant relevance in today’s world, even though it was formulated long ago. Similarly, the people who founded the League of Nations after World War I did not succeed in preventing the next war, but they contributed to the founding and the charter of its successor organization, the United Nations, which still exerts geopolitical influence today. The actors who initially influenced the formation of these institutions as well as their values and principles, had a long-lasting impact.
In order to positively influence the future for hundreds of years, we fortunately do not need to predict the next hundreds of years in detail. Instead, all we need to predict is what type of institutions – or, more generally, stable and powerful decision-making agencies – are most likely to react to future developments maximally well.14
AI is the ultimate lever through which to influence the future. The goals of an artificial superintelligence would be much more stable than the values of human leaders or those enshrined in any constitution or charter. And a superintelligent AI would, with at least considerable likelihood, remain in control of the future not only for centuries, but for millions or even billions of years to come. In non-AI scenarios on the other hand, all the good things we achieve in the coming decade(s) will “dilute” over time, as current societies, with all their norms and institutions, change or collapse.
In a future where smarter-than-human artificial intelligence won’t be created, our altruistic impact – even if we manage to achieve a lot in greatly influencing this non-AI future – would be comparatively “capped” and insignificant when contrasted with the scenarios where our actions do affect the development of superintelligent AI (or how AI would act).15 We should expect AI scenarios to not only contain the most stable lever we can imagine – the AI’s goal function which the AI will want to preserve carefully – but also the highest stakes. In comparison with non-AI scenarios, space colonization by superintelligent AI would turn the largest amount of matter and energy into complex computations. In a best-case scenario, all these resources could be turned into a vast utopia full of happiness, which provides as strong incentive for us to get AI creation perfectly right. However, if the AI is equipped with insufficiently good values, or if it optimizes for random goals not intended by its creators, the outcome could also include astronomical amounts of suffering. In combination, these two reasons of highest influence/goal-stability and highest stakes build a strong case in favor of focusing our attention on AI scenarios.
While critics may object that all this emphasis on the astronomical stakes in AI scenarios appears unfairly Pascalian, it should be noted that AI is not a frivolous thought experiment where we invoke new kinds of physics to raise the stakes. Smarter-than-human artificial intelligence and space colonization are both realistically possible and plausible developments that fit squarely into the laws of nature as we currently understand them. If either of them turn out to be impossible, that would be a big surprise, and would suggest that we are fundamentally misunderstanding something about the way physical reality works. While the implications of smarter-than-human artificial intelligence are hard to grasp intuitively, the underlying reasons for singling out AI as a scenario to worry about are sound. As illustrated by Leó Szilárd’s lobbying for precautions around nuclear bombs well before the first such bombs were built, it is far from hopeless to prepare for disruptive new technologies in advance, before they are completed.
Finally, it should be noted that “focusing our attention on AI” can mean many things. It does not necessarily mean that all altruists (including suffering-focused ones) should go into machine learning to directly come up with solutions for designing AIs safely. Rather, it means that we should pick interventions to support according to their long-term consequences, and particularly according to the ways in which our efforts could make a difference to futures containing superintelligent AI. Whether it is best to try to affect AI outcomes in a narrow and targeted way, or whether we should go for a broader strategy, depends on several factors and requires further study.
FRI has looked systematically into paths to impact for affecting AI outcomes with particular emphasis on preventing suffering, and we have come up with a few promising candidates. The following list presents some tentative proposals:
- Raising awareness of:
It is important to note that human values may not affect the goals of an AI at all if researchers fail to solve the value-loading problem. Raising awareness of certain values may therefore be particularly impactful if it concerns groups likely to be in control of the goals of smarter-than-human artificial intelligence.
Further research is needed to flesh out these paths to impact in more detail, and to discover even more promising ways to affect AI outcomes. As there is always the possibility that we have overlooked something or are misguided or misinformed, we should remain open-minded and periodically rethink the assumptions our current prioritization is based on.
This text borrows ideas and framings from other people’s introductions to AI. I tried to flag this with links or citations wherever I remembered the source and where the writing was not convergent, but I may not have remembered everything. I'm particularly indebted to the writings of Eliezer Yudkowsky, Nick Bostrom and Scott Alexander. Many thanks also to David Althaus, Tobias Baumann, Ruairi Donnelly, Caspar Oesterheld and Kelly Witwicki for helpful comments and editing.
X. Further reading
- Artificial Intelligence and Its Implications for Future Suffering
- Superintelligence by Nick Bostrom (2014)
- Our Mission
- The Case for Suffering-Focused Ethics
- Reducing Risks of Astronomical Suffering: A Neglected Priority
- Altruists Should Prioritize Artificial Intelligence
Armstrong, S. (2013). General Purpose Intelligence: Arguing the Orthogonality Thesis. Future of Humanity Institute, Oxford University.
Bostrom, N. (2003). Astronomical Waste: The Opportunity Cost of Delayed Technological Development. Utilitas, 15(3), 308-314.
Bostrom, N. (2005). What is a Singleton? Linguistic and Philosophical Investigations, 5(2), 48-54.
Bostrom, N. (2012). The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents. Minds and Machines, 22(2), 71-85.
Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press.
Burkart, J. M., Schubiger, M. N., & Schaik, C. P. (2016). The evolution of general intelligence. Behavioral and Brain Sciences, 1-65.
Chalmers, D. (2010). The Singularity: A Philosophical Analysis. Journal of Consciousness Studies, 17: 7-65.
Dawkins, R. (1996). The blind watchmaker: Why the evidence of evolution reveals a universe without design. New York: Norton.
Good, I. J. (1965). Speculations concerning the first ultraintelligent machine. Blacksburg, VA: Dept. of Statistics, Virginia Polytechnic Institute and State University.
Hanson, R. (2016). The Age of Em: Work, love, and life when robots rule the Earth. Oxford: Oxford University Press.
Legg, S., & Hutter, M. (2005). Universal Intelligence: A Definition of Machine Intelligence. Minds and Machines, 17(4), 391-444.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., . . . Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
Müller, V. C., & Bostrom, N. (2016). Future Progress in Artificial Intelligence: A Survey of Expert Opinion. Fundamental Issues of Artificial Intelligence, 553-570.
Newman, M. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46(5), 323–351.
Omohundro, S. (2008). The Basic AI Drives. Proceedings of the 2008 conference on Artificial General Intelligence 2008: 483-492. IOS Press Amsterdam.
Sandberg, A. (1999). The Physics of Information Processing Superobjects: Daily Life Among the Jupiter Brains.
Sandberg, A. (2014). Monte Carlo model of brain emulation development, Working Paper 2014-1 (version 1.2). Future of Humanity Institute, Oxford University.
Sandberg, A. & Bostrom, N. (2008). Whole Brain Emulation: A Roadmap, Technical Report #2008‐3, Future of Humanity Institute, Oxford University.
Schmidhuber, J. (2006). Godel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements. arXiv:cs.LO/0309048 v5.
Shulman, C. & Bostrom, N. (2012). How Hard is Artificial Intelligence? Evolutionary Arguments and Selection Effects. Journal of Consciousness Studies, 19(7-8), 103-130.
Shulman, C. & Sandberg, A. (2010): Implications of a Software-Limited Singularity. In ECAP10: VIII European Conference on Computing and Philosophy.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G. V., . . . Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489.
Strawson, G. (2006). Realistic Monism - Why Physicalism Entails Panpsychism. Journal of Consciousness Studies, 13(10-11), 3–31.
Wittman, A. B., & Wall, L. L. (2007). The Evolutionary Origins of Obstructed Labor: Bipedalism, Encephalization, and the Human Obstetric Dilemma. Obstetrical & Gynecological Survey, 62(11), 739-748.
- The Open Philanthropy Project lists and discusses more timeline surveys here. (back)
- There is also an argument for putting the scale roughly as follows: Rats at 100, chimpanzees at 200, the village idiot at 240, the average human at 245, and Einstein at 253. This way, the distance from chimpanzee to village idiot is larger than the distance from village idiot to Einstein – whereas the scale used in the main text above has it the other way around. The scale here places more weight on differences in innate cognitive algorithms (which for all we know should not differ too much among different humans), whereas the scale in the main text above puts more weight on the difference this makes when coupled with a lifetime of goal-directed learning and becoming proficient in the use of (computer-)tools. (back)
- Gwern Branwen counters the objection that problems become exponentially harder to solve as complexity increases, which would make gains in intelligence beyond a certain level less relevant. Similarly, Kaj Sotala analyzed how much more capable AI could become relative to humans, and concluded that “it seems like an AI could still substantially improve on human intelligence, possibly even mastering domains which are currently too hard for humans. In practice, the limits of prediction do not seem to pose much of a meaningful upper bound on an AI’s capabilities, nor do we have any nontrivial lower bounds on how much time it might take to achieve a superhuman level of capability.” (back)
- For an excellent discussion on what evolutionary history tells us about the feasibility of AI, see Shulman & Bostrom (2012). (back)
- One way this happens is through inventions increasing the total population (e.g. the discovery of agriculture or the industrial revolution), which automatically also increases the total number of potential innovators. Another way is for innovations to improve the storage ability and accessibility of information (writing; the printing press; the internet). A third examples is how innovations in nutrition, medicine or education increased intelligence generationally, which again improved scientific productivity and creativity. Finally, over the past decades, many tasks, including many areas of research and development, have already been improved through outsourcing them to machines – a process that it is still ongoing and accelerating. (back)
- It is interesting to ponder why humans are interested in philosophy in the first place. Perhaps the reason is that, unlike AI systems designed for a given task, we usually do not walk through life with a fixed and well-specified goal. Instead, most of us have a somewhat loose set of values or convictions, the content of which can change over time. Perhaps this reflects that humans are careless when it comes to goal preservation, or simply that different subsystems of the brain have different goals, producing aggregate behavior that is not as “goal-tracking” or “rational” as it would be if a single party was in control. Or perhaps humans are not really optimizing their stated goals and convictions at all at the highest level of decision-making, and instead go through life with the top-level goal of being seen as a good person, or maybe seeing themselves as a good person. The content of this self image would then change according to beliefs about what constitutes “being a good person” – which may in part influenced by moral philosophy, but also by things like societal norms or psychological dispositions. Needless to say, AIs may lack some or even all of the above motivations and could, depending on how they are built, function completely differently. (back)
- Part of the issue is that terms like “morality” or “moral arguments” are often underspecified or used differently by different people. In the context here, the term “moral argument” refers to appeals made with the goal of motivating/persuading agents (oneself or others) to pursue different terminal (i.e. non-instrumental) objectives than one would otherwise pursue. It is worth noting that this definition of “moral argument” does not include decision theoretical arguments about whether or not to engage in power-weighted cooperation with other agents, or whether to set “ethical injunctions” under appropriate circumstances. (back)
- Some people object that trying to encode ethical values into an AI is a form of slavery, enforcing our values on the AI. This view is confused, because it presupposes that the AI can have goals that are somehow independent of the process that created it. But the way we build an AI inevitably determines its objectives – there is no way around it. (back)
- Making money through most of these activities require providing some kind of proof of identity. But this appears like a trivial hurdle for a superintelligent AI, as it could come up with several strategies to fake or circumvent this. Some examples are: Convincingly faking screenshots of IDs; persuading actual people to provide it with their own virtual identity in exchange for (the promise of) profits; use of illegal or unregulated ways of storing money; etc. (back)
- This scenario is adapted from here, where it is fleshed out in more detail. (back)
- Bostrom (2003) estimates that 10^29 humans could be simulated per second with the resources in our galactic supercluster alone. Of course, if an AIs goal is optimizing for something other than people-experiences, the number of sentient minds will be far lower. However, this gigantic number illustrates just how vast “galactic stakes” are in comparison to “Earthly stakes.” (back)
- Whether realist views on consciousness leave room for artificial sentience depends on the solutions to the binding problem (which is not a problem at all for type-A physicalism) and the psycho-physical bridging laws they invoke: If for instance binding in carbon-based minds is achieved through a quantum mechanism, then at least quantum computers might qualify as sentient under the right conditions. And if evolution went through all the trouble to implement the machinery to uphold quantum coherence in the brains of living organisms, surely it must have been advantageous for something, some physically relevant objective, which would suggest that conditional on Strawson’s view being right, AIs would be more likely to rely on quantum computers than we would otherwise assume – in order to e.g. also make use of this postulated benefit from quantum coherence that natural selection seemingly discovered. (back)
- And possibly beyond our universe, insofar as our decisional-algorithms are correlated with the actions of copies of us in other parts of the multiverse. (back)
- Nick Bostrom (2005) coined the term singleton for decision-making agencies that are so powerful that they essentially do not have any competitors anymore. A strong case can be made that focusing on such singleton outcomes, has the highest impact. (back)
- Even if some non-AI futures also involve space colonization, the futures where space colonization is orchestrated by superintelligent AI would reach much higher stakes. After all, AI would colonize space more rapidly and more efficiently than humans (or posthumans) with less developed technology would. Accordingly, a wave of space colonization initiated by AI would encompass a much larger amount of total computations. (back)