Cross-posted from my website on cause prioritization research. Articles on the FRI blog reflect the opinions of individual researchers and not necessarily of FRI as a whole, nor have they necessarily been vetted by other team members. For more background, see our post on launching the FRI blog.
Efforts to mitigate the risks of advanced artificial intelligence may be a top priority for effective altruists. If this is true, what are the best means to shape AI? Should we write math-heavy papers on open technical questions, or opt for broader, non-technical interventions like values spreading?
The answer to these questions hinges on how we expect AI to unfold. That is, what do we expect advanced AI to look like, and how will it be developed?
Many of these issues have been discussed at length, but the implications for the action-guiding question of how to best work on the problem often remain unclear. This post aims to fill the gap with a rigorous analysis of how different views on AI scenarios relate to the possible ways to shape advanced AI.
We can slice the space of possible scenarios in infinitely many ways, some of which are more useful for our thinking than others. Commonly discussed questions about AI scenarios include:
- When will humanity build general artificial intelligence, assuming that it happens at all?1
- Will the takeoff be hard or soft? That is, how long will it take to get from a human-level AI to a superintelligence?
- To what extent will the goals of an AI be aligned with human values?
- What architecture will be used to build advanced AI? For instance, will it use an explicit utility function or a reward module? Will it be based on “clean” mathematical principles or on a “messy” collection of heuristics?
- Will advanced AI act as a single agent, as MIRI’s models tend to assume, or will superintelligence reside in a distributed system like the economy?2
The reason why we ask these questions is that the answers determine how we should work on the problem. We can choose from a plethora of possible approaches:
- We might work on technical aspects of the AI alignment problem.
- We could do other kinds of technical research, such as finding scalable solutions to short-term problems or specifically trying to prevent the worst possible outcomes.
- We could focus on philosophical and conceptual work to raise awareness of AI-related issues.
- We could work on AI policy or AI strategy.
- Instead of shaping AI directly, we might opt for broader, more indirect interventions such as improving international cooperation and spreading altruistic values.
Which factors determine the value of technical AI safety work?
To avoid the complexity of considering many strategic questions at the same time, I will focus on whether we should work on AI in a technical or non-technical way, which I believe to be the most action-guiding dimension.
The control problem
The value of technical work depends on whether it is possible to find well-posed and tractable technical problems whose solution is essential for a positive AI outcome. The most common candidate for this role is the control problem (and subproblems thereof), or how to make superintelligent AI systems act in accordance with human values. The viability of technical work therefore depends to some extent on whether it makes sense to think about AI in this way – that is, whether the control problem is of central importance.3
This, in turn, depends on our outlook on AI scenarios. For instance, we might think that the technical side of AI safety may be less difficult than it seems, that they will likely be solved anyway, or that the most serious risks may instead be related to security aspects, coordination problems, and selfish values.
The following views support work on the control problem:
- Uncontrolled AI is the “default outcome” or at least somewhat likely, which makes technical work on the problem a powerful lever for influencing the far future. Also, uncontrolled AI would mean that human values will matter less in the future, which renders many other interventions – such as values spreading – futile.
- A hard takeoff or intelligence explosion is likely. This matters because it correlates with the likelihood of uncontrolled AI. Also, it might mean that humans will quickly be “out of the loop”, making it more difficult to shape AI with non-technical means.
- We cannot rule out very short timelines. In this case, the takeoff will be unexpected and research on AI safety will be more neglected, which means that we can have a larger impact.
In contrast, if AI is like the economy, then the control problem does not apply in its usual form – there is no unified agent to control. Influencing the technical development of AI would be harder because of its gradual nature, just as it was arguably difficult to influence industrialization in the past.
It is often argued that an agent-like superintelligence would ultimately emerge even if AI takes a different form at first. I think this is likely, but not certain. But even so, the strategic picture is radically different if economy-like AI comes first. This is because we can mainly hope to (directly) shape the first kind of advanced AI since it is hard to predict, and hard to influence, what happens afterward.
In other words, the first transition may constitute an “event horizon” and therefore be most relevant to strategic considerations. For example, if agent-like AI is built second, then the first kind of advanced AI systems will be the driving force. They will be intellectually superior to us by many orders of magnitude, which makes it all but impossible to (directly) influence the agent-like AI via technical work.
How much safety work will be done anyway?
This brings us to another intermediate variable, namely how much technical safety work will be done by others anyway. If the timeline to AI is long, if the takeoff is soft, or if AI is like the economy, then large amounts of money and skilled time may be dedicated to AI safety, comparable to contemporary mainstream discussion of climate change.
As AI is applied to more and more industrial contexts, large-scale failures of AI systems will likely become dangerous or costly, so we can expect that the AI industry will be forced to make them safe, either because their customers demand it or because of regulation. We may also experience an AI Sputnik moment that leads to more investment in safety research.
Since the resources of effective altruists are small in comparison to large companies and governments, this scenario reduces the value of technical AI safety work. Non-technical approaches such as spreading altruistic values among AI researchers or work on AI policy might be more promising in these cases. However, the argument does not apply if we are interested in specific questions that would otherwise remain neglected, or if we think that safety techniques will not work anymore once AI systems reach a certain threshold of capability. (It’s unclear to what extent this is the case.)
This shows that how we work on AI depends not only on our predictions of future scenarios, but also on our goals. Personally, I’m mostly interested in suffering-focused AI safety, that is, how to prevent s-risks of advanced AI. This may lead to slightly different strategic conclusions compared to AI safety efforts that focus on loading human values. For instance, it means that fewer people will work on the issues that matter most to me.
A related question is whether strong intelligence enhancement, such as emulations or iterated embryo selection, will become feasible (and is employed) before strong AI is built. In that case, the enhanced minds will likely work on AI safety, too, which might mean that future generations can tackle the problem more effectively (given sufficiently long timelines). In fact, this may be true even without intelligence enhancement because we are nearsighted with respect to time, that is, it is harder to predict and influence events that are further in the future.
It’s not clear whether strong intelligence enhancement technology will be available before advanced AI. But we can view modern tools such as blogs and online forums as a weak form of intelligence enhancement in that they facilitate the exchange of ideas; extrapolating this trend, future generations may be even more “intelligent” in a sense. Of course, if we think that AI may be built unexpectedly soon, then the argument is less relevant.
Uncertainty about AI scenarios
Technical work requires a sufficiently good model of what AI will look like, or else we cannot identify viable technical measures. The more uncertain we are about all the different parameters of how AI will unfold, the harder it is to influence its technical development. That said, radical uncertainty also affects other approaches to shape AI, potentially making it a general argument against focusing on AI. Still, the argument applies to a larger extent to technical work than to non-technical work.
In a nutshell, AI scenarios inform our strategy via three intermediate variables:
- Is the control problem of central importance?
- How much (quality-adjusted) technical work will others do anyway?
- How certain can we be about how AI will develop?
Technical work seems more promising if we think the control problem is pivotal, if we think that others will invest sufficient resources, and if we have a clear picture of what AI will look like.
AI strategy on the movement level
Effective altruists should coordinate their efforts, that is, think in terms of comparative advantages and what the movement should do on the margin rather than just considering individual actions. Applied to the problem of how to best shape AI, this might imply that we should pursue a variety of approaches as a movement rather than committing to any single approach.
Still, my impression is that non-technical work on AI is somewhat neglected in the EA community. (80000 hours’ guide on AI policy tends to agree.)
My thoughts on AI scenarios
My position on AI scenarios is close to Brian Tomasik, that is, I lean toward a soft takeoff, relatively long timelines, and distributed, economy-like AI rather than a single actor. Also, we should question the notion of general (super)intelligence. AI systems will likely achieve superhuman performance in more and more domain-specific tasks, but not across all domains at the same time, which makes it a gradual process rather than an intelligence explosion. But of course, I cannot justify high confidence in these views given that many experts disagree.
Following the analysis of this post, this is reason to be mildly sceptical about whether technical work on the control problem is the best way to shape AI. That said, it’s still a viable option because I might be wrong and because technical work has indirect benefits in that it influences the AI community to take safety concerns more seriously.
More generally, one of the best ways to handle pervasive uncertainty may be to focus on “meta” activities such as increasing the influence of effective altruists in the AI community by building expertise and credibility. This is valuable regardless of one’s views on AI scenarios."
- Depending on whether AI is agent-like or economy-like, this framing may be confusing because there may not be a single point in time when AI is built. (back)
- This is not equivalent to being spatially distributed, as it’s possible that AI is spatially distributed, but still acts like a unified agent. It also raises questions about what exactly it means to say that an entity “acts like an agent”. (back)
- However, even if the control problem is not as crucial as it first seems, we may come at it from a different angle. For instance, we may work on security aspects, or on specific subtopics that matter most for preventing worst-case outcomes. (back)