This post is based on an internal document first written in 2015. Some information might be sketchy, dated or incomplete, but we figured there are benefits to making it public so people interested in FRI will get a better understanding of the considerations we are currently investigating.
FRI’s mission is to identify the best intervention(s) for suffering reducers to work on. Figuring out the long-term consequences of our actions is tricky, such that we are often left with significant uncertainty about the value – sometimes even the sign – of a particular intervention. The past rate at which we have uncovered crucial considerations suggests that more research on prioritization is still very valuable and likely to remain so for years. However, in order to not get stuck with research indefinitely, we will have to eventually focus our efforts on an intervention directly targeted at improving the future. Therefore, besides efforts to grow FRI, it is important to already pursue time-sensitive “capacity building” – connecting with more committed altruists, gathering resources, reputation, etc. – in order to get into a position where a clear “path to impact (PTI)” – the concrete intervention that future FRI thinks is more valuable than further research1 – can eventually be tackled with maximum force.
We do not yet know what our PTI(s) will be, which is why it makes sense to pursue a flexible approach focused on movement building and further research. But we should already be in a position to make competent guesses on the matter, and this is important, because depending on our current assessment of plausible PTIs and how likely we will be pursue each of them, we might have reason to already adjust our movement building strategy.
- Top PTI candidates per category and conditions that would favor them
- Thoughts on haste and timeline considerations
- Some tentative conclusions
The intent of this document is to sketch the broad categories of PTIs we currently consider likely, in order to then determine the most important subgoals to optimize for in our movement building. Examples being: monetary resources, committed altruists, people with talent in AI safety, societal reputation, reputation within the effective-altruist (EA) community, timing of all of it (are there haste considerations anywhere?), etc.
We start by sorting all plausible PTIs into logically exhaustive categories. The idea here is that by categorizing them, we make it less likely that we miss something important. One obvious distinction is whether we focus on short-term vs long-term consequences. Then, given that within the long-term branch, most of the expected value comes from outcomes that are somehow about affecting the way AI2 scenarios unfold, we can distinguish four different ways of affecting AI-related outcomes:
- Far future
- Influencing whether AI happens at all
- Influencing whether AI will be controlled or not
- Improving controlled AI outcomes
- Improving uncontrolled AI outcomes
(Of course, this categorization is not the only way of looking at it.) It should be noted that some possible interventions in these categories might turn out to be a bad idea to focus on: For instance, decreasing the probability that AI takeoff happens at all would be bad for reasons of cooperation (even if it overall decreased suffering), as a lot of people care strongly about utopian outcomes that require value-aligned AI.
Top PTI candidates per category and conditions that would favor them
This section is going to list some plausible PTI candidates for each category. The ideas listed are not meant to be conclusive or even particularly promising, but they give an overview on the sort of interventions we are considering.
The category as a whole: Short-term interventions might become our PTI if we ever place a high likelihood on “doom soon,” e.g. as the explanation to the Fermi paradox, or because we think most of our copies are in short-lived simulations. Another reason for focusing on the short term is if years of research fail to bring about more clarity to the uncertainties of the far-future picture. Finally, short-term interventions become appealing if we decide that the general impacts of our decision algorithms throughout the multiverse dominate the specific impacts that our copies have in such a way that short-term actions seem favored.
Plausible, concrete interventions:
- (e.g.) humane fish slaughter
- something related to wild-animal suffering
Influencing whether AI happens at all
The category as a whole: Approaches within this category have to be designed carefully to avoid greatly harming other value systems. Actively decreasing the probability of AI takeoff happening at all is for instance prohibited by considerations about cooperation: Even if it seemed positive, it would be important to find another intervention that is also positive and less opposed to other people’s interests.
Plausible, concrete intervention:
- Using honest arguments and/or moral trade to pull away people from reducing (non-AI) extinction risk
- if we can’t identify less controversial interventions with a similar expected impact
- or (in the moral trade case) if there’s sufficient interest
Influencing whether AI will be controlled or not
The category as a whole: Working directly on AI safety is unlikely to be our comparative advantage because a lot of people already care about this. Having said that, the problem seems difficult and will likely require a lot of work. AI safety thus might become our PTI if talent constraints cannot be overcome easily by all the funding the cause is expected to receive in the near future, in which case we could e.g. help with the recruiting of talented researchers.
Increasing the probability of uncontrolled AI – in the not-so-likely case that we come to the conclusion that uncontrolled AI in expectation causes less suffering – is prohibited by reasons of cooperation. If it became our view that uncontrolled AI in expectation produces the least suffering, we should still pursue another approach, e.g. some concrete intervention listed under the subsequent category “Improving controlled AI outcomes” or something in the domain of “fail-safe” AI safety.
Plausible, concrete intervention:
- Channeling money and talent to an organization working on AI safety (and/or founding our own) to focus on safety issues most relevant to preventing suffering
- if there is still enough room for more funding and we find no alternative where our comparative advantage is higher
- or if the alternatives are less positive for other value systems and we engage in moral trade
- or if we identify (neglected) subproblems that are particularly relevant from a suffering-focused point of view
Improving controlled AI outcomes
The category as a whole: This set of interventions becomes particularly important if we think that controlled AI is worse than uncontrolled AI in expectation, but with a wide range of outcomes that differ in the amount of suffering they contain. And it becomes more important the more likely we consider it that AI will be controlled.
Plausible, concrete interventions:
- Value spreading: suffering-focused ethics
- if we think suffering in physics and unknown unknowns dominate
- Value spreading: antisubstratism and concern for small minds
- if we think suffering in physics and suffering subroutines dominate
- Improving international cooperation to reduce AI arms races by founding or funding a suitable organization
- if we think the worst outcomes are likely to emerge after arms races – plus benefits to the preceding category (“Influencing whether AI will be controlled or not”)
Improving uncontrolled AI outcomes
The category as a whole: Focusing on this category is intriguing because we seem to be the only group who takes AI risks seriously and cares very strongly about the differences in all the scenarios where human values are not implemented. If we think the consideration “focus on your comparative advantages” has a lot of merit, then this could turn out to be our PTI.
Plausible, concrete interventions:
- Work on ways to make AI “fail-safe,” i.e. make sure it fails in (comparatively) “benign” ways if it does fail
- if we think there’s a practical way to do this and give the mechanism to the teams working on likely-to-be unsafe AI (though why would they care?)
- Differential progress in AI safety: Speed up the bits most important to prevent worst-case scenarios
- e.g. if we are concerned about some types of “almost-friendly AI” being the worst outcome;
- or if we can identify other highly important steps for preventing AI-related dystopias
- Work on AI safety applied to the specific AI architectures most likely to bring about the worst types of uncontrolled AI
- if we think the differences within uncontrolled-AI outcomes are larger than the differences in how likely each AI architecture is to lead to superintelligence;
- or if the reason others working on AI safety neglect a certain architecture type is not because it is not likely to lead to superintelligence, but rather because it is harder to control a superintelligence of such an architecture (focusing on the avoidance of worst-case scenarios likely requires less control than getting everything right!)
In future docs inspired by this outline here, we are going to list the pros and cons for each of the above proposals in order to then assign rough weightings to them. It is important to then factor in that some of the proposals above are more far-fetched than others.
Thoughts on haste and timeline considerations
The main way FRI and its parent/partner organization, the Effective Altruism Foundation (EAF), might currently be pursuing the wrong priorities is if there are strong haste considerations that are not given enough weight. AI takeoff represents a hard deadline, after which all our efforts are “graded.” If AI comes very soon, attempts that focus on influencing variables that take time, such as value spreading or promoting international cooperation, might count for nothing. Therefore, it seems important to get a good estimate on how strong we should expect AI-related haste considerations to be. Some thoughts:
- Earlier takeoff is more likely to be uncontrolled, because it likely takes time to solve the value-loading and control problems, and because hard takeoffs, which are more likely to be uncontrolled than slow takeoffs, account for most of the takeoff scenarios that happen very soon (e.g. next 10 years). If AI ends up uncontrolled, value spreading will be mostly irrelevant for the far future.
- Some people argue that those who care about the far future should gamble on influencing early hard takeoff scenarios, because our impact in these scenarios is huge (few others are doing the same!) and gets less diluted than if we focus on the long-term promotion of AI safety decades down the line.
- On the other hand, there might be threshold effects and low-hanging fruit, such that focusing on several approaches and timelines – as opposed to throwing all your eggs into one single basket – could be the smarter approach.
Getting more clarity on AI timelines and strategic considerations on how to act in situations where the deadline is uncertain seems important.
Some tentative conclusions
Based on the considerations in this document, we can draw the following tentative conclusions:
- Several paths to impact are best pursued in collaboration with other organizations (who have built up expertise in e.g. the field of AI). Moral trade could also be an important consideration. This suggests that networking with other effective altruists and groups focused on the far future should receive high priority.
- Many of the proposed PTIs would be somewhat difficult to pitch to donors unfamiliar with the arguments for focusing on AI and the far future. This suggests that it is important to carefully bridge inferential distances with movement building.
- Societal influence might be very important for interventions under the category “Improving controlled AI outcomes,” but many PTIs don’t involve convincing large segments of the population of something. This presents an argument for engaging in (highly) targeted rather than broad movement building.
- For further research as well as for some plausible PTIs, effective altruists with expertise in AI and computer science could be very valuable.
- Interventions targeted at improving uncontrolled-AI outcomes (so-called “fail-safe” measures) seem particularly promising because they are neglected and have the potential to be more tractable than generic AI safety.
Most of the ideas in this article are not my own; they summarize part of what FRI has been exploring or is planning to explore more in the near future.
Special thanks to Brian Tomasik, Simon Knutsson and David Althaus for helpful comments and suggestions.
- It should be noted that the distinction between “research” and “direct intervention” is not necessarily clear-cut: If some activities are likely positive and low-effort, such as e.g. writing about a novel topic and spreading the arguments in relevant circles, this may already provide some impact on its own. Furthermore, certain types of research, such as figuring out whether there are ways to make artificial intelligence (AI) cause less suffering in the event of the worst failure modes, are valuable both for prioritization as well as intrinsically: If initial research suggests that the topic is tractable, prioritization research gradually turns into object-level AI-safety research. (back)
- The following presents a short summary on why the assumption likely holds true. Those who are skeptical may rest assured that a great deal of thought has internally gone into confirming the merits of this assumption, and that it remains open to re-evaluation as new insights are discovered.
If we somehow manage to affect the goals of a singleton-AI, our actions would have a future-shaping impact until the AI either ceases to exist or suffers from a failure of goal-preservation. No matter its goal, a powerful intelligence would instrumentally value self-preservation and goal-preservation, and it would, qua its superior intelligence, be much better at this than humans and human societies ever were or could be. This suggests that focusing on AI-related outcomes makes it possible to predictably affect the future for millions, perhaps even billions of years to come – or in any case for longer than through any other foreseeable means. Moreover, because most possible goals for an AI would imply instrumentally valuing resource accumulation, we should expect singleton-AIs to ambitiously colonize space, rendering the stakes astronomical. Even if the AI in question has a goal unrelated to conscious beings, it might incidentally create suffering in the process of achieving it. Without concern for suffering, even the slightest gains would be worth creating vast amounts of suffering. Unless we can with extraordinary great confidence reject some of the ingredients in this argument (e.g. orthogonality, instrumental convergence, the feasibility of superintelligent AI in the first place), there seems to be no scenario of remotely similar likelihood where our actions now could have a comparable impact on the far future. (back)