Approval-directed agency and the decision theory of Newcomb-like problems

The quest for artificial intelligence poses questions relating to decision theory: How can we implement any given decision theory in an AI? Which decision theory (if any) describes the behavior of any existing AI design? This paper examines which decision theory (in particular, evidential or causal) is implemented by an approval-directed agent, i.e., an agent whose goal it is to maximize the score it receives from an overseer.

Read more

Robust program equilibrium

One approach to achieving cooperation in the one-shot prisoner’s dilemma is Tennenholtz’s program equilibrium, in which the players of a game submit programs instead of strategies. These programs are then allowed to read each other’s source code to decide which action to take. Unfortunately, existing cooperative equilibria are either fragile or computationally challenging and therefore unlikely to be realized in practice. This paper proposes a new, simple, more efficient program to achieve more robust cooperative program equilibria.

Read more

Self-improvement races

Just like human factions may race toward AI and thus risk misalignment, AIs may race toward superior abilities by self-improving themselves in risky ways.

Read more

Overview: Multiverse-wide Superrationality

This page provides an overview of all resources related to multiverse-wide superrationality. Lukas Gloor (2017): Commenting on MSR, Part 1: Multiverse-wide cooperation in a nutshell Introductory talk by Caspar Oesterheld for people familiar with decision theory Caspar Oesterheld (2017): Multiverse-wide Cooperation via Correlated Decision Making Caspar Oesterheld (2017): Multiverse-wide cooperation via correlated decision making – Summary […]

Read more

Multiverse-wide Cooperation via Correlated Decision Making

Some decision theorists argue that when playing a prisoner's dilemma-type game against a sufficiently similar opponent, we should cooperate to make it more likely that our opponent also cooperates. This idea, which Hofstadter calls superrationality, has strong implications when combined with the insight from modern physics that we live in a large universe or multiverse of some sort.

Read more

Backup Utility Functions: A Fail-Safe AI Technique

Setting up the goal systems of advanced AIs in a way that results in benevolent behavior is expected to be difficult. We should account for the possibility that the goal systems of AIs fail to implement our values as originally intended. In this paper, we propose the idea of backup utility functions: Secondary utility functions that are used in case the primary ones “fail”.

Read more

Formalizing Preference Utilitarianism in Physical World Models

Most ethical work is done at a low level of formality which can lead to misunderstandings in ethical discussions. In this paper, we use Bayesian inference to introduce a formalization of preference utilitarianism in physical world models. Even though our formalization is not immediately applicable, it is a first step in providing ethical inquiry with a formal basis.

Read more

GET INVOLVED