Session 04

AI Safety and Governance

“We are looking ahead, as is one of the first mandates given us as chiefs, to make sure and to make every decision that we make relate to the welfare and well-being of the seventh generation to come. ... What about that seventh generation? Where are you taking them? What will they have?”

—Oren Lyons (1980)

Required materials

Introduction

This week, we’ll consider: What are some of the various claims about how AI poses an existential risk? What are the different ways of getting involved to reduce these risks?

For context on the field’s current perspectives on these questions, a 2020 survey of AI safety and governance researchers (Clarke et al., 2021) found that, on average, researchers currently guess there is:

A 10% chance of existential catastrophe from misaligned, influence-seeking AI
A 6% chance of existential catastrophe from AI-exacerbated war or AI misuse
A 7% chance of existential catastrophe from “other scenarios”

Note that there were high levels of uncertainty and disagreement in the above survey’s results. These imply that many researchers must be wrong about important questions, which arguably makes skeptical and questioning mindsets.

The case for worrying about risks from artificial intelligence

Introduction to AI Safety (video - 18mins)
- OR, you could read The case for taking AI seriously as a threat to humanity - Vox (20 mins.)
Our World in Data] AI timelines: What do experts in artificial intelligence expect for the future? (Roser, 2023) (7mins)

Strategies for reducing risks from unaligned artificial intelligence

Suffering risks

Why s-risks are the worst existential risks, and how to prevent them (26mins)

More to explore

The development of artificial intelligence

AlphaGo - The Movie - DeepMind - A documentary exploring what artificial intelligence can reveal about the 3000-year-old game of Go, and what that can teach us about the future potential of artificial intelligence. (Video - 1 hour 30 mins.)
The Artificial Intelligence Revolution: Part 1 - A fun and interesting exploration of artificial intelligence by the popular blogger Tim Urban. (45 mins.)

Further reading on AI alignment

AGI Safety Fundamentals Curricula
My personal cruxes for working on AI safety (65 mins.)
Professor Stuart Russell on the flaws that make today's AI architecture unsafe & a new approach that could fix it (Podcast - 2 hours 15 mins.)
Some Background on Our Views Regarding Advanced Artificial Intelligence - Open Philanthropy Project - An explication of why there is a serious possibility that progress in artificial intelligence could precipitate a transition comparable to the Neolithic and Industrial revolutions. (1 hour)
*The Precipice* - Chapter 5 (pages 138-152) - Unaligned Artificial Intelligence (25 mins.)
What Failure Looks Like (12 mins.) Two specific stories about what a very bad society-wide AI alignment failure could look like, which differ considerably from the classic “intelligence explosion” story
AGI Safety from first principles (1 hour 15 mins.) one AI researcher’s take on the specific factors for the problem of aligning general AI
Human Compatible: Artificial Intelligence and The Problem of Control (Book)
The Alignment Problem: Machine Learning and Human Values (Book)

Governance for artificial intelligence

The new 30-person research team in DC investigating how emerging technologies could affect national security - 80,000 Hours - How might international security be altered if the impact of machine learning is similar in scope to that of electricity? (Podcast - 2 hours)
Technology Roulette: Managing Loss of Control as Many Militaries Pursue Technological Superiority - Center for a New American Security - An argument for how advances in military technology (including but not limited to AI) can impede relevant decision making and create risk, thus demanding greater attention by the national security establishment. (60 mins.)

Technical AI alignment work

AI Alignment Landscape (Video - 30 mins.)
AI safety starter pack (7 mins.)
How to pursue a career in technical AI alignment (59 mins.)
Technical Alignment Curriculum (readings for a 7 week course)
The Alignment Forum, especially their core sequences

Further criticisms of worries about AI risk

How sure are we about this AI stuff? (26 mins.)
A tale of 2.75 orthogonality theses (20 mins.)

Session 3 (Part 2)

Session 5