top of page

Session 04

AI Safety and Governance

“We are looking ahead, as is one of the first mandates given us as chiefs, to make sure and to make every decision that we make relate to the welfare and well-being of the seventh generation to come. ... What about that seventh generation? Where are you taking them? What will they have?”

—Oren Lyons (1980)

Required materials


This week, we’ll consider: What are some of the various claims about how AI poses an existential risk? What are the different ways of getting involved to reduce these risks?

For context on the field’s current perspectives on these questions, a 2020 survey of AI safety and governance researchers (Clarke et al., 2021) found that, on average, researchers currently guess there is:

  • A 10% chance of existential catastrophe from misaligned, influence-seeking AI

  • A 6% chance of existential catastrophe from AI-exacerbated war or AI misuse

  • A 7% chance of existential catastrophe from “other scenarios”

Note that there were high levels of uncertainty and disagreement in the above survey’s results. These imply that many researchers must be wrong about important questions, which arguably makes skeptical and questioning mindsets.

More to explore

The development of artificial intelligence

  1. AlphaGo - The Movie - DeepMind - A documentary exploring what artificial intelligence can reveal about the 3000-year-old game of Go, and what that can teach us about the future potential of artificial intelligence. (Video - 1 hour 30 mins.)

  2. The Artificial Intelligence Revolution: Part 1 - A fun and interesting exploration of artificial intelligence by the popular blogger Tim Urban. (45 mins.)


Further reading on AI alignment

  1. AGI Safety Fundamentals Curricula

  2. My personal cruxes for working on AI safety (65 mins.)

  3. Professor Stuart Russell on the flaws that make today's AI architecture unsafe & a new approach that could fix it (Podcast - 2 hours 15 mins.)

  4. Some Background on Our Views Regarding Advanced Artificial Intelligence - Open Philanthropy Project - An explication of why there is a serious possibility that progress in artificial intelligence could precipitate a transition comparable to the Neolithic and Industrial revolutions. (1 hour)

  5. *The Precipice* - Chapter 5 (pages 138-152) - Unaligned Artificial Intelligence (25 mins.)

  6. What Failure Looks Like (12 mins.) Two specific stories about what a very bad society-wide AI alignment failure could look like, which differ considerably from the classic “intelligence explosion” story

  7. AGI Safety from first principles (1 hour 15 mins.) one AI researcher’s take on the specific factors for the problem of aligning general AI

  8. Human Compatible: Artificial Intelligence and The Problem of Control (Book)

  9. The Alignment Problem: Machine Learning and Human Values (Book)


Governance for artificial intelligence

  1. The new 30-person research team in DC investigating how emerging technologies could affect national security - 80,000 Hours - How might international security be altered if the impact of machine learning is similar in scope to that of electricity? (Podcast - 2 hours)

  2. Technology Roulette: Managing Loss of Control as Many Militaries Pursue Technological Superiority - Center for a New American Security - An argument for how advances in military technology (including but not limited to AI) can impede relevant decision making and create risk, thus demanding greater attention by the national security establishment. (60 mins.)


Technical AI alignment work

  1. AI Alignment Landscape (Video - 30 mins.)

  2. AI safety starter pack (7 mins.)

  3. How to pursue a career in technical AI alignment (59 mins.)

  4. Technical Alignment Curriculum (readings for a 7 week course)

  5. The Alignment Forum, especially their core sequences


Further criticisms of worries about AI risk

bottom of page