While we haven’t yet built aligned AI, the field of alignment has been chugging along for a few years now, producing many useful outputs. In this talk, Rohin will survey conceptual progress in AI alignment over the last two years. While the talk will assume basic knowledge of the arguments for AI risk, technical knowledge is not necessary.
Rohin is a 6th year PhD student in Computer Science working at the Center for Human-Compatible AI (CHAI) at UC Berkeley. While he started his PhD working on program synthesis, he became convinced that it was important for us to build safe, aligned AI, and so moved to CHAI at the start of his fourth year. He now thinks about how to provide specifications of good behavior in ways other than reward functions, especially ones that do not require much human effort. He is best known for the Alignment Newsletter, a weekly publication with recent content relevant to AI alignment that has over 1700 subscribers.
View Rohin’s slides here:
Subscribe to the Alignment Newsletter: