Episode 12: Jacob Steinhardt, UC Berkeley, on machine learning safety, alignment and measurement

Generally Intelligent - A podcast by Kanjun Qiu

Categories:

Jacob Steinhardt (Google Scholar) (Website) is an assistant professor at UC Berkeley. ย His main research interest is in designing machine learning systems that are reliable and aligned with human values. ย Some of his specific research directions include robustness, rewards specification and reward hacking, as well as scalable alignment. Highlights: ๐Ÿ“œโ€œTest accuracy is a very limited metric.โ€ ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆโ€œYou might not be able to get lots of feedback on human values.โ€ ๐Ÿ“Šโ€œIโ€™m interested in measuring the progress in AI capabilities.โ€