Episode 12: Jacob Steinhardt, UC Berkeley, on machine learning safety, alignment and measurement
Generally Intelligent - A podcast by Kanjun Qiu
Categories:
Jacob Steinhardt (Google Scholar) (Website) is an assistant professor at UC Berkeley. ย His main research interest is in designing machine learning systems that are reliable and aligned with human values. ย Some of his specific research directions include robustness, rewards specification and reward hacking, as well as scalable alignment. Highlights: ๐โTest accuracy is a very limited metric.โ ๐จโ๐ฉโ๐งโ๐ฆโYou might not be able to get lots of feedback on human values.โ ๐โIโm interested in measuring the progress in AI capabilities.โ