[Crosspost] Adam Gleave on Vulnerabilities in GPT-4 APIs (+ extra Nathan Labenz interview)

The Inside View - A podcast by Michaël Trazzi

Categories:

This is a special crosspost episode where Adam Gleave is interviewed by Nathan Labenz from the Cognitive Revolution. At the end I also have a discussion with Nathan Labenz about his takes on AI. Adam Gleave is the founder of Far AI, and with Nathan they discuss finding vulnerabilities in GPT-4's fine-tuning and Assistant PIs, Far AI's work exposing exploitable flaws in "superhuman" Go AIs through innovative adversarial strategies, accidental jailbreaking by naive developers during fine-tuning, and more. OUTLINE (00:00) Intro (02:57) NATHAN INTERVIEWS ADAM GLEAVE: FAR.AI's Mission (05:33) Unveiling the Vulnerabilities in GPT-4's Fine Tuning and Assistance APIs (11:48) Divergence Between The Growth Of System Capability And The Improvement Of Control (13:15) Finding Substantial Vulnerabilities (14:55) Exploiting GPT 4 APIs: Accidentally jailbreaking a model (18:51) On Fine Tuned Attacks and Targeted Misinformation (24:32) Malicious Code Generation (27:12) Discovering Private Emails (29:46) Harmful Assistants (33:56) Hijacking the Assistant Based on the Knowledge Base (36:41) The Ethical Dilemma of AI Vulnerability Disclosure (46:34) Exploring AI's Ethical Boundaries and Industry Standards (47:47) The Dangers of AI in Unregulated Applications (49:30) AI Safety Across Different Domains (51:09) Strategies for Enhancing AI Safety and Responsibility (52:58) Taxonomy of Affordances and Minimal Best Practices for Application Developers (57:21) Open Source in AI Safety and Ethics (1:02:20) Vulnerabilities of Superhuman Go playing AIs (1:23:28) Variation on AlphaZero Style Self-Play (1:31:37) The Future of AI: Scaling Laws and Adversarial Robustness (1:37:21) MICHAEL TRAZZI INTERVIEWS NATHAN LABENZ (1:37:33) Nathan’s background (01:39:44) Where does Nathan fall in the Eliezer to Kurzweil spectrum (01:47:52) AI in biology could spiral out of control (01:56:20) Bioweapons (02:01:10) Adoption Accelerationist, Hyperscaling Pauser (02:06:26) Current Harms vs. Future Harms, risk tolerance (02:11:58) Jailbreaks, Nathan’s experiments with Claude The cognitive revolution: https://www.cognitiverevolution.ai/ Exploiting Novel GPT-4 APIs: https://far.ai/publication/pelrine2023novelapis/ Advesarial Policies Beat Superhuman Go AIs: https://far.ai/publication/wang2022adversarial/