Suhail Doshi: The Future of Computer Vision

The Gradient: Perspectives on AI - A podcast by Daniel Bashir

Categories:

Episode 123I spoke with Suhail Doshi about:* Why benchmarks aren’t prepared for tomorrow’s AI models* How he thinks about artists in a world with advanced AI tools* Building a unified computer vision model that can generate, edit, and understand pixels. Suhail is a software engineer and entrepreneur known for founding Mixpanel, Mighty Computing, and Playground AI (they’re hiring!).Reach me at [email protected] for feedback, ideas, guest suggestions. Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (00:54) Ad read — MLOps conference* (01:30) Suhail is *not* in pivot hell but he *is* all-in on 50% AI-generated music* (03:45) AI and music, similarities to Playground* (07:50) Skill vs. creative capacity in art* (12:43) What we look for in music and art* (15:30) Enabling creative expression* (18:22) Building a unified computer vision model, underinvestment in computer vision* (23:14) Enhancing the aesthetic quality of images: color and contrast, benchmarks vs user desires* (29:05) “Benchmarks are not prepared for how powerful these models will become”* (31:56) Personalized models and personalized benchmarks* (36:39) Engaging users and benchmark development* (39:27) What a foundation model for graphics requires* (45:33) Text-to-image is insufficient* (46:38) DALL-E 2 and Imagen comparisons, FID* (49:40) Compositionality* (50:37) Why Playground focuses on images vs. 3d, video, etc.* (54:11) Open source and Playground’s strategy* (57:18) When to stop open-sourcing?* (1:03:38) Suhail’s thoughts on AGI discourse* (1:07:56) OutroLinks:* Playground homepage* Suhail on Twitter Get full access to The Gradient at thegradientpub.substack.com/subscribe