AI/ML in Kubernetes, with Maciej Szulik, Clayton Coleman, and Dawn Chen

Kubernetes Podcast from Google - A podcast by Abdel Sghiouar, Kaslin Fields - Tuesdays

Categories:

In this episode, we talk to three active leaders who have been around since the very beginning of Kubernetes. We explore how Kubernetes has changed since its inception, with a particular focus on current efforts in Open source Kubernetes to support AI/ML style workloads.   Maciej Szulik is currently taking a seat in the Kubernetes Steering Committee. He’s also leading Special Interests Groups responsible for kubectl, workload and batch controllers. Maciej has been contributing to Kubernetes since the early days, jumping from one area to another where help was needed. He authored the first version of audit and helped shape its current one, as well as touched multiple other places in apimachinery. He was also responsible for designing and implementing Job and CronJob controllers. In kubectl he was responsible for the plugin mechanism and several major refactors to simplify the code. Since May 2024 he joined the ranks of Production Readiness Review (PRR) approvers helping ensure high production standards for the future of Kubernetes releases.    Clayton Coleman is a long-time Kubernetes contributor, having helped launch Kubernetes as open source, being on the bootstrap steering committee, and working across a number of SIGs to make Kubernetes a reliable and powerful foundation for workloads.  At Red Hat he led OpenShift’s pivot onto Kubernetes and its growth across on-premise, edge, and into cloud.  At Google he is now focused on enabling the next generation of key workloads, especially AI/ML in Kubernetes and on GKE.   Dawn Chen has been a Principal Software Engineer at Google cloud since May 2007. Dawn has worked on an open source project called Kubernetes before the project was founded. She has been one of tech leads in both Kubernetes and GKE, and founded SIG Node from scratch. She also led Anthos platform team for the last 4 years, and mainly focuses on the core infrastructure. Prior to Kubernetes, she was the one of the tech leads for Google internal container infrastructure -- Borg for about 7 years. Outside of work, she is a wife, a mother of a 16-year old boy and a good friend. She enjoys reading, cooking, hiking and traveling.   Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: [email protected] - twitter: @kubernetespod News of the week Kubernetes 1.31 Code Freeze is on July 9th Links from the interview Kubernetes Working Group Batch Kubernetes Working Group Serving Blog: Introducing Indexed Jobs (2021) Docs: Kubernetes Jobs KEP: Elastic Indexed Jobs Docs: Kubernetes CronJobs KubeCon EU 2021: The Long, Winding and Bumpy Road to CronJob’s GA - Maciej Szulik, Red Hat & Alay Patel, Red Hat KubeCon EU 2018: Writing Kube Controllers for Everyone - Maciej Szulik, Red Hat (Beginner Skill Level) Kubernetes Working Group Device Management Kubernetes Enhancement Proposal process README DockerCon 2014: The announcement of Kubernetes at DockerCon Blog: AI & Kubernetes (by Kaslin) Kueue - “Kueue is a cloud-native job queueing system for batch, HPC, AI/ML, and similar applications in a Kubernetes cluster.” Whitepaper: Large-scale cluster management at {Google} with {Borg} Email: “Containers: Introduction” - An email introducing the concept of Linux containers to the Linux community Links from the post-interview chat Blog - “Scaling Kubernetes to 7,500 nodes” - OpenAI Ray on Kubernetes