Loading…
In-person
1-4 April 2025
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in British Summer Time (BST) (UTC +1). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Wednesday April 2, 2025 12:00 - 12:30 BST
Kubernetes was designed for microservices. With AI rapidly advancing, Kubernetes must adapt to also support both AI training and multi-node inference. It needs to improve not only at scheduling these workloads within the cluster, but also at fine-grained resource assignment on the nodes.

High Performance Computing (HPC) systems use workload managers such as Slurm. Slurm, the most used HPC workload manager with over two decades of development, excels at gang scheduling, fair usage, job planning, and batch scheduling.

We will show the current state of Slinky, a fully open-source toolset designed to integrate Slurm with Kubernetes and to solve the difficulties of getting AI clusters working more performantly and efficiently. Slinky includes a Slurm operator, a Slurm client library, and a metrics exporter. Here, we will outline our architecture and discuss the challenges of achieving the fine-grained control needed in Kubernetes for full functionality for AI and HPC workloads.
Speakers
avatar for Tim Wickberg

Tim Wickberg

CTO, SchedMD LLC
Tim Wickberg is the Chief Technology Officer of SchedMD, and is responsible for the technical direction and development of the open-source Slurm Workload Manager.
avatar for Marlow Warnicke (Weston)

Marlow Warnicke (Weston)

Principal Cloud Architect, SchedMD
Marlow is a Principal Cloud Engineer working on scheduling at SchedMD. She also is a chair for the CNCF Environmental Sustainability TAG. Marlow has expertise in resource management, the AI/ML Kubernetes cloud compute ecosystem, embedded systems, high performance compute system tools... Read More →
Wednesday April 2, 2025 12:00 - 12:30 BST
Level 1 | Hall Entrance S10 | Room A
  AI + ML
  • Content Experience Level Any

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link