Name: Slinky: Slurm in Kubernetes, Performant AI and HPC Workload Management in Kubernetes - Tim Wickberg, SchedMD
Start: 2025-04-02T12:00:00+0100
End: 2025-04-02T12:30:00+0100

In-person
1-4 April 2025
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in British Summer Time (BST) (UTC +1). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

Wednesday April 2, 2025 12:00 - 12:30 BST

Level 1 | Hall Entrance S10 | Room A

Kubernetes was designed for microservices. With AI rapidly advancing, Kubernetes must adapt to also support both AI training and multi-node inference. It needs to improve not only at scheduling these workloads within the cluster, but also at fine-grained resource assignment on the nodes.

High Performance Computing (HPC) systems use workload managers such as Slurm. Slurm, the most used HPC workload manager with over two decades of development, excels at gang scheduling, fair usage, job planning, and batch scheduling.

We will show the current state of Slinky, a fully open-source toolset designed to integrate Slurm with Kubernetes and to solve the difficulties of getting AI clusters working more performantly and efficiently. Slinky includes a Slurm operator, a Slurm client library, and a metrics exporter. Here, we will outline our architecture and discuss the challenges of achieving the fine-grained control needed in Kubernetes for full functionality for AI and HPC workloads.

Speakers

Tim Wickberg

CTO, SchedMD

Tim Wickberg is the Chief Technology Officer of SchedMD, and is responsible for the technical direction and development of the open-source Slurm Workload Manager.

Slinky KubeCon Europe 2025 pdf

Wednesday April 2, 2025 12:00 - 12:30 BST
Level 1 | Hall Entrance S10 | Room A

AI + ML

Content Experience Level Any

KubeCon + CloudNativeCon Europe 2025

Tim Wickberg

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!