Loading…
In-person
1-4 April 2025
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in British Summer Time (BST) (UTC +1). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Friday April 4, 2025 09:06 - 09:21 BST
Traditional load balancing approaches, including round robin or those relying on metrics like QPS are often ineffective when applied to LLM serving. LLM requests vary significantly in computational demands due to prompt length, the model differences and their autoregressive nature, leading to unpredictable request running times. Moreover, the emergence of model multiplexing techniques (e.g., LoRA) introduces new complexities that necessitate LLM-aware load balancing strategies.
In this talk, we introduce a new set of Kubernetes APIs for routing to LLM workloads that allow configuration of serving objectives and priorities for each use case. These APIs integrate seamlessly with Gateway API, and an included extension means that support for these APIs can easily be plugged into many Gateway API implementations to enable turnkey LLM routing support.
This talk will show this project in action, demonstrating the significant improvements it can enable across a variety of real world examples.
Speakers
avatar for Jiaxin

Jiaxin

Software Engineer, Bytedance
Jiaxin works at ByteDance Infrastructure Lab, focusing on serverless and AI infrastructure. He is also a co-chair of Kubernetes WG-Serving, Jiaxin drives innovations and contributes to the future of scalable AI systems.
avatar for Clayton Coleman

Clayton Coleman

Distinguished Engineer, Google
Architect, engineer, and strategic visionary for application platforms in the cloud. Core contributor to Kubernetes and OpenShift, the open source platform as a service and the containerized cluster manager. I helped launch the shift to cloud native applications and the platforms... Read More →
Friday April 4, 2025 09:06 - 09:21 BST
Level 0 | ICC Auditorium
  Keynote Sessions, AI + ML

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link