Loading…
In-person
1-4 April 2025
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in British Summer Time (BST) (UTC +1). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
or to bookmark your favorites and sync them to your phone or calendar.
Type: Maintainer Track clear filter
Friday, April 4
 

11:45 BST

Cloud Native AI: Harness the Power of Advanced Scheduling for High-Performance AI/ML Training - William Wang & Xuzheng Chang, Huawei
Friday April 4, 2025 11:45 - 12:15 BST
In the era of large models, as models and data are becoming increasingly larger, LLM workloads have extremely high requirements for network throughput and latency.

However, Kubernetes has no awareness of either the parallel models of LLM workloads or the underlying high-speed network communication topology, which leads to a loss in training performance. Meanwhile, many expensive high-performance underlying resources are not utilized more efficiently.

As one of the important projects for Cloud-native AI, Volcano has conducted in-depth research over the past year. It has remodeled the workloads in large model training and inference scenarios as well as the new network topologies, and designed and implemented high-performance scheduling features.

This talk will cover:
1. The complexities related to intelligent scheduling, improving performance and cost-effective
2. Methodology to reconsider the resource model and LLM workload
3. Enhancement to Volcano to optimize training for AI/ML
Speakers
avatar for William Wang (Leibo Wang)

William Wang (Leibo Wang)

Senior software engineer, Nvidia
Cloud native architect, open-source enthusiast, technical lead and maintainer of CNCF Volcano, software developer with a decade of experience in diverse domains including cloud native technology, large-scale cluster resource management, batch scheduling, BigData, and AI acceleration... Read More →
avatar for Xuzheng Chang

Xuzheng Chang

Senior engineer, Huawei Cloud
XuzhengChang is a maintainer of the Volcano community, with in-depth research and practical experience in the fields of batch computing and cloud-native AI scheduling. Xuzheng has spearheaded several significant features within the Volcano community, including network topology-aware... Read More →
Friday April 4, 2025 11:45 - 12:15 BST
Level 3 | ICC Capital Suite 14-16
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
  • 🚨 Contribfest
  • πŸͺ§ Poster Sessions
  • AI + ML
  • Application Development
  • Breaks
  • ⚑ Lightning Talks
  • Cloud Native Experience
  • Cloud Native Novice
  • CNCF-hosted Co-located Events
  • Connectivity
  • Data Processing + Storage
  • Emerging + Advanced
  • Experiences
  • Keynote Sessions
  • Maintainer Track
  • Observability
  • Operations + Performance
  • Platform Engineering
  • Project Opportunities
  • Registration
  • Security
  • Solutions Showcase
  • Sponsor-hosted Co-located Event
  • Tutorials