Loading…
In-person
1-4 April 2025
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in British Summer Time (BST) (UTC +1). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Thursday April 3, 2025 17:30 - 18:00 BST
Large Language Models (LLM) require preprocessing vast amounts of data, a process that can span days due to its complexity and scale, often involving PetaBytes of data. This talk demonstrates how Kubeflow Pipelines (KFP) simplify LLM data processing with flexibility, repeatability, and scalability. These pipelines are being used daily at IBM Research to build indemnified LLMs tailored for enterprise applications.
Different data preparation toolkits are built on Kubernetes, Rust, Slurm, or Spark. How would you choose one for your own LLM experiments or enterprise use cases and why should you consider Kubernetes and KFP?
This talk describes how open source Data Prep Toolkit leverages KFP and KubeRay for scalable pipeline orchestration, e.g. deduplication, content classification, and tokenization.
We share challenges, lessons, and insights from our experience with KFP, highlighting its applicability for diverse LLM tasks, such as data preprocessing, RAG retrieval, and model fine-tuning.
Speakers
avatar for Alexey Roytman

Alexey Roytman

Software Architect, IBM
I am a Software Architect at IBM Research. I take pleasure in tackling technical challenges and discovering/ implementing innovative solutions. With over 20 years in my career, I have amassed experience in developing various middleware and cloud components. I have a keen interest... Read More →
avatar for Anish Asthana

Anish Asthana

Associate Manager, Engineering, Red Hat
Anish is an engineering manager at Red Hat in the OpenShift AI organization. He is working on making machine learning easier for the wider community by building a platform out with cloud capabilities at the core. Most recently, his interests have been focused on the Distributed Workloads... Read More →
Thursday April 3, 2025 17:30 - 18:00 BST
Level 1 | Hall Entrance N11
  AI + ML
  • Content Experience Level Any

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link