KubeCon + CloudNativeCon Europe 2025: Full Schedule

In-person
1-4 April 2025
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in British Summer Time (BST) (UTC +1). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

11:15 BST

Lessons Learned From Architecting the Highest-scale Operational Systems in the World - Artur Bergman, Fastly

Wednesday April 2, 2025 11:15 - 11:45 BST

Level 0 | ICC Capital Hall | Room 2

Platform engineering for accelerating modern, resilient cloud-native systems requires a ruthless focus on the experience of both your customers and your developers. Restrictive vendor experiences, made worse by overreliance on single-point solutions, and the isolated bash script approaches from the past introduce unacceptable compromises to performance, security, and quality for continuous operations. As the founder and CTO of Fastly, Artur Bergman has spent decades optimizing the vendors in his stack and how he uses them to build a cohesive developer toolchain for Fastly’s internal teams and customer platform teams worldwide. This talk will cover: lessons learned from testing the limits of vendor systems to meet business needs, evaluating when to build versus buy platform engineering systems from first principles, and how to apply a rigorous experience design lens when architecting platforms for team success.

Speakers

Artur Bergman

Founder and CTO, Fastly

Artur Bergman currently serves as Chief Technology Officer of Fastly, Inc., a leading edge cloud platform. Artur founded Fastly in 2011 and served as its CEO until 2020, guiding the company through its IPO in 2019. Prior to becoming CTO in 2024, he held the role of Chief Architect... Read More →

Wednesday April 2, 2025 11:15 - 11:45 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Intermediate

12:00 BST

Leveraging Internal Knowledge: Building AiKA at Spotify - Majd Salman & Jofre Mateu Matesanz, Spotify

Wednesday April 2, 2025 12:00 - 12:30 BST

Level 0 | ICC Capital Hall | Room 2

In the fast-paced world of technology, access to the right information at the right time is crucial for innovation and efficiency. Enter AiKA, Spotify's RAG based internal “artificial intelligence knowledge assistance” platform, designed to empower our developers by providing instant access to the vast pool of internal knowledge through various surfaces. We'll cover why we developed AiKA, detailing the challenges of managing and retrieving info across a large organization. Learn about the technologies and methodologies we employed and how we integrated AiKA seamlessly into our existing infrastructure

We'll highlight how AiKA's flexible API allows engineers to ingest their own custom knowledge, tailoring the tool to meet the unique needs of different teams. Discover how it not only enhances productivity but also fosters a culture of self-service and continuous learning.

Speakers

Jofre Mateu Matesanz

Software Engineer, Spotify

Jofre is a Senior Data Engineer at Spotify with a focus on making internal knowledge assistance and productivity tools for engineers.

Majd Salman

Senior Data Engineer, Spotify

Majd Salman is a Senior Data Engineer at Spotify with a focus on making internal knowledge assistance and productivity tools for engineers.

Wednesday April 2, 2025 12:00 - 12:30 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Intermediate

14:30 BST

Many Cooks, One Platform: Balancing Ownership and Contribution for the Perfect Broth - Lian Li, lianmakesthings

Wednesday April 2, 2025 14:30 - 15:00 BST

Level 0 | ICC Capital Hall | Room 2

When I started contracting with the Dutch government to build a new internal developer platform, I found myself navigating competing demands from different teams. Development teams wanted support tailored to their processes, neighboring infrastructure teams aimed to protect their areas of responsibility, and management expected visible progress. These conflicting priorities kept pulling my team in multiple directions, making it challenging to stay aligned and focused.

Since I have a background in Developer Relations, I soon made it my goal to engage all involved parties, giving users a sense of ownership and collaboration, while keeping the platform cohesive.

In this talk, I’ll share the tools and processes that helped address these challenges. I’ll provide practical insights for aligning diverse stakeholders. If you’ve ever faced the challenge of “too many cooks” this session will show how to turn competing demands into a recipe for success.

Speakers

Lian Li

Cloud Native Human, lianmakesthings

Lian always wanted to save the world. After leaving law school, she decided to work with computers instead. While in Web Dev, she started attending tech events, and soon fell in love with the community. In her roles as Consultant and DevRel, Lian combined technical knowledge with... Read More →

Wednesday April 2, 2025 14:30 - 15:00 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Any

15:15 BST

More Data Please: Hands on Green Cloud Experiments - Leonard Pahlke, BWI GmbH & Antonio Di Turi, Data Reply

Wednesday April 2, 2025 15:15 - 15:45 BST

Level 0 | ICC Capital Hall | Room 2

Sustainable cloud computing has been a topic for over a decade, but we lack concrete data on Kubernetes energy consumption. This session shares a case study of a microservice running on a k3s clusters, providing real energy metrics at every stage of Platform Engineering: Day 0 (manual setup with k3s, Cilium, microservice deployment), Day 1 (introducing ArgoCD, Falco for security), and Day 2 (adding observability with Prometheus, Grafana, OpenTelemetry, and Kepler). We use bare metal environments ensuring clean, measurable energy data, from idle setup to fully operational.

We’ll explore how tools like Kepler estimate energy consumption for Kubernetes components and compare them to actual plug measurements. For Day 3, we’ll present experiments: changing programming languages, OS images, VPA and KEDA. By sharing practical insights and data, we aim to inspire engineers to innovate and build a more sustainable cloud-native ecosystem.

Presented by TAG Environmental Sustainability Leads.

Speakers

Antonio Di Turi

Data Engineer, Data Reply

Co-chair of WG Green review in the CNCF TAG-environmental-sustainability. I am determined and dynamic, I like the crowd and I like to be exposed to new stimuli. DevOps and Sustainability are my passions. I feel very lucky because in my job I always find some fun.

Leonard Pahlke

Senior Expert Cloud Native Engineering, BWI GmbH

Leonard is a dedicated open source contributor and leader, currently chairing the CNCF TAG Environmental Sustainability. Previously, Leonard led the K8s release team for v1.26 and as the emeritus advisor for v1.28. With a strong focus on emerging technologies, he advocates for open... Read More →

Wednesday April 2, 2025 15:15 - 15:45 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Intermediate

16:15 BST

Making CRDs Delightful: Beyond the Pitfalls - Evan Anderson, Stacklok, Inc

Wednesday April 2, 2025 16:15 - 16:45 BST

Level 0 | ICC Capital Hall | Room 2

CRDs have a lot of traps for new operator authors; this is a different talk about developing for Kubernetes! If you're building Kubernetes resource types, let's talk about how to make them satisfying and enjoyable for your users. Using examples from multiple popular projects, Evan will provide 10 tips on how to make your APIs friendly to Kubernetes beginners and experts alike.

* Use status for humans and machines
* Condition super-powers with one simple rule!
* How to avoid needing to build a CLI
* When to build one anyway
* Day-1 RBAC for everyone
* Supporting GitOps gracefully
* Status-free objects: Policies and Classes
* The beauty of zero
* Borrowing is best: embedding known types
* Operating someone else's CRD: labels and annotations

Evan has been extending and operating Kubernetes for the last 6 years. The above patterns will be illustrated with examples from his experience with ArgoCD, Cert-Manager, Gateway-API, Knative, and Kubernetes, among others.

Speakers

Evan Anderson

Software Engineer, Stacklok, Inc

Founder and maintainer on Knative serverless project. Currently at Stacklok working on supply chain security, previously at Google and VMware; recovering SRE.

Wednesday April 2, 2025 16:15 - 16:45 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Advanced

17:00 BST

Platform Engineering for Software Developers and Architects (Redux) - Daniel Bryant, Syntasso

Wednesday April 2, 2025 17:00 - 17:30 BST

Level 0 | ICC Capital Hall | Room 2

Building on my KubeCon EU 2022 talk, "From Kubernetes to PaaS to... err, what's next," I aim to introduce platform engineering to the software developer and architect communities.

My primary goal is for developers to understand "what good looks like" with a successful platform build and help them understand how a platform can influence the SDLC (for better or worse!)

Key takeaways from the session:
- Explore how platform architecture influences software architecture and vice versa
- Learn why the principles of coupling and cohesion apply to platform components (and configuration) in the same way as they do with software components
- Understand what to expect from an effective platform, including how applications are built, shipped, and run
- Learn about key platform metrics grounded in developer experience frameworks such as DORA, SPACE, and DevEx

Speakers

Daniel Bryant

Platform Engineer and Head of Product Marketing, Syntasso

Daniel Bryant is a platform engineer and the Head of Product Marketing at Syntasso. Daniel is a long-time coder, platform engineer, and Java Champion, and he contributes to several open source projects. He also writes for InfoQ, O’Reilly, and The New Stack, and regularly presents... Read More →

Wednesday April 2, 2025 17:00 - 17:30 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Any

17:45 BST

Scale Smarter Not Harder: How Extending Cluster Autoscaler Saves Millions - Rahul Rangith & Ben Hinthorne, Datadog

Wednesday April 2, 2025 17:45 - 18:15 BST

Level 0 | ICC Capital Hall | Room 2

“I need 100 instances with 32 CPUs and 128GB of memory each, with remote storage and up to 10GB/s of network bandwidth, and I need them now”! At Datadog, we make scaling requests like this thousands of times a day, across dozens of clusters in multiple cloud providers. At this scale, and with so many machine specifications to choose from, we realized the importance of asking the question: how do I select the best instance type in every environment?
Join us to learn how answering this question with every scale up decision significantly reduces our cloud costs. We’ll discuss the tools we use to score instance types, and strategies to plug these recommendations into the Kubernetes Cluster Autoscaler via its gRPC expander. Whether you’re operating a single cluster or a massive Kubernetes platform, this talk will teach you how to upgrade your infrastructure to make informed instance type selections that minimize your cloud spend.

Speakers

Rahul Rangith

Software Engineer, Datadog

Rahul Rangith has worked at Datadog since 2022 after graduating from the University of Waterloo. He works on Datadog’s Compute team which is responsible for the company’s Kubernetes platform. On the team, he focuses on node management and autoscaling. Rahul is active in the Kubernetes... Read More →

Ben Hinthorne

Software Engineer, Datadog

Ben Hinthorne joined Datadog’s Compute Team in 2021, which is responsible for building and scaling their Kubernetes platform. Recently, he has focused on the autoscaling ecosystem, working to optimize application performance, infrastructure cost, and resiliency through opinionated... Read More →

Wednesday April 2, 2025 17:45 - 18:15 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Intermediate

11:00 BST

Extending Kubernetes Resource Model (KRM) Beyond Kubernetes Workloads - Mangirdas Judeikis, Cast AI & Nabarun Pal, Broadcom

Thursday April 3, 2025 11:00 - 11:30 BST

Level 0 | ICC Capital Hall | Room 2

Writing consistent APIs is hard. The Kubernetes Resource Model (KRM) is the foundation of Kubernetes’ success because it is consistent, predictable, and easy to understand, and it provides a declarative approach to managing infrastructure and applications. But what if KRM could transcend Kubernetes itself?

This talk will explore the paradigm shift of how one could use KRM with kcp or Kubernetes Generic control plane to provide more than just workload management. This is not a new concept, Crossplane and many other tools are already doing this. But if we could take this further? What if each cloud API would look and feel like Kubernetes API? We will extensively cover how “kcp + friends” in the CNCF ecosystem fulfill that vision.

At the end of the talk, the audience will walk away with knowledge of KRM++, the approaches on building a scalable multi-tenant control plane for managing resources in their multi-cluster Kubernetes based infrastructure, possibly hybrid cloud.

Speakers

MJ / Mangirdas Judeikis

Staff Engineer, kcp maintainer, Cast AI

Control planes, distributed systems and opensource. All Kubernetes and kcp! A decade of Kubernetes experience, focusing on platform engineering based on Kubernetes over the last decade. Doing platform engineering before it was cool. :)I thrive on Go, Kubernetes, and an Open Source... Read More →

Nabarun Pal

Principal Software Engineer, Broadcom

Nabarun is a Principal Software Engineer at Broadcom, a maintainer of the Kubernetes project, a chair of Kubernetes SIG Contributor Experience and an emeritus Kubernetes Steering Committee member. He is contributing to kcp in various ways in the recent past.He is a Release Manager... Read More →

Thursday April 3, 2025 11:00 - 11:30 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Intermediate

11:45 BST

Building a 5* Kubernetes Hotel - Dean Fuller, Fidelity International & Rachael Wonnacott, Fidelity International

Thursday April 3, 2025 11:45 - 12:15 BST

Level 0 | ICC Capital Hall | Room 2

When Fidelity International's public cloud journey began to slow it became clear that our barrier to cloud was too high and with lower cognitive load platforms readily available on premises (CloudFoundry) why would anyone move? This sparked the realisation that we needed to build a public cloud container hosting platform that could provide that experience our developers had become used to for so many years, what was born was known as the "Kubernetes Hotel". Abstracting much of the K8s infrastructure complexity from our internal developers it allowed them to focus on the business logic and leaving the platform team to do the heavy engineering. In this talk we'll explore the high's and low's of the K8s hotel business, how our MVP was more of a motel and what we believe a 5* K8s hotel might look like as we progress further on our journey.

Speakers

Rachael Wonnacott

Associate Director - Container Platform Engineering, Fidelity International

Rachael has spent the last decade focused on platform engineering. She places a conscious emphasis on improving flow and is on the quest to smooth the application lifecycle for developers in the enterprise. With a background in astrophysics, Rachael brings her scientific approach... Read More →

Dean Fuller

Director of Developer Platform Engineering, Fidelity International

Dean Fuller has spent the last 20 years working in the technology infrastructure domain, always looking for opportunities to challenge approach and focusing on value and quality of the outcomes. Today Dean oversees the Developer Platform Engineering group at Fidelity International... Read More →

Thursday April 3, 2025 11:45 - 12:15 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Any

14:15 BST

Set Your Developers Free: Fleet Management at Spotify - Tim Hansen, Spotify

Thursday April 3, 2025 14:15 - 14:45 BST

Level 0 | ICC Capital Hall | Room 2

Migrations, security patches, and dependency upgrades are a necessary toil, but not one that your developers have to suffer through. Learn about Spotify’s approach to managing its fleet of over 10,000 software components — and how we patched the Log4J vulnerability across most of our software in 6 hours.

Fleet Management has freed our developers to focus on impactful software development — rather than the toil of dependency upgrades and migrations. Through automation, our percentage of software that’s up-to-date jumped from 10% to 80%, and security vulnerabilities were cut in half. Spotify orchestrates hundreds of changes, across thousands of repositories, and releases them to production — all without developer intervention.

Speakers

Tim Hansen

Staff Engineer, Spotify

Tim is a staff engineer at Spotify who works in the Platform organization to decrease infrastructure toil for Spotify developers, focused on the open-source Backstage platform. Prior to this, he worked in FinOps at Spotify, focused on reducing cloud infrastructure costs.

Thursday April 3, 2025 14:15 - 14:45 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Any

15:00 BST

Breaking Free From the Cloud: Banking on Self-Hosted Kubernetes - Kārlis Akots Gribulis & Per Hedegaard Christiansen, Saxo Bank

Thursday April 3, 2025 15:00 - 15:30 BST

Level 0 | ICC Capital Hall | Room 2

What drives a global investment bank to transition from managed cloud Kubernetes service to self-hosted on-premises solution? While managed Kubernetes in the cloud can simplify deployments they do often come with significant trade-offs. At Saxo Bank, we made the decision to regain control by shifting to a self-hosted, on-premises Kubernetes platform.

This session will unpack our motivations, such as decreasing costs by 80%, reducing cluster creation time fifteenfold, and improving our CIS benchmark standing by 30%. We’ll dive into the architecture we adopted, the lessons learned from overcoming performance and resilience challenges, and how this change has impacted our infrastructure into positioning Kubernetes as Saxo Bank’s cornerstone for the future.

Speakers

Per Hedegaard Christiansen

Head of Container Platform Engineering, Saxo Bank

Passionate about container technology and always eager to explore new tech stacks. With extensive experience in Docker, Kubernetes, and microservices, I design and optimize scalable, secure container environments. Constantly learning and embracing cutting-edge tools, I thrive in agile... Read More →

Kārlis Akots Gribulis

Senior Container Platform Engineer, Saxo Bank

Kārlis Akots Gribulis has hands-on experience working across various companies in the cloud-native space. Throughout his career, he has been deeply involved in deploying, managing, and optimizing Kubernetes clusters, helping organizations harness the full power of cloud-native technologies... Read More →

Thursday April 3, 2025 15:00 - 15:30 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Intermediate

16:00 BST

How We Progressively Deliver Changes To Kubernetes Using Canary Deployments and Feature Flags - Bob Walker, Octopus Deploy

Thursday April 3, 2025 16:00 - 16:30 BST

Level 0 | ICC Capital Hall | Room 2

This is the case study of how we changed how we ship software.

With thousands of customers, each in their own Kubernetes container, deploying updates was tough. Off-hours schedules meant it took over 24 hours to push a new version. If something broke, we had to scramble. Canary deployments let us update small groups of customers at a time. We built a tool to stop rollouts fast when issues appeared, limiting the damage.

In the past, new features went to everyone at once. Rolling back wasn't an option. If something failed it'd leave customers stuck in the mess. Now, using OpenFeature, we hide new functionality behind feature flags. We release features to small groups, gather feedback, and test internally for weeks. If things go wrong, we flip the flag off and move on.

This two-pronged approach lets us avoid risky big-bang releases. We went from deploying every 10 days to every 4, with fewer than 1% high-severity defects. Most of these are resolved before customers notice them.

Speakers

Bob Walker

Field CTO, Octopus Deploy

Bob Walker is a Field CTO Octopus Deploy. Bob started as a developer in the early days of .NET when web forms were the hottest new thing, and manual deployments were the norm. After one too many five-hour 2 AM Saturday deployments, he searched for any automation to stop that pain... Read More →

Thursday April 3, 2025 16:00 - 16:30 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Intermediate

16:45 BST

How Do You Measure Developer Productivity? - Jennifer Riggins, The New Stack; Cat Morris, Syntasso; Akshaya Aradhya, Oscilar; Laura Tacho, DX; Helen Greul, Multiverse.io

Thursday April 3, 2025 16:45 - 17:15 BST

Level 0 | ICC Capital Hall | Room 2

Engineering is a science, so we know we can't improve what we don't measure. But many ways of measuring developer productivity focus too much on output, and aren’t trusted by developers.
So how should we measure developer productivity, and quantify the impact of processes, tools, Gen AI and culture on the developer experience (DevEx)?
Then, how do you take this data and turn it into something that's actionable and effective? Should we collect quantitative vs qualitative measurements? What about business impact? Cognitive load? Is there a way to measure the maturity of your platform strategy?
Join this panel to learn how from those who have been working with a Platform-as-a-Product mindset for years now. Join Multiverse's (ex-Backstage) Helen Greul, Oscilar’s (ex-GitHub, Netflix) Akshaya Aradhya, DX's Laura Tacho and Syntasso's Cat Morris in this epic panel hosted by The New Stack's Jennifer Riggins.

Speakers

Jennifer Riggins

Technology Journalist, The New Stack

Jennifer Riggins is a tech storyteller, journalist, writer, and event and podcast host, helping to share the stories where culture and technology collide and to translate the impact of the tech we are building. She has been a working writer since 2003, and is currently based in L... Read More →

Cat Morris

Staff Product Manager, Syntasso

Cat is the Product Manager at Syntasso delivering Kratix, an open-source cloud-native framework for building internal platforms. She has worked in tech for over 10 years, the last 6 have been in Platform Engineering across all kinds of domains. She specialises in bringing Product... Read More →

Helen Greul

VP Engineering at Multiverse, Multiverse.io

Helen is an engineering leader, speaker and a strong advocate for creating developer ecosystems that empower teams to thrive. Her journey has taken her from hands-on coding to steering engineering and platform teams, providing her with a holistic perspective on the challenges and... Read More →

Akshaya Aradhya

VP of Engineering, Oscilar

Akshaya is a seasoned engineering executive with deep, technical knowledge about data, cloud, platform, machine learning, AI and infrastructure. Prior to joining Oscilar, she had worked at companies like GitHub, Netflix, LiveRamp and Intuit.She is passionate about building high performing... Read More →

Laura Tacho

CTO, DX

Laura Tacho is CTO at DX, a developer intelligence platform. She previously led teams at companies like CloudBees, Aula Education, and Nova Credit, and is a Docker Captain alumni.

Thursday April 3, 2025 16:45 - 17:15 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Any

17:30 BST

Cloudy With a Chance of Kubernetes: Going From One To Three Cloud Providers - Laurent Bernaille & Maxime Visonneau, Datadog

Thursday April 3, 2025 17:30 - 18:00 BST

Level 0 | ICC Capital Hall | Room 2

Over the past five years, Datadog expanded from operating in a single region to six regions across three cloud providers. Kubernetes facilitated this expansion by abstracting the differences between cloud environments. However, we encountered several interesting challenges as some implementation details leaked through the abstraction.

This talk will begin with our rationale for adopting a multi-cloud strategy and the constraints it introduced. We will then share our insights on leveraging Kubernetes, the disparities among cloud provider implementations, and how these inconsistencies sometimes breached the Kubernetes abstraction. Finally, we will discuss how our platform teams created additional abstractions hiding most of these differences and the few remaining details that we have to expose to teams deploying on our platform.

Speakers

Maxime Visonneau

Engineering Manager, Datadog

Maxime is an experienced systems and software engineer known for his passion in building robust infrastructures for small to large businesses. Having successfully led his startup to acquisition by Twitter in 2021. He is currently leading teams in charge of the Kubernetes platform... Read More →

Laurent Bernaille

Principal Engineer, Datadog

Laurent Bernaille worked several years as a consultant specializing in cloud, containers, and automation and helped organizations migrate to the public cloud and adopt containers. He is now Principal Engineer at Datadog and works closely with infrastructure teams, which are responsible... Read More →

Thursday April 3, 2025 17:30 - 18:00 BST
Level 0 | ICC Capital Hall | Room 2

Platform Engineering

Content Experience Level Any

11:00 BST

Kubernetes and AI To Protect Our Forests: A Cloud Native Infrastructure for Wildfire Prevention - Andrea Giardini, Crossover Engineering BV

Friday April 4, 2025 11:00 - 11:30 BST

Level 0 | ICC Capital Hall | Room 2

As wildfires become increasingly devastating due to climate change, leveraging technology for environmental protection is crucial. This talk focuses on the infrastructure needed to support AI-driven wildfire prevention systems using Kubernetes and cloud-native technologies. We will discuss the challenges of managing robust data pipelines for processing satellite imagery and environmental data, emphasizing the importance of GPU acceleration for AI. Additionally, we will explore strategies for efficient storage solutions to handle large datasets, ensuring scalability and performance. Attendees will gain insights into the architectural considerations and operational challenges of deploying an effective, resilient wildfire monitoring and prevention infrastructure. Join us in understanding how we can harness the power of technology to protect our forests and mitigate the impact of wildfires on our environment.

Speakers

Andrea Giardini

Cloud Native Consultant / Trainer, Crossover Engineering

Andrea is a technical consultant passionate about infrastructure, cloud, and automation. Throughout his career, he has worked in different roles, from an individual contributor building infrastructure as code to an engineering manager growing a team from the ground up. He likes... Read More →

Friday April 4, 2025 11:00 - 11:30 BST
Level 0 | ICC Capital Hall | Room 2

AI + ML

Content Experience Level Intermediate

11:45 BST

Kubernetes Meets Climate Science: Building Large-scale Feature Detection From Climate Data Records - Armagan Karatosun & Roope Tervo, European Organisation for the Exploitation of Meteorological Satellites

Friday April 4, 2025 11:45 - 12:15 BST

Level 0 | ICC Capital Hall | Room 2

The Exponential growth of Earth Observation (EO) data volumes in the past decade has made downloading and processing EO data locally impractical. In response, the European public space sector launched initiatives to provide private cloud infrastructure, like the European Weather Cloud (EWC), allowing users to provision computing resources close to the data.

Leveraging these new possibilities introduced by cloud services and machine learning, the hydro-meteorological community has initiated projects to identify features from remote sensing data, including satellite imagery, to enhance early weather warnings and climate science. EUMETSAT and its Member States are now developing a collaborative environment within EWC for manual annotation, model development, and analyses to provide reliable feature identification from EO data.

Join us in our session to learn more about our solution, involving an environment for data preparation, community annotation tools, and a features database.

Speakers

Roope Tervo

European Weather Cloud service coordinator, EUMETSAT

Software professional with special interests are in Clouds, AI, ML, Open Data, APIs, team management, architecture and spatial services.

Armagan Karatosun

Cloud Data Services Expert, EUMETSAT (European Organisation for the Exploitation of Meteorological Satellites)

Armagan Karatosun (He/him), holds an MSc in High-Performance Computing from Istanbul Technical University with 6+ years of industry experience. As a Cloud Data Services Expert at EUMETSAT, he specializes in crafting cloud-based solutions. His focus is on creating resilient and event-driven... Read More →

Friday April 4, 2025 11:45 - 12:15 BST
Level 0 | ICC Capital Hall | Room 2

AI + ML

Content Experience Level Intermediate

13:45 BST

Optimizing Model Serving on Kubernetes With Model Streaming - Ekin Karabulut & Ronen Dar, Run:ai

Friday April 4, 2025 13:45 - 14:15 BST

Level 0 | ICC Capital Hall | Room 2

Deploying large language models in Kubernetes environments faces a critical challenge: the cold start problem.When auto-scaling workloads with tools like Knative, the latency from loading large model weights into GPU memory slows response times, degrades performance, and increases costs.Traditional methods rely on loading weights sequentially into CPU memory then to the GPU,which is slow and inefficient.This talk introduces Run:ai Model Streamer, an open-source tool that mitigates cold starts by streaming model weights to GPU memory while reading them from storage in parallel.It integrates seamlessly into inference engine containers and Kubernetes workflows, enabling parallelized weight streaming without modifying weight formats, making it an easy-to-adopt solution for Kubernetes-based AI deployments.We’ll share benchmarking results comparing storage backends like GP3 SSDs, IO2 SSDs, and S3, highlighting performance improvements, cost savings, and best practices from these experiments.

Speakers

Ekin Karabulut

Data Scientist & Developer Advocate, Run:ai

Ekin is a data scientist at Run:ai. She specialized in the privacy implications of federated learning with DNNs. Through her journey, she focused on distributed training techniques and observed inefficiencies in GPU usage both in research and industry settings. She thus established... Read More →

Ronen Dar

CTO and Co-Founder, Run:ai

Ronen Dar, PhD, is the co-founder and CTO of Run:ai. Ronen has been responsible for building the Run:ai Atlas platform and the technology that powers the platform, from GPU API-level virtualization to advanced K8s-based scheduling capabilities.

Friday April 4, 2025 13:45 - 14:15 BST
Level 0 | ICC Capital Hall | Room 2

AI + ML

Content Experience Level Any

14:30 BST

Unlocking How To Efficiently, Flexibly, Manage and Schedule Seven AI Chips in Kubernetes - Xiao Zhang, DaoCloud & Mengxuan Li, The 4th paradigm, Ltd

Friday April 4, 2025 14:30 - 15:00 BST

Level 0 | ICC Capital Hall | Room 2

There are more and more AI accelerator manufacturers emerged in recent years. Data centers often face scenarios where multiple AI accelerators from different vendors exist at the same time, such as Nvidia and AMD, Intel, etc..
Therefore, managing these heterogeneous devices face bigger challenges. The CNCF sandbox project HAMi (Heterogeneous AI Computing Virtualization Middleware) was officially born for this purpose.
This session will focus on efficiently managing heterogeneous AI chips through HAMi in Kubernetes clusters
* A unified scheduler which capable of topology-aware, numa-aware, supports binpack and spread schedule policy on 7 AI accelerators.
* Virtualization on 6 AI accelerators
* Task priority
* Memory oversubscription on k8s GPU tasks
* Observability in two dimensions: allocated resources and real usage
* HAMi+Volcano/Koordinator for collaborative orchestration and scheduling capabilities on AI batch tasks
* HAMi+Kueue for practice in training and inference scenarios

Speakers

xiaozhang

Senior Software Engineer, DaoCloud

Xiao Zhang is the leader of the Container team (focus on infra, AI, Multi-Cluster, Cluster - LCM, OCI). He is also an active community contributor and cloud native enthusiast. He is currently a member of Kubernetes / Kubernetes-sigs, maintainer of Karmada, kubean, HAMi, and cloudtty... Read More →

Mengxuan Li

System Architect, The 4th paradigm, Ltd

Member of volcano community Founder and maintainer of CNCF sandbox project HAMi Responsible for the development of gpu virtualization mechanism on volcano. It have been merged in the master branch of volcano, and will be released in v1.8.

Friday April 4, 2025 14:30 - 15:00 BST
Level 0 | ICC Capital Hall | Room 2

AI + ML

Content Experience Level Any

15:15 BST

How To Supercharge AI/ML Observability With OpenTelemetry and Fluent Bit - Celalettin Calis, Chronosphere

Friday April 4, 2025 15:15 - 15:45 BST

Level 0 | ICC Capital Hall | Room 2

Keeping AI/ML models performant and reliable in production is no small task—especially when running on Kubernetes. Effective monitoring and observability are key to ensuring these systems deliver results at scale.

This session explores how to build an advanced open source observability stack tailored for AI/ML workloads using Fluent Bit and OpenTelemetry. We’ll cover:

- Logging and debugging popular models like GPT, BERT, and custom LLMs.
- Tracking prompts and their results to gain actionable insights.
- Monitoring agent performance in production environments.

Complementing OpenTelemetry’s robust tracing and error stack trace capabilities with Fluent Bit’s resource-efficient log processing, live tail, and metrics scraping creates a comprehensive observability solution tailored for AI/ML workloads. If you’re an AI/ML practitioner working with Kubernetes, this talk will equip you with the strategies and tools you need to enhance your system’s reliability and performance.

Speakers

Celalettin Calis

Member of Technical Staff, Chronosphere

Celalettin Calis is a Member of Technical Staff at Chronosphere. His career includes significant roles at Calyptia and SAP, where he focused on Kubernetes platform engineering, developing CI/CD pipelines, and managing containerized environments. As a cloud-native expert, he has extensive... Read More →

Friday April 4, 2025 15:15 - 15:45 BST
Level 0 | ICC Capital Hall | Room 2

AI + ML

Content Experience Level Intermediate