All Webinars

Scaling AI Reliability: Real world lessons from Mistral AI

Learn how one of the world's leading AI companies monitors its infrastructure, manages incidents, and prepares for a future where agents respond to pages before humans do.

January 26, 2026

Watch Now

When you're running AI infrastructure at scale, reliability isn't optional, it's existential. In this session, Devon Mizelle, Senior Site Reliability Engineer at Mistral AI, pulls back the curtain on how his team ensures uptime and performance across a rapidly evolving model ecosystem.

Devon shares the real story behind Mistral's transition to a fully automated monitoring and alerting workflow, including how they dynamically generate synthetic checks for every model the moment it goes live. No manual configuration. No forgotten monitors. No inconsistent alerting thresholds.

Beyond the technical implementation, this conversation explores where observability is headed in the AI era. What happens when an agent gets paged before a human? How close are we to truly self-healing systems? And what does this mean for SREs who want to stay relevant?

What you'll learn:

  • How Mistral AI automatically generates and maintains monitoring for every model in production
  • The monitoring-as-code approach that eliminated manual Check configuration
  • Why consistent alerting thresholds matter, and how to enforce them at scale
  • Real talk on AI SRE: when agents can (and can't) resolve incidents autonomously
  • The future of on-call: from pager fatigue to pager-to-agent workflows

Who you'll hear from:

Sylvain Kalache

Sylvain Kalache

Head of AI Labs

Rootly AI

Giovanni Rago

Giovanni Rago

Head of Customer Solutions

Checkly

Devon Mizelle

Devon Mizelle

Sr. SRE

Mistral AI