Mistral AI Paris , TX 75460
Posted 1 week ago
Mistral AI is looking for a Site Reliability Engineer (SRE) to shape reliability, scalability, and performance of our platform and customer facing applications. You will work closely with our software engineers to ensure our systems meet and exceed our customers' expectations.
Responsibilities
Make sure our inference and platform resources are always available and in good shape
Ensure our products are reliable and ensure SLAs
Design, build, and maintain scalable, highly available, and fault-tolerant standard and AI infrastructure to support our machine learning workloads and services
Implement and improve monitoring, alerting, and incident response systems to ensure optimal system performance and minimize downtime
Develop and maintain comprehensive documentation for infrastructure designs, processes, and best practices
Participate in on-call rotations to respond to incidents and perform root cause analysis to prevent future occurrences
Drive continuous improvement in infrastructure automation, deployment, and orchestration using tools like Kubernetes, Flux, Terraform, …
Collaborate with the security team to ensure infrastructure adheres to best security practices and compliance requirements
Evaluate and implement new tools, technologies, and processes to enhance our AI infrastructure's efficiency, reliability, and scalability
About you :
5+ years of experience in SW Engineering
Key technical skills: observability/alerting/operational maintenance
Familiar with bare Kubernetes/Grafana/Prometheus
Experience building cross datacenter & highly available distributed systems
Experience profiling & optimizing stacks to the millisecond
Good programming skills in one language (Python/Go/C++/Rust)
Master's degree in Computer Science, Engineering, or a related field, or equivalent experience.
Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role, ideally in an AI/ML-focused environment.
Strong understanding of AI/ML infrastructure requirements
Experience with containerization and orchestration technologies like Docker and Kubernetes.
Familiarity with infrastructure-as-code tools such as Terraform
Solid understanding of cloud computing platforms like AWS, GCP, or Azure.
Experience with monitoring, logging, and alerting tools like Prometheus, Grafana, ELK Stack, …
Strong problem-solving skills and the ability to work independently and collaboratively in a fast-paced environment.
Excellent communication skills, both written and verbal.
What We Offer:
Ability to shape the exciting journey of AI and be part of the very early days of one of Europe's hottest startup
A fun, young, multicultural team and collaborative work environment - based in Paris and London
Competitive salary and bonus structure
Comprehensive benefits package
Opportunities for professional growth and development
We're a small team, composed of seasoned researchers and engineers in the AI field. We like to work hard and be at the edge of science. We are creative, low-ego, team-spirited, and have been passionate about AI for years. We hire people that foster in competitive environments, because they find them more fun to work in. We hire passionate women and men from all over the world.
Developers are using our API via la Plateforme to build incredible AI-first applications powered by our models that can understand and generate natural language text and code. We are multilingual at our core. More recently, we released le Chat, as a demonstrator of our models.
Mistral AI