DevOps Engineer

  • Mistral AI
  • Paris
  • Full-Time
  • Posted 1 months ago

Job Description

We are seeking our first DevOps Engineer.


Responsibilities
- Collaborate with AI/ML engineers and researchers to develop and implement a CI/CD that enables safe and reproducible experiments
- Enable seamless replication of work environment across several HPC clusters 
- Implement and maintain monitoring, logging and alerting systems for both our large training runs and our client-facing APIs
- Make sure training environments are always available and ready on several clusters
- Improve development processes while finding the right balance between rigor, speed and flexibility for software development & research organization
- Develop and own internal tooling
- Collaborate with our AI/ML engineers and data scientists to build and maintain a secure, scalable, and efficient infrastructure.
- Develop and implement CI/CD pipelines to streamline the evaluation and development of AI/ML models and other applications.
- Ensure compliance with security best practices and industry standards.
- Work closely with the development team to troubleshoot and resolve issues in production environments.
- Develop and maintain containerization and orchestration systems using tools like Docker and Kubernetes.
- Document processes and procedures to ensure consistency and knowledge sharing across the team


About you: 
- Master’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
- 3+ years of experience in a DevOps role, preferably in an AI/ML-focused environment.
- Strong experience with Kubernetes-based cloud computing
- Proficiency in scripting languages such as Python, Bash, or PowerShell.
- Experience with CI/CD tools like Jenkins, GitLab CI, or CircleCI.
- Experience with containerization and orchestration technologies such as Docker and Kubernetes.
- Strong knowledge of Python development good practices
- Having worked with GPUs before is a + but not required
- Familiarity with infrastructure-as-code tools like Terraform or CloudFormation.
- Knowledge of monitoring, logging, and alerting tools like Prometheus, Grafana, ELK Stack, or Datadog.
- You ideally have an experience in Slurm
- Strong understanding of networking, security, and system administration concepts.
- Excellent problem-solving and communication skills.
- Self-motivated and able to work well in a fast-paced startup environment.


What We Offer: 
- Ability to shape the exciting journey of AI and be part of the very early days of one of Europe’s hottest startup 
- A fun, young, multicultural team and collaborative work environment — based in Paris and London 
- Competitive salary and bonus structure 
- Comprehensive benefits package 
- Opportunities for professional growth and development