nous recherchons

Senior/Lead Python Engineer (ML Infrastructure) A distance Temps plein

Notre client :

Our client is a technology-focused company building high-performance, real-time ML inference systems. The team develops ultra-low-latency engines that process billions of requests per day, integrating ML models with business-critical decision-making pipelines. They are looking for an experienced backend engineer to own and scale production-grade ML services with strong focus on latency, reliability, and observability.

Vos tâches :

Lead the design and development of low-latency ML inference services handling massive request volumes.
Build and scale real-time decision-making engines, integrating ML models with business logic under strict SLAs.
Collaborate closely with data scientists to deploy ML models seamlessly and reliably in production.
Design systems for model versioning, shadowing, and A/B testing at runtime.
Ensure high availability, scalability, and observability of production systems.
Continuously optimize latency, throughput, and cost-efficiency using modern tools and techniques.
Work independently while collaborating with cross-functional teams including Algo, Infrastructure, Product, Engineering, and Business stakeholders.

Expérience et compétences requises :

B.Sc. or M.Sc. in Computer Science, Software Engineering, or related technical field.
5+ years of experience building high-performance backend or ML inference systems.
Expert in Python and experience with low-latency APIs and real-time serving frameworks (e.g., FastAPI, Triton Inference Server, TorchServe, BentoML).
Experience with scalable service architectures, message queues (Kafka, Pub/Sub), and asynchronous processing.
Strong understanding of model deployment, online/offline feature parity, and real-time monitoring.
Experience with cloud environments (AWS, GCP, OCI) and container orchestration (Kubernetes).
Familiarity with in-memory and NoSQL databases (Aerospike, Redis, Bigtable) for ultra-fast data access.
Experience with observability stacks (Prometheus, Grafana, OpenTelemetry) and alerting/diagnostics best practices.
Strong ownership mindset and ability to deliver solutions end-to-end.
Passion for performance, clean architecture, and impactful systems.