Notre client :
Our client is a technology-focused company building high-performance, real-time ML inference systems. The team develops ultra-low-latency engines that process billions of requests per day, integrating ML models with business-critical decision-making pipelines. They are looking for an experienced backend engineer to own and scale production-grade ML services with strong focus on latency, reliability, and observability.
Vos tâches :
- Lead the design and development of low-latency ML inference services handling massive request volumes.
- Build and scale real-time decision-making engines, integrating ML models with business logic under strict SLAs.
- Collaborate closely with data scientists to deploy ML models seamlessly and reliably in production.
- Design systems for model versioning, shadowing, and A/B testing at runtime.
- Ensure high availability, scalability, and observability of production systems.
- Continuously optimize latency, throughput, and cost-efficiency using modern tools and techniques.
- Work independently while collaborating with cross-functional teams including Algo, Infrastructure, Product, Engineering, and Business stakeholders.
Expérience et compétences requises :
- B.Sc. or M.Sc. in Computer Science, Software Engineering, or related technical field.
- 5+ years of experience building high-performance backend or ML inference systems.
- Expert in Python and experience with low-latency APIs and real-time serving frameworks (e.g., FastAPI, Triton Inference Server, TorchServe, BentoML).
- Experience with scalable service architectures, message queues (Kafka, Pub/Sub), and asynchronous processing.
- Strong understanding of model deployment, online/offline feature parity, and real-time monitoring.
- Experience with cloud environments (AWS, GCP, OCI) and container orchestration (Kubernetes).
- Familiarity with in-memory and NoSQL databases (Aerospike, Redis, Bigtable) for ultra-fast data access.
- Experience with observability stacks (Prometheus, Grafana, OpenTelemetry) and alerting/diagnostics best practices.
- Strong ownership mindset and ability to deliver solutions end-to-end.
- Passion for performance, clean architecture, and impactful systems.
Ce serait un plus :
- Prior experience leading high-throughput, low-latency ML systems in production.
- Knowledge of real-time feature pipelines and streaming data platforms.
- Familiarity with advanced monitoring and profiling techniques for ML services.
Conditions de travail
Semaine de travail de 5 jours, journée de travail de 8 heures, horaire flexible ;