Senior ML Ops Engineer
Mappa
Operations, Software Engineering, Data Science
Latin America
USD 2k-2k / month
Posted on Sep 18, 2025
Senior ML Ops Engineer
- Full-Time
- Remote
- 2.000 USD
Languages requested
English advanced
Job description
Own model serving for multiple LLM/speech models on Modal. Build and maintain the APIs around those models. Create the feedback/eval loop to improve quality while meeting strict latency/cost SLOs.
Responsibilities
- Host and scale real-time & batch inference on Modal (autoscaling, images/volumes/secrets).
- Operate a multi-model fleet (versioning, routing, canaries/blue-green, traffic shaping).
- Ship endpoints; auth, RBAC, quotas, rate limits, telemetry.
- Implement feedback pipelines, online A/B evals, and guardrails with actionable alerts.
- Drive performance: profiling, batching, quantization, KV-cache, runtime tuning.
- Establish observability and reliability (OTel, metrics/logs, SLOs, runbooks, on-call).
- CI/CD and IaC for reproducible builds and one-click rollbacks.
Must-haves
- 5+ years in ML Ops/Platform/SRE with production LLM/ML serving.
- Strong Python; high-throughput async APIs (FastAPI/Starlette) and GitHub-based CI/CD.
- Deep experience with vLLM, TensorRT-LLM, Triton, or ONNX Runtime.
- Hands-on with Modal or equivalent GPU/k8s platform.
- Solid observability (OTel) and incident response/postmortems.
Preferred
- ONNX export expertise (PyTorch→ONNX), quantized/dynamic graphs, custom ops.
- Safety/guardrails and constrained decoding.
- Systems perf (CUDA/Triton kernels) or Rust for hot paths; load/chaos testing.