Senior ML Ops Engineer

Mappa

Mappa

Operations, Software Engineering, Data Science
Latin America
USD 2k-2k / month
Posted on Sep 18, 2025

Senior ML Ops Engineer

  • Full-Time
  • Remote
  • 2.000 USD

Languages requested

English advanced

Job description

Own model serving for multiple LLM/speech models on Modal. Build and maintain the APIs around those models. Create the feedback/eval loop to improve quality while meeting strict latency/cost SLOs.

Responsibilities

  • Host and scale real-time & batch inference on Modal (autoscaling, images/volumes/secrets).
  • Operate a multi-model fleet (versioning, routing, canaries/blue-green, traffic shaping).
  • Ship endpoints; auth, RBAC, quotas, rate limits, telemetry.
  • Implement feedback pipelines, online A/B evals, and guardrails with actionable alerts.
  • Drive performance: profiling, batching, quantization, KV-cache, runtime tuning.
  • Establish observability and reliability (OTel, metrics/logs, SLOs, runbooks, on-call).
  • CI/CD and IaC for reproducible builds and one-click rollbacks.

Must-haves

  • 5+ years in ML Ops/Platform/SRE with production LLM/ML serving.
  • Strong Python; high-throughput async APIs (FastAPI/Starlette) and GitHub-based CI/CD.
  • Deep experience with vLLM, TensorRT-LLM, Triton, or ONNX Runtime.
  • Hands-on with Modal or equivalent GPU/k8s platform.
  • Solid observability (OTel) and incident response/postmortems.

Preferred

  • ONNX export expertise (PyTorch→ONNX), quantized/dynamic graphs, custom ops.
  • Safety/guardrails and constrained decoding.
  • Systems perf (CUDA/Triton kernels) or Rust for hot paths; load/chaos testing.