ML Engineer - Voice Pipeline & data architecture
Software Engineering, IT, Data Science
Latin America
Job Description
- Full-Time
- Remote
About the role
We are looking for a Senior Machine Learning Engineer to own the evolution of our voice ML system end to end.
We have a working voice model in production. The next step is to make the system more scalable, measurable, efficient, and easier to evolve. This role will own the connection between training data, model development, inference performance, and production architecture.
You will be responsible for improving how we collect and version training data, how we evaluate and ship models, how we optimize inference cost and speed, and how we simplify the overall ML pipeline.
This is a hands-on role for someone who has built production ML systems, understands audio and voice pipelines, and can make architecture decisions across data, modeling, and serving.
What you’ll do
- Own the training data pipeline end to end: collection, validation, labeling, versioning, and dataset creation.
- Build systems that allow training data to scale as the product and data volume grow.
- Identify what data gaps, quality issues, or labeling problems are limiting model performance.
- Train, evaluate, deploy, and maintain production ML models.
- Compare new models against the current production model and decide when they are ready to ship.
- Define what should be measured across accuracy, latency, cost, and failure modes.
- Run experiments to understand whether improvements should come from better data, model architecture, preprocessing, or serving changes.
- Work with audio and voice data, including speech-to-text outputs, speaker diarization, audio preprocessing, and feature extraction.
- Debug the full voice pipeline, including issues from Deepgram output, audio quality, preprocessing, model behavior, and final output.
- Benchmark and improve inference speed, cost, and reliability.
- Experiment with smaller, faster, or more efficient models using techniques like quantization, pruning, distillation, or architecture changes.
- Own or strongly influence the inference path from Deepgram to model output.
- Simplify the ML infrastructure and reduce unnecessary operational complexity.
- Make the pipeline easier to trace, debug, maintain, and evolve.
- Decide where the system should go next: better datasets, cohort-specific training, specialized models, model optimization, or architecture changes.
What we’re looking for
Must have
- Strong production ML experience: you have trained, evaluated, deployed, and maintained models in production.
- Deep understanding of PyTorch or TensorFlow.
- Strong Python skills for training scripts, data pipelines, experimentation, debugging, and automation.
- Experience designing ML data pipelines for collection, validation, labeling, versioning, and serving training data.
- Strong understanding of data quality and how bad data affects model performance.
- Experience working with audio, speech, or voice systems.
- Understanding of audio preprocessing, speaker diarization, speech-to-text APIs, and audio feature extraction.
- Ability to debug where signal is being lost across the voice pipeline.
- Experience with model optimization and inference efficiency.
- Hands-on experience with quantization, pruning, distillation, architecture search, or smaller model architectures.
- Ability to benchmark accuracy, latency, cost, and performance tradeoffs.
- SQL skills to query databases, inspect available data, and understand what data is available for training and evaluation.
- Ability to think architecturally across the full ML system: data, training, evaluation, serving, cost, and operations.
Nice to have
- Experience with Deepgram or similar speech-to-text providers.
- Experience owning or improving inference pipelines.
- Experience reducing ML infrastructure cost or operational complexity.
- Familiarity with Modal, containerization, async jobs, model serving, and production monitoring.
- Experience with cohort-specific training, specialized models, or model routing.
- Experience simplifying complex ML pipelines.
How you work
- You think in systems, not patches.
- You look for the root cause of a problem instead of only fixing the symptom.
- You prefer simple, traceable pipelines over complex systems that are hard to operate.
- You remove unnecessary steps and reduce moving parts.
- You are comfortable making decisions when the data is incomplete.
- You run experiments to answer questions instead of waiting for perfect requirements.
- You are rigorous about measurement and do not ship models without comparing them to production.
- You care about accuracy, latency, cost, reliability, and failure modes.
- You ask what is actually limiting the system before deciding what to optimize.
- You use AI tools to prototype, debug, explore data, and move faster, while understanding their limitations.
Ideal candidate
The ideal candidate is a senior ML engineer or ML systems engineer with strong experience in production ML, voice or audio processing, data pipeline architecture, and inference optimization.
You should be able to own a working voice model system and make it better across the full stack: better data, faster model iteration, clearer evaluation, cheaper inference, simpler infrastructure, and stronger production reliability.
This role is for someone who can move between model development, data engineering, audio processing, and ML systems architecture — and who can decide what the next technical step should be, not just execute predefined tasks.