Senior Speech & Audio ML Engineer
Mappa
Software Engineering, Data Science
Latina, Province of Latina, Italy
USD 2,500-5,500 / month
Posted on Aug 28, 2025
Job Title: Senior Speech & Audio ML Engineer
Location: Remote
Type: Full-time
Salary Range: $2,500–5,500 USD / month
Role Purpose
We are looking for a Senior ML Engineer to build and ship core models for a speech-driven behavioral engine. You will own end-to-end modeling from raw, long-form audio and layered annotations to production inference.
Responsibilities include:
- Designing audio features and embeddings.
- Training and evaluating a suite of models.
- Delivering reproducible pipelines that meet targets for accuracy, robustness, latency, and cost.
Non-Negotiables
- Experience: 5+ years building production ML systems, including 2+ years in speech/audio.
- Speech & Signal Processing: VAD, diarization, segmentation, denoising, spectral features (log-mel/MFCC), prosody (pitch/energy), long-form audio handling.
- SOTA Audio Models & Embeddings: Wav2Vec2, HuBERT, wavLM (or similar); fine-tuning/self-supervised learning; contrastive/metric learning for downstream tasks.
- Data Engineering & Quality: SQL, Python data stack (Pandas/Polars), ETL for audio + metadata, stratified sampling, leakage prevention, feature stores.
- Evaluation Discipline: Golden sets, robust speaker/content splits, ROC/PR/calibration, fairness/bias checks, ablations, drift/shift detection on embeddings and audio quality.
- MLOps, Serving & Reproducibility: FastAPI/gRPC around HF/torchaudio models, experiment tracking (W&B/MLflow), artifact/model versioning, CI/CD, observability, scalable batch/streaming inference.
- Proven ability to create and document novel IP (methods, architectures, or training/eval techniques) with clear prior-art awareness.
Nice to Have
- Tooling: SpeechBrain, Lightning, OpenSMILE/Praat, Kaldi/Conformer/Emformer, Label Studio.
- Multimodal Skills: ASR (e.g., Whisper) + paralinguistic features; emotion/prosody modeling; speaker embeddings (x-vectors, ECAPA-TDNN).
- Performance & Deployment: Quantization/distillation, Triton/CUDA basics, distributed training, real-time/streaming inference, on-device DSP (Rust/C++).
- Publications/Patents/Competitions: Demonstrating novel audio modeling work.
Details
- Full time
- Payment in USD [5000-5500 USD]
- Remote