Senior Machine Learning Engineer
Mappa
Software Engineering
United States
USD 4k-5,500 / month
Posted on Sep 6, 2025
Job description
Job Title: Senior Speech & Audio ML Engineer
Location: Remote
Type: Full-time
Salary Range: 4,000 - 5,500 USD / month + commissions
What you will do:
Build and ship core ML models for a speech-driven behavioral engine. Own end-to-end modeling from raw, long-form audio and layered annotations to production inference. Design audio features/embeddings, train and evaluate a suite of models, and deliver reproducible pipelines that meet accuracy, robustness, latency, and cost targets.
Your skill set and experience:
- 5+ years building production ML systems, including 2+ years in speech/audio.
- Speech & signal processing: VAD, diarization, segmentation, denoising, spectral features (log-mel/MFCC), prosody (pitch/energy), long-form audio handling.
- SOTA audio models & embeddings: Wav2Vec2, HuBERT, wavLM (or similar); fine-tuning/self-supervised learning; contrastive/metric learning for downstream tasks.
- Data excellence: SQL, Python data stack (Pandas/Polars), ETL for audio+metadata, stratified sampling, leakage prevention, feature stores.
- ML Training: PyTorch, Hugging Face Transformers/Hub, mixed precision, hyperparameter tuning, transfer learning, cross-validation.
- Evaluation discipline: golden sets, robust speaker/content splits, ROC/PR/calibration, fairness/bias checks, ablations, drift/shift detection on embeddings and audio quality.
- MLOps, serving & reproducibility: FastAPI/gRPC around HF/torchaudio models, experiment tracking (W&B/MLflow), artifact/model versioning, CI/CD, observability, scalable batch/streaming inference.
- Proven ability to create and document novel IP (methods, architectures, or training/eval techniques) with clear prior-art awareness.
Nice to have:
- Tooling: SpeechBrain, Lightning, OpenSMILE/Praat, Kaldi/Conformer/Emformer, Label Studio.
- Multimodal: ASR (e.g., Whisper) + paralinguistic features; emotion/prosody modeling; speaker embeddings (x-vectors, ECAPA-TDNN).
- Performance & deployment: quantization/distillation, Triton/CUDA basics, distributed training, real-time/streaming inference, on-device DSP (Rust/C++).
- Publications/patents/competition results demonstrating novel audio modeling work.
Details
- Salary Range: 4,000 - 5,500 USD / month + commissions
- Full time