Senior Machine Learning Engineer

Mappa

Mappa

Software Engineering
United States
USD 4k-5,500 / month
Posted on Sep 6, 2025

Job description

Job Title: Senior Speech & Audio ML Engineer

Location: Remote

Type: Full-time

Salary Range: 4,000 - 5,500 USD / month + commissions

What you will do:

Build and ship core ML models for a speech-driven behavioral engine. Own end-to-end modeling from raw, long-form audio and layered annotations to production inference. Design audio features/embeddings, train and evaluate a suite of models, and deliver reproducible pipelines that meet accuracy, robustness, latency, and cost targets.

Your skill set and experience:

  • 5+ years building production ML systems, including 2+ years in speech/audio.
  • Speech & signal processing: VAD, diarization, segmentation, denoising, spectral features (log-mel/MFCC), prosody (pitch/energy), long-form audio handling.
  • SOTA audio models & embeddings: Wav2Vec2, HuBERT, wavLM (or similar); fine-tuning/self-supervised learning; contrastive/metric learning for downstream tasks.
  • Data excellence: SQL, Python data stack (Pandas/Polars), ETL for audio+metadata, stratified sampling, leakage prevention, feature stores.
  • ML Training: PyTorch, Hugging Face Transformers/Hub, mixed precision, hyperparameter tuning, transfer learning, cross-validation.
  • Evaluation discipline: golden sets, robust speaker/content splits, ROC/PR/calibration, fairness/bias checks, ablations, drift/shift detection on embeddings and audio quality.
  • MLOps, serving & reproducibility: FastAPI/gRPC around HF/torchaudio models, experiment tracking (W&B/MLflow), artifact/model versioning, CI/CD, observability, scalable batch/streaming inference.
  • Proven ability to create and document novel IP (methods, architectures, or training/eval techniques) with clear prior-art awareness.

Nice to have:

  • Tooling: SpeechBrain, Lightning, OpenSMILE/Praat, Kaldi/Conformer/Emformer, Label Studio.
  • Multimodal: ASR (e.g., Whisper) + paralinguistic features; emotion/prosody modeling; speaker embeddings (x-vectors, ECAPA-TDNN).
  • Performance & deployment: quantization/distillation, Triton/CUDA basics, distributed training, real-time/streaming inference, on-device DSP (Rust/C++).
  • Publications/patents/competition results demonstrating novel audio modeling work.

Details

  • Salary Range: 4,000 - 5,500 USD / month + commissions
  • Full time