Lead Data Engineer

Mappa

Mappa

Software Engineering, Data Science
USD 2,800-6k / month
Posted on Aug 1, 2025

Job description

About the Role

We are looking for a Senior or Lead Data Engineer to be the founding member of our new data team. In this high-impact role, you’ll report directly to the CTO and play a critical part in shaping the future of data at our company. You will work across three distinct product lines to standardize and unify data pipelines, processes, and governance. This is a unique opportunity to build from the ground up and influence data architecture and strategy at a multi-product SaaS company.

Responsibilities

  • Design, build, and optimize scalable data pipelines using Apache Spark, Kafka, and related tools
  • Develop and orchestrate data workflows using Apache Airflow
  • Architect and manage infrastructure in AWS, including services like S3, Kinesis, Lambda, EMR, Glue, and Redshift
  • Build and support real-time data pipelines and event-driven systems for streaming data ingestion and processing
  • Establish standardized data ingestion, transformation, and storage processes across multiple product lines
  • Work closely with product and engineering teams to unify and document data models and schemas
  • Ensure data quality, lineage, and observability across all pipelines
  • Lead data governance, compliance, and security practices from day one
  • Mentor future hires and define engineering standards as the data team grows

Qualifications

  • 5+ years of experience in data engineering, ideally in SaaS or multi-product environments
  • Expertise in AWS services relevant to data infrastructure (S3, Glue, Redshift, EMR, Lambda, Kinesis, etc.)
  • Strong proficiency in Python for building pipelines, ETL/ELT jobs, and infrastructure automation
  • Deep hands-on experience with Apache Spark, Kafka, and the broader Apache ecosystem
  • Proven success designing and supporting real-time data streams and batch workflows
  • Solid experience with Airflow or equivalent orchestration tools
  • Strong understanding of distributed systems, data architecture, and scalable design patterns
  • Passion for clean, maintainable code and building robust, fault-tolerant systems

Nice to Have

  • Experience with Delta Lake, Apache Iceberg, or Apache Hudi
  • Familiarity with containerized workloads (Docker, Kubernetes, ECS/EKS)
  • Background in building internal data platforms or centralized data lakes
  • Experience supporting data science and BI/analytics teams

Knowledge of data privacy, compliance, and security practices (e.g., SOC 2, GDPR)

Details

  • Payment in USD [2800-6000 USD]
  • Full time