Mid-Senior level
Posted March 14, 2026
Job link
Thinking about this job
Responsibilities
Responsibilities
- Design & scale data pipelines using Airflow/Dagster + dbt to ingest, transform, and serve healthcare data (claims, denials, insurance reimbursements, patient billing).
- Own data infrastructure: Reliability, quality, observability, and cost optimization for distributed systems handling massive datasets.
- Build production data systems for AI/ML products—coordinating real-time processing, batch jobs, and feature stores.
- Optimize database systems (SQL/NoSQL) for performance at scale, working with engineering/ML/product teams to meet SLAs.
- Enable team self-service: Democratize data access while maintaining governance and lineage.
Not Met Priorities
What still needs stronger evidence
Requirements
- What We're Looking For Experience (3+ years Data/ML Engineering):
- VC-backed startup (Series A+) OR fast-moving tech (Meta, Palantir, etc.)
- Built distributed data pipelines processing high-volume, complex datasets
- Python proficiency (data engineering, scripting, tooling)
- Expert : Airflow or Dagster + dbt for orchestration/transformation
- Experience with distributed systems (Spark, Kafka, Ray, etc.)
- SQL mastery; cloud data warehouses (Snowflake, BigQuery, Redshift)
- Bonus: ML pipelines (MLflow, Kubeflow), healthcare formats (HL7, FHIR)
Preferred Skills
- What We're Looking For Experience (3+ years Data/ML Engineering):
- VC-backed startup (Series A+) OR fast-moving tech (Meta, Palantir, etc.)
- Healthcare data strongly preferred (claims processing, HIPAA compliance bonus) Technical Requirements:
- Python proficiency (data engineering, scripting, tooling)
- Expert : Airflow or Dagster + dbt for orchestration/transformation
- Experience with distributed systems (Spark, Kafka, Ray, etc.)
- SQL mastery; cloud data warehouses (Snowflake, BigQuery, Redshift)
- Bonus: ML pipelines (MLflow, Kubeflow), healthcare formats (HL7, FHIR)
The Role Own the end-to-end data infrastructure powering our AI-driven revenue analytics platform. As a Senior/Staff-level Data Engineer, you'll design scalable pipelines processing massive healthcare claims datasets, enabling ML models and product teams to deliver insights that directly recover millions for practices. What You'll Do
Design & scale data pipelines using Airflow/Dagster + dbt to ingest, transform, and serve healthcare data (claims, denials, insurance reimbursements, patient billing).
Own data infrastructure: Reliability, quality, observability, and cost optimization for distributed systems handling massive datasets.
Build production data systems for AI/ML products—coordinating real-time processing, batch jobs, and feature stores.
Optimize database systems (SQL/NoSQL) for performance at scale, working with engineering/ML/product teams to meet SLAs.
Enable team self-service: Democratize data access while maintaining governance and lineage. What We're Looking For Experience (3+ years Data/ML Engineering):
VC-backed startup (Series A+) OR fast-moving tech (Meta, Palantir, etc.)
Built distributed data pipelines processing high-volume, complex datasets
Healthcare data strongly preferred (claims processing, HIPAA compliance bonus) Technical Requirements:
Python proficiency (data engineering, scripting, tooling)
Expert : Airflow or Dagster + dbt for orchestration/transformation
Experience with distributed systems (Spark, Kafka, Ray, etc.)
SQL mastery; cloud data warehouses (Snowflake, BigQuery, Redshift)
Bonus: ML pipelines (MLflow, Kubeflow), healthcare formats (HL7, FHIR)
Design & scale data pipelines using Airflow/Dagster + dbt to ingest, transform, and serve healthcare data (claims, denials, insurance reimbursements, patient billing).
Own data infrastructure: Reliability, quality, observability, and cost optimization for distributed systems handling massive datasets.
Build production data systems for AI/ML products—coordinating real-time processing, batch jobs, and feature stores.
Optimize database systems (SQL/NoSQL) for performance at scale, working with engineering/ML/product teams to meet SLAs.
Enable team self-service: Democratize data access while maintaining governance and lineage. What We're Looking For Experience (3+ years Data/ML Engineering):
VC-backed startup (Series A+) OR fast-moving tech (Meta, Palantir, etc.)
Built distributed data pipelines processing high-volume, complex datasets
Healthcare data strongly preferred (claims processing, HIPAA compliance bonus) Technical Requirements:
Python proficiency (data engineering, scripting, tooling)
Expert : Airflow or Dagster + dbt for orchestration/transformation
Experience with distributed systems (Spark, Kafka, Ray, etc.)
SQL mastery; cloud data warehouses (Snowflake, BigQuery, Redshift)
Bonus: ML pipelines (MLflow, Kubeflow), healthcare formats (HL7, FHIR)