← Serch more jobs

ML Data Engineer – Healthcare Data Curation & Cleaning (1 Year Fixed Term)

LinkedIn Inside Higher Ed Stanford, CA
Not Applicable Posted March 14, 2026 Job link
Thinking about this job
Not Met Priorities
What still needs stronger evidence
Requirements
  • Data Pipeline Engineering:
  • ML Data Engineering:
  • 3+ years of experience in software development and data engineering with a strong focus on data cleaning, transformation, and creation.
  • Proficiency in Python and experience with data processing libraries (e.g., Pandas, Polars, NumPy).
  • Hands-on experience in building and maintaining automated data pipelines for large-scale data processing.
  • Familiarity with machine learning frameworks (e.g., PyTorch, JAX, scikit-learn) as applied to data quality and augmentation tasks.
  • Expertise in working with healthcare data, including familiarity with the OMOP Common Data Model (OMOP CDM).
  • Strong experience in a Linux environment and comfort with UNIX command-line tools.
  • Proven ability to work collaboratively in multidisciplinary teams and communicate technical concepts effectively.
  • Experience with cloud platforms (e.g., GCP, AWS, or Azure) and distributed computing frameworks.
  • Proficiency with version control systems (e.g., Git) and containerization tools (e.g., Docker).
  • Familiarity with healthcare data standards and regulatory requirements.
  • Knowledge of key data structures algorithms, and techniques pertinent to systems that support high volume, velocity, or variety datasets (including data mining, machine learning, NLP, data retrieval).
  • Experience with relational, NoSQL, or NewSQL database systems and data modeling, structured and unstructured.
  • Experience in parallel and distributed data processing techniques and platforms (MPI, Map/Reduce, Batch).
  • Experience in scripting languages and experience in debugging them, experience with high performance/systems languages and techniques.
  • Knowledge of benchmark software development and programmable fields/systems, ability to analyze systems and data pipelines and propose solutions that leverage emerging technologies.
  • Ability to use and integrate security controls for web applications, mobile platforms, and backend systems.
  • Experience deploying reliable data systems and data quality management.
  • Ability to research, evaluate, architect, and deploy new tools, frameworks, and patterns to build scalable Big Data platforms.
  • Ability to document use cases, solutions and recommendations.
Preferred Skills
  • 3+ years of experience in software development and data engineering with a strong focus on data cleaning, transformation, and creation.
  • Proficiency in Python and experience with data processing libraries (e.g., Pandas, Polars, NumPy).
  • Hands-on experience in building and maintaining automated data pipelines for large-scale data processing.
  • Familiarity with machine learning frameworks (e.g., PyTorch, JAX, scikit-learn) as applied to data quality and augmentation tasks.
  • Expertise in working with healthcare data, including familiarity with the OMOP Common Data Model (OMOP CDM).
  • Strong experience in a Linux environment and comfort with UNIX command-line tools.
  • Proven ability to work collaboratively in multidisciplinary teams and communicate technical concepts effectively.
  • Experience with cloud platforms (e.g., GCP, AWS, or Azure) and distributed computing frameworks.
  • Proficiency with version control systems (e.g., Git) and containerization tools (e.g., Docker).
  • Familiarity with healthcare data standards and regulatory requirements.
  • Bachelor’s degree in scientific or analytic field and five years of relevant experience, or a combination of education and relevant experience.
  • Knowledge of key data structures algorithms, and techniques pertinent to systems that support high volume, velocity, or variety datasets (including data mining, machine learning, NLP, data retrieval).
  • Experience with relational, NoSQL, or NewSQL database systems and data modeling, structured and unstructured.
  • Experience deploying reliable data systems and data quality management.
Education
  • (Required) – Education & Experience (required)
  • (Not required) – Bachelor’s degree in scientific or analytic field and five years of relevant experience, or a combination of education and relevant experience.