← Serch more jobs

AI/ML Scientist – Protein Foundation Models

LinkedIn Manifold Bio Boston, MA
Not Applicable Posted April 5, 2026 Job link
Responsibilities

On the ML team at Manifold Bio, you'll lead and advance foundation model training efforts—pretraining, fine-tuning, and evaluating folding, docking, language, and generative design models on proprietary experimental data—while contributing deep expertise in training methodology, architecture selection, and optimization. You will develop and scale distributed multi-GPU/multi-node training pipelines, integrate model outputs into mBER to improve binder design and enable new capabilities, and design rigorous ML experiments with clear hypotheses, evaluation frameworks, and systematic analyses. You’ll also establish best practices for efficient large-scale training (e.g., mixed precision, gradient checkpointing) and produce clear documentation and analyses to support architecture and training decisions.

Not Met Priorities
What still needs stronger evidence
Requirements
  • Demonstrated experience pretraining and/or fine-tuning protein foundation models (folding, docking, language models, or generative design) with published or otherwise demonstrable results
  • Strong familiarity with AlphaFold architecture and training methodology
  • 2+ years of hands-on experience with PyTorch and/or JAX for deep learning
  • Experience with large-scale model training: distributed training, multi-GPU/multi-node setups, mixed precision, gradient checkpointing
  • Solid understanding of deep learning architectures (transformers, attention mechanisms, diffusion/flow matching) and optimization techniques
  • Experience working with protein structure data (PDB, mmCIF) and/or protein sequence datasets
  • Strong statistical analysis and experimental design skills
  • Proficiency in Python scientific computing stack (NumPy, Pandas, scikit-learn)
  • Self-directed researcher who can balance guidance with independence
  • Excellent written and verbal communication skills for cross-functional collaboration
Preferred Skills
  • Excellent written and verbal communication skills for cross-functional collaboration
  • Experience with protein generative design methods (e.g., RFdiffusion, ProteinMPNN, flow matching approaches)
  • Experience with protein language models (e.g., ESM family)
  • Published research in computational biology, protein design, or structural biology
  • Experience training on proprietary or domain-specific biological datasets
  • Familiarity with Ray for distributed computing
  • Experience with Kubernetes (EKS) and cloud computing platforms (AWS)
  • Knowledge of protein engineering, directed evolution, or structural biology wet lab techniques
  • Experience working with agentic AI coding tools for fast, parallelized execution of modeling experiments
  • Previous biotech/pharma industry experience
  • You have deep experience training protein foundation models and want to apply that expertise to some of the richest proprietary experimental datasets in the field