← Serch more jobs

AI Cloud Infrastructure Engineer

LinkedIn Scout AI Sunnyvale, CA
Mid-Senior level Posted March 13, 2026 Job link
Thinking about this job
Not Met Priorities
What still needs stronger evidence
Requirements
  • Stay current on best practices in MLOps, distributed training frameworks, and AI infrastructure at scale Qualifications
  • 3+ years of experience in ML infrastructure, MLOps, or large-scale data systems
  • Proven experience with distributed training (PyTorch DDP, DeepSpeed, Ray, or similar) and workflow orchestration (Kubernetes, Airflow, or equivalent)
  • Strong proficiency in Python and cloud-native infrastructure (AWS, GCP, or Azure)
  • Deep understanding of data engineering (ETL pipelines, object storage, data versioning, metadata management)
  • Familiarity with containerization and deployment (Docker, Kubernetes) and monitoring systems (Prometheus, Grafana)
  • Experience optimizing GPU cluster utilization, scaling training jobs, and profiling model performance
  • Must be a U.S.
Preferred Skills
  • Stay current on best practices in MLOps, distributed training frameworks, and AI infrastructure at scale Qualifications
  • Proven experience with distributed training (PyTorch DDP, DeepSpeed, Ray, or similar) and workflow orchestration (Kubernetes, Airflow, or equivalent)
  • Experience optimizing GPU cluster utilization, scaling training jobs, and profiling model performance
  • Bonus: Experience with edge-deployed ML systems, federated training, or robotic data collection pipelines
Education
  • (Not required) – Bachelor’s degree or higher in Computer Science, Electrical Engineering, or related technical field
  • (Not required) – Must be a U.S.