← Serch more jobs

AIML - Staff ML System Engineer, ML Platform Technologies (MLPT)

LinkedIn Apple Santa Clara, CA
Not Applicable Posted March 13, 2026 Job link
Thinking about this job
Not Met Priorities
What still needs stronger evidence
Requirements
  • Experience building large-scale deep learning infrastructure or platforms for distributed model training
  • Experience with large-scale AI training infra components, such as accelerators, network fabrics, CUDA, NCCL, RDMA
  • Strong programming skills in Python or Go
  • Understanding of data structures, software design principles, and algorithms
  • Experience building large-scale distributed systems with tools such as Kubernetes, Kafka, Prometheus, etc.
  • Experience with deep learning frameworks, such as PyTorch, or JAX
  • With minimum of 7+ years of industry experience
Preferred Skills
  • Experience working with public cloud vendors such as AWS, GCP, Azure.
  • Experience developing model parallel and data parallel training solutions and other training optimizations
  • Familiarity with recent developments in foundation model architectures for language and multimodal
  • Publication record at ML conferences such as MLSys, NeurIPS, etc.
  • Advance degree in the area of Computer Science or equivalent, or a related
Education
  • (Not required) – Bachelors degree in the area of Computer Science or equivalent, or a related domain
  • (Not required) – Advance degree in the area of Computer Science or equivalent, or a related