← Serch more jobs

Reliability Engineer, AI & Data Platforms (AiDP)

LinkedIn Apple Sunnyvale, CA
Not Applicable Posted March 13, 2026 Job link
Thinking about this job
Not Met Priorities
What still needs stronger evidence
Requirements
  • 3+ years of professional software engineering experience with large-scale big data platforms, including strong programming skills in Java, Scala, Python, or Go.
  • Proven expertise in designing, building, and operating large-scale distributed data processing systems with a strong focus on Apache Spark.
  • Hands-on experience with table formats and data lake technologies such as Apache Iceberg, ensuring scalability, reliability, and optimized query performance.
  • Skilled at coding for distributed systems and developing resilient data pipelines.
  • Strong background in incident management, including troubleshooting, root cause analysis, and performance optimization in complex production environments.
  • Proficient with Unix/Linux systems and command-line tools for debugging and operational support.
Preferred Skills
  • Expertise in designing, building, and operating critical, large-scale distributed systems with a focus on low latency, fault-tolerance, and high availability.
  • Experience with contribution to Open Source projects is a plus.
  • Experience with multiple public cloud infrastructure, managing multi-tenant Kubernetes clusters at scale and debugging Kubernetes/Spark issues.
  • Experience with workflow and data pipeline orchestration tools (e.g., Airflow, DBT).
  • Understanding of data modeling and data warehousing concepts.
  • Familiarity with the AI/ML stack, including GPUs, MLFlow, or Large Language Models (LLMs).
  • A learning attitude to continuously improve the self, team, and the organization.
  • Solid understanding of software engineering best practices, including the full development lifecycle, secure coding, and experience building reusable frameworks or libraries.