← Serch more jobs

Staff Software Engineer - AI/ML Platform

LinkedIn GEICO Palo Alto, CA
Not Applicable Posted March 14, 2026 Job link
Thinking about this job
Not Met Priorities
What still needs stronger evidence
Requirements
  • 8+ years of software engineering experience with focus on infrastructure, platform engineering, or MLOps
  • 3+ years of hands-on experience with machine learning infrastructure and deployment at scale
  • 2+ years of experience working with Large Language Models and transformer architectures
  • Proficient in Python; strong skills in Go, Rust, or Java preferred
  • Proven experience working with open source LLMs (Llama 2/3, Qwen, Mistral, Gemma, Code Llama, etc.)
  • Proficient in Kubernetes including custom operators, helm charts, and GPU scheduling
  • Deep expertise in Azure services (AKS, Azure ML, Container Registry, Storage, Networking)
  • Experience implementing and operating feature stores (Chronon, Feast, Tecton, Azure ML Feature Store, or custom solutions)
  • Hands-on experience with inference optimization using vLLM, TensorRT-LLM, Triton Inference Server, or similar
  • DevOps & Platform Skills
  • Advanced experience with Azure DevOps, GitHub Actions, Jenkins, or similar CI/CD platforms
  • Proficiency with Terraform, ARM templates, Pulumi, or CloudFormation
  • Deep understanding of Docker, container optimization, and multi-stage builds
  • Experience with Prometheus, Grafana, ELK stack, Azure Monitor, and distributed tracing
  • Knowledge of both SQL and NoSQL databases, data warehousing, and vector databases
  • Leadership & Soft Skills
  • Demonstrated track record of mentoring engineers and leading technical initiatives
  • Experience leading design reviews with focus on compliance, performance, and reliability
  • Excellent ability to explain complex technical concepts to diverse audiences
  • Strong analytical and troubleshooting skills for complex distributed systems
  • 8+ years of platform engineering or infrastructure experience
  • Background in regulated industries with understanding of data privacy requirements
Preferred Skills
  • Proficient in Python; strong skills in Go, Rust, or Java preferred
  • Knowledge of both SQL and NoSQL databases, data warehousing, and vector databases
  • Excellent ability to explain complex technical concepts to diverse audiences
  • Strong analytical and troubleshooting skills for complex distributed systems
  • Experience managing cross-functional technical projects and coordinating with multiple stakeholders
  • Advanced Experience
  • Master’s degree in computer science, Machine Learning, or related field
  • 8+ years of platform engineering or infrastructure experience
  • Experience with Staff Engineer or Tech Lead roles in ML/AI organizations
  • Background in distributed systems and high-performance computing
  • Open-source contributions to ML infrastructure projects or LLM frameworks
  • Multi-Cloud Experience: Hands-on experience with Azure, AWS (SageMaker, EKS) and/or GCP (Vertex AI, GKE)
  • Experience with specialized hardware (A100s, H100s, TPUs, TEEs) and optimization
  • RLHF & Fine-tuning: Experience with Reinforcement Learning from Human Feedback and LLM fine-tuning workflows
  • Experience with Milvus, Pinecone, Weaviate, Qdrant, or similar vector storage solutions
  • Deep experience with MLflow, Kubeflow, DataRobot, or similar platforms
  • Industry Knowledge
  • Understanding of AI safety principles, model governance, and regulatory compliance
  • Background in regulated industries with understanding of data privacy requirements
  • Experience supporting ML research teams and academic partnerships
  • Deep understanding of GPU optimization, memory management, and high-throughput systems
Education
  • (Not required) – Bachelor’s degree in computer science, Engineering, or related technical field (or equivalent experience)
  • (Not required) – Advanced Experience
  • (Not required) – Master’s degree in computer science, Machine Learning, or related field
  • (Not required) – 8+ years of platform engineering or infrastructure experience
  • (Not required) – Experience with Staff Engineer or Tech Lead roles in ML/AI organizations
  • (Not required) – Background in distributed systems and high-performance computing
  • (Not required) – Open-source contributions to ML infrastructure projects or LLM frameworks
  • (Not required) – Multi-Cloud Experience: Hands-on experience with Azure, AWS (SageMaker, EKS) and/or GCP (Vertex AI, GKE)
  • (Not required) – Experience with specialized hardware (A100s, H100s, TPUs, TEEs) and optimization
  • (Not required) – RLHF & Fine-tuning: Experience with Reinforcement Learning from Human Feedback and LLM fine-tuning workflows
  • (Not required) – Experience with Milvus, Pinecone, Weaviate, Qdrant, or similar vector storage solutions
  • (Not required) – Deep experience with MLflow, Kubeflow, DataRobot, or similar platforms
  • (Not required) – Understanding of AI safety principles, model governance, and regulatory compliance
  • (Not required) – Experience supporting ML research teams and academic partnerships