← Serch more jobs

Senior DevOps/Platform Engineer III

LinkedIn Pacific Northwest National Laboratory Seattle, WA
Not Applicable Posted March 27, 2026 Job link
Thinking about this job
Not Met Priorities
What still needs stronger evidence
Requirements
  • Infrastructure Automation & Platform Engineering
  • Demonstrated proficiency in Python and working knowledge of at least one additional language (C#/.NET, Go, C++) for infrastructure automation and tooling development
  • Knowledge of Infrastructure as Code principles and tools including Terraform, CloudFormation, Pulumi, or ARM templates with emphasis on modular, reusable code patterns
  • Ability to design, implement, and maintain sophisticated CI/CD pipelines across multiple environments using tools such as Jenkins, GitLab CI, GitHub Actions, or Azure DevOps
  • Proficiency with version control workflows (Git), GitOps methodologies, automated testing frameworks for infrastructure code, and policy-as-code practices with consistent use of AI assist tools (e.g., Claude, GitHub Copilot) to accelerate automation and troubleshooting
  • Cloud & Container Orchestration
  • Demonstrated experience designing and managing infrastructure across cloud platforms (AWS, Azure, or GCP) with multi-cloud experience highly valued
  • Strong expertise with containerization technologies (Docker) and container orchestration platforms (Kubernetes, EKS, AKS, or GKE) including advanced concepts like operators, custom resources, and cluster management
  • Ability to design and implement event-driven architectures using cloud-native services (AWS EventBridge, Azure Event Grid, Pub/Sub) and messaging systems with understanding of service mesh technologies (Istio, Linkerd) and API gateway patterns
  • Knowledge of networking concepts in cloud and containerized environments including CNI plugins, ingress controllers, load balancing, and service discovery with familiarity in edge computing deployments and hybrid cloud architectures
  • Observability, Reliability & Security
  • Ability to implement comprehensive observability solutions including metrics collection (Prometheus, CloudWatch), distributed tracing (Jaeger, Tempo), and centralized logging (ELK Stack, Loki, Splunk)
  • Understanding of Site Reliability Engineering (SRE) principles including SLOs, SLIs, error budgets, and incident response with ability to design and implement chaos engineering practices to improve system resilience
  • Experience implementing security best practices including secrets management (Vault, AWS Secrets Manager), vulnerability scanning, and DevSecOps tooling
  • Knowledge of disaster recovery strategies, backup automation, and business continuity planning with understanding of compliance frameworks and ability to implement automated compliance controls
  • Data Platform Operations & ML Infrastructure
  • Understanding of cloud-native data pipeline architectures and ETL/ELT orchestration (AWS Glue, Azure Data Factory, Airflow, Prefect) with ability to build and maintain infrastructure supporting ML pipelines, model training workflows, and MLOps practices
  • Knowledge of deploying and operating cloud-based data storage systems and platforms (S3, Redshift, Delta Lake, PostgreSQL, MongoDB, OpenSearch, Snowflake)
  • Understanding of distributed data processing frameworks (Spark/Databricks, Kafka, Flink) with experience operating Kubernetes-based platforms for data workloads including Spark on K8s, Ray clusters, or Kubeflow
  • Ability to implement infrastructure supporting large-scale data systems with appropriate monitoring, cost optimization, and performance tuning including storage tiering, data lifecycle management, and compute resource optimization
  • Collaboration & Operations
  • Strong problem-solving abilities with experience troubleshooting complex distributed systems spanning applications, infrastructure, and data layers
  • Excellent communication skills to collaborate effectively with software engineers, data scientists, security teams, and business stakeholders with ability to create clear, comprehensive documentation for infrastructure designs, runbooks, and disaster recovery procedures
  • Demonstrated capacity to manage multiple infrastructure initiatives simultaneously while maintaining high availability and reliability standards with proven ability to mentor team members on DevOps practices and operational excellence
  • Experience participating in on-call rotations, incident response, and post-mortem processes with ability to balance tactical operational needs with strategic infrastructure improvements
  • Applying image classification for nuclear forensics analysis [Link]
  • PhD and 1 year of software engineering experience -OR
  • MS/MA and 3 years of software engineering experience -OR
  • BS/BA and 5 years of software engineering experience -OR
  • AA and 14 years of software engineering experience in designing, architecting, programming, deploying, and automating software solutions in support of scientific research or consumer digital product development -OR
  • HS/GED and 16 years of software engineering experience in designing, architecting, programming, deploying, and automating software solutions in support of scientific research or consumer digital product development
  • Proven expertise in Python and proficiency in at least one other language (C#/.NET, C++, Go)
  • 3-5 years of hands-on DevOps, Platform Engineering, Site Reliability Engineering, or Infrastructure Engineering experience
  • This position requires the ability to obtain and maintain a federal security clearance.
  • All tiers of investigation include a declaration of illegal drug activities, including use, supply, possession, or manufacture within the last 1 to 7 years (depending on the applicable tier of investigation).
Preferred Skills
  • Knowledge of disaster recovery strategies, backup automation, and business continuity planning with understanding of compliance frameworks and ability to implement automated compliance controls
  • Data Platform Operations & ML Infrastructure
  • Understanding of cloud-native data pipeline architectures and ETL/ELT orchestration (AWS Glue, Azure Data Factory, Airflow, Prefect) with ability to build and maintain infrastructure supporting ML pipelines, model training workflows, and MLOps practices
  • Knowledge of deploying and operating cloud-based data storage systems and platforms (S3, Redshift, Delta Lake, PostgreSQL, MongoDB, OpenSearch, Snowflake)
  • Understanding of distributed data processing frameworks (Spark/Databricks, Kafka, Flink) with experience operating Kubernetes-based platforms for data workloads including Spark on K8s, Ray clusters, or Kubeflow
  • Applying image classification for nuclear forensics analysis [Link]
  • HS/GED and 16 years of software engineering experience in designing, architecting, programming, deploying, and automating software solutions in support of scientific research or consumer digital product development
  • Degree in computer science, software engineering, or related field
  • Demonstrated ability to contribute to technical direction and independently structure complex problems into actionable work, in collaboration with senior engineers and cross-functional teams
  • Proven expertise in Python and proficiency in at least one other language (C#/.NET, C++, Go)
  • 3-5 years of hands-on DevOps, Platform Engineering, Site Reliability Engineering, or Infrastructure Engineering experience
  • Demonstrated contributions to open-source infrastructure projects or active participation in DevOps communities
Education
  • (Not required) – PhD and 1 year of software engineering experience -OR
  • (Not required) – MS/MA and 3 years of software engineering experience -OR
  • (Not required) – BS/BA and 5 years of software engineering experience -OR
  • (Not required) – AA and 14 years of software engineering experience in designing, architecting, programming, deploying, and automating software solutions in support of scientific research or consumer digital product development -OR
  • (Not required) – HS/GED and 16 years of software engineering experience in designing, architecting, programming, deploying, and automating software solutions in support of scientific research or consumer digital product development
  • (Not required) – Degree in computer science, software engineering, or related field