← Serch more jobs

Solutions Architect, Infrastructure

LinkedIn NVIDIA Redmond, WA
Not Applicable Posted April 2, 2026 2 variants Job link
Thinking about this job
Not Met Priorities
What still needs stronger evidence
Requirements
  • 4+ years experience in Solutions Architecture, Infrastructure Engineering, or similar technical roles.
  • Hands‑on experience with bring‑up and validation of large‑scale NVIDIA GPU platforms, including multi‑GPU and multi‑node architectures.
  • Understanding of high‑performance networking technologies (e.g., RDMA, congestion control, high‑bandwidth interconnects) and their role in distributed AI workloads.
  • Familiarity with NVIDIA system software stacks: CUDA, NCCL, NVSwitch/NVLink, driver behavior, and performance tuning.
  • Proficiency with Linux systems tools for identifying issues and evaluating system performance, such as: dmesg, journalctl, lspci, numactl, ethtool, iostat, perf, nvidia-smi, top/htop, ipmitool, container‑level tooling, and related utilities.
  • Understanding of server hardware architecture, including PCIe topologies, system firmware, NUMA, BIOS/UEFI configuration, power/thermal envelopes, and memory/subsystem behavior.
  • Understanding of BMC/IPMI/Redfish for remote management, hardware health monitoring, and out‑of‑band debugging during early‑stage bring‑up.
  • Strong Linux fundamentals across drivers, kernel subsystems, cgroups, containers, and node‑level performance analysis.
  • Ability to identify performance bottlenecks at the cluster, node, accelerator, network, or application layer.
  • Ways To Stand Out From The Crowd
  • Outstanding interpersonal skills and the ability to build clarity and direction across diverse, fast paced technical teams.
  • Knowledge of Compute and networking infrastructure (e.g., Instance types, networking primitives, high‑performance communication paths etc) at Hyperscalers or Cloud Service Providers.
  • Demonstrated leadership resolving multi‑team infrastructure challenges across engineering, product, and customer groups.
  • A consistent record of taking GPU or infrastructure products from pilot to high‑volume deployment in large data center environments.
  • Familiarity with modern deep learning, LLM architectures, and distributed training/inference challenges at scale.
Preferred Skills
  • Outstanding interpersonal skills and the ability to build clarity and direction across diverse, fast paced technical teams.
  • Knowledge of Compute and networking infrastructure (e.g., Instance types, networking primitives, high‑performance communication paths etc) at Hyperscalers or Cloud Service Providers.
  • Demonstrated leadership resolving multi‑team infrastructure challenges across engineering, product, and customer groups.
  • A consistent record of taking GPU or infrastructure products from pilot to high‑volume deployment in large data center environments.
  • Familiarity with modern deep learning, LLM architectures, and distributed training/inference challenges at scale.
Education
  • (Not required) – BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or similar, or equivalent experience.