← Serch more jobs

Principal Software Engineer

LinkedIn Microsoft AI Redmond, WA
Not Applicable Posted March 26, 2026 Job link
Thinking about this job
Not Met Priorities
What still needs stronger evidence
Requirements
  • Experience with model compression (quantization, distillation, SVD, low‑rank methods).
  • Experience in building high‑throughput inference serving stacks (continuous batching, KV‑cache optimizations, routing).
  • Familiarity with Microsoft’s DLIS, Talon routing, Triton/TensorRT‑LLM stack, and Azure/H100/A100 GPU environments.
  • Solid experience in GPU inference optimization (CUDA, TensorRT, Triton, or custom GPU kernels).
  • Proficiency in profiling tools (Nsight, TensorBoard, PyTorch profiler) and ability to identify CPU/GPU bottlenecks.
  • Deep understanding of LLM/SLM architectures (attention, embeddings, MoE, decoders).
  • Experience optimizing latency‑critical online services.
  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
  • These requirements include but are not limited to the following specialized security screenings:
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred Skills
  • Experience with model compression (quantization, distillation, SVD, low‑rank methods).
  • Experience in building high‑throughput inference serving stacks (continuous batching, KV‑cache optimizations, routing).
  • Familiarity with Microsoft’s DLIS, Talon routing, Triton/TensorRT‑LLM stack, and Azure/H100/A100 GPU environments.
  • Publications, competition wins, or real‑world deployments related to model efficiency.
  • Solid experience in GPU inference optimization (CUDA, TensorRT, Triton, or custom GPU kernels).
  • Proficiency in profiling tools (Nsight, TensorBoard, PyTorch profiler) and ability to identify CPU/GPU bottlenecks.
  • Deep understanding of LLM/SLM architectures (attention, embeddings, MoE, decoders).
  • Experience optimizing latency‑critical online services.
Education
  • (Not required) – Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • (Not required) – Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • (Not required) – OR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.