← Serch more jobs

Senior/Staff Software Engineer, Super Compute Memory

LinkedIn Pryon Boston, MA
Not Applicable Posted April 5, 2026 Job link
Thinking about this job
Not Met Priorities
What still needs stronger evidence
Requirements
  • Extensive experience in software development, with a proven track record of delivering complex, large-scale systems (8+ years for Senior, 12+ years for Staff)
  • Proven experience building distributed systems at 100M+ scale (documents, vectors, or equivalent)
  • Deep knowledge of parallel and distributed computing concepts including consensus algorithms, distributed coordination, and fault tolerance
  • Hands-on experience with vector databases (pgvector, Pinecone, Weaviate, Milvus, or equivalent)
  • Proficiency in systems programming languages such as C++, Go, or Rust
  • Experience with parallel programming models (e.g., MPI, OpenMP, CUDA)
  • Production experience optimizing GPU workloads for inference including batch optimization, quantization (INT8, FP16), and GPU memory management
  • Experience managing large-scale GPU infrastructure (10+ GPUs in production) including cluster orchestration, resource scheduling, and cost optimization
  • Deep understanding of GPU architectures (NVIDIA A100/H100, tensor cores) and inference frameworks (vLLM, TensorRT, Triton)
  • Deep understanding of memory hierarchies, cache optimization, and NUMA architectures
  • Experience with container orchestration (Kubernetes) and distributed computing frameworks (Ray, Dask, Spark, or equivalent)
  • Familiarity with performance analysis and optimization tools and techniques (profilers, tracers, benchmarking)
  • Strong systems programming background with evidence of performance-critical contributions (open source, papers, or production systems)
Preferred Skills
  • Experience with cloud-based HPC, including services on AWS (EC2 P4/P5 instances), GCP (A100/H100 VMs), or Azure (ND-series)
  • Knowledge of networking and storage technologies in the context of high-performance computing (RDMA, NVMe, distributed filesystems, GPU-Direct Storage)
  • Advanced GPU optimization experience including multi-GPU inference (model parallelism, pipeline parallelism), mixed-precision training/inference, and GPU profiling tools (NVIDIA Nsight, nvprof, PyTorch Profiler)
  • Experience with ML infrastructure including model serving frameworks (vLLM, TensorRT-LLM, Triton Inference Server), GPU resource management (NVIDIA MIG, GPU time-slicing), and inference optimization (continuous batching, speculative decoding)
  • Production experience with GPU monitoring and observability (DCGM, GPU metrics dashboards, cost-per-query optimization)
  • Background in information retrieval or vector search (FAISS, HNSW, IVF indices, approximate nearest neighbor algorithms)
  • Production experience with object storage (MinIO, S3, GCS) at petabyte scale
  • Familiarity with specific technologies: Kafka, PostgreSQL, pgvector, Kubernetes, FluxCD, Yugabyte
  • Contributions to open-source projects in the HPC, distributed systems, or vector search space
  • Experience with on-premises enterprise deployments and air-gapped environments (government/defense sector)