Official role
Founding Inference Systems Engineer
Location: San Francisco / Bay Area preferred. Remote exceptional for the right person.
About the Role
This role is for a senior systems builder who can make production inference behavior understandable and actionable. You will work around serving engines, caches, routing, workload replay, and latency or cost tradeoffs, but the broader mission is to turn messy runtime behavior into clear evidence. Experience with vLLM, SGLang, LMCache, TensorRT-LLM, or MAX is helpful, but learning speed and systems judgment matter more.
What You'll Do
- Study inference workloads and identify the serving behaviors that matter for cost, latency, reliability, and utilization.
- Build instrumentation, experiments, and comparison artifacts around systems such as vLLM, SGLang, TensorRT-LLM, MAX, LMCache, and related runtimes when relevant.
- Reason about batching, routing, prefix caching, KV cache behavior, prefill/decode behavior, throughput, and tail latency at the right level of detail.
- Create reproducible workload replay and evaluation artifacts for internal and customer decisions.
- Partner across kernel, observability, product, and customer workstreams.
Skills and Qualifications
Minimum qualifications
- Fast learner with strong distributed systems, backend, or performance fundamentals.
- AI-native builder who uses AI tools to learn runtimes, prototype instrumentation, debug systems, and ship faster.
- Evidence-backed debugging discipline, with comfort turning runtime behavior into clear measurements and artifacts.
- Distributed, backend, or systems engineering experience.
- Experience with Python, C++, Rust, Go, or similar, or clear ability to learn the needed tool quickly.
- Strong Linux, profiling, and debugging fundamentals.
- Experience operating or improving production systems.
Preferred qualifications
- Experience with LLM serving engines, KV cache, or prefix caching.
- Exposure to CUDA, NCCL, Kubernetes, Slurm, or load testing.
- Experience with ML systems papers, reproducible benchmarks, or OSS infrastructure.
Logistics
- Full-time founding role.
- San Francisco / Bay Area preferred. Remote exceptional for the right person.
- Details discussed during process.
When applying, select "Founding Inference Systems Engineer" in the Role or lane field.