Hiring · Senior founding role

Founding Inference Systems Engineer

Location: San Francisco / Bay Area preferred. Remote exceptional for the right person.

We look for fast learners with high agency, AI-native workflows, clear technical communication, and evidence-backed judgment first. Exact serving-engine keywords are not required. We can teach specific tools if you can learn fast, reason clearly, and ship evidence-backed systems.

Touchdown Labs helps teams make AI workloads cheaper and easier to own by turning inference behavior, traces, workload replay, GPU signals, and task-path evidence into reusable optimization artifacts.


Official role

Founding Inference Systems Engineer

Location: San Francisco / Bay Area preferred. Remote exceptional for the right person.

About the Role

This role is for a senior systems builder who can make production inference behavior understandable and actionable. You will work around serving engines, caches, routing, workload replay, and latency or cost tradeoffs, but the broader mission is to turn messy runtime behavior into clear evidence. Experience with vLLM, SGLang, LMCache, TensorRT-LLM, or MAX is helpful, but learning speed and systems judgment matter more.

What You'll Do

  • Study inference workloads and identify the serving behaviors that matter for cost, latency, reliability, and utilization.
  • Build instrumentation, experiments, and comparison artifacts around systems such as vLLM, SGLang, TensorRT-LLM, MAX, LMCache, and related runtimes when relevant.
  • Reason about batching, routing, prefix caching, KV cache behavior, prefill/decode behavior, throughput, and tail latency at the right level of detail.
  • Create reproducible workload replay and evaluation artifacts for internal and customer decisions.
  • Partner across kernel, observability, product, and customer workstreams.

Skills and Qualifications

Minimum qualifications
  • Fast learner with strong distributed systems, backend, or performance fundamentals.
  • AI-native builder who uses AI tools to learn runtimes, prototype instrumentation, debug systems, and ship faster.
  • Evidence-backed debugging discipline, with comfort turning runtime behavior into clear measurements and artifacts.
  • Distributed, backend, or systems engineering experience.
  • Experience with Python, C++, Rust, Go, or similar, or clear ability to learn the needed tool quickly.
  • Strong Linux, profiling, and debugging fundamentals.
  • Experience operating or improving production systems.
Preferred qualifications
  • Experience with LLM serving engines, KV cache, or prefix caching.
  • Exposure to CUDA, NCCL, Kubernetes, Slurm, or load testing.
  • Experience with ML systems papers, reproducible benchmarks, or OSS infrastructure.

Logistics

  • Full-time founding role.
  • San Francisco / Bay Area preferred. Remote exceptional for the right person.
  • Details discussed during process.

When applying, select "Founding Inference Systems Engineer" in the Role or lane field.