Hiring · Senior founding role

Founding Kernel Engineer, GPU Kernels

Location: San Francisco / Bay Area preferred. Remote exceptional for the right person.

We look for fast learners with high agency, AI-native workflows, clear technical communication, and evidence-backed judgment first. Exact CUDA, HIP, or architecture keyword match is not required. If your fundamentals and learning speed are strong, we can teach specific tools.

Touchdown Labs helps teams make AI workloads cheaper and easier to own by turning inference behavior, traces, workload replay, GPU signals, and task-path evidence into reusable optimization artifacts.


Official role

Founding Kernel Engineer, GPU Kernels

Location: San Francisco / Bay Area preferred. Remote exceptional for the right person.

About the Role

This is a performance engineering role for a senior builder who can learn unfamiliar hardware, reason from first principles, and produce trustworthy measurements. You will work near GPU kernels and numerical formats, but the core need is judgment: identify what matters, test it carefully, and turn low-level evidence into reusable optimization artifacts. Specific CUDA, HIP, Triton, Hopper, Blackwell, or CDNA experience is useful, not a hard gate.

What You'll Do

  • Investigate performance questions that affect inference cost, latency, and reliability.
  • Build or adapt kernels, microbenchmarks, and experiments when they are the right way to answer a question.
  • Work with technologies such as CUDA, HIP, Triton, WGMMA, TMA, tcgen05.mma, TMEM, MFMA, Hopper, Blackwell, CDNA, FP4, NVFP4, and other sub-byte formats where relevant.
  • Connect correctness, profiling, throughput, and system integration evidence into practical recommendations.
  • Translate low-level findings into artifacts the broader product and infrastructure can use.

Skills and Qualifications

Minimum qualifications
  • Fast learner with strong CS, systems, numerical, or performance fundamentals.
  • AI-native builder who uses AI tools to study hardware docs, prototype kernels, debug failures, and move faster while verifying results.
  • Evidence-backed measurement discipline, including careful profiling, benchmarking, and correctness checks.
  • GPU, kernel, compiler, numerical, or performance engineering experience.
  • Experience with C++, CUDA, HIP, Triton, or similar low-level performance tools, or clear ability to learn the needed stack quickly.
  • Comfort reasoning about numerical correctness and performance tradeoffs.
  • Ability to communicate careful measurements without overstating results.
Preferred qualifications
  • Experience with Hopper, Blackwell, CDNA, tensor cores, or MFMA.
  • Experience with quantization formats, inference kernels, compiler work, or code generation.
  • Published OSS, benchmark, kernel, compiler, or performance work.

Logistics

  • Full-time founding role.
  • San Francisco / Bay Area preferred. Remote exceptional for the right person.
  • Details discussed during process.

When applying, select "Founding Kernel Engineer, GPU Kernels" in the Role or lane field.