Official role
Founding Kernel Engineer, GPU Kernels
Location: San Francisco / Bay Area preferred. Remote exceptional for the right person.
About the Role
This is a performance engineering role for a senior builder who can learn unfamiliar hardware, reason from first principles, and produce trustworthy measurements. You will work near GPU kernels and numerical formats, but the core need is judgment: identify what matters, test it carefully, and turn low-level evidence into reusable optimization artifacts. Specific CUDA, HIP, Triton, Hopper, Blackwell, or CDNA experience is useful, not a hard gate.
What You'll Do
- Investigate performance questions that affect inference cost, latency, and reliability.
- Build or adapt kernels, microbenchmarks, and experiments when they are the right way to answer a question.
- Work with technologies such as CUDA, HIP, Triton, WGMMA, TMA, tcgen05.mma, TMEM, MFMA, Hopper, Blackwell, CDNA, FP4, NVFP4, and other sub-byte formats where relevant.
- Connect correctness, profiling, throughput, and system integration evidence into practical recommendations.
- Translate low-level findings into artifacts the broader product and infrastructure can use.
Skills and Qualifications
Minimum qualifications
- Fast learner with strong CS, systems, numerical, or performance fundamentals.
- AI-native builder who uses AI tools to study hardware docs, prototype kernels, debug failures, and move faster while verifying results.
- Evidence-backed measurement discipline, including careful profiling, benchmarking, and correctness checks.
- GPU, kernel, compiler, numerical, or performance engineering experience.
- Experience with C++, CUDA, HIP, Triton, or similar low-level performance tools, or clear ability to learn the needed stack quickly.
- Comfort reasoning about numerical correctness and performance tradeoffs.
- Ability to communicate careful measurements without overstating results.
Preferred qualifications
- Experience with Hopper, Blackwell, CDNA, tensor cores, or MFMA.
- Experience with quantization formats, inference kernels, compiler work, or code generation.
- Published OSS, benchmark, kernel, compiler, or performance work.
Logistics
- Full-time founding role.
- San Francisco / Bay Area preferred. Remote exceptional for the right person.
- Details discussed during process.
When applying, select "Founding Kernel Engineer, GPU Kernels" in the Role or lane field.