GPU-Native Inference
Custom attention kernels, tiered KV cache, and ROCm-first optimizations for local LLM inference at scale. Built on llama.cpp, benchmarked in production.
GPU-native intelligence infrastructure, built in the open.
What we build
Custom attention kernels, tiered KV cache, and ROCm-first optimizations for local LLM inference at scale. Built on llama.cpp, benchmarked in production.
Persistent, searchable memory that spans sessions. Agents that remember, learn, and improve over time — without sending your data to the cloud.
All kernels, benchmarks, and findings published as we go. Building in public, shipping to production, sharing everything that works.