mad-lab-ai

GPU-native intelligence infrastructure, built in the open.

What we build

Deep infrastructure for the open AI era

GPU-Native Inference

Custom attention kernels, tiered KV cache, and ROCm-first optimizations for local LLM inference at scale. Built on llama.cpp, benchmarked in production.

Agent Memory

Persistent, searchable memory that spans sessions. Agents that remember, learn, and improve over time — without sending your data to the cloud.

Open Research

All kernels, benchmarks, and findings published as we go. Building in public, shipping to production, sharing everything that works.