Latest insights
May 21, 2026
Using AI to code is not the same as building AI systems
AI-assisted coding is becoming table stakes. AI systems engineering is becoming the real differentiator. Here is the difference, and why it matters.May 19, 2026
The permission map every production agent needs before it calls a tool
Tool-using agents need an explicit map of what they can read, write, mutate, escalate, and never touch.May 12, 2026
RAG does not start with embeddings. It starts with answerability.
Before you tune retrieval, prove the question can be answered from the source corpus, with a citation a human would accept.May 5, 2026
The 3 a.m. AI runbook
Production AI fails in ways ordinary app runbooks do not cover. The operating plan has to include quality drift, retrieval failure, model outages, cost spikes, and human escalation.Apr 28, 2026
Cache-aware agent architecture, or why your loop is paying for the same context fifteen times
Prompt caching is no longer a performance optimization. It is an architectural constraint that decides whether a long-running agent is economic to operate.Apr 21, 2026
MCP is becoming the production interface for agents — own it like one
The Model Context Protocol is moving from a developer convenience to the production interface between agents and your systems. Here is what changes when you treat it that way.Apr 14, 2026
Verifier-gated agent loops — the eval, moved from CI into the runtime
A small verifier model sitting between the frontier model and the side-effect boundary is the most useful piece of agent architecture nobody is shipping yet.Apr 7, 2026
The cheapest model that passes the eval wins
How a working eval harness picks the model — and how often the answer is not the frontier one the team came in expecting.Mar 31, 2026
No orphan PoCs: put a real user in the system by week 2
A PoC with no path to production hides the hard decisions. A week-2 user forces them into the open while the architecture is still cheap to change.Mar 24, 2026
What a real user broke on day twelve that no spec would have caught
Why we put a real user inside the system in week 2, and the kinds of architecture decisions that get rewritten when we do.Mar 17, 2026
The eval we wrote in week one that killed the build in week two
How the discovery eval is supposed to work — including the two times we walked away and refunded the fee.Mar 10, 2026
Shipping evaluation frameworks that survive contact with production
An evaluation harness is a product. Ship it like one — versioned, owned, and instrumented.