You shipped an AI feature, agentic workflow, or internal AI tool. Now you'd like a second opinion before the next release — on architecture, evaluation, tool use, cost, safety, or operability.

This is not a pre-sales audit. We do not use reviews to wedge our way into a rebuild. The output is a written assessment your team can act on whether or not we ever work together again.

The shape

Week 1: Intake and access. We read the architecture, prompts, evals, traces, cost reports, incident notes, and product constraints. If the system is not ready for review, we say that in writing.
Week 2: Findings and reproduction. We reproduce the riskiest paths, inspect model and retrieval behavior, and look for gaps between the stated success criteria and what the system actually measures.
Week 3: Report and readout. We deliver a prioritized report, run a 90-minute readout with engineering and product, and leave your team with the fixes we would make first.

Deliverables

A written assessment with prioritized findings.
A 90-minute readout for engineering and product.
A concrete risk register: release blockers, near-term fixes, and backlog items.
A cost and latency review with the model, retrieval, and orchestration paths separated.
A tool-use review: what the system can call, what it can mutate, what it logs, and where it should stop.
An evaluation gap analysis: what your current eval catches, what it misses, and what we would add.
A 30-day re-check, optional and fixed-scope.

What we look for

Prompts and agent paths that pass demos but fail repeatability.
Tool-calling paths with unclear permissions, weak schemas, missing idempotency, or no human approval before irreversible actions.
Retrieval systems with no answerability test or source-grounding threshold.
Safety claims that are not enforced in code, review workflow, or escalation paths.
Cost paths where a smaller model, cache, batch job, or narrower context window would pass the same eval.
Operational gaps: no rollback path, no owner, no incident playbook, no way to tell if quality is drifting.

The arithmetic

AI review is $15–25k, fixed price, usually three weeks. The lower end fits a bounded feature with clear access. The upper end fits multi-agent systems, regulated workflows, or systems with several retrieval and approval paths.

Who this works for

Teams with an AI feature close to release and enough implementation detail to review.
Leaders who need a defensible outside read before a security, procurement, or board conversation.
Teams that want the truth more than they want a clean bill of health.

We don't sell follow-on engagements off the back of a review. If you ask us to help implement findings, we scope that separately after the report is done.

Agentic system review

The shape

Deliverables

What we look for

The arithmetic

Who this works for