You shipped an AI feature, agentic workflow, or internal AI tool. Now you'd like a second opinion before the next release — on architecture, evaluation, tool use, cost, safety, or operability.
This is not a pre-sales audit. We do not use reviews to wedge our way into a rebuild. The output is a written assessment your team can act on whether or not we ever work together again.
The shape
- Week 1: Intake and access. We read the architecture, prompts, evals, traces, cost reports, incident notes, and product constraints. If the system is not ready for review, we say that in writing.
- Week 2: Findings and reproduction. We reproduce the riskiest paths, inspect model and retrieval behavior, and look for gaps between the stated success criteria and what the system actually measures.
- Week 3: Report and readout. We deliver a prioritized report, run a 90-minute readout with engineering and product, and leave your team with the fixes we would make first.
Deliverables
- A written assessment with prioritized findings.
- A 90-minute readout for engineering and product.
- A concrete risk register: release blockers, near-term fixes, and backlog items.
- A cost and latency review with the model, retrieval, and orchestration paths separated.
- A tool-use review: what the system can call, what it can mutate, what it logs, and where it should stop.
- An evaluation gap analysis: what your current eval catches, what it misses, and what we would add.
- A 30-day re-check, optional and fixed-scope.
What we look for
- Prompts and agent paths that pass demos but fail repeatability.
- Tool-calling paths with unclear permissions, weak schemas, missing idempotency, or no human approval before irreversible actions.
- Retrieval systems with no answerability test or source-grounding threshold.
- Safety claims that are not enforced in code, review workflow, or escalation paths.
- Cost paths where a smaller model, cache, batch job, or narrower context window would pass the same eval.
- Operational gaps: no rollback path, no owner, no incident playbook, no way to tell if quality is drifting.
The arithmetic
AI review is $15–25k, fixed price, usually three weeks. The lower end fits a bounded feature with clear access. The upper end fits multi-agent systems, regulated workflows, or systems with several retrieval and approval paths.
Who this works for
- Teams with an AI feature close to release and enough implementation detail to review.
- Leaders who need a defensible outside read before a security, procurement, or board conversation.
- Teams that want the truth more than they want a clean bill of health.
We don't sell follow-on engagements off the back of a review. If you ask us to help implement findings, we scope that separately after the report is done.