zoff.tech

Agentic system review

A fixed-scope review of an AI or agentic system already in or near production — architecture, evaluation, tool use, cost, safety, and operability.

You shipped an AI feature, agentic workflow, or internal AI tool. Now you'd like a second opinion before the next release — on architecture, evaluation, tool use, cost, safety, or operability.

This is not a pre-sales audit. We do not use reviews to wedge our way into a rebuild. The output is a written assessment your team can act on whether or not we ever work together again.

The shape

  • Week 1: Intake and access. We read the architecture, prompts, evals, traces, cost reports, incident notes, and product constraints. If the system is not ready for review, we say that in writing.
  • Week 2: Findings and reproduction. We reproduce the riskiest paths, inspect model and retrieval behavior, and look for gaps between the stated success criteria and what the system actually measures.
  • Week 3: Report and readout. We deliver a prioritized report, run a 90-minute readout with engineering and product, and leave your team with the fixes we would make first.

Deliverables

  • A written assessment with prioritized findings.
  • A 90-minute readout for engineering and product.
  • A concrete risk register: release blockers, near-term fixes, and backlog items.
  • A cost and latency review with the model, retrieval, and orchestration paths separated.
  • A tool-use review: what the system can call, what it can mutate, what it logs, and where it should stop.
  • An evaluation gap analysis: what your current eval catches, what it misses, and what we would add.
  • A 30-day re-check, optional and fixed-scope.

What we look for

  • Prompts and agent paths that pass demos but fail repeatability.
  • Tool-calling paths with unclear permissions, weak schemas, missing idempotency, or no human approval before irreversible actions.
  • Retrieval systems with no answerability test or source-grounding threshold.
  • Safety claims that are not enforced in code, review workflow, or escalation paths.
  • Cost paths where a smaller model, cache, batch job, or narrower context window would pass the same eval.
  • Operational gaps: no rollback path, no owner, no incident playbook, no way to tell if quality is drifting.

The arithmetic

AI review is $15–25k, fixed price, usually three weeks. The lower end fits a bounded feature with clear access. The upper end fits multi-agent systems, regulated workflows, or systems with several retrieval and approval paths.

Who this works for

  • Teams with an AI feature close to release and enough implementation detail to review.
  • Leaders who need a defensible outside read before a security, procurement, or board conversation.
  • Teams that want the truth more than they want a clean bill of health.

We don't sell follow-on engagements off the back of a review. If you ask us to help implement findings, we scope that separately after the report is done.