A team instruments their agent against a tracing vendor's SDK. It works. Six months later the vendor's pricing changes, or the team moves to Azure, or a regulated customer demands the telemetry never leave their network. Now every span in the codebase is written against a format that only one backend speaks, and the "quick migration" is a quarter of rewriting instrumentation that was never the product. They did not buy observability. They rented it, and the lease just came due.
This is the same trap as picking a model by the vendor who sold the room instead of the eval. The fix is the same too: own the contract, not the vendor.
The contract is OpenTelemetry
OpenTelemetry (OTel) is the vendor-neutral standard for emitting traces, metrics, and logs. You instrument once, against the open SDK, and the telemetry flows to any backend that speaks OTel — without touching your code. The instrumentation is decoupled from the destination. That decoupling is the entire point, and for agent systems it has gone from nice-to-have to load-bearing, because the destinations are now plural and changing: a cloud's native backend today, a specialized eval platform tomorrow, your own store for the regulated workload.
What makes this real for agents rather than generic services is the GenAI semantic conventions — a standardized vocabulary for the things agents actually do. The model on a request is gen_ai.request.model whether it is GPT, Gemini, or an open-weights model. Token usage is gen_ai.usage.input_tokens regardless of vendor. Tool calls, operation names, and agent steps have agreed-upon attribute names. This is what lets a single trace span be understood identically by every backend instead of being a vendor-specific blob. The convention is the lingua franca; without it, "portable" traces are just bytes nobody else can read.
The Collector is where the leverage is
Between your agents and any backend sits the OTel Collector, and it is the most underused piece of the stack. Because all telemetry passes through it, the Collector is where you do the things you do not want scattered through application code:
- Redact PII before it leaves your network. For regulated work this is not optional — the Collector strips or hashes sensitive fields so the prompt text and tool arguments that go to a backend are already clean. Do this here, once, not in fifty instrumentation sites.
- Sample deliberately. Keep 100% of failures and 100% of autonomous agent modifications; sample routine successes. Failures are where the signal is; full-fidelity success traces are mostly cost.
- Route and fan out. The same trace can go to your cloud's backend for ops and to an eval platform for LLM-as-judge scoring, with no change upstream.
Most agent frameworks already speak it
The good news is you are rarely starting from zero. The major agent frameworks — across the LangChain/LangGraph, CrewAI, and AutoGen families, and increasingly the cloud-native SDKs — emit OTel traces natively or through a plugin. The two big clouds both terminate OTel without translation: on Microsoft, traces land in Azure Monitor / Application Insights; on Google, in Cloud Trace and the broader Cloud Observability suite. We cover each of those stacks in its own field guide — Microsoft and Google — but the architectural point precedes the choice: if your spans are OTel with GenAI conventions, which cloud's backend you use is a routing decision, not a rewrite.
What you still own
OTel defines how to capture telemetry. It does not decide which insight to extract. Raw traces will tell you a failure rate; they will not, by themselves, cluster the failures, surface the coordination root cause, or tie a regression to last week's autonomous prompt change. That interpretation layer — dashboards, evals on traces, drift detection — is real work and where purpose-built platforms earn their place. But you want that layer reading a standard, portable trace underneath it, so the platform is a choice you can revisit instead of a wall you are bricked behind.
Closing
The agent is the easy part to instrument. The expensive mistake is instrumenting it against a format only one vendor reads.
OpenTelemetry with the GenAI semantic conventions is the contract: instrument once, redact and sample at the Collector, send the same trace to Azure Monitor, Cloud Trace, an eval platform, or your own store — and change your mind later without changing your code. Own the trace. Rent the backend.