OpenTelemetry SDK integration
Arbitex ships an opt-in OpenTelemetry (OTel) SDK integration. When enabled, the platform exports traces and metrics to any OTLP-compatible backend — Grafana Tempo, Jaeger, Zipkin, Honeycomb, or a vendor-hosted OTel Collector.
OTel is disabled by default. It activates only when OTEL_EXPORTER_OTLP_ENDPOINT is set. All OTel functions are safe no-ops when the variable is absent, so there is no performance penalty in deployments that do not use it.
Architecture overview
Section titled “Architecture overview”The integration lives in backend/app/core/telemetry.py and is called during application startup.
FastAPI startup └─ init_telemetry() ├─ TracerProvider ──► BatchSpanProcessor ──► OTLPSpanExporter (gRPC) │ └─ _ContextVarBridgeProcessor (writes trace_id/span_id into observability.py ContextVars) └─ MeterProvider ──► PeriodicExportingMetricReader (60 s) ──► OTLPMetricExporter (gRPC)Auto-instrumentation patches three libraries at startup:
| Library | Instrumentation package |
|---|---|
| FastAPI | opentelemetry-instrumentation-fastapi |
| httpx | opentelemetry-instrumentation-httpx |
| SQLAlchemy | opentelemetry-instrumentation-sqlalchemy |
Each package is imported at runtime — if any is absent, instrumentation for that library is silently skipped without failing startup.
Environment variables
Section titled “Environment variables”| Variable | Required | Default | Description |
|---|---|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT | Yes | — | gRPC endpoint for the OTLP exporter, e.g. http://otel-collector:4317. Setting this variable activates OTel. HTTPS endpoints are automatically detected by prefix. |
OTEL_SERVICE_NAME | No | arbitex-platform | Service name attached to all spans and metrics as the service.name resource attribute. |
OTEL_RESOURCE_ATTRIBUTES | No | — | Additional resource attributes in key=value,key=value format, e.g. deployment.environment=production,region=us-east-1. Follows the standard OTel specification. |
These variables follow the OpenTelemetry SDK environment variable specification and are compatible with the standard OTel Collector.
TLS note
Section titled “TLS note”The exporter uses insecure gRPC when the endpoint does not start with https://. For production deployments, use a TLS-terminated OTLP Collector endpoint:
OTEL_EXPORTER_OTLP_ENDPOINT=https://otel-collector.internal:4317Enabling OTel
Section titled “Enabling OTel”Docker / docker-compose
Section titled “Docker / docker-compose”services: api: image: arbitex-platform:latest environment: OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector:4317" OTEL_SERVICE_NAME: "arbitex-platform" OTEL_RESOURCE_ATTRIBUTES: "deployment.environment=production"Kubernetes (Helm)
Section titled “Kubernetes (Helm)”Set the variables in your values-prod.yaml:
api: env: OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector.monitoring.svc.cluster.local:4317" OTEL_SERVICE_NAME: "arbitex-platform" OTEL_RESOURCE_ATTRIBUTES: "deployment.environment=production,k8s.namespace=arbitex"Outpost deployments
Section titled “Outpost deployments”Outpost deployments use the same variables. Set them in the outpost’s environment or Helm values. The OTel endpoint should point to a collector co-located in the air-gapped network — do not route spans or metrics through the public Arbitex cloud endpoint.
ContextVar bridge
Section titled “ContextVar bridge”Arbitex has a legacy observability.py module that stores trace_id and span_id in Python ContextVar instances. These are read by MetricsMiddleware and the structured JSON logger to correlate log lines with traces.
When OTel is enabled, the _ContextVarBridgeProcessor span processor runs on every span start and writes the OTel-generated trace and span IDs into those ContextVars:
# On span start:trace_id = format(ctx.trace_id, "032x") # 32-char hexspan_id = format(ctx.span_id, "016x") # 16-char hexset_trace_context(trace_id, span_id)This means no changes are needed to existing log pipelines or metrics code — they automatically pick up OTel-generated IDs. Log lines written during a request will contain the same trace_id that appears in your tracing backend, enabling log-to-trace correlation without a log-shipping plugin.
Metrics export
Section titled “Metrics export”The MeterProvider exports metrics every 60 seconds via PeriodicExportingMetricReader. All Prometheus metrics already emitted by the platform (request counts, DLP scan durations, provider latencies, etc.) are also available as OTel metrics when the SDK is enabled.
The metric exporter uses the same gRPC endpoint as the trace exporter. If your backend expects separate endpoints for traces and metrics, configure a Collector pipeline that fans out from a single receiver.
Required Python packages
Section titled “Required Python packages”OTel packages are optional dependencies. Install them to enable the integration:
pip install \ opentelemetry-sdk \ opentelemetry-exporter-otlp-proto-grpc \ opentelemetry-instrumentation-fastapi \ opentelemetry-instrumentation-httpx \ opentelemetry-instrumentation-sqlalchemyThe platform Dockerfile includes these packages when the OTEL_EXTRAS=1 build argument is set:
docker build --build-arg OTEL_EXTRAS=1 -t arbitex-platform:latest .Production Helm values sets this flag by default for Cloud deployments. Outpost deployments may need it set explicitly.
Graceful shutdown
Section titled “Graceful shutdown”On application shutdown, shutdown_telemetry() is called automatically. It flushes the BatchSpanProcessor (sending any buffered spans to the OTLP exporter) and shuts down the MeterProvider. This ensures no spans are lost during rolling restarts or planned shutdowns.
Verifying the integration
Section titled “Verifying the integration”- Set
OTEL_EXPORTER_OTLP_ENDPOINTand restart the platform. - Check the application log for:
OTel initialized: service=arbitex-platform, endpoint=http://otel-collector:4317
- Send a request to any API endpoint. The trace should appear in your backend within a few seconds.
- Check for the
_ContextVarBridgeProcessorby verifying that structured log lines includetrace_idmatching spans in your tracing backend.
If OTel is not initializing, check for:
OTel disabled (OTEL_EXPORTER_OTLP_ENDPOINT not set)— variable not setOTel packages not installed, skipping initialization— install the optional packages above
See also
Section titled “See also”- Grafana dashboard catalog — pre-built dashboards for the platform metrics
- Prometheus alerting reference — alert rules for DLP, providers, and infrastructure
- Audit Log — structured audit events (separate from OTel spans)