Distributed tracing guide
Arbitex Platform (platform-0046) and Outpost (outpost-0027) both ship OpenTelemetry SDK integrations. When both are configured with the same trace backend, you can follow a single request across the full chain: client → Outpost proxy → Platform API → provider call — all in one trace waterfall.
This guide focuses on cross-service distributed tracing. For per-service OTel configuration details (environment variables, packages, metrics), see OpenTelemetry configuration guide.
How distributed tracing works
Section titled “How distributed tracing works”When a request enters the Outpost proxy, the OTel SDK starts a root span. As the request flows to the Platform API (for management-plane operations, audit sync, or policy evaluation), standard W3C traceparent headers propagate the trace context automatically. Both services export spans to the same OTLP collector, and the trace backend joins them into a single waterfall by trace ID.
Client request │ ▼┌─────────────────────────────────────────────────────────────┐│ Arbitex Outpost ││ ├── span: POST /proxy/chat/completions (root span) ││ │ └── trace_id: a1b2c3d4e5f6... ││ ├── span: policy cache lookup ││ ├── span: DLP pipeline (NER + regex) ││ └── span: httpx → Platform audit sync ││ └── traceparent header propagated ──────────────┤│ ││ Arbitex Platform API ││ ├── span: POST /api/internal/audit/ingest (child span) ││ │ └── trace_id: a1b2c3d4e5f6... (same trace!) ││ └── span: SQLAlchemy INSERT audit_logs │└─────────────────────────────────────────────────────────────┘ │ ▼OTel Collector → Jaeger / Grafana TempoBecause both services use the OpenTelemetry SDK with automatic W3C trace context propagation (via opentelemetry-instrumentation-httpx), no manual instrumentation is required to link spans across services.
Prerequisites
Section titled “Prerequisites”- Arbitex Platform 0046+ with OTel packages installed (
OTEL_EXTRAS=1build) - Arbitex Outpost 0027+ with OTel packages installed
- An OTLP-compatible trace backend reachable from both services:
- Jaeger (self-hosted, recommended for getting started)
- Grafana Tempo (self-hosted or Grafana Cloud)
- Any OTLP collector fan-out to your existing APM tool
Step 1: Deploy an OTel Collector
Section titled “Step 1: Deploy an OTel Collector”Both services should send to the same OTel Collector. The Collector fans out to the trace backend and optionally to multiple destinations.
Docker Compose example
Section titled “Docker Compose example”services: otel-collector: image: otel/opentelemetry-collector-contrib:0.96.0 command: ["--config=/etc/otel-collector-config.yaml"] volumes: - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml ports: - "4317:4317" # OTLP gRPC - "4318:4318" # OTLP HTTP - "8888:8888" # Collector self-metrics (Prometheus)Kubernetes example
Section titled “Kubernetes example”apiVersion: apps/v1kind: Deploymentmetadata: name: otel-collector namespace: monitoringspec: replicas: 1 selector: matchLabels: app: otel-collector template: metadata: labels: app: otel-collector spec: containers: - name: otel-collector image: otel/opentelemetry-collector-contrib:0.96.0 args: ["--config=/etc/otel/config.yaml"] volumeMounts: - name: config mountPath: /etc/otel ports: - containerPort: 4317 - containerPort: 4318 volumes: - name: config configMap: name: otel-collector-config---apiVersion: v1kind: Servicemetadata: name: otel-collector namespace: monitoringspec: selector: app: otel-collector ports: - name: otlp-grpc port: 4317 targetPort: 4317 - name: otlp-http port: 4318 targetPort: 4318Collector configuration
Section titled “Collector configuration”receivers: otlp: protocols: grpc: endpoint: "0.0.0.0:4317" http: endpoint: "0.0.0.0:4318"
processors: batch: timeout: 1s send_batch_size: 1024
exporters: otlp/jaeger: endpoint: "jaeger:4317" tls: insecure: true # Or for Grafana Tempo: # otlp/tempo: # endpoint: "tempo:4317" # tls: # insecure: true
service: pipelines: traces: receivers: [otlp] processors: [batch] exporters: [otlp/jaeger]Step 2: Configure the Platform
Section titled “Step 2: Configure the Platform”Set the OTLP endpoint environment variable on the Platform API:
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317OTEL_SERVICE_NAME=arbitex-platformOTEL_RESOURCE_ATTRIBUTES=deployment.environment=production,service.version=0046Kubernetes (Helm values):
api: env: OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector.monitoring.svc.cluster.local:4317" OTEL_SERVICE_NAME: "arbitex-platform" OTEL_RESOURCE_ATTRIBUTES: "deployment.environment=production,k8s.namespace=arbitex"Verify OTel initialized at startup:
INFO OTel initialized: service=arbitex-platform, endpoint=http://otel-collector:4317Step 3: Configure the Outpost
Section titled “Step 3: Configure the Outpost”Set the same OTLP endpoint on the Outpost. For a network-isolated outpost, point the endpoint at a Collector co-located in the customer’s private network — not through the public Arbitex cloud endpoint.
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.corp.internal:4317OTEL_SERVICE_NAME=arbitex-outpostOTEL_RESOURCE_ATTRIBUTES=deployment.environment=production,outpost.id=your-outpost-uuidHelm values (outpost-values-prod.yaml):
# Add to the outpost Helm values under env:env: OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector.monitoring.svc.cluster.local:4317" OTEL_SERVICE_NAME: "arbitex-outpost" OTEL_RESOURCE_ATTRIBUTES: "outpost.id=your-outpost-uuid,deployment.environment=production"Jaeger setup
Section titled “Jaeger setup”Jaeger 1.35+ natively supports OTLP gRPC on port 4317.
Docker Compose (all-in-one, development)
Section titled “Docker Compose (all-in-one, development)”services: jaeger: image: jaegertracing/all-in-one:1.54 environment: - COLLECTOR_OTLP_ENABLED=true ports: - "16686:16686" # Jaeger UI - "4317:4317" # OTLP gRPC (if using Jaeger directly, without Collector)Point the Collector’s otlp/jaeger exporter at http://jaeger:4317.
Kubernetes (Jaeger Operator)
Section titled “Kubernetes (Jaeger Operator)”# Install the Jaeger Operatorhelm repo add jaegertracing https://jaegertracing.github.io/helm-chartshelm install jaeger-operator jaegertracing/jaeger-operator \ --namespace observability \ --create-namespace
# Deploy a simple Jaeger instancekubectl apply -f - <<EOFapiVersion: jaegertracing.io/v1kind: Jaegermetadata: name: jaeger namespace: observabilityspec: strategy: allInOne allInOne: options: collector: otlp: grpc: host-port: ":4317" storage: type: memory # Use elasticsearch for productionEOFPoint the Collector at http://jaeger-collector.observability.svc.cluster.local:4317.
Navigating traces in Jaeger UI
Section titled “Navigating traces in Jaeger UI”- Open the Jaeger UI at port 16686.
- In the Service dropdown, select
arbitex-platformorarbitex-outpost. - Set the Lookback window (e.g., last 1 hour) and click Find Traces.
- Click a trace to open the waterfall view.
Cross-service traces appear as a single entry — the trace list shows both service names in the span count. A typical Outpost→Platform trace shows:
- Root span:
POST /proxy/chat/completions(service:arbitex-outpost)- Child: DLP scan spans
- Child:
POST /api/internal/audit/ingest(service:arbitex-platform)- Child: SQLAlchemy insert spans
Grafana Tempo setup
Section titled “Grafana Tempo setup”Grafana Tempo is a OTLP-native trace backend designed to work with Grafana dashboards.
Docker Compose
Section titled “Docker Compose”services: tempo: image: grafana/tempo:2.3.1 command: ["-config.file=/etc/tempo.yaml"] volumes: - ./tempo.yaml:/etc/tempo.yaml - tempo-data:/var/tempo ports: - "3200:3200" # Tempo query API - "4317:4317" # OTLP gRPC (internal)
grafana: image: grafana/grafana:10.2.3 ports: - "3000:3000" environment: - GF_AUTH_ANONYMOUS_ENABLED=true - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin volumes: - grafana-storage:/var/lib/grafana
volumes: tempo-data: {} grafana-storage: {}server: http_listen_port: 3200
distributor: receivers: otlp: protocols: grpc: endpoint: "0.0.0.0:4317"
ingester: max_block_duration: 5m
storage: trace: backend: local local: path: /var/tempo/traces wal: path: /var/tempo/wal
compactor: compaction: block_retention: 48hConfigure Grafana data source
Section titled “Configure Grafana data source”- In Grafana, go to Connections > Data Sources > Add new data source.
- Select Tempo.
- Set URL to
http://tempo:3200. - Click Save & Test.
To enable trace-to-log correlation (linking trace IDs in logs to Tempo traces), also add a Loki data source and configure derived fields:
- In the Loki data source settings, go to Derived fields.
- Add a derived field:
- Name:
trace_id - Regex:
"trace_id":"([0-9a-f]{32})" - Internal link: Tempo data source
- Query:
${__value.raw}
- Name:
Reading cross-service trace waterfalls
Section titled “Reading cross-service trace waterfalls”A distributed trace waterfall shows all spans from both services for a single request. Here is what to look for:
Span anatomy
Section titled “Span anatomy”Each span shows:
- Service name (left column) —
arbitex-platformorarbitex-outpost - Operation name — e.g.,
POST /proxy/chat/completions - Duration — time the span took
- Start offset — when the span started relative to the root span
Typical Outpost → Platform trace
Section titled “Typical Outpost → Platform trace”arbitex-outpost POST /proxy/chat/completions [200ms root] arbitex-outpost policy.cache.lookup [1ms] arbitex-outpost dlp.tier1.regex_scan [3ms] arbitex-outpost dlp.tier2.ner_scan [45ms] arbitex-outpost httpx POST /api/internal/audit [12ms] arbitex-platform POST /api/internal/audit [10ms] arbitex-platform sqlalchemy INSERT audit_logs [4ms] arbitex-outpost httpx POST provider.anthropic [130ms]Diagnosing latency
Section titled “Diagnosing latency”| What to look for | Likely cause |
|---|---|
| DLP scan span > 200ms | NER GPU pod not available; falling back to CPU |
| Provider span > 5s | Provider API latency; not an Arbitex issue |
audit/ingest span > 50ms | Platform API overloaded or database slow |
| Missing Platform spans | Platform OTel not configured; check sso_enabled log |
| Missing Outpost spans | Outpost OTel not configured; check startup log |
Filtering by outpost
Section titled “Filtering by outpost”If you set outpost.id in OTEL_RESOURCE_ATTRIBUTES, you can filter traces in Grafana Tempo using TraceQL:
{ resource.outpost.id = "your-outpost-uuid" && span.http.status_code = 200 }In Jaeger, add a tag filter: outpost.id=your-outpost-uuid.
Cross-service log-to-trace correlation
Section titled “Cross-service log-to-trace correlation”When OTel is enabled, both Platform and Outpost inject the current trace_id into every structured log line:
{ "timestamp": "2026-03-12T14:00:00Z", "level": "INFO", "message": "DLP scan completed", "trace_id": "a1b2c3d4e5f67890abcdef1234567890", "span_id": "1234567890abcdef", "service": "arbitex-outpost"}The same trace_id appears in logs from both services for the same request. In Grafana:
- Search Loki for logs matching a specific
trace_id:{job="arbitex-outpost"} |= "a1b2c3d4e5f67890abcdef1234567890" - Click the Jump to trace link next to the
trace_idfield (requires derived field configuration — see Grafana Tempo setup).
Troubleshooting
Section titled “Troubleshooting”Spans from one service appear but not the other
Section titled “Spans from one service appear but not the other”Verify both services have OTEL_EXPORTER_OTLP_ENDPOINT set and the OTel packages are installed:
Platform:
# Check startup logdocker logs arbitex-api 2>&1 | grep -i otel# Expected: "OTel initialized: service=arbitex-platform, endpoint=..."Outpost:
kubectl logs -n arbitex-outpost deployment/arbitex-outpost | grep -i otel# Expected: OTel initialization logCross-service spans not connected (two separate traces)
Section titled “Cross-service spans not connected (two separate traces)”If Platform spans appear as a separate trace from Outpost spans, the traceparent header is not being propagated. Check:
- Both services are running
opentelemetry-instrumentation-httpx— this is required for automatic header injection on outbound HTTP calls - The Outpost → Platform connection is made via HTTP (not a queue or async mechanism that bypasses the HTTP headers)
- The trace backend is receiving spans from both services on the same OTLP endpoint
Collector is dropping spans
Section titled “Collector is dropping spans”Check the Collector’s self-metrics at port 8888 (/metrics):
otelcol_exporter_send_failed_spans_total{exporter="otlp/jaeger"} 42Common causes:
- Collector cannot reach Jaeger/Tempo — verify network connectivity
- Exporter backpressure — the Collector queue is full; increase
queue_sizein theotlpexporter config - TLS mismatch — use
tls.insecure: truefor non-TLS backends
See also
Section titled “See also”- OpenTelemetry configuration guide — per-service OTel setup, environment variables, metrics
- OpenTelemetry SDK integration — internal SDK architecture (TracerProvider, ContextVar bridge)
- Grafana dashboard catalog — pre-built dashboards for platform observability
- Kubernetes deployment guide — deploying the OTel Collector alongside Arbitex in Kubernetes