Skip to content

Distributed tracing guide

Arbitex Platform (platform-0046) and Outpost (outpost-0027) both ship OpenTelemetry SDK integrations. When both are configured with the same trace backend, you can follow a single request across the full chain: client → Outpost proxy → Platform API → provider call — all in one trace waterfall.

This guide focuses on cross-service distributed tracing. For per-service OTel configuration details (environment variables, packages, metrics), see OpenTelemetry configuration guide.


When a request enters the Outpost proxy, the OTel SDK starts a root span. As the request flows to the Platform API (for management-plane operations, audit sync, or policy evaluation), standard W3C traceparent headers propagate the trace context automatically. Both services export spans to the same OTLP collector, and the trace backend joins them into a single waterfall by trace ID.

Client request
┌─────────────────────────────────────────────────────────────┐
│ Arbitex Outpost │
│ ├── span: POST /proxy/chat/completions (root span) │
│ │ └── trace_id: a1b2c3d4e5f6... │
│ ├── span: policy cache lookup │
│ ├── span: DLP pipeline (NER + regex) │
│ └── span: httpx → Platform audit sync │
│ └── traceparent header propagated ──────────────┤
│ │
│ Arbitex Platform API │
│ ├── span: POST /api/internal/audit/ingest (child span) │
│ │ └── trace_id: a1b2c3d4e5f6... (same trace!) │
│ └── span: SQLAlchemy INSERT audit_logs │
└─────────────────────────────────────────────────────────────┘
OTel Collector → Jaeger / Grafana Tempo

Because both services use the OpenTelemetry SDK with automatic W3C trace context propagation (via opentelemetry-instrumentation-httpx), no manual instrumentation is required to link spans across services.


  • Arbitex Platform 0046+ with OTel packages installed (OTEL_EXTRAS=1 build)
  • Arbitex Outpost 0027+ with OTel packages installed
  • An OTLP-compatible trace backend reachable from both services:
    • Jaeger (self-hosted, recommended for getting started)
    • Grafana Tempo (self-hosted or Grafana Cloud)
    • Any OTLP collector fan-out to your existing APM tool

Both services should send to the same OTel Collector. The Collector fans out to the trace backend and optionally to multiple destinations.

otel-collector.yaml
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:0.96.0
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "8888:8888" # Collector self-metrics (Prometheus)
otel-collector-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:0.96.0
args: ["--config=/etc/otel/config.yaml"]
volumeMounts:
- name: config
mountPath: /etc/otel
ports:
- containerPort: 4317
- containerPort: 4318
volumes:
- name: config
configMap:
name: otel-collector-config
---
apiVersion: v1
kind: Service
metadata:
name: otel-collector
namespace: monitoring
spec:
selector:
app: otel-collector
ports:
- name: otlp-grpc
port: 4317
targetPort: 4317
- name: otlp-http
port: 4318
targetPort: 4318
otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"
processors:
batch:
timeout: 1s
send_batch_size: 1024
exporters:
otlp/jaeger:
endpoint: "jaeger:4317"
tls:
insecure: true
# Or for Grafana Tempo:
# otlp/tempo:
# endpoint: "tempo:4317"
# tls:
# insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/jaeger]

Set the OTLP endpoint environment variable on the Platform API:

Terminal window
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_SERVICE_NAME=arbitex-platform
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production,service.version=0046

Kubernetes (Helm values):

api:
env:
OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector.monitoring.svc.cluster.local:4317"
OTEL_SERVICE_NAME: "arbitex-platform"
OTEL_RESOURCE_ATTRIBUTES: "deployment.environment=production,k8s.namespace=arbitex"

Verify OTel initialized at startup:

INFO OTel initialized: service=arbitex-platform, endpoint=http://otel-collector:4317

Set the same OTLP endpoint on the Outpost. For a network-isolated outpost, point the endpoint at a Collector co-located in the customer’s private network — not through the public Arbitex cloud endpoint.

Terminal window
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.corp.internal:4317
OTEL_SERVICE_NAME=arbitex-outpost
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production,outpost.id=your-outpost-uuid

Helm values (outpost-values-prod.yaml):

# Add to the outpost Helm values under env:
env:
OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector.monitoring.svc.cluster.local:4317"
OTEL_SERVICE_NAME: "arbitex-outpost"
OTEL_RESOURCE_ATTRIBUTES: "outpost.id=your-outpost-uuid,deployment.environment=production"

Jaeger 1.35+ natively supports OTLP gRPC on port 4317.

services:
jaeger:
image: jaegertracing/all-in-one:1.54
environment:
- COLLECTOR_OTLP_ENABLED=true
ports:
- "16686:16686" # Jaeger UI
- "4317:4317" # OTLP gRPC (if using Jaeger directly, without Collector)

Point the Collector’s otlp/jaeger exporter at http://jaeger:4317.

Terminal window
# Install the Jaeger Operator
helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm install jaeger-operator jaegertracing/jaeger-operator \
--namespace observability \
--create-namespace
# Deploy a simple Jaeger instance
kubectl apply -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger
namespace: observability
spec:
strategy: allInOne
allInOne:
options:
collector:
otlp:
grpc:
host-port: ":4317"
storage:
type: memory # Use elasticsearch for production
EOF

Point the Collector at http://jaeger-collector.observability.svc.cluster.local:4317.

  1. Open the Jaeger UI at port 16686.
  2. In the Service dropdown, select arbitex-platform or arbitex-outpost.
  3. Set the Lookback window (e.g., last 1 hour) and click Find Traces.
  4. Click a trace to open the waterfall view.

Cross-service traces appear as a single entry — the trace list shows both service names in the span count. A typical Outpost→Platform trace shows:

  • Root span: POST /proxy/chat/completions (service: arbitex-outpost)
    • Child: DLP scan spans
    • Child: POST /api/internal/audit/ingest (service: arbitex-platform)
      • Child: SQLAlchemy insert spans

Grafana Tempo is a OTLP-native trace backend designed to work with Grafana dashboards.

services:
tempo:
image: grafana/tempo:2.3.1
command: ["-config.file=/etc/tempo.yaml"]
volumes:
- ./tempo.yaml:/etc/tempo.yaml
- tempo-data:/var/tempo
ports:
- "3200:3200" # Tempo query API
- "4317:4317" # OTLP gRPC (internal)
grafana:
image: grafana/grafana:10.2.3
ports:
- "3000:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
volumes:
- grafana-storage:/var/lib/grafana
volumes:
tempo-data: {}
grafana-storage: {}
tempo.yaml
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
ingester:
max_block_duration: 5m
storage:
trace:
backend: local
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
compactor:
compaction:
block_retention: 48h
  1. In Grafana, go to Connections > Data Sources > Add new data source.
  2. Select Tempo.
  3. Set URL to http://tempo:3200.
  4. Click Save & Test.

To enable trace-to-log correlation (linking trace IDs in logs to Tempo traces), also add a Loki data source and configure derived fields:

  1. In the Loki data source settings, go to Derived fields.
  2. Add a derived field:
    • Name: trace_id
    • Regex: "trace_id":"([0-9a-f]{32})"
    • Internal link: Tempo data source
    • Query: ${__value.raw}

A distributed trace waterfall shows all spans from both services for a single request. Here is what to look for:

Each span shows:

  • Service name (left column) — arbitex-platform or arbitex-outpost
  • Operation name — e.g., POST /proxy/chat/completions
  • Duration — time the span took
  • Start offset — when the span started relative to the root span
arbitex-outpost POST /proxy/chat/completions [200ms root]
arbitex-outpost policy.cache.lookup [1ms]
arbitex-outpost dlp.tier1.regex_scan [3ms]
arbitex-outpost dlp.tier2.ner_scan [45ms]
arbitex-outpost httpx POST /api/internal/audit [12ms]
arbitex-platform POST /api/internal/audit [10ms]
arbitex-platform sqlalchemy INSERT audit_logs [4ms]
arbitex-outpost httpx POST provider.anthropic [130ms]
What to look forLikely cause
DLP scan span > 200msNER GPU pod not available; falling back to CPU
Provider span > 5sProvider API latency; not an Arbitex issue
audit/ingest span > 50msPlatform API overloaded or database slow
Missing Platform spansPlatform OTel not configured; check sso_enabled log
Missing Outpost spansOutpost OTel not configured; check startup log

If you set outpost.id in OTEL_RESOURCE_ATTRIBUTES, you can filter traces in Grafana Tempo using TraceQL:

{ resource.outpost.id = "your-outpost-uuid" && span.http.status_code = 200 }

In Jaeger, add a tag filter: outpost.id=your-outpost-uuid.


When OTel is enabled, both Platform and Outpost inject the current trace_id into every structured log line:

{
"timestamp": "2026-03-12T14:00:00Z",
"level": "INFO",
"message": "DLP scan completed",
"trace_id": "a1b2c3d4e5f67890abcdef1234567890",
"span_id": "1234567890abcdef",
"service": "arbitex-outpost"
}

The same trace_id appears in logs from both services for the same request. In Grafana:

  1. Search Loki for logs matching a specific trace_id:
    {job="arbitex-outpost"} |= "a1b2c3d4e5f67890abcdef1234567890"
  2. Click the Jump to trace link next to the trace_id field (requires derived field configuration — see Grafana Tempo setup).

Spans from one service appear but not the other

Section titled “Spans from one service appear but not the other”

Verify both services have OTEL_EXPORTER_OTLP_ENDPOINT set and the OTel packages are installed:

Platform:

Terminal window
# Check startup log
docker logs arbitex-api 2>&1 | grep -i otel
# Expected: "OTel initialized: service=arbitex-platform, endpoint=..."

Outpost:

Terminal window
kubectl logs -n arbitex-outpost deployment/arbitex-outpost | grep -i otel
# Expected: OTel initialization log

Cross-service spans not connected (two separate traces)

Section titled “Cross-service spans not connected (two separate traces)”

If Platform spans appear as a separate trace from Outpost spans, the traceparent header is not being propagated. Check:

  1. Both services are running opentelemetry-instrumentation-httpx — this is required for automatic header injection on outbound HTTP calls
  2. The Outpost → Platform connection is made via HTTP (not a queue or async mechanism that bypasses the HTTP headers)
  3. The trace backend is receiving spans from both services on the same OTLP endpoint

Check the Collector’s self-metrics at port 8888 (/metrics):

otelcol_exporter_send_failed_spans_total{exporter="otlp/jaeger"} 42

Common causes:

  • Collector cannot reach Jaeger/Tempo — verify network connectivity
  • Exporter backpressure — the Collector queue is full; increase queue_size in the otlp exporter config
  • TLS mismatch — use tls.insecure: true for non-TLS backends