Skip to content

Grafana dashboard catalog

Arbitex ships six Grafana dashboard JSON files in deploy/grafana/. Each dashboard targets a Prometheus data source and auto-refreshes every 30 seconds. All dashboards require Grafana 10.0.0 or later.


  1. In Grafana, navigate to Dashboards → Import.
  2. Upload the JSON file from deploy/grafana/ or paste its contents.
  3. When prompted, select your Prometheus data source for the DS_PROMETHEUS input.
  4. Click Import.

Each dashboard uses the ${DS_PROMETHEUS} template variable to reference the data source — you can point multiple Arbitex environments at different Prometheus instances by importing the same JSON with a different data source selection.


UID: arbitex-system-health Tags: arbitex, platform, system-health Variable: $job — selects Prometheus scrape job(s); supports all-jobs wildcard.

Tracks platform-level HTTP health.

PanelTypeMetric(s)Description
Request Rate (by method)Time serieshttp_requests_total by methodPer-method request rate (req/s). Useful for spotting unexpected changes in GET/POST ratios.
Request Latency Percentiles (p50/p95/p99)Time serieshttp_request_duration_seconds_bucketLatency percentiles across all endpoints. Threshold coloring: yellow >500 ms, red >2 s.
Error Rate (4xx/5xx)Time serieshttp_requests_total by status_code4xx (orange) and 5xx (red) error rates.
Active ConnectionsStathttp_active_connectionsCurrent active HTTP connections. Threshold: yellow >500, red >1000.
5xx Error RatioStatDerived from http_requests_total5xx errors as a fraction of all requests. Threshold: yellow >1%, red >5%.
Request Rate by EndpointTime series (stacked bars)http_requests_total by endpointPer-endpoint traffic breakdown. Useful for identifying hot endpoints.

UID: arbitex-dlp-analysis Tags: arbitex, platform, dlp Variable: none (platform-wide).

Monitors DLP scan performance and trigger patterns.

PanelTypeMetric(s)Description
DLP Scan Latency p95Statdlp_scan_duration_seconds_bucketCurrent p95 scan latency. Threshold: yellow >100 ms, red >500 ms. Ties to the DLPScanLatencyHigh alert.
DLP Scan Latency p99Statdlp_scan_duration_seconds_bucketCurrent p99 scan latency. Threshold: yellow >250 ms, red >1 s.
DLP Trigger Rate (per min)Statdlp_trigger_totalDLP triggers per minute. Threshold: yellow >10, red >50.
Scan Throughput (ops/s)Statdlp_scans_totalTotal scan operations per second.
DLP Scan Latency Distribution (p50/p95/p99)Time seriesdlp_scan_duration_seconds_bucketFull latency curve over time.
DLP Trigger Rate by TierTime seriesdlp_trigger_total by tierTrigger rate split by detection tier: Tier 1 (Regex), Tier 2 (NER/GLiNER), Tier 3 (DeBERTa NLI), Presidio Bridge. High Tier 3 rates increase latency.
Entity Types Detected (Top Triggers)Tabledlp_trigger_total by entity_typeInstant snapshot of which entity types are firing most frequently. Color-graded by trigger rate.
Scan Throughput vs Trigger RateTime seriesdlp_scans_total, dlp_trigger_totalSide-by-side comparison of total scans versus trigger rate — reveals what fraction of traffic is triggering DLP rules.

Provider Performance (provider-performance.json)

Section titled “Provider Performance (provider-performance.json)”

UID: arbitex-provider-performance Tags: arbitex, platform, providers Variable: $provider — multi-select; filters all panels to selected provider(s).

Per-provider LLM performance metrics.

PanelTypeMetric(s)Description
Per-Provider Latency (p50/p95)Time seriesprovider_request_duration_seconds_bucket by providerLatency percentiles per provider. Useful for comparing upstream response times.
Per-Provider Error RateTime seriesprovider_errors_total by providerError rate per provider (errors/s). Ties to the ProviderErrorRateHigh alert.
Token Throughput by Provider (prompt/completion)Time seriesprovider_tokens_total by provider, token_typePrompt and completion token throughput split by provider.
Request Distribution by ModelPie chart (donut)provider_request_duration_seconds_count by modelShare of requests by model over the selected time range.
Provider / Model Summary TableTableMulti-query joinPer-provider, per-model table showing request rate (req_rate), error rate (error_rate, color-graded at 1%/5%), and p95 latency (p95_latency, color-graded at 2s/10s).

UID: arbitex-usage-billing Tags: arbitex, platform, billing, usage Variable: $org_id — multi-select; filters org-scoped panels to selected organization(s).

Tenant usage, token consumption, and budget utilization.

PanelTypeMetric(s)Description
Total Chat Request RateStathttp_requests_total{endpoint=~"/api/chat.*"}Platform-wide chat request rate (req/s).
Total Token Throughput (tokens/s)Statprovider_tokens_totalCombined prompt + completion tokens per second.
Rate Limit Rejections (per min)Statrate_limit_rejections_totalRate-limited requests per minute. Threshold: yellow >10, red >100.
Max Budget Utilization (any org)Statbudget_utilization_ratioHighest budget utilization ratio across all orgs. Threshold: yellow >70%, red >90%. Ties to the BudgetThreshold80/95 alerts.
Request Volume by OrgTime serieshttp_requests_total by org_idPer-org request volume over time.
Token Counts by Org (prompt/completion)Time seriesprovider_tokens_total by org_id, token_typePer-org prompt and completion token consumption.
Budget Utilization by Org (%)Gaugebudget_utilization_ratio by org_idGauge visualization showing each org’s budget usage (0–100%). Threshold coloring: green <70%, yellow 70–90%, red >90%.
Rate Limit Hits by OrgTime series (bars)rate_limit_rejections_total by org_idPer-org rate limit rejections.

UID: arbitex-security-events Tags: arbitex, platform, security Variable: none (platform-wide).

Security signal monitoring — authentication failures, access controls, and anomaly detection.

PanelTypeMetric(s)Description
Auth Failures (per min)Statauth_failures_totalAuth failures per minute. Threshold: yellow >5, red >20.
Rate Limit Rejections (per min)Statrate_limit_rejections_totalRate-limited requests per minute.
mTLS Verification Failures (per min)Statmtls_verification_failures_totalMutual TLS handshake failures. Threshold: yellow >1, red >5.
IP Allowlist Blocks (per min)Statip_allowlist_blocks_totalRequests blocked by IP allowlist. Threshold: yellow >1, red >10.
GeoIP Anon IP Detections (per min)Statgeoip_anonymous_ip_detections_totalRequests from VPN/Tor/proxy IPs detected by GeoIP enrichment.
Auth Failures (window total)Statauth_failures_total (increase over range)Cumulative auth failures in the selected time window.
Security Event TimelineTime seriesAll five security metricsCombined time series: Auth Failures (red), Rate Limit (orange), mTLS Failures (purple), IP Blocks (yellow), Anon IP (light-blue).
Auth Failures by ReasonTime seriesauth_failures_total by reasonFailure rate broken down by failure reason (e.g., invalid token, expired token, missing key).
Auth Failure Summary (window)Tableauth_failures_total by reason (increase)Ranked table of failure counts by reason over the time window.

UID: arbitex-compliance Tags: arbitex, platform, compliance Variable: $framework — multi-select; filters policy violation panels to selected compliance frameworks.

Policy enforcement and audit integrity metrics for compliance reporting.

PanelTypeMetric(s)Description
Policy Violations (per min)Statpolicy_violations_totalReal-time policy violation rate. Threshold: yellow >1, red >10.
Audit Chain Breaks (window)Stataudit_chain_breaks_total (increase over range)HMAC chain verification failures. Any non-zero value is red — ties to the AuditChainBreak alert.
Policy Pack Evaluations (ops/s)Statpolicy_pack_evaluations_totalPolicy pack evaluation throughput.
Policy Violations (window total)Statpolicy_violations_total (increase over range)Cumulative violations in the selected time window.
Policy Violations by FrameworkTime seriespolicy_violations_total by frameworkViolation rate per compliance framework (e.g., HIPAA, SOC2, GDPR).
DLP Entity Distribution (window)Pie chart (donut)dlp_trigger_total by entity_typeDistribution of detected entity types over the selected range.
Violation Counts by Framework (window)Tablepolicy_violations_total by framework (increase)Ranked table of violation counts per framework.
Policy Pack Violation Rate by PackBar gaugepolicy_pack_evaluations_total by pack_nameFraction of evaluations resulting in a violation, per pack. Threshold: green <50%, yellow 50–80%, red >80%.

All six dashboards are editable in Grafana (the JSON files set "editable": true). To extend a dashboard without losing upstream updates, use Grafana’s built-in copy feature:

  1. Open the dashboard.
  2. Click the gear icon → Save As — enter a new title (e.g., “Arbitex — DLP Analysis (Custom)”).
  3. The copy is stored in Grafana’s database and survives re-imports of the original file.