Skip to content

Outpost Health Monitoring

Arbitex Hybrid Outpost reports health to the management plane via a dual-track heartbeat system. Every deployed outpost sends periodic heartbeats both to the Platform management plane (policy sync channel) and to the Cloud Portal (operational dashboard). This document explains the architecture, configuration, and how to use the portal dashboard.


┌─────────────────────────────────────────────────────────┐
│ Hybrid Outpost │
│ │
│ HeartbeatSender │
│ ├── Platform heartbeat ──────────────────────────────► Platform management plane
│ │ POST /v1/orgs/{org_id}/outposts/{outpost_id}/heartbeat │
│ │ Auth: mTLS (same cert as policy sync) │
│ │ Interval: 120s (with backoff on failure) │
│ │ │
│ └── Enhanced heartbeat ───────────────────────────────► Cloud Portal
│ POST {CLOUD_HEARTBEAT_URL}/v1/outpost/heartbeat │
│ Auth: mTLS preferred; Bearer token fallback │
│ Interval: CLOUD_HEARTBEAT_INTERVAL (default 60s) │
└─────────────────────────────────────────────────────────┘

Platform heartbeat carries the operational state used for policy sync decisions: version, uptime, policy sync status, DLP tier 3 activation, and pending audit event count.

Enhanced heartbeat carries the richer telemetry displayed in the Cloud Portal dashboard: DLP tier list, certificate expiry, resource usage (CPU/memory/disk).

The enhanced heartbeat fires after every platform heartbeat attempt, regardless of whether the platform heartbeat succeeded.

When platform heartbeats fail (network errors, timeouts, HTTP non-2xx), the outpost applies exponential backoff:

delay = min(120s × 2^(failures−1), 900s) × ±10% jitter
  • First failure: 120s
  • Second failure: 240s
  • …capped at 900 seconds (15 minutes)

On the next successful heartbeat the interval resets to the base 120s. Jitter (±10%) prevents thundering-herd reconnection when many outposts recover simultaneously.


Sent to POST /v1/orgs/{org_id}/outposts/{outpost_id}/heartbeat:

FieldTypeDescription
versionstringOutpost software version (e.g. 0.1.0)
uptimeintSeconds since the outpost process started
policy_versionstringVersion hash of the currently active policy bundle
last_sync_atISO-8601 string | nullTimestamp of the most recent successful policy sync
dlp_model_versionstringDeBERTa ONNX model identifier, or none if Tier 3 is inactive
pending_audit_eventsintApproximate count of unsynced audit events (capped at 100); -1 = error reading count
tier3_activeboolWhether DeBERTa (Tier 3) contextual DLP is currently loaded and available

Sent to POST {CLOUD_HEARTBEAT_URL}/v1/outpost/heartbeat:

FieldTypeDescription
outpost_idUUIDOutpost identifier
versionstringOutpost software version
uptime_secondsintSeconds since the outpost process started
last_policy_syncISO-8601 string | nullTimestamp of the most recent successful policy sync
dlp_tiers_activestring[]Active DLP tiers: subset of ["regex", "ner", "deberta", "credint"]
cert_expiryISO-8601 string | nullmTLS certificate expiry date; null if unreadable
resource_usageobjectCPU/memory/disk percentages: {cpu_percent, memory_percent, disk_percent}

The Platform management plane responds to a successful heartbeat with HTTP 200 and optionally a JSON body containing latest_version. If the outpost is running an outdated version:

{"latest_version": "0.2.0"}

The outpost logs a warning: Outpost version outdated: running=0.1.0 latest=0.2.0 — update recommended. No automatic action is taken; the operator must deploy the update.


Set these environment variables on the outpost:

VariableRequiredDefaultDescription
CLOUD_HEARTBEAT_URLNo""Base URL of the Cloud Portal heartbeat receiver. When empty, enhanced heartbeats are silently skipped.
CLOUD_HEARTBEAT_INTERVALNo60Interval in seconds between enhanced heartbeats.
OUTPOST_CERT_PATHYes (production)certs/outpost.pemPath to the outpost mTLS client certificate.
OUTPOST_KEY_PATHYes (production)certs/outpost.keyPath to the outpost mTLS private key.
OUTPOST_CA_PATHYes (production)certs/ca.pemPath to the Platform CA certificate for server verification.
PLATFORM_MANAGEMENT_URLYes""Platform management plane base URL. Heartbeats are skipped if empty.
OUTPOST_IDYes""Outpost UUID from the Cloud Portal registration.
ORG_IDYes""Organisation UUID. Required for the heartbeat URL path.

Note: The platform heartbeat interval is hardcoded at 120 seconds and is not configurable via environment variable. The CLOUD_HEARTBEAT_INTERVAL setting applies only to the enhanced (Cloud Portal) heartbeat channel.


The Outposts page in the Cloud Portal shows all registered outposts for the organisation. Each row shows:

  • Outpost name and region
  • Last heartbeat timestamp
  • Status badge (green / amber / red — see thresholds below)
  • Software version and whether an update is available
  • Active DLP tiers
  • Certificate expiry date (with warning when < 30 days remaining)
ColourConditionMeaning
Green (healthy)Heartbeat received within the last 5 minutesOutpost is operating normally
Amber (stale)Last heartbeat 5–30 minutes agoOutpost may be experiencing connectivity issues or is under high backoff
Red (offline)No heartbeat for > 30 minutes, or deregistered statusOutpost is unreachable or deregistered

Navigate to an individual outpost and click Heartbeat History to view the last 50 heartbeat records (paginated, newest first). The admin API behind this view:

GET /v1/admin/outposts/{outpost_id}/heartbeats?limit=50&offset=0
Authorization: X-API-Key <admin-key>

Each record includes: received_at, status, version, uptime_seconds, policy_version, last_sync_at, dlp_tiers_active, cert_expiry, resource_usage.

GET /v1/admin/outposts
Authorization: X-API-Key <admin-key>

Returns all outposts across all organisations, ordered by most-recently-seen first.


Cause: Heartbeats are reaching the outpost process but not getting through to the Platform or Cloud Portal.

Checks:

  1. Firewall rules. The outpost must be able to make outbound HTTPS connections to PLATFORM_MANAGEMENT_URL and CLOUD_HEARTBEAT_URL. Verify there is no egress firewall blocking TCP 443.
  2. mTLS certificate validity. The outpost will refuse to send heartbeats if OUTPOST_CERT_PATH, OUTPOST_KEY_PATH, or OUTPOST_CA_PATH are missing. Check outpost logs for mTLS certificates required but missing.
  3. Backoff state. After repeated failures the outpost may be sleeping for up to 15 minutes between attempts. Check logs for Heartbeat backoff — sleeping Xs (failure=N). Wait for the next attempt or restart the outpost to reset the backoff counter.
  4. Proxy/load balancer. If the outpost connects via a forward proxy, confirm the proxy allows connections to both the Platform management plane and the Cloud Portal.

Stale status in portal after outpost restart

Section titled “Stale status in portal after outpost restart”

The Cloud Portal status is derived from last_heartbeat_at. After restart there is a 120-second window before the first platform heartbeat and up to CLOUD_HEARTBEAT_INTERVAL seconds before the first enhanced heartbeat. The status will update automatically once the first heartbeat is received.

Missed heartbeats after policy sync disruption

Section titled “Missed heartbeats after policy sync disruption”

The heartbeat sender is independent of the policy sync client. A failed policy sync does not prevent heartbeats from being sent. If heartbeats are missing while policy sync is also failing, the root cause is likely a network connectivity issue or an expired mTLS certificate.

When cert_expiry in the heartbeat is within 30 days, the portal shows a warning badge on the outpost row. Renew the outpost certificate before it expires:

Terminal window
# Via Cloud admin API
POST /v1/orgs/{org_id}/outposts/{outpost_id}/renew
Authorization: X-API-Key <admin-key>

The renewed certificate bundle (cert + key + CA) must be deployed to the outpost’s OUTPOST_CERT_PATH and OUTPOST_KEY_PATH. The outpost process picks up the new cert on next mTLS client creation (next heartbeat cycle after the files are replaced in-place).

Resource usage fields missing from history

Section titled “Resource usage fields missing from history”

resource_usage is populated by the psutil library. If psutil is not installed in the outpost container image, resource fields are omitted from the enhanced heartbeat payload. This does not affect other heartbeat functionality. Install psutil to enable CPU/memory/disk reporting:

RUN pip install psutil