Outpost Health Monitoring

Arbitex Hybrid Outpost reports health to the management plane via a dual-track heartbeat system. Every deployed outpost sends periodic heartbeats both to the Platform management plane (policy sync channel) and to the Cloud Portal (operational dashboard). This document explains the architecture, configuration, and how to use the portal dashboard.

Heartbeat Architecture

Two heartbeat channels

┌─────────────────────────────────────────────────────────┐
│                    Hybrid Outpost                        │
│                                                          │
│  HeartbeatSender                                         │
│  ├── Platform heartbeat  ──────────────────────────────► Platform management plane
│  │   POST /v1/orgs/{org_id}/outposts/{outpost_id}/heartbeat │
│  │   Auth: mTLS (same cert as policy sync)              │
│  │   Interval: 120s (with backoff on failure)           │
│  │                                                       │
│  └── Enhanced heartbeat ───────────────────────────────► Cloud Portal
│      POST {CLOUD_HEARTBEAT_URL}/v1/outpost/heartbeat    │
│      Auth: mTLS preferred; Bearer token fallback         │
│      Interval: CLOUD_HEARTBEAT_INTERVAL (default 60s)   │
└─────────────────────────────────────────────────────────┘

Platform heartbeat carries the operational state used for policy sync decisions: version, uptime, policy sync status, DLP tier 3 activation, and pending audit event count.

Enhanced heartbeat carries the richer telemetry displayed in the Cloud Portal dashboard: DLP tier list, certificate expiry, resource usage (CPU/memory/disk).

The enhanced heartbeat fires after every platform heartbeat attempt, regardless of whether the platform heartbeat succeeded.

Backoff on failure

When platform heartbeats fail (network errors, timeouts, HTTP non-2xx), the outpost applies exponential backoff:

delay = min(120s × 2^(failures−1), 900s) × ±10% jitter

First failure: 120s
Second failure: 240s
…capped at 900 seconds (15 minutes)

On the next successful heartbeat the interval resets to the base 120s. Jitter (±10%) prevents thundering-herd reconnection when many outposts recover simultaneously.

Heartbeat Payload Fields

Platform management plane payload

Sent to POST /v1/orgs/{org_id}/outposts/{outpost_id}/heartbeat:

Field	Type	Description
`version`	string	Outpost software version (e.g. `0.1.0`)
`uptime`	int	Seconds since the outpost process started
`policy_version`	string	Version hash of the currently active policy bundle
`last_sync_at`	ISO-8601 string \| null	Timestamp of the most recent successful policy sync
`dlp_model_version`	string	DeBERTa ONNX model identifier, or `none` if Tier 3 is inactive
`pending_audit_events`	int	Approximate count of unsynced audit events (capped at 100); `-1` = error reading count
`tier3_active`	bool	Whether DeBERTa (Tier 3) contextual DLP is currently loaded and available

Cloud Portal enhanced payload

Sent to POST {CLOUD_HEARTBEAT_URL}/v1/outpost/heartbeat:

Field	Type	Description
`outpost_id`	UUID	Outpost identifier
`version`	string	Outpost software version
`uptime_seconds`	int	Seconds since the outpost process started
`last_policy_sync`	ISO-8601 string \| null	Timestamp of the most recent successful policy sync
`dlp_tiers_active`	string[]	Active DLP tiers: subset of `["regex", "ner", "deberta", "credint"]`
`cert_expiry`	ISO-8601 string \| null	mTLS certificate expiry date; null if unreadable
`resource_usage`	object	CPU/memory/disk percentages: `{cpu_percent, memory_percent, disk_percent}`

Platform acknowledgement

The Platform management plane responds to a successful heartbeat with HTTP 200 and optionally a JSON body containing latest_version. If the outpost is running an outdated version:

{"latest_version": "0.2.0"}

The outpost logs a warning: Outpost version outdated: running=0.1.0 latest=0.2.0 — update recommended. No automatic action is taken; the operator must deploy the update.

Configuration

Set these environment variables on the outpost:

Variable	Required	Default	Description
`CLOUD_HEARTBEAT_URL`	No	`""`	Base URL of the Cloud Portal heartbeat receiver. When empty, enhanced heartbeats are silently skipped.
`CLOUD_HEARTBEAT_INTERVAL`	No	`60`	Interval in seconds between enhanced heartbeats.
`OUTPOST_CERT_PATH`	Yes (production)	`certs/outpost.pem`	Path to the outpost mTLS client certificate.
`OUTPOST_KEY_PATH`	Yes (production)	`certs/outpost.key`	Path to the outpost mTLS private key.
`OUTPOST_CA_PATH`	Yes (production)	`certs/ca.pem`	Path to the Platform CA certificate for server verification.
`PLATFORM_MANAGEMENT_URL`	Yes	`""`	Platform management plane base URL. Heartbeats are skipped if empty.
`OUTPOST_ID`	Yes	`""`	Outpost UUID from the Cloud Portal registration.
`ORG_ID`	Yes	`""`	Organisation UUID. Required for the heartbeat URL path.

Note: The platform heartbeat interval is hardcoded at 120 seconds and is not configurable via environment variable. The CLOUD_HEARTBEAT_INTERVAL setting applies only to the enhanced (Cloud Portal) heartbeat channel.

Cloud Portal Dashboard

Outpost list (`/portal/outposts`)

The Outposts page in the Cloud Portal shows all registered outposts for the organisation. Each row shows:

Outpost name and region
Last heartbeat timestamp
Status badge (green / amber / red — see thresholds below)
Software version and whether an update is available
Active DLP tiers
Certificate expiry date (with warning when < 30 days remaining)

Status thresholds

Colour	Condition	Meaning
Green (healthy)	Heartbeat received within the last 5 minutes	Outpost is operating normally
Amber (stale)	Last heartbeat 5–30 minutes ago	Outpost may be experiencing connectivity issues or is under high backoff
Red (offline)	No heartbeat for > 30 minutes, or `deregistered` status	Outpost is unreachable or deregistered

Heartbeat history

Navigate to an individual outpost and click Heartbeat History to view the last 50 heartbeat records (paginated, newest first). The admin API behind this view:

GET /v1/admin/outposts/{outpost_id}/heartbeats?limit=50&offset=0
Authorization: X-API-Key <admin-key>

Each record includes: received_at, status, version, uptime_seconds, policy_version, last_sync_at, dlp_tiers_active, cert_expiry, resource_usage.

All-org outpost list (multi-org admins)

GET /v1/admin/outposts
Authorization: X-API-Key <admin-key>

Returns all outposts across all organisations, ordered by most-recently-seen first.

Troubleshooting

Outpost shows amber/red but is running

Cause: Heartbeats are reaching the outpost process but not getting through to the Platform or Cloud Portal.

Checks:

Firewall rules. The outpost must be able to make outbound HTTPS connections to PLATFORM_MANAGEMENT_URL and CLOUD_HEARTBEAT_URL. Verify there is no egress firewall blocking TCP 443.
mTLS certificate validity. The outpost will refuse to send heartbeats if OUTPOST_CERT_PATH, OUTPOST_KEY_PATH, or OUTPOST_CA_PATH are missing. Check outpost logs for mTLS certificates required but missing.
Backoff state. After repeated failures the outpost may be sleeping for up to 15 minutes between attempts. Check logs for Heartbeat backoff — sleeping Xs (failure=N). Wait for the next attempt or restart the outpost to reset the backoff counter.
Proxy/load balancer. If the outpost connects via a forward proxy, confirm the proxy allows connections to both the Platform management plane and the Cloud Portal.

Stale status in portal after outpost restart

The Cloud Portal status is derived from last_heartbeat_at. After restart there is a 120-second window before the first platform heartbeat and up to CLOUD_HEARTBEAT_INTERVAL seconds before the first enhanced heartbeat. The status will update automatically once the first heartbeat is received.

Missed heartbeats after policy sync disruption

The heartbeat sender is independent of the policy sync client. A failed policy sync does not prevent heartbeats from being sent. If heartbeats are missing while policy sync is also failing, the root cause is likely a network connectivity issue or an expired mTLS certificate.

Certificate expiry warning in portal

When cert_expiry in the heartbeat is within 30 days, the portal shows a warning badge on the outpost row. Renew the outpost certificate before it expires:

# Via Cloud admin API
POST /v1/orgs/{org_id}/outposts/{outpost_id}/renew
Authorization: X-API-Key <admin-key>

The renewed certificate bundle (cert + key + CA) must be deployed to the outpost’s OUTPOST_CERT_PATH and OUTPOST_KEY_PATH. The outpost process picks up the new cert on next mTLS client creation (next heartbeat cycle after the files are replaced in-place).

Resource usage fields missing from history

resource_usage is populated by the psutil library. If psutil is not installed in the outpost container image, resource fields are omitted from the enhanced heartbeat payload. This does not affect other heartbeat functionality. Install psutil to enable CPU/memory/disk reporting:

RUN pip install psutil

Outpost Health Monitoring

Outpost Health Monitoring

Heartbeat Architecture

Two heartbeat channels

Backoff on failure

Heartbeat Payload Fields

Platform management plane payload

Cloud Portal enhanced payload

Platform acknowledgement

Configuration

Cloud Portal Dashboard

Outpost list (/portal/outposts)

Status thresholds

Heartbeat history

All-org outpost list (multi-org admins)

Troubleshooting

Outpost shows amber/red but is running

Stale status in portal after outpost restart

Missed heartbeats after policy sync disruption

Certificate expiry warning in portal

Resource usage fields missing from history

Outpost list (`/portal/outposts`)