Skip to content

SIEM Integration — Splunk HEC

The SIEMDirectSink feature (outpost-0008-siem-parity) allows Arbitex Hybrid Outpost to stream audit events directly to your Splunk instance without routing traffic through Arbitex Cloud. Events are delivered over the Splunk HTTP Event Collector (HEC) endpoint using batched, HMAC-authenticated payloads with automatic retry and dead letter fallback.

This guide applies to Outpost deployments only. For Cloud-managed SIEM forwarding see SIEM integration.


Before configuring the Outpost sink, ensure the following are in place:

  • Arbitex Hybrid Outpost version 1.8.0 or later deployed and registered to your organization
  • Splunk Enterprise 8.x or later, or Splunk Cloud (both support HEC)
  • HTTP Event Collector enabled on your Splunk instance (disabled by default in fresh installs)
  • A dedicated Splunk index named arbitex (or a name of your choosing) with appropriate retention settings
  • Network connectivity from the Outpost host to the Splunk HEC endpoint on port 8088 (or your configured HEC port)
  • A valid HEC token scoped to the target index

  1. Log in to Splunk Web as an administrator.
  2. Go to Settings → Data Inputs → HTTP Event Collector.
  3. Click Global Settings in the top-right corner.
  4. Set All Tokens to Enabled.
  5. Confirm the HTTP Port Number (default: 8088).
  6. Click Save.
  1. Go to Settings → Indexes → New Index.
  2. Set Index Name to arbitex (match this to SPLUNK_INDEX below).
  3. Configure Max Size of Entire Index and Retention according to your compliance requirements. Recommended minimum: 90 days for SOC 2 / ISO 27001 programs.
  4. Click Save.
  1. Return to Settings → Data Inputs → HTTP Event Collector.
  2. Click New Token.
  3. Name: arbitex-outpost (descriptive label only).
  4. Click Next.
  5. Source type: select arbitex:audit (create it as a new source type if it does not exist).
  6. Default Index: select arbitex.
  7. Click Review, then Submit.
  8. Copy the generated token value. You will set this as SPLUNK_HEC_TOKEN.

The HEC token is shown only once. Store it in your secrets manager immediately.


Set the following environment variables on your Outpost deployment (via .env file, Kubernetes Secret, or your secrets manager of choice):

VariableRequiredDefaultDescription
SIEM_SINKYesSet to splunk_hec to activate the Splunk sink
SPLUNK_HEC_URLYesFull HEC collector URL, e.g. https://your-splunk:8088/services/collector/event
SPLUNK_HEC_TOKENYesHEC authentication token created in Step 1.3
SPLUNK_INDEXNoarbitexTarget Splunk index name
SPLUNK_SOURCE_TYPENoarbitex:auditSplunk sourcetype applied to every event
SPLUNK_VERIFY_SSLNotrueVerify TLS certificate on the HEC endpoint. Set to false only in isolated test environments
SIEM_BATCH_SIZENo100Maximum number of events per HEC batch request
SIEM_FLUSH_INTERVAL_SECONDSNo10Maximum seconds between batch flushes, regardless of batch fill level
Terminal window
SIEM_SINK=splunk_hec
SPLUNK_HEC_URL=https://splunk.corp.example.com:8088/services/collector/event
SPLUNK_HEC_TOKEN=a1b2c3d4-e5f6-7890-abcd-ef1234567890
SPLUNK_INDEX=arbitex
SPLUNK_SOURCE_TYPE=arbitex:audit
apiVersion: v1
kind: Secret
metadata:
name: arbitex-outpost-siem
namespace: arbitex
type: Opaque
stringData:
SIEM_SINK: "splunk_hec"
SPLUNK_HEC_URL: "https://splunk.corp.example.com:8088/services/collector/event"
SPLUNK_HEC_TOKEN: "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
SPLUNK_INDEX: "arbitex"
SPLUNK_SOURCE_TYPE: "arbitex:audit"
SPLUNK_VERIFY_SSL: "true"
SIEM_BATCH_SIZE: "100"
SIEM_FLUSH_INTERVAL_SECONDS: "10"

Reference the secret in your Outpost Deployment’s envFrom:

envFrom:
- secretRef:
name: arbitex-outpost-siem

Events accumulate in an in-process buffer. A batch is flushed to the HEC endpoint when either:

  • The buffer reaches SIEM_BATCH_SIZE events, or
  • SIEM_FLUSH_INTERVAL_SECONDS elapses since the last flush (whichever comes first).

On HTTP 429 or 5xx responses, the sink retries with exponential backoff (up to 3 attempts). If all retries are exhausted, the batch is written to the dead letter file at /var/log/arbitex/splunk_dead_letter.jsonl. See Dead letter recovery below.


Events are delivered to Splunk wrapped in the standard HEC envelope. The event field contains the Arbitex native audit event as a JSON object. The OCSF-mapped format used by the Cloud connector is not used here — the Outpost sink delivers raw Arbitex audit events to minimize transform overhead at the edge.

{
"time": 1741564800.000,
"host": "outpost-prod-1.corp.example.com",
"source": "arbitex:outpost",
"sourcetype": "arbitex:audit",
"index": "arbitex",
"event": {
"timestamp": "2026-03-07T12:00:00.000Z",
"user_id": "a1b2c3d4-0001-0001-0001-000000000001",
"action": "chat_completion",
"conversation_id": "conv_01HZ_EXAMPLE",
"model_id": "claude-sonnet-4-6",
"provider": "anthropic",
"prompt_text": "[REDACTED]",
"response_text": "[REDACTED]",
"token_count_input": 312,
"token_count_output": 847,
"cost_estimate": 0.0024,
"latency_ms": 1840,
"tenant_id": "org_acme",
"metadata": {
"client_ip": "10.0.1.42",
"user_agent": "arbitex-sdk/2.1.0"
},
"hmac": "sha256:3f2a1b...",
"previous_hmac": "sha256:7c4e9d...",
"hmac_key_id": "key_2026_03"
}
}

Field notes:

  • prompt_text and response_text are redacted at the Outpost level if the DLP Output Redaction policy is active. The literal string [REDACTED] appears in place of the original text.
  • hmac, previous_hmac, and hmac_key_id are chain integrity fields used for tamper detection. See Audit log verification for how to validate the chain.
  • cost_estimate is in USD, derived from the provider’s published per-token pricing at the time of the request.

After Outpost restarts with the new configuration, use these searches in Splunk to confirm events are flowing:

index=arbitex sourcetype="arbitex:audit"
| head 20
| table _time, event.user_id, event.action, event.model_id, event.tenant_id

Event volume by action type (last 24 hours)

Section titled “Event volume by action type (last 24 hours)”
index=arbitex sourcetype="arbitex:audit" earliest=-24h
| spath input=_raw output=action path=event.action
| stats count by action
| sort -count
index=arbitex sourcetype="arbitex:audit" earliest=-7d
| spath input=_raw output=action path=event.action
| search action IN ("policy_block", "dlp_trigger", "dlp_redaction")
| table _time, event.user_id, event.action, event.conversation_id, event.tenant_id
| sort -_time
index=arbitex sourcetype="arbitex:audit" earliest=-1h
| spath input=_raw output=hmac path=event.hmac
| spath input=_raw output=previous_hmac path=event.previous_hmac
| spath input=_raw output=key_id path=event.hmac_key_id
| table _time, hmac, previous_hmac, key_id
| sort _time

Use the hmac and previous_hmac values to verify chain continuity: each event’s previous_hmac must equal the hmac of the immediately preceding event for the same tenant. Gaps indicate dropped events; mismatches indicate tamper.


After restarting Outpost with the new environment variables, send a synthetic test event using the Arbitex admin API:

Terminal window
curl -X POST https://api.arbitex.ai/api/admin/siem/test/splunk_hec \
-H "Authorization: Bearer arb_live_your-admin-api-key"

The response confirms whether the test event was accepted by HEC:

{
"connector": "splunk_hec",
"status": "ok",
"message": "Test event delivered. Check index 'arbitex' for action: siem_test_event."
}

Then confirm in Splunk:

index=arbitex sourcetype="arbitex:audit"
| spath input=_raw output=action path=event.action
| search action="siem_test_event"
| table _time, event.action, event.tenant_id

If the event does not appear within 30 seconds:

  1. Check the Outpost container logs for [siem] log lines — connection errors and HTTP status codes are logged at ERROR level.
  2. Verify the HEC endpoint is reachable from the Outpost host: curl -k -H "Authorization: Splunk <token>" https://your-splunk:8088/services/collector/health
  3. Confirm SPLUNK_VERIFY_SSL=true is appropriate for your certificate setup.
  4. Inspect the dead letter file: /var/log/arbitex/splunk_dead_letter.jsonl.

When all retry attempts for a batch are exhausted, events are written to /var/log/arbitex/splunk_dead_letter.jsonl. Each line is a self-contained JSON object:

{
"event": { "timestamp": "2026-03-07T12:00:00.000Z", "action": "chat_completion", "..." : "..." },
"error": "HTTP 503: Service Unavailable",
"connector": "splunk_hec",
"timestamp": 1741564800.0
}

To replay dead letter events after the SIEM is restored, parse the JSONL and re-submit each event payload to the HEC endpoint:

Terminal window
jq -c '.event' /var/log/arbitex/splunk_dead_letter.jsonl | while read -r event; do
curl -s -X POST "$SPLUNK_HEC_URL" \
-H "Authorization: Splunk $SPLUNK_HEC_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"sourcetype\": \"arbitex:audit\", \"index\": \"arbitex\", \"event\": $event}"
done

Contact Arbitex support for assisted bulk recovery if the dead letter file is large.