SIEM integration guide
Arbitex can forward audit events to your SIEM via two independent delivery paths: Platform connectors (cloud-side, OCSF-formatted, managed) and Outpost direct sink (edge-side, raw JSON, air-gap capable). This guide explains both paths, covers Splunk HEC, Microsoft Sentinel, and syslog configuration in detail, documents the CEF format mapping for ArcSight-compatible systems, and provides verification procedures.
Choosing a delivery path
Section titled “Choosing a delivery path”| Factor | Platform connectors | Outpost direct sink |
|---|---|---|
| Deployment | Cloud (SaaS) or Hybrid | Hybrid Outpost only |
| Event format | OCSF v1.1 (structured, SIEM-native) | Raw JSON (Arbitex native) |
| Air-gap compatible | No | Yes |
| Data residency | Events transit Arbitex Cloud | Events never leave your network |
| Configuration | Environment variables on the Platform | Environment variables on the Outpost |
| Connectors | Splunk HEC, Sentinel, Elastic, Datadog, Sumo Logic | Splunk HEC, syslog (RFC 5424) |
| Batching | Up to 100 events / 5 seconds | Ring buffer, drain loop |
| Retry + dead letter | Exponential backoff, JSONL dead letter | Dead letter file at configured path |
Use Platform connectors when your Outpost can reach Arbitex Cloud, you want OCSF-formatted events (no custom parsing), and you need one of the five supported connectors.
Use the Outpost direct sink when your security policy prohibits audit data from transiting Arbitex Cloud, you are operating in air-gapped mode, or you need sub-second latency between event generation and SIEM ingestion.
Both paths can run simultaneously. When running in parallel, the Platform relay delivers OCSF-formatted events and the Outpost direct sink delivers raw JSON. Use separate indexes or source types to avoid schema conflicts between the two streams.
Platform connectors
Section titled “Platform connectors”Platform connectors are configured via environment variables on the Arbitex Platform (or your Helm values / Kubernetes secrets). All connectors emit events in OCSF v1.1 format. See SIEM integration for the complete OCSF schema reference and connector list.
Splunk HEC (Platform)
Section titled “Splunk HEC (Platform)”Required environment variables:
| Variable | Description |
|---|---|
SPLUNK_HEC_URL | HEC endpoint, e.g. https://splunk.example.com:8088/services/collector |
SPLUNK_HEC_TOKEN | HEC authentication token |
Optional:
| Variable | Default | Description |
|---|---|---|
SPLUNK_HEC_INDEX | arbitex | Target Splunk index |
SPLUNK_HEC_SOURCE | arbitex:audit | Event source name |
SPLUNK_HEC_BATCH_SIZE | 100 | Events per batch |
SPLUNK_HEC_FLUSH_INTERVAL | 5 | Seconds between flushes |
SPLUNK_HEC_MAX_RETRIES | 3 | Retry attempts on failure |
Splunk prerequisites:
- Enable HTTP Event Collector: Settings → Data Inputs → HTTP Event Collector → Global Settings → All Tokens: Enabled
- Create an HEC token with source type
arbitex:ocsf - Ensure the target index exists and the token has write access
Verifying connectivity:
# List connector statuscurl https://api.arbitex.ai/api/admin/siem/connectors \ -H "Authorization: Bearer <admin_api_key>"
# Send a test event to Splunk HECcurl -X POST https://api.arbitex.ai/api/admin/siem/test/splunk_hec \ -H "Authorization: Bearer <admin_api_key>"In Splunk, confirm with:
index=arbitex sourcetype="arbitex:ocsf"| spath output=action path=api.operation| search action="siem_test_event"| table _time, actionMicrosoft Sentinel (Platform)
Section titled “Microsoft Sentinel (Platform)”Required environment variables:
| Variable | Description |
|---|---|
SENTINEL_TENANT_ID | Azure AD tenant ID |
SENTINEL_CLIENT_ID | Azure AD application (client) ID |
SENTINEL_CLIENT_SECRET | Azure AD application client secret |
SENTINEL_DCE_ENDPOINT | Data Collection Endpoint URL |
SENTINEL_DCR_IMMUTABLE_ID | Data Collection Rule immutable ID |
Optional:
| Variable | Default | Description |
|---|---|---|
SENTINEL_STREAM_NAME | Custom-ArbitexOCSF_CL | DCR stream name |
SENTINEL_BATCH_SIZE | 100 | Events per batch |
SENTINEL_FLUSH_INTERVAL | 5 | Seconds between flushes |
Azure prerequisites:
- Create an Azure AD application registration; note tenant ID, client ID, and create a client secret
- Create a Data Collection Endpoint (DCE) in your Azure Monitor workspace
- Create a Data Collection Rule (DCR) with stream
Custom-ArbitexOCSF_CLtargeting a Log Analytics workspace - Assign Monitoring Metrics Publisher role to your app registration on the DCR resource
- Note the DCE endpoint URL and DCR immutable ID
Test the connection:
curl -X POST https://api.arbitex.ai/api/admin/siem/test/sentinel \ -H "Authorization: Bearer <admin_api_key>"Connector health API
Section titled “Connector health API”# Overall SIEM health summarycurl https://api.arbitex.ai/api/admin/siem/health \ -H "Authorization: Bearer <admin_api_key>"Response:
{ "healthy": 2, "degraded": 0, "error": 0, "not_configured": 4, "total": 6}Status meanings:
| Status | Description |
|---|---|
healthy | Endpoint reachable, credentials valid |
degraded | Endpoint reachable but returning unexpected responses |
error | Connection failed or credentials invalid |
not_configured | Required environment variable(s) not set |
Outpost direct sink — Splunk HEC
Section titled “Outpost direct sink — Splunk HEC”The Outpost direct sink delivers events from the Outpost process directly to Splunk HEC without routing through Arbitex Cloud.
SIEM_DIRECT_ENABLED=trueSIEM_DIRECT_TYPE=splunk_hecSIEM_DIRECT_URL=https://splunk.corp.example.com:8088SIEM_DIRECT_TOKEN=a1b2c3d4-e5f6-7890-abcd-ef1234567890SIEM_DIRECT_BUFFER_CAPACITY=10000SIEM_DIRECT_DEAD_LETTER_PATH=/var/log/arbitex/siem-dead-letter.jsonlThe sink appends /services/collector/event to SIEM_DIRECT_URL automatically. Each event is a single HTTP POST with this body:
{ "event": { "...audit event fields..." }, "time": 1741478400.123, "sourcetype": "arbitex:audit", "source": "arbitex-outpost"}Events delivered via the direct sink are raw JSON, not OCSF-formatted. If you are ingesting into the same Splunk index as Platform SIEM events, use a distinct source type or index to avoid schema conflicts.
For full configuration reference and ring buffer / dead letter behavior, see Outpost SIEM direct sink.
Outpost direct sink — syslog (RFC 5424)
Section titled “Outpost direct sink — syslog (RFC 5424)”The Outpost can deliver audit events over syslog to any RFC 5424-compatible receiver.
Configuration
Section titled “Configuration”SIEM_DIRECT_ENABLED=trueSIEM_DIRECT_TYPE=syslogSIEM_DIRECT_URL=udp://syslog.corp.example.com:514SIEM_DIRECT_DEAD_LETTER_PATH=/var/log/arbitex/siem-dead-letter.jsonlUse tcp:// for TCP syslog delivery (one connection per event):
SIEM_DIRECT_URL=tcp://syslog.corp.example.com:514RFC 5424 message format
Section titled “RFC 5424 message format”Each event is formatted as a single RFC 5424 syslog message:
<134>1 {timestamp} - arbitex-outpost {event_id} - - {json_body}| Field | Value |
|---|---|
| PRI | 134 (facility local0 = 16, severity informational = 6; 16×8+6=134) |
| VERSION | 1 |
| HOSTNAME | - (nil value) |
| APP-NAME | arbitex-outpost |
| PROCID | Event ID from the audit record |
| MSGID | - (nil value) |
| STRUCTURED-DATA | - (nil value) |
| MSG | Full audit event serialized as JSON |
Example message:
<134>1 2026-03-11T08:30:00.000Z - arbitex-outpost req_01jnx4 - - {"request_id":"req_01jnx4","timestamp":"2026-03-11T08:30:00.000Z","user_id":"usr_abc","action":"chat_completion","model_id":"claude-sonnet-4-6","provider":"anthropic","token_count_input":312,"token_count_output":847}UDP vs TCP:
- UDP: Single datagram per event, no delivery confirmation. Datagrams may be silently dropped if the receiver is unavailable or if the datagram exceeds the network MTU. Use when your syslog infrastructure is UDP-native and low-overhead delivery is acceptable.
- TCP: New connection per event, terminated with a newline character. Delivery confirmed at the transport layer but higher per-event overhead. Connection or write errors cause the event to be dead-lettered.
Configuring common syslog receivers
Section titled “Configuring common syslog receivers”rsyslog — add a UDP input and forward to your SIEM:
module(load="imudp")input(type="imudp" port="514")if $programname == 'arbitex-outpost' then { action(type="omfwd" target="siem.corp.example.com" port="514" protocol="tcp")}syslog-ng — accept from Outpost and forward:
source s_arbitex { udp(ip("0.0.0.0") port(514));};filter f_arbitex { program("arbitex-outpost");};destination d_siem { tcp("siem.corp.example.com" port(514));};log { source(s_arbitex); filter(f_arbitex); destination(d_siem);};CEF format mapping
Section titled “CEF format mapping”Common Event Format (CEF) is a structured syslog message format used by ArcSight, IBM QRadar, and other SIEMs that expect standardized field names. Arbitex does not emit CEF natively, but the audit event fields map directly to CEF headers and extensions.
CEF message structure:
CEF:0|Vendor|Product|Version|DeviceEventClassId|Name|Severity|ExtensionsMapping Arbitex OCSF fields to CEF
Section titled “Mapping Arbitex OCSF fields to CEF”Use a syslog pipeline processor (Logstash, Fluentd, or a proprietary transformer) to convert OCSF events to CEF before ingestion.
CEF header mapping:
| CEF Field | Source | Value |
|---|---|---|
Version | constant | 0 |
DeviceVendor | metadata.product.vendor_name | Arbitex |
DeviceProduct | metadata.product.name | Arbitex |
DeviceVersion | metadata.version | 1.1.0 |
DeviceEventClassId | class_uid | e.g., 6003 |
Name | class_name + api.operation | e.g., Api Activity: prompt_sent |
Severity | severity_id × 2 (CEF 0–10 scale) | 2 for Informational (severity_id=1) |
CEF extension field mapping:
| CEF Extension | OCSF Source Field | Notes |
|---|---|---|
rt | time | Millisecond epoch (OCSF) → CEF receipt time |
suser | actor.user.uid | User ID of the actor |
src | src_endpoint.ip | Source IP address |
cs1 | actor.user.org_uid | Tenant/org ID (custom string field) |
cs1Label | constant | tenantId |
cs2 | api.service.uid | Model identifier |
cs2Label | constant | modelId |
cs3 | api.service.name | Provider name |
cs3Label | constant | provider |
cn1 | unmapped.token_count_input | Input token count |
cn1Label | constant | tokenCountInput |
cn2 | unmapped.token_count_output | Output token count |
cn2Label | constant | tokenCountOutput |
cn3 | unmapped.latency_ms | Request latency (ms) |
cn3Label | constant | latencyMs |
act | api.operation | Action performed |
outcome | severity | Outcome classification |
Example CEF output for a chat completion event:
CEF:0|Arbitex|Arbitex|1.1.0|6003|Api Activity: prompt_sent|2|rt=1741564800000 suser=usr_01HZ_ALICE src=198.51.100.42 cs1=org_acme cs1Label=tenantId cs2=claude-sonnet-4-6 cs2Label=modelId cs3=anthropic cs3Label=provider cn1=312 cn1Label=tokenCountInput cn2=847 cn2Label=tokenCountOutput cn3=1840 cn3Label=latencyMs act=prompt_sent outcome=InformationalLogstash filter for OCSF-to-CEF conversion
Section titled “Logstash filter for OCSF-to-CEF conversion”filter { if [class_uid] { # Map severity_id to CEF severity (0-10 scale) ruby { code => ' severity_id = event.get("[severity_id]") || 1 cef_severity = [[severity_id * 2, 10].min, 0].max event.set("[cef_severity]", cef_severity) ' } mutate { add_field => { "cef_header" => "CEF:0|Arbitex|Arbitex|%{[metadata][version]}|%{[class_uid]}|%{[class_name]}: %{[api][operation]}|%{[cef_severity]}" "cef_extensions" => "rt=%{[time]} suser=%{[actor][user][uid]} src=%{[src_endpoint][ip]} cs1=%{[actor][user][org_uid]} cs1Label=tenantId cs2=%{[api][service][uid]} cs2Label=modelId cs3=%{[api][service][name]} cs3Label=provider act=%{[api][operation]}" } } }}ArcSight CEF ingestion
Section titled “ArcSight CEF ingestion”ArcSight SmartConnectors can ingest CEF-formatted syslog messages directly. Configure a Syslog Daemon SmartConnector pointed at the port where your syslog pipeline emits the transformed CEF stream. Map cs1 (tenantId), cs2 (modelId), and act to ArcSight Active Channel columns for dashboard filtering.
Dead letter recovery
Section titled “Dead letter recovery”When a connector fails after all retry attempts, events are written to a JSONL dead letter file. Each line is a complete JSON object (OCSF event for Platform connectors, raw audit event for the Outpost syslog sink).
# Replay Splunk dead letter after SIEM is restoredjq -c '.event' /var/log/arbitex/siem-dead-letter.jsonl | while read -r event; do curl -s -X POST "$SPLUNK_HEC_URL" \ -H "Authorization: Splunk $SIEM_DIRECT_TOKEN" \ -H "Content-Type: application/json" \ -d "{\"sourcetype\": \"arbitex:audit\", \"index\": \"arbitex\", \"event\": $event}"doneThe dead letter file has no automatic size cap. Monitor disk usage and set up rotation if the SIEM endpoint is unavailable for an extended period.
See also
Section titled “See also”- SIEM integration admin reference — OCSF schema, connector comparison, admin UI
- Splunk HEC connector — Detailed Outpost direct sink for Splunk
- Microsoft Sentinel connector — DCR-based Sentinel integration
- Outpost SIEM direct sink — Ring buffer, monitoring, and dead letter behavior
- Audit log export — Export audit records with HMAC signature for offline analysis
- Audit log verification — HMAC chain tamper detection