Skip to content

SIEM integration guide

Arbitex can forward audit events to your SIEM via two independent delivery paths: Platform connectors (cloud-side, OCSF-formatted, managed) and Outpost direct sink (edge-side, raw JSON, air-gap capable). This guide explains both paths, covers Splunk HEC, Microsoft Sentinel, and syslog configuration in detail, documents the CEF format mapping for ArcSight-compatible systems, and provides verification procedures.


FactorPlatform connectorsOutpost direct sink
DeploymentCloud (SaaS) or HybridHybrid Outpost only
Event formatOCSF v1.1 (structured, SIEM-native)Raw JSON (Arbitex native)
Air-gap compatibleNoYes
Data residencyEvents transit Arbitex CloudEvents never leave your network
ConfigurationEnvironment variables on the PlatformEnvironment variables on the Outpost
ConnectorsSplunk HEC, Sentinel, Elastic, Datadog, Sumo LogicSplunk HEC, syslog (RFC 5424)
BatchingUp to 100 events / 5 secondsRing buffer, drain loop
Retry + dead letterExponential backoff, JSONL dead letterDead letter file at configured path

Use Platform connectors when your Outpost can reach Arbitex Cloud, you want OCSF-formatted events (no custom parsing), and you need one of the five supported connectors.

Use the Outpost direct sink when your security policy prohibits audit data from transiting Arbitex Cloud, you are operating in air-gapped mode, or you need sub-second latency between event generation and SIEM ingestion.

Both paths can run simultaneously. When running in parallel, the Platform relay delivers OCSF-formatted events and the Outpost direct sink delivers raw JSON. Use separate indexes or source types to avoid schema conflicts between the two streams.


Platform connectors are configured via environment variables on the Arbitex Platform (or your Helm values / Kubernetes secrets). All connectors emit events in OCSF v1.1 format. See SIEM integration for the complete OCSF schema reference and connector list.

Required environment variables:

VariableDescription
SPLUNK_HEC_URLHEC endpoint, e.g. https://splunk.example.com:8088/services/collector
SPLUNK_HEC_TOKENHEC authentication token

Optional:

VariableDefaultDescription
SPLUNK_HEC_INDEXarbitexTarget Splunk index
SPLUNK_HEC_SOURCEarbitex:auditEvent source name
SPLUNK_HEC_BATCH_SIZE100Events per batch
SPLUNK_HEC_FLUSH_INTERVAL5Seconds between flushes
SPLUNK_HEC_MAX_RETRIES3Retry attempts on failure

Splunk prerequisites:

  1. Enable HTTP Event Collector: Settings → Data Inputs → HTTP Event Collector → Global Settings → All Tokens: Enabled
  2. Create an HEC token with source type arbitex:ocsf
  3. Ensure the target index exists and the token has write access

Verifying connectivity:

Terminal window
# List connector status
curl https://api.arbitex.ai/api/admin/siem/connectors \
-H "Authorization: Bearer <admin_api_key>"
# Send a test event to Splunk HEC
curl -X POST https://api.arbitex.ai/api/admin/siem/test/splunk_hec \
-H "Authorization: Bearer <admin_api_key>"

In Splunk, confirm with:

index=arbitex sourcetype="arbitex:ocsf"
| spath output=action path=api.operation
| search action="siem_test_event"
| table _time, action

Required environment variables:

VariableDescription
SENTINEL_TENANT_IDAzure AD tenant ID
SENTINEL_CLIENT_IDAzure AD application (client) ID
SENTINEL_CLIENT_SECRETAzure AD application client secret
SENTINEL_DCE_ENDPOINTData Collection Endpoint URL
SENTINEL_DCR_IMMUTABLE_IDData Collection Rule immutable ID

Optional:

VariableDefaultDescription
SENTINEL_STREAM_NAMECustom-ArbitexOCSF_CLDCR stream name
SENTINEL_BATCH_SIZE100Events per batch
SENTINEL_FLUSH_INTERVAL5Seconds between flushes

Azure prerequisites:

  1. Create an Azure AD application registration; note tenant ID, client ID, and create a client secret
  2. Create a Data Collection Endpoint (DCE) in your Azure Monitor workspace
  3. Create a Data Collection Rule (DCR) with stream Custom-ArbitexOCSF_CL targeting a Log Analytics workspace
  4. Assign Monitoring Metrics Publisher role to your app registration on the DCR resource
  5. Note the DCE endpoint URL and DCR immutable ID

Test the connection:

Terminal window
curl -X POST https://api.arbitex.ai/api/admin/siem/test/sentinel \
-H "Authorization: Bearer <admin_api_key>"
Terminal window
# Overall SIEM health summary
curl https://api.arbitex.ai/api/admin/siem/health \
-H "Authorization: Bearer <admin_api_key>"

Response:

{
"healthy": 2,
"degraded": 0,
"error": 0,
"not_configured": 4,
"total": 6
}

Status meanings:

StatusDescription
healthyEndpoint reachable, credentials valid
degradedEndpoint reachable but returning unexpected responses
errorConnection failed or credentials invalid
not_configuredRequired environment variable(s) not set

The Outpost direct sink delivers events from the Outpost process directly to Splunk HEC without routing through Arbitex Cloud.

Terminal window
SIEM_DIRECT_ENABLED=true
SIEM_DIRECT_TYPE=splunk_hec
SIEM_DIRECT_URL=https://splunk.corp.example.com:8088
SIEM_DIRECT_TOKEN=a1b2c3d4-e5f6-7890-abcd-ef1234567890
SIEM_DIRECT_BUFFER_CAPACITY=10000
SIEM_DIRECT_DEAD_LETTER_PATH=/var/log/arbitex/siem-dead-letter.jsonl

The sink appends /services/collector/event to SIEM_DIRECT_URL automatically. Each event is a single HTTP POST with this body:

{
"event": { "...audit event fields..." },
"time": 1741478400.123,
"sourcetype": "arbitex:audit",
"source": "arbitex-outpost"
}

Events delivered via the direct sink are raw JSON, not OCSF-formatted. If you are ingesting into the same Splunk index as Platform SIEM events, use a distinct source type or index to avoid schema conflicts.

For full configuration reference and ring buffer / dead letter behavior, see Outpost SIEM direct sink.


The Outpost can deliver audit events over syslog to any RFC 5424-compatible receiver.

Terminal window
SIEM_DIRECT_ENABLED=true
SIEM_DIRECT_TYPE=syslog
SIEM_DIRECT_URL=udp://syslog.corp.example.com:514
SIEM_DIRECT_DEAD_LETTER_PATH=/var/log/arbitex/siem-dead-letter.jsonl

Use tcp:// for TCP syslog delivery (one connection per event):

Terminal window
SIEM_DIRECT_URL=tcp://syslog.corp.example.com:514

Each event is formatted as a single RFC 5424 syslog message:

<134>1 {timestamp} - arbitex-outpost {event_id} - - {json_body}
FieldValue
PRI134 (facility local0 = 16, severity informational = 6; 16×8+6=134)
VERSION1
HOSTNAME- (nil value)
APP-NAMEarbitex-outpost
PROCIDEvent ID from the audit record
MSGID- (nil value)
STRUCTURED-DATA- (nil value)
MSGFull audit event serialized as JSON

Example message:

<134>1 2026-03-11T08:30:00.000Z - arbitex-outpost req_01jnx4 - - {"request_id":"req_01jnx4","timestamp":"2026-03-11T08:30:00.000Z","user_id":"usr_abc","action":"chat_completion","model_id":"claude-sonnet-4-6","provider":"anthropic","token_count_input":312,"token_count_output":847}

UDP vs TCP:

  • UDP: Single datagram per event, no delivery confirmation. Datagrams may be silently dropped if the receiver is unavailable or if the datagram exceeds the network MTU. Use when your syslog infrastructure is UDP-native and low-overhead delivery is acceptable.
  • TCP: New connection per event, terminated with a newline character. Delivery confirmed at the transport layer but higher per-event overhead. Connection or write errors cause the event to be dead-lettered.

rsyslog — add a UDP input and forward to your SIEM:

/etc/rsyslog.conf
module(load="imudp")
input(type="imudp" port="514")
if $programname == 'arbitex-outpost' then {
action(type="omfwd" target="siem.corp.example.com" port="514" protocol="tcp")
}

syslog-ng — accept from Outpost and forward:

source s_arbitex {
udp(ip("0.0.0.0") port(514));
};
filter f_arbitex {
program("arbitex-outpost");
};
destination d_siem {
tcp("siem.corp.example.com" port(514));
};
log {
source(s_arbitex);
filter(f_arbitex);
destination(d_siem);
};

Common Event Format (CEF) is a structured syslog message format used by ArcSight, IBM QRadar, and other SIEMs that expect standardized field names. Arbitex does not emit CEF natively, but the audit event fields map directly to CEF headers and extensions.

CEF message structure:

CEF:0|Vendor|Product|Version|DeviceEventClassId|Name|Severity|Extensions

Use a syslog pipeline processor (Logstash, Fluentd, or a proprietary transformer) to convert OCSF events to CEF before ingestion.

CEF header mapping:

CEF FieldSourceValue
Versionconstant0
DeviceVendormetadata.product.vendor_nameArbitex
DeviceProductmetadata.product.nameArbitex
DeviceVersionmetadata.version1.1.0
DeviceEventClassIdclass_uide.g., 6003
Nameclass_name + api.operatione.g., Api Activity: prompt_sent
Severityseverity_id × 2 (CEF 0–10 scale)2 for Informational (severity_id=1)

CEF extension field mapping:

CEF ExtensionOCSF Source FieldNotes
rttimeMillisecond epoch (OCSF) → CEF receipt time
suseractor.user.uidUser ID of the actor
srcsrc_endpoint.ipSource IP address
cs1actor.user.org_uidTenant/org ID (custom string field)
cs1LabelconstanttenantId
cs2api.service.uidModel identifier
cs2LabelconstantmodelId
cs3api.service.nameProvider name
cs3Labelconstantprovider
cn1unmapped.token_count_inputInput token count
cn1LabelconstanttokenCountInput
cn2unmapped.token_count_outputOutput token count
cn2LabelconstanttokenCountOutput
cn3unmapped.latency_msRequest latency (ms)
cn3LabelconstantlatencyMs
actapi.operationAction performed
outcomeseverityOutcome classification

Example CEF output for a chat completion event:

CEF:0|Arbitex|Arbitex|1.1.0|6003|Api Activity: prompt_sent|2|rt=1741564800000 suser=usr_01HZ_ALICE src=198.51.100.42 cs1=org_acme cs1Label=tenantId cs2=claude-sonnet-4-6 cs2Label=modelId cs3=anthropic cs3Label=provider cn1=312 cn1Label=tokenCountInput cn2=847 cn2Label=tokenCountOutput cn3=1840 cn3Label=latencyMs act=prompt_sent outcome=Informational

Logstash filter for OCSF-to-CEF conversion

Section titled “Logstash filter for OCSF-to-CEF conversion”
filter {
if [class_uid] {
# Map severity_id to CEF severity (0-10 scale)
ruby {
code => '
severity_id = event.get("[severity_id]") || 1
cef_severity = [[severity_id * 2, 10].min, 0].max
event.set("[cef_severity]", cef_severity)
'
}
mutate {
add_field => {
"cef_header" => "CEF:0|Arbitex|Arbitex|%{[metadata][version]}|%{[class_uid]}|%{[class_name]}: %{[api][operation]}|%{[cef_severity]}"
"cef_extensions" => "rt=%{[time]} suser=%{[actor][user][uid]} src=%{[src_endpoint][ip]} cs1=%{[actor][user][org_uid]} cs1Label=tenantId cs2=%{[api][service][uid]} cs2Label=modelId cs3=%{[api][service][name]} cs3Label=provider act=%{[api][operation]}"
}
}
}
}

ArcSight SmartConnectors can ingest CEF-formatted syslog messages directly. Configure a Syslog Daemon SmartConnector pointed at the port where your syslog pipeline emits the transformed CEF stream. Map cs1 (tenantId), cs2 (modelId), and act to ArcSight Active Channel columns for dashboard filtering.


When a connector fails after all retry attempts, events are written to a JSONL dead letter file. Each line is a complete JSON object (OCSF event for Platform connectors, raw audit event for the Outpost syslog sink).

Terminal window
# Replay Splunk dead letter after SIEM is restored
jq -c '.event' /var/log/arbitex/siem-dead-letter.jsonl | while read -r event; do
curl -s -X POST "$SPLUNK_HEC_URL" \
-H "Authorization: Splunk $SIEM_DIRECT_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"sourcetype\": \"arbitex:audit\", \"index\": \"arbitex\", \"event\": $event}"
done

The dead letter file has no automatic size cap. Monitor disk usage and set up rotation if the SIEM endpoint is unavailable for an extended period.