Skip to content

Chat completions

POST /v1/chat/completions is the primary inference endpoint. It accepts requests in the standard OpenAI chat completions format and returns responses in the same format, with Arbitex-specific extensions for policy headers and DLP streaming events.


POST https://api.arbitex.ai/v1/chat/completions
Content-Type: application/json
Authorization: Bearer arb_live_your-api-key-here
sequenceDiagram
participant Client as Client App
participant GW as Arbitex Gateway
participant DLP as DLP Pipeline
participant PE as Policy Engine
participant LLM as LLM Provider
Client->>GW: POST /v1/chat/completions
GW->>DLP: Scan input (Tier 1-3)
DLP-->>GW: Findings (entities, confidence)
GW->>PE: Evaluate policy chain
alt BLOCK
PE-->>GW: BLOCK action
GW-->>Client: 403 + X-Policy-Action: BLOCK
else REDACT
PE-->>GW: REDACT action
GW->>GW: Apply redaction transforms
GW->>LLM: Forward redacted prompt
else ALLOW / ROUTE_TO
PE-->>GW: ALLOW or ROUTE_TO
GW->>LLM: Forward prompt
end
LLM-->>GW: Response (stream/batch)
GW->>DLP: Scan output (if applies_to=output|both)
GW-->>Client: Response + X-Policy-Action header

FieldTypeRequiredDescription
modelstringYesProvider and model identifier in the format provider/model-id. Example: anthropic/claude-sonnet-4-20250514. Omitting the provider prefix triggers automatic resolution against your configured provider list.
messagesarrayYesOrdered list of message objects. Each message has a role (system, user, or assistant) and content (string or content parts array).
streambooleanNoIf true, the response is returned as a server-sent events (SSE) stream. Default: false.
temperaturefloatNoSampling temperature, 0.0–2.0. Higher values produce more varied output. Forwarded directly to the upstream provider; the gateway does not modify this value.
max_tokensintegerNoMaximum number of tokens to generate. If your organization has a token budget policy, the effective limit is the lower of this value and the budget limit.
top_pfloatNoNucleus sampling parameter. Forwarded to the upstream provider.
stopstring or string[]NoStop sequences. Forwarded to the upstream provider.
nintegerNoNumber of completions to generate. Forwarded to the upstream provider.
userstringNoAn optional end-user identifier that you can pass for your own logging purposes. This is forwarded to the upstream provider and also recorded in the gateway audit log.

Fields not listed here that are valid in the OpenAI API are forwarded to the upstream provider unchanged if the provider supports them. Unknown fields are silently dropped.


The response is in the standard OpenAI chat completions format. Arbitex adds two response headers and, in DLP redaction cases, modifies the response body.

{
"id": "chatcmpl-01HZ8X9K2P3QR4ST5UV6WX7YZ",
"object": "chat.completion",
"created": 1741442321,
"model": "claude-sonnet-4-20250514",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here is a summary of the key provisions..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 148,
"total_tokens": 172
}
}

The model field in the response reflects the model that actually served the request. If a ROUTE_TO policy rule rerouted the request to a different model than requested, the model field in the response reflects the actual model used, not the one in the original request.

Every response includes policy decision metadata:

HeaderDescription
X-Policy-ActionThe terminal action taken by the Policy Engine: ALLOW, BLOCK, CANCEL, REDACT, or ROUTE_TO. Always present.
X-Matched-RuleThe ID of the specific rule that matched. Omitted when action is ALLOW.
X-Request-IDThe unique request identifier, matching request_id in the audit log.
X-RateLimit-LimitMaximum requests permitted in the current rate limit window.
X-RateLimit-RemainingRequests remaining in the current window.
X-RateLimit-ResetUnix timestamp when the current window resets.

When the Policy Engine applies a REDACT action to the model’s output, the response body is modified before it is delivered to your application. Sensitive content in choices[].message.content is replaced with a placeholder string.

The default placeholder is [REDACTED]. Per-rule placeholders can be configured with custom strings such as [CC-REMOVED] or [PHI-REDACTED].

Example — output with redacted content:

{
"id": "chatcmpl-01HZ8X9K2P3QR4ST5UV6WX7YZ",
"object": "chat.completion",
"created": 1741442321,
"model": "claude-sonnet-4-20250514",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The patient's name is [REDACTED] and their SSN is [REDACTED]. The treatment plan recommends..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 148,
"total_tokens": 172
}
}

The response header X-Policy-Action: REDACT is set when output redaction occurred.

Your application should treat [REDACTED] spans as data that was present in the model’s output but has been removed before delivery. Do not attempt to reconstruct or infer the original content. The audit log records what entity types were detected and redacted — query the audit log with the X-Request-ID value if you need to review what was found.


from openai import OpenAI
client = OpenAI(
base_url="https://api.arbitex.ai/v1",
api_key="arb_live_your-api-key-here",
)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4-20250514",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize the key provisions of SOX Section 404."}
],
temperature=0.3,
max_tokens=512,
)
print(response.choices[0].message.content)

The OpenAI SDK passes Authorization: Bearer automatically. No other SDK configuration changes are needed.

Terminal window
curl https://api.arbitex.ai/v1/chat/completions \
-H "Authorization: Bearer arb_live_your-api-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-20250514",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize the key provisions of SOX Section 404."}
],
"temperature": 0.3,
"max_tokens": 512
}'
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.arbitex.ai/v1",
apiKey: "arb_live_your-api-key-here",
});
const response = await client.chat.completions.create({
model: "anthropic/claude-sonnet-4-20250514",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Summarize the key provisions of SOX Section 404." },
],
temperature: 0.3,
max_tokens: 512,
});
console.log(response.choices[0].message.content);

Set "stream": true to receive the response as a server-sent events (SSE) stream. The stream follows the standard OpenAI streaming format — each event is a data: line containing a JSON delta object, terminated by data: [DONE].

Terminal window
curl https://api.arbitex.ai/v1/chat/completions \
-H "Authorization: Bearer arb_live_your-api-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-20250514",
"messages": [{"role": "user", "content": "Write a haiku about audit logs."}],
"stream": true
}'
data: {"id":"chatcmpl-01HZ","object":"chat.completion.chunk","created":1741442321,"model":"claude-sonnet-4-20250514","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-01HZ","object":"chat.completion.chunk","created":1741442321,"model":"claude-sonnet-4-20250514","choices":[{"index":0,"delta":{"content":"Every"},"finish_reason":null}]}
data: {"id":"chatcmpl-01HZ","object":"chat.completion.chunk","created":1741442321,"model":"claude-sonnet-4-20250514","choices":[{"index":0,"delta":{"content":" request"},"finish_reason":null}]}
...
data: [DONE]

Arbitex adds two event types that do not exist in the standard OpenAI streaming format. These events appear inline in the SSE stream before [DONE].

Emitted when the Policy Engine applies output redaction to a content chunk that was already streamed. This can occur when the contextual validator (DLP Tier 3) determines — after a delay — that content already delivered needs to be flagged.

event: dlp_correction
data: {"request_id":"req_01HZ8X9K2P3QR4ST5UV6WX7YZ","entity_type":"credit_card","replacement":"[REDACTED]","offset":142,"length":19}
FieldDescription
request_idThe request identifier, matching X-Request-ID
entity_typeThe DLP entity type that was detected
replacementThe placeholder string that replaced the sensitive content
offsetCharacter offset in the full assembled response where the replacement begins
lengthLength (in characters) of the replaced span

When your application receives a dlp_correction event, it should:

  1. Replace the characters at [offset, offset + length) in the assembled response with the replacement string
  2. If you have already rendered the original content to an end user, update the display accordingly
  3. Log the correction for audit purposes — the full correction record is also in the gateway audit log

This event is rare in practice. It occurs only when output scanning detects high-confidence sensitive data in an already-streamed chunk. The gateway attempts to buffer output scanning results before streaming content, but the contextual validator runs asynchronously and corrections may arrive after delivery.

Emitted when the Policy Engine determines that the model’s output violates a policy rule mid-stream and the stream must be terminated.

event: output_blocked
data: {"request_id":"req_01HZ8X9K2P3QR4ST5UV6WX7YZ","rule_id":"rule_01HZ_BLOCK_PHI_OUTPUT","message":"Output contains protected health information and cannot be delivered."}
FieldDescription
request_idThe request identifier
rule_idThe ID of the policy rule that triggered the block
messageThe human-readable block message configured on the rule (if any)

After output_blocked, the stream terminates. [DONE] is not emitted. Your application should treat the accumulated response up to this event as incomplete and discard it. Rendering partial blocked output to end users is not recommended.


  • API reference overview — authentication, rate limiting, and error codes
  • Policy Engine overview — how BLOCK, REDACT, and ROUTE_TO actions are determined
  • DLP Overview — how the 3-tier pipeline produces the findings that drive output events
  • Audit log — querying the per-request audit record, including DLP findings and policy decisions