Chat completions
POST /v1/chat/completions is the primary inference endpoint. It accepts requests in the standard OpenAI chat completions format and returns responses in the same format, with Arbitex-specific extensions for policy headers and DLP streaming events.
Endpoint
Section titled “Endpoint”POST https://api.arbitex.ai/v1/chat/completionsContent-Type: application/jsonAuthorization: Bearer arb_live_your-api-key-heresequenceDiagram participant Client as Client App participant GW as Arbitex Gateway participant DLP as DLP Pipeline participant PE as Policy Engine participant LLM as LLM Provider
Client->>GW: POST /v1/chat/completions GW->>DLP: Scan input (Tier 1-3) DLP-->>GW: Findings (entities, confidence) GW->>PE: Evaluate policy chain alt BLOCK PE-->>GW: BLOCK action GW-->>Client: 403 + X-Policy-Action: BLOCK else REDACT PE-->>GW: REDACT action GW->>GW: Apply redaction transforms GW->>LLM: Forward redacted prompt else ALLOW / ROUTE_TO PE-->>GW: ALLOW or ROUTE_TO GW->>LLM: Forward prompt end LLM-->>GW: Response (stream/batch) GW->>DLP: Scan output (if applies_to=output|both) GW-->>Client: Response + X-Policy-Action headerRequest body
Section titled “Request body”| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Provider and model identifier in the format provider/model-id. Example: anthropic/claude-sonnet-4-20250514. Omitting the provider prefix triggers automatic resolution against your configured provider list. |
messages | array | Yes | Ordered list of message objects. Each message has a role (system, user, or assistant) and content (string or content parts array). |
stream | boolean | No | If true, the response is returned as a server-sent events (SSE) stream. Default: false. |
temperature | float | No | Sampling temperature, 0.0–2.0. Higher values produce more varied output. Forwarded directly to the upstream provider; the gateway does not modify this value. |
max_tokens | integer | No | Maximum number of tokens to generate. If your organization has a token budget policy, the effective limit is the lower of this value and the budget limit. |
top_p | float | No | Nucleus sampling parameter. Forwarded to the upstream provider. |
stop | string or string[] | No | Stop sequences. Forwarded to the upstream provider. |
n | integer | No | Number of completions to generate. Forwarded to the upstream provider. |
user | string | No | An optional end-user identifier that you can pass for your own logging purposes. This is forwarded to the upstream provider and also recorded in the gateway audit log. |
Fields not listed here that are valid in the OpenAI API are forwarded to the upstream provider unchanged if the provider supports them. Unknown fields are silently dropped.
Response
Section titled “Response”The response is in the standard OpenAI chat completions format. Arbitex adds two response headers and, in DLP redaction cases, modifies the response body.
Response body (non-streaming)
Section titled “Response body (non-streaming)”{ "id": "chatcmpl-01HZ8X9K2P3QR4ST5UV6WX7YZ", "object": "chat.completion", "created": 1741442321, "model": "claude-sonnet-4-20250514", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Here is a summary of the key provisions..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 24, "completion_tokens": 148, "total_tokens": 172 }}The model field in the response reflects the model that actually served the request. If a ROUTE_TO policy rule rerouted the request to a different model than requested, the model field in the response reflects the actual model used, not the one in the original request.
Response headers
Section titled “Response headers”Every response includes policy decision metadata:
| Header | Description |
|---|---|
X-Policy-Action | The terminal action taken by the Policy Engine: ALLOW, BLOCK, CANCEL, REDACT, or ROUTE_TO. Always present. |
X-Matched-Rule | The ID of the specific rule that matched. Omitted when action is ALLOW. |
X-Request-ID | The unique request identifier, matching request_id in the audit log. |
X-RateLimit-Limit | Maximum requests permitted in the current rate limit window. |
X-RateLimit-Remaining | Requests remaining in the current window. |
X-RateLimit-Reset | Unix timestamp when the current window resets. |
DLP redaction in responses
Section titled “DLP redaction in responses”When the Policy Engine applies a REDACT action to the model’s output, the response body is modified before it is delivered to your application. Sensitive content in choices[].message.content is replaced with a placeholder string.
The default placeholder is [REDACTED]. Per-rule placeholders can be configured with custom strings such as [CC-REMOVED] or [PHI-REDACTED].
Example — output with redacted content:
{ "id": "chatcmpl-01HZ8X9K2P3QR4ST5UV6WX7YZ", "object": "chat.completion", "created": 1741442321, "model": "claude-sonnet-4-20250514", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The patient's name is [REDACTED] and their SSN is [REDACTED]. The treatment plan recommends..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 24, "completion_tokens": 148, "total_tokens": 172 }}The response header X-Policy-Action: REDACT is set when output redaction occurred.
Your application should treat [REDACTED] spans as data that was present in the model’s output but has been removed before delivery. Do not attempt to reconstruct or infer the original content. The audit log records what entity types were detected and redacted — query the audit log with the X-Request-ID value if you need to review what was found.
Code examples
Section titled “Code examples”Python (OpenAI SDK)
Section titled “Python (OpenAI SDK)”from openai import OpenAI
client = OpenAI( base_url="https://api.arbitex.ai/v1", api_key="arb_live_your-api-key-here",)
response = client.chat.completions.create( model="anthropic/claude-sonnet-4-20250514", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Summarize the key provisions of SOX Section 404."} ], temperature=0.3, max_tokens=512,)
print(response.choices[0].message.content)The OpenAI SDK passes Authorization: Bearer automatically. No other SDK configuration changes are needed.
curl https://api.arbitex.ai/v1/chat/completions \ -H "Authorization: Bearer arb_live_your-api-key-here" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4-20250514", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Summarize the key provisions of SOX Section 404."} ], "temperature": 0.3, "max_tokens": 512 }'Node.js (OpenAI SDK)
Section titled “Node.js (OpenAI SDK)”import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://api.arbitex.ai/v1", apiKey: "arb_live_your-api-key-here",});
const response = await client.chat.completions.create({ model: "anthropic/claude-sonnet-4-20250514", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Summarize the key provisions of SOX Section 404." }, ], temperature: 0.3, max_tokens: 512,});
console.log(response.choices[0].message.content);Streaming
Section titled “Streaming”Set "stream": true to receive the response as a server-sent events (SSE) stream. The stream follows the standard OpenAI streaming format — each event is a data: line containing a JSON delta object, terminated by data: [DONE].
Standard stream format
Section titled “Standard stream format”curl https://api.arbitex.ai/v1/chat/completions \ -H "Authorization: Bearer arb_live_your-api-key-here" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4-20250514", "messages": [{"role": "user", "content": "Write a haiku about audit logs."}], "stream": true }'data: {"id":"chatcmpl-01HZ","object":"chat.completion.chunk","created":1741442321,"model":"claude-sonnet-4-20250514","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-01HZ","object":"chat.completion.chunk","created":1741442321,"model":"claude-sonnet-4-20250514","choices":[{"index":0,"delta":{"content":"Every"},"finish_reason":null}]}
data: {"id":"chatcmpl-01HZ","object":"chat.completion.chunk","created":1741442321,"model":"claude-sonnet-4-20250514","choices":[{"index":0,"delta":{"content":" request"},"finish_reason":null}]}
...
data: [DONE]Arbitex streaming extension events
Section titled “Arbitex streaming extension events”Arbitex adds two event types that do not exist in the standard OpenAI streaming format. These events appear inline in the SSE stream before [DONE].
dlp_correction
Section titled “dlp_correction”Emitted when the Policy Engine applies output redaction to a content chunk that was already streamed. This can occur when the contextual validator (DLP Tier 3) determines — after a delay — that content already delivered needs to be flagged.
event: dlp_correctiondata: {"request_id":"req_01HZ8X9K2P3QR4ST5UV6WX7YZ","entity_type":"credit_card","replacement":"[REDACTED]","offset":142,"length":19}| Field | Description |
|---|---|
request_id | The request identifier, matching X-Request-ID |
entity_type | The DLP entity type that was detected |
replacement | The placeholder string that replaced the sensitive content |
offset | Character offset in the full assembled response where the replacement begins |
length | Length (in characters) of the replaced span |
When your application receives a dlp_correction event, it should:
- Replace the characters at
[offset, offset + length)in the assembled response with thereplacementstring - If you have already rendered the original content to an end user, update the display accordingly
- Log the correction for audit purposes — the full correction record is also in the gateway audit log
This event is rare in practice. It occurs only when output scanning detects high-confidence sensitive data in an already-streamed chunk. The gateway attempts to buffer output scanning results before streaming content, but the contextual validator runs asynchronously and corrections may arrive after delivery.
output_blocked
Section titled “output_blocked”Emitted when the Policy Engine determines that the model’s output violates a policy rule mid-stream and the stream must be terminated.
event: output_blockeddata: {"request_id":"req_01HZ8X9K2P3QR4ST5UV6WX7YZ","rule_id":"rule_01HZ_BLOCK_PHI_OUTPUT","message":"Output contains protected health information and cannot be delivered."}| Field | Description |
|---|---|
request_id | The request identifier |
rule_id | The ID of the policy rule that triggered the block |
message | The human-readable block message configured on the rule (if any) |
After output_blocked, the stream terminates. [DONE] is not emitted. Your application should treat the accumulated response up to this event as incomplete and discard it. Rendering partial blocked output to end users is not recommended.
See also
Section titled “See also”- API reference overview — authentication, rate limiting, and error codes
- Policy Engine overview — how
BLOCK,REDACT, andROUTE_TOactions are determined - DLP Overview — how the 3-tier pipeline produces the findings that drive output events
- Audit log — querying the per-request audit record, including DLP findings and policy decisions