Chat completions

POST /v1/chat/completions is the primary inference endpoint. It accepts requests in the standard OpenAI chat completions format and returns responses in the same format, with Arbitex-specific extensions for policy headers and DLP streaming events.

Endpoint

POST https://api.arbitex.ai/v1/chat/completions
Content-Type: application/json
Authorization: Bearer arb_live_your-api-key-here

sequenceDiagram
    participant Client as Client App
    participant GW as Arbitex Gateway
    participant DLP as DLP Pipeline
    participant PE as Policy Engine
    participant LLM as LLM Provider

    Client->>GW: POST /v1/chat/completions
    GW->>DLP: Scan input (Tier 1-3)
    DLP-->>GW: Findings (entities, confidence)
    GW->>PE: Evaluate policy chain
    alt BLOCK
        PE-->>GW: BLOCK action
        GW-->>Client: 403 + X-Policy-Action: BLOCK
    else REDACT
        PE-->>GW: REDACT action
        GW->>GW: Apply redaction transforms
        GW->>LLM: Forward redacted prompt
    else ALLOW / ROUTE_TO
        PE-->>GW: ALLOW or ROUTE_TO
        GW->>LLM: Forward prompt
    end
    LLM-->>GW: Response (stream/batch)
    GW->>DLP: Scan output (if applies_to=output|both)
    GW-->>Client: Response + X-Policy-Action header

Request body

Field	Type	Required	Description
`model`	string	Yes	Provider and model identifier in the format `provider/model-id`. Example: `anthropic/claude-sonnet-4-20250514`. Omitting the provider prefix triggers automatic resolution against your configured provider list.
`messages`	array	Yes	Ordered list of message objects. Each message has a `role` (`system`, `user`, or `assistant`) and `content` (string or content parts array).
`stream`	boolean	No	If `true`, the response is returned as a server-sent events (SSE) stream. Default: `false`.
`temperature`	float	No	Sampling temperature, 0.0–2.0. Higher values produce more varied output. Forwarded directly to the upstream provider; the gateway does not modify this value.
`max_tokens`	integer	No	Maximum number of tokens to generate. If your organization has a token budget policy, the effective limit is the lower of this value and the budget limit.
`top_p`	float	No	Nucleus sampling parameter. Forwarded to the upstream provider.
`stop`	string or string[]	No	Stop sequences. Forwarded to the upstream provider.
`n`	integer	No	Number of completions to generate. Forwarded to the upstream provider.
`user`	string	No	An optional end-user identifier that you can pass for your own logging purposes. This is forwarded to the upstream provider and also recorded in the gateway audit log.

Fields not listed here that are valid in the OpenAI API are forwarded to the upstream provider unchanged if the provider supports them. Unknown fields are silently dropped.

Response

The response is in the standard OpenAI chat completions format. Arbitex adds two response headers and, in DLP redaction cases, modifies the response body.

Response body (non-streaming)

{
  "id": "chatcmpl-01HZ8X9K2P3QR4ST5UV6WX7YZ",
  "object": "chat.completion",
  "created": 1741442321,
  "model": "claude-sonnet-4-20250514",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here is a summary of the key provisions..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 148,
    "total_tokens": 172
  }
}

The model field in the response reflects the model that actually served the request. If a ROUTE_TO policy rule rerouted the request to a different model than requested, the model field in the response reflects the actual model used, not the one in the original request.

Response headers

Every response includes policy decision metadata:

Header	Description
`X-Policy-Action`	The terminal action taken by the Policy Engine: `ALLOW`, `BLOCK`, `CANCEL`, `REDACT`, or `ROUTE_TO`. Always present.
`X-Matched-Rule`	The ID of the specific rule that matched. Omitted when action is `ALLOW`.
`X-Request-ID`	The unique request identifier, matching `request_id` in the audit log.
`X-RateLimit-Limit`	Maximum requests permitted in the current rate limit window.
`X-RateLimit-Remaining`	Requests remaining in the current window.
`X-RateLimit-Reset`	Unix timestamp when the current window resets.

DLP redaction in responses

When the Policy Engine applies a REDACT action to the model’s output, the response body is modified before it is delivered to your application. Sensitive content in choices[].message.content is replaced with a placeholder string.

The default placeholder is [REDACTED]. Per-rule placeholders can be configured with custom strings such as [CC-REMOVED] or [PHI-REDACTED].

Example — output with redacted content:

{
  "id": "chatcmpl-01HZ8X9K2P3QR4ST5UV6WX7YZ",
  "object": "chat.completion",
  "created": 1741442321,
  "model": "claude-sonnet-4-20250514",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The patient's name is [REDACTED] and their SSN is [REDACTED]. The treatment plan recommends..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 148,
    "total_tokens": 172
  }
}

The response header X-Policy-Action: REDACT is set when output redaction occurred.

Your application should treat [REDACTED] spans as data that was present in the model’s output but has been removed before delivery. Do not attempt to reconstruct or infer the original content. The audit log records what entity types were detected and redacted — query the audit log with the X-Request-ID value if you need to review what was found.

Code examples

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.arbitex.ai/v1",
    api_key="arb_live_your-api-key-here",
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-20250514",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Summarize the key provisions of SOX Section 404."}
    ],
    temperature=0.3,
    max_tokens=512,
)

print(response.choices[0].message.content)

The OpenAI SDK passes Authorization: Bearer automatically. No other SDK configuration changes are needed.

curl

curl https://api.arbitex.ai/v1/chat/completions \
  -H "Authorization: Bearer arb_live_your-api-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-20250514",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize the key provisions of SOX Section 404."}
    ],
    "temperature": 0.3,
    "max_tokens": 512
  }'

Node.js (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.arbitex.ai/v1",
  apiKey: "arb_live_your-api-key-here",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-20250514",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Summarize the key provisions of SOX Section 404." },
  ],
  temperature: 0.3,
  max_tokens: 512,
});

console.log(response.choices[0].message.content);

Streaming

Set "stream": true to receive the response as a server-sent events (SSE) stream. The stream follows the standard OpenAI streaming format — each event is a data: line containing a JSON delta object, terminated by data: [DONE].

Standard stream format

curl https://api.arbitex.ai/v1/chat/completions \
  -H "Authorization: Bearer arb_live_your-api-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-20250514",
    "messages": [{"role": "user", "content": "Write a haiku about audit logs."}],
    "stream": true
  }'

data: {"id":"chatcmpl-01HZ","object":"chat.completion.chunk","created":1741442321,"model":"claude-sonnet-4-20250514","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-01HZ","object":"chat.completion.chunk","created":1741442321,"model":"claude-sonnet-4-20250514","choices":[{"index":0,"delta":{"content":"Every"},"finish_reason":null}]}

data: {"id":"chatcmpl-01HZ","object":"chat.completion.chunk","created":1741442321,"model":"claude-sonnet-4-20250514","choices":[{"index":0,"delta":{"content":" request"},"finish_reason":null}]}

...

data: [DONE]

Arbitex streaming extension events

Arbitex adds two event types that do not exist in the standard OpenAI streaming format. These events appear inline in the SSE stream before [DONE].

`dlp_correction`

Emitted when the Policy Engine applies output redaction to a content chunk that was already streamed. This can occur when the contextual validator (DLP Tier 3) determines — after a delay — that content already delivered needs to be flagged.

event: dlp_correction
data: {"request_id":"req_01HZ8X9K2P3QR4ST5UV6WX7YZ","entity_type":"credit_card","replacement":"[REDACTED]","offset":142,"length":19}

Field	Description
`request_id`	The request identifier, matching `X-Request-ID`
`entity_type`	The DLP entity type that was detected
`replacement`	The placeholder string that replaced the sensitive content
`offset`	Character offset in the full assembled response where the replacement begins
`length`	Length (in characters) of the replaced span

When your application receives a dlp_correction event, it should:

Replace the characters at [offset, offset + length) in the assembled response with the replacement string
If you have already rendered the original content to an end user, update the display accordingly
Log the correction for audit purposes — the full correction record is also in the gateway audit log

This event is rare in practice. It occurs only when output scanning detects high-confidence sensitive data in an already-streamed chunk. The gateway attempts to buffer output scanning results before streaming content, but the contextual validator runs asynchronously and corrections may arrive after delivery.

`output_blocked`

Emitted when the Policy Engine determines that the model’s output violates a policy rule mid-stream and the stream must be terminated.

event: output_blocked
data: {"request_id":"req_01HZ8X9K2P3QR4ST5UV6WX7YZ","rule_id":"rule_01HZ_BLOCK_PHI_OUTPUT","message":"Output contains protected health information and cannot be delivered."}

Field	Description
`request_id`	The request identifier
`rule_id`	The ID of the policy rule that triggered the block
`message`	The human-readable block message configured on the rule (if any)

After output_blocked, the stream terminates. [DONE] is not emitted. Your application should treat the accumulated response up to this event as incomplete and discard it. Rendering partial blocked output to end users is not recommended.