Kill switch
The kill switch provides immediate, manual control to disable a specific provider, model, or (provider, model) pair. When activated, the gateway blocks all requests to the disabled entry — no traffic reaches the provider, and the fallback chain skips the entry automatically. The kill switch is designed for emergency response: a provider security incident, a model producing harmful output, or an operational issue that requires instant traffic cutoff.
What happens when the kill switch is activated
Section titled “What happens when the kill switch is activated”- All requests targeting the disabled (provider, model) pair are blocked immediately with a
503 provider_unavailableerror - Fallback chains skip the disabled entry — if
gpt-4ois kill-switched and a fallback chain lists it as a secondary entry, the gateway proceeds to the next entry in the chain - The kill switch state is persisted to the database and survives gateway restarts
- The audit log records the activation with the identity of the admin who triggered it, the reason string, and a timestamp
The kill switch does not affect requests already in flight. Requests that have been forwarded to the provider before the switch is activated complete normally.
Activating the kill switch
Section titled “Activating the kill switch”Via the admin API
Section titled “Via the admin API”POST https://api.arbitex.ai/api/admin/kill-switchAuthorization: Bearer arb_live_your-api-key-hereContent-Type: application/json
{ "provider": "openai", "model_id": "gpt-4o", "enabled": false, "reason": "Provider security incident — disabling pending investigation"}| Field | Required | Description |
|---|---|---|
provider | Yes | Provider identifier (e.g., openai, anthropic) |
model_id | No | Model identifier. If omitted, the kill switch applies to all models for the provider. |
enabled | Yes | Set to false to disable the entry. Set to true to re-enable. |
reason | Yes | A human-readable reason recorded in the audit log. This field is required — every kill switch activation must document why. |
Via the admin portal
Section titled “Via the admin portal”The admin portal provides a visual toggle on the provider management page (Settings > Providers > [Provider Name]). The toggle includes a confirmation dialog that requires a reason string before activation.
Re-enabling a provider or model
Section titled “Re-enabling a provider or model”POST https://api.arbitex.ai/api/admin/kill-switchAuthorization: Bearer arb_live_your-api-key-hereContent-Type: application/json
{ "provider": "openai", "model_id": "gpt-4o", "enabled": true, "reason": "Provider incident resolved — re-enabling after verification"}Re-enabling restores the entry to normal operation. The health monitor state is evaluated independently — if the health monitor had previously disengaged the entry due to failures, re-enabling the kill switch does not override the health monitor state. The health monitor must independently verify the entry is healthy before routing live traffic.
Fallback chain interaction
Section titled “Fallback chain interaction”The kill switch overrides fallback chain configuration. When an entry is kill-switched:
- As a primary target: the gateway immediately falls through to the next entry in the fallback chain, as if the primary returned a 5xx error
- As a fallback entry: the gateway skips the kill-switched entry and continues to the next fallback in the chain
- As the only entry: the request returns
503 provider_unavailablewith no further fallback attempts
Plan your fallback chains with the assumption that any single entry may be kill-switched at any time. A kill switch on a fallback chain entry does not generate an error — it is silently skipped.
Kill switch vs health monitoring
Section titled “Kill switch vs health monitoring”The kill switch and the automatic health monitor serve different purposes:
| Kill switch | Health monitor | |
|---|---|---|
| Trigger | Manual — admin action | Automatic — based on failure thresholds |
| Scope | Provider, model, or (provider, model) pair | Individual (provider, model) pair |
| Recovery | Manual — admin must re-enable | Automatic — test request after lockout period |
| Persistence | Survives restarts | Resets on restart |
| Use case | Emergency response, planned maintenance | Transient provider failures |
Both systems can be active simultaneously. If a (provider, model) pair is both kill-switched and health-monitor-disengaged, re-enabling the kill switch does not restore traffic — the health monitor must also return to active state.
Emergency runbook
Section titled “Emergency runbook”When you need to disable a provider immediately:
- Activate the kill switch via the admin API or portal. Include a descriptive reason.
- Verify fallback chains are routing traffic to alternative providers. Check the audit log for
503 provider_unavailableerrors — these indicate requests with no viable fallback. - Notify affected teams that the provider is disabled and which fallback providers are handling traffic.
- Monitor the audit log for the disabled provider — the kill switch audit entry records the activation. Any subsequent requests to the disabled entry produce
503audit entries with the kill switch reason. - When the incident is resolved, re-enable the provider via the kill switch API. Monitor the health check endpoint to confirm the provider is healthy before re-enabling.
Audit log
Section titled “Audit log”Kill switch activations and deactivations produce audit log entries with:
| Field | Description |
|---|---|
action | kill_switch_activated or kill_switch_deactivated |
provider | The affected provider |
model_id | The affected model (or null for provider-wide switches) |
user_id | The admin who triggered the switch |
reason | The reason string provided at activation |
timestamp | When the switch was toggled |
These entries are forwarded to your SIEM alongside request audit entries.