Policy Engine — Deep Dive

The Arbitex Policy Engine is the enforcement core of the platform. Every AI request — whether routed through the cloud gateway or an on-premises Outpost — passes through the Policy Engine before the prompt reaches a model and again when the model response is returned. This document is the comprehensive technical reference for the Policy Engine. For a user-focused overview, see Policy Engine — User Guide. For admin configuration steps, see Policy Engine admin guide.

Source: backend/app/services/policy_engine.py

Rule model

The Policy Engine uses a Palo Alto firewall-style rule model: rules are organized into packs, packs are organized into chains, and evaluation proceeds sequentially through the chain until a terminal action is reached (in first_applicable mode) or all rules have been evaluated (in deny_overrides mode).

Chain
 └─ Pack 1
 │   └─ Rule 1 (sequence: 10)
 │   └─ Rule 2 (sequence: 20)
 └─ Pack 2
     └─ Rule 3 (sequence: 10)
     └─ Rule 4 (sequence: 20)

An organization has exactly one org chain (the primary enforcement chain). Users may optionally have a user chain (personal overrides). The two chains interact in a defined evaluation order (see Chain evaluation order).

If no rule matches anywhere in the chain, the default action is ALLOW.

Combining algorithms

The combining algorithm controls how the Policy Engine responds to conflicting rule outcomes within a chain.

`first_applicable`

The engine evaluates rules in sequence order and stops at the first matching rule. The action of that rule is the final enforcement decision.

This is the default algorithm and the most commonly used. It mirrors how stateless firewalls process ACLs: rules are ordered by specificity or priority, and the most-specific matching rule wins.

Use when: You want predictable, ordered evaluation where rule order is explicit governance intent.

`deny_overrides`

The engine evaluates all rules regardless of matches. If any matching rule produces a block action, that action overrides any allow actions from other matching rules. Non-blocking actions accumulate (e.g., multiple log actions all fire).

Use when: You have multiple packs that might independently match and you want a conservative posture — any block anywhere kills the request.

{
  "combining_algorithm": "deny_overrides"
}

Set the algorithm on the chain object via the admin API. Org chains default to first_applicable.

Condition types

The Policy Engine supports 8 condition types. All conditions are evaluated against the request context at the time of evaluation.

1. `keyword_match`

Matches when the request text (prompt or response, depending on direction) contains one or more specified keywords.

{
  "condition_type": "keyword_match",
  "operator": "contains_any",
  "value": ["password", "secret", "api_key"],
  "case_sensitive": false
}

Operator	Behavior
`contains_any`	Matches if any keyword is present
`contains_all`	Matches if all keywords are present
`contains_none`	Matches if no keyword is present

case_sensitive defaults to false.

2. `regex_match`

Matches when the request text satisfies a regular expression.

{
  "condition_type": "regex_match",
  "operator": "matches",
  "value": "\\b(?:sk-[a-zA-Z0-9]{48})\\b",
  "direction": "output"
}

The value field is a RE2-compatible regular expression. Backtrack-unsafe patterns (catastrophic backtracking) are rejected at rule creation time.

Operator	Behavior
`matches`	Matches if pattern is found anywhere in text
`not_matches`	Matches if pattern is not found

3. `dlp_label`

Matches when the DLP pipeline detects one or more data label types in the request or response.

{
  "condition_type": "dlp_label",
  "operator": "contains_any",
  "value": ["PII_EMAIL", "PII_SSN", "PCI_PAN", "PHI_DIAGNOSIS"]
}

DLP labels are produced by the configured DLP pipeline (Presidio, custom NER, or hybrid). See DLP Pipeline Configuration for the full label taxonomy.

Operator	Behavior
`contains_any`	Matches if any of the listed labels is detected
`contains_all`	Matches if all listed labels are detected
`contains_none`	Matches if none of the listed labels is detected

4. `model_id`

Matches based on the AI model identifier requested.

{
  "condition_type": "model_id",
  "operator": "in",
  "value": ["gpt-4o", "gpt-4o-mini"]
}

Operator	Behavior
`eq`	Exact model ID match
`neq`	Model ID does not match
`in`	Model ID is in the list
`not_in`	Model ID is not in the list
`starts_with`	Model ID starts with the given prefix (e.g., `gpt-4`)

5. `model_risk_tier`

Matches based on the model’s registered risk tier in the Model Registry. Tiers are ordered: tier_4 < tier_3 < tier_2 < tier_1.

{
  "condition_type": "model_risk_tier",
  "operator": "lte",
  "value": "tier_2"
}

Operator	Behavior
`eq`	Exact tier match
`neq`	Not this tier
`lte`	This tier or higher risk (e.g., `lte tier_2` matches `tier_2` and `tier_1`)
`gte`	This tier or lower risk (e.g., `gte tier_3` matches `tier_3` and `tier_4`)

Models not present in the registry are evaluated as tier_4 (lowest risk) by default. Set unregistered_model_tier on the chain to change this.

6. `group_membership`

Matches based on the requesting user’s group membership.

{
  "condition_type": "group_membership",
  "operator": "in",
  "value": ["compliance-officers", "legal-team"]
}

Operator	Behavior
`in`	User is a member of any listed group
`not_in`	User is not a member of any listed group
`all_in`	User is a member of all listed groups

Groups are evaluated at request time against the current group membership state. Changes to group membership take effect immediately on the next request.

7. `token_count`

Matches based on the token count of the prompt or response.

{
  "condition_type": "token_count",
  "operator": "gt",
  "value": 4096,
  "direction": "input"
}

Token counting uses the model’s native tokenizer where available, falling back to the cl100k_base tokenizer for unknown models.

Operator	Behavior
`gt`	Token count is greater than value
`gte`	Token count is greater than or equal to value
`lt`	Token count is less than value
`lte`	Token count is less than or equal to value
`eq`	Token count equals value

8. `content_category`

Matches when the request is classified into one of the platform’s content categories by the content classifier.

{
  "condition_type": "content_category",
  "operator": "in",
  "value": ["VIOLENCE", "SEXUAL_CONTENT", "SELF_HARM"]
}

Content categories are assigned by the configured content classifier (OpenAI Moderation API, Perspective API, or the built-in FastText classifier). See Content Categories for the full category list.

Operator	Behavior
`in`	Category is in the listed set
`not_in`	Category is not in the listed set

Action types

When a rule matches, the engine applies the rule’s action. There are 7 action types.

1. `allow`

Permit the request or response to proceed. No modification is made.

{ "action": "allow" }

In first_applicable mode, an allow action terminates evaluation and forwards the request. Use allow rules to create explicit exemptions before more-restrictive rules in the chain.

2. `block`

Reject the request or response. The user receives a policy violation message.

{
  "action": "block",
  "block_message": "This request was blocked by your organization's AI use policy."
}

block_message is displayed to the user in the chat interface and returned in the API error body. If omitted, the default block message is shown. The audit log records the matched rule, pack, and chain.

3. `redact`

Redact matched sensitive content before forwarding to the model (on input) or before returning to the user (on output). Requires dlp_label or regex_match condition to identify what to redact.

{
  "action": "redact",
  "redact_replacement": "[REDACTED]"
}

redact_replacement is the string substituted for each detected span. Defaults to [REDACTED].

Redaction is non-reversible in the forwarded content. The original, pre-redaction text is available in the audit log if audit_redacted_content is enabled (off by default).

4. `warn`

Allow the request but display a warning message to the user.

{
  "action": "warn",
  "warn_message": "This message contains sensitive terminology. Review your organization's AI use policy."
}

Warn actions do not halt evaluation in deny_overrides mode. In first_applicable mode, a warn action is terminal and the request proceeds with the warning shown.

5. `log`

Log the event to the audit trail without modifying the request or response. Useful for monitoring without enforcement.

{
  "action": "log",
  "log_severity": "warning"
}

log_severity accepts info, warning, or critical. The audit event includes the matched rule, pack, chain, and the matched text span (if applicable).

In deny_overrides mode, log actions are non-terminal — evaluation continues and other rules may also match.

6. `require_approval`

Hold the request in a human-in-the-loop review queue before forwarding to the model.

{
  "action": "require_approval",
  "approval_group": "compliance-reviewers",
  "timeout_action": "block",
  "timeout_minutes": 60
}

Field	Description
`approval_group`	Group whose members can approve or reject the request
`timeout_action`	What happens if no reviewer acts within `timeout_minutes`: `block` or `allow`
`timeout_minutes`	How long to wait for approval. Default: `60`.

See Human-in-the-Loop Governance for reviewer workflow details.

7. `rate_limit`

Apply a dynamic rate limit to the matching request pattern.

{
  "action": "rate_limit",
  "limit": 10,
  "window_seconds": 60,
  "scope": "user"
}

Field	Description
`limit`	Maximum requests allowed within `window_seconds`
`window_seconds`	Sliding window duration in seconds
`scope`	`user` (per-user), `group` (per-group), or `org` (org-wide)

When the rate limit is exceeded, the engine returns a 429 response. Rate limit state is stored in Redis. See Rate Limiting Architecture for implementation details.

Chain evaluation order

Each request is evaluated against chains in the following order:

1. User chain (if present and enabled for the org)
2. Org chain

User chain

A user chain is a personal policy chain attached to a specific user. User chains allow personal overrides — for example, a user may have a stricter personal policy than the org default, or an admin may grant elevated permissions to a specific user.

User chains are evaluated first. In first_applicable mode (the default), if a user chain rule matches and produces a terminal action, org chain evaluation is skipped entirely.

User chains are created via POST /api/admin/policy-chains with chain_type: "user" and a user_id.

Org chain

The org chain is the primary enforcement chain applied to all requests in the organization. It is evaluated after the user chain (or on its own, if no user chain is configured or if the user chain did not match).

Every organization has exactly one org chain. It cannot be deleted, only modified.

Evaluation example

Request: user "alice", model "gpt-4o", prompt contains PAN

1. Alice's user chain (first_applicable):
   - Rule 1: group_membership in [finance-power-users] → ALLOW  ← alice is in this group
   → Terminal: ALLOW. Org chain skipped.

Request: user "bob", model "gpt-4o", prompt contains PAN

1. Bob's user chain: (not configured)
2. Org chain (first_applicable):
   - Rule 1: group_membership in [finance-power-users] → ALLOW  ← bob is NOT in this group. No match.
   - Rule 2: dlp_label contains_any [PCI_PAN] → BLOCK
   → Terminal: BLOCK.

Policy simulation (dry-run)

Before deploying policy changes to production, you can test them against a simulated request.

POST /api/admin/policy-engine/simulate

Request body:

{
  "prompt": "Please send me the credit card number on file for customer 12345.",
  "model_id": "gpt-4o",
  "user_id": "user_abc",
  "direction": "input",
  "chain_id": "org"
}

Response 200 OK:

{
  "outcome": "block",
  "matched_rule": {
    "id": "rule_xyz",
    "name": "Block PAN in prompts",
    "pack": "PCI-DSS v4.0 Compliance Bundle",
    "condition_type": "dlp_label",
    "action": "block"
  },
  "evaluation_trace": [
    {
      "rule_id": "rule_aaa",
      "name": "Finance power user exemption",
      "matched": false,
      "reason": "User not in group finance-power-users"
    },
    {
      "rule_id": "rule_xyz",
      "name": "Block PAN in prompts",
      "matched": true,
      "reason": "DLP detected label PCI_PAN"
    }
  ],
  "dry_run": true
}

The evaluation_trace lists every rule evaluated in sequence order, whether it matched, and why. This is the primary debugging tool for unexpected policy behavior.

Using dry_run to test before deploying:

Create or modify rules on a staging chain.
Run simulate with representative test cases.
Verify the outcome and evaluation_trace match your intent.
Promote the chain configuration to the active org chain.

Policy templates and pack management

System bundles

Arbitex ships pre-configured policy packs (bundles) for common compliance standards. Bundles are read-only and updated by Arbitex when standards change.

Bundle	Standard	Default rule count
`pci-dss-v4`	PCI-DSS v4.0	14
`hipaa`	HIPAA Privacy + Security Rules	11
`gdpr`	GDPR Article 5 data minimization	9
`glba`	Gramm–Leach–Bliley Act	7
`sox`	Sarbanes-Oxley Act	6
`ccpa`	California Consumer Privacy Act	8
`sec-17a-4`	SEC Rule 17a-4 (broker-dealer records)	5
`occ-sr-11-7`	OCC SR 11-7 model risk management	8

To add a bundle to your org chain:

POST /api/admin/policy-chains/{chain_id}/packs

{
  "pack_id": "pci-dss-v4",
  "sequence": 100
}

Custom packs

Create a custom pack to group org-specific rules:

POST /api/admin/policy-packs

{
  "name": "Internal Data Classification Policy",
  "description": "Enforces Acme Corp data classification rules for AI interactions",
  "pack_type": "custom"
}

Add rules to the pack via POST /api/admin/policy-packs/{pack_id}/rules.

Pack ordering

Packs within a chain are evaluated in ascending sequence order. Convention:

Sequence range	Purpose
1–99	High-priority user exemptions or overrides
100–299	Compliance bundles (PCI-DSS, HIPAA, etc.)
300–499	Custom org rules
500–699	Model governance rules (model_risk_tier, model_id)
700–899	Content policy rules
900–999	Default catch-all rules

This ordering ensures compliance bundles fire before custom rules, which fire before model governance rules. Adjust to suit your org’s risk posture.

Rule direction

All conditions and actions support a direction field that specifies when the rule is evaluated:

Direction	Evaluated when
`input`	Before the prompt is sent to the model
`output`	Before the model response is returned to the user
`both`	On both prompt and response

Most blocking rules target input to prevent sensitive content from reaching the model. Redaction and DLP detection rules often target both.

Policy Engine — User Guide — user-facing overview
Policy Engine admin guide — admin configuration walkthrough
Model Risk Management — model registry and model_risk_tier condition
DLP Pipeline Configuration — DLP label taxonomy
Human-in-the-Loop Governance — require_approval workflow
API Reference Batch 10 — policy engine API endpoints
Outpost Policy Simulator — simulating policies on the Outpost

Policy Engine — Deep Dive

Rule model

Combining algorithms

first_applicable

deny_overrides

Condition types

1. keyword_match

2. regex_match

3. dlp_label

4. model_id

5. model_risk_tier

6. group_membership

7. token_count

8. content_category