Skip to content

Policy Engine — Deep Dive

The Arbitex Policy Engine is the enforcement core of the platform. Every AI request — whether routed through the cloud gateway or an on-premises Outpost — passes through the Policy Engine before the prompt reaches a model and again when the model response is returned. This document is the comprehensive technical reference for the Policy Engine. For a user-focused overview, see Policy Engine — User Guide. For admin configuration steps, see Policy Engine admin guide.

Source: backend/app/services/policy_engine.py


The Policy Engine uses a Palo Alto firewall-style rule model: rules are organized into packs, packs are organized into chains, and evaluation proceeds sequentially through the chain until a terminal action is reached (in first_applicable mode) or all rules have been evaluated (in deny_overrides mode).

Chain
└─ Pack 1
│ └─ Rule 1 (sequence: 10)
│ └─ Rule 2 (sequence: 20)
└─ Pack 2
└─ Rule 3 (sequence: 10)
└─ Rule 4 (sequence: 20)

An organization has exactly one org chain (the primary enforcement chain). Users may optionally have a user chain (personal overrides). The two chains interact in a defined evaluation order (see Chain evaluation order).

If no rule matches anywhere in the chain, the default action is ALLOW.


The combining algorithm controls how the Policy Engine responds to conflicting rule outcomes within a chain.

The engine evaluates rules in sequence order and stops at the first matching rule. The action of that rule is the final enforcement decision.

This is the default algorithm and the most commonly used. It mirrors how stateless firewalls process ACLs: rules are ordered by specificity or priority, and the most-specific matching rule wins.

Use when: You want predictable, ordered evaluation where rule order is explicit governance intent.

The engine evaluates all rules regardless of matches. If any matching rule produces a block action, that action overrides any allow actions from other matching rules. Non-blocking actions accumulate (e.g., multiple log actions all fire).

Use when: You have multiple packs that might independently match and you want a conservative posture — any block anywhere kills the request.

{
"combining_algorithm": "deny_overrides"
}

Set the algorithm on the chain object via the admin API. Org chains default to first_applicable.


The Policy Engine supports 8 condition types. All conditions are evaluated against the request context at the time of evaluation.

Matches when the request text (prompt or response, depending on direction) contains one or more specified keywords.

{
"condition_type": "keyword_match",
"operator": "contains_any",
"value": ["password", "secret", "api_key"],
"case_sensitive": false
}
OperatorBehavior
contains_anyMatches if any keyword is present
contains_allMatches if all keywords are present
contains_noneMatches if no keyword is present

case_sensitive defaults to false.

Matches when the request text satisfies a regular expression.

{
"condition_type": "regex_match",
"operator": "matches",
"value": "\\b(?:sk-[a-zA-Z0-9]{48})\\b",
"direction": "output"
}

The value field is a RE2-compatible regular expression. Backtrack-unsafe patterns (catastrophic backtracking) are rejected at rule creation time.

OperatorBehavior
matchesMatches if pattern is found anywhere in text
not_matchesMatches if pattern is not found

Matches when the DLP pipeline detects one or more data label types in the request or response.

{
"condition_type": "dlp_label",
"operator": "contains_any",
"value": ["PII_EMAIL", "PII_SSN", "PCI_PAN", "PHI_DIAGNOSIS"]
}

DLP labels are produced by the configured DLP pipeline (Presidio, custom NER, or hybrid). See DLP Pipeline Configuration for the full label taxonomy.

OperatorBehavior
contains_anyMatches if any of the listed labels is detected
contains_allMatches if all listed labels are detected
contains_noneMatches if none of the listed labels is detected

Matches based on the AI model identifier requested.

{
"condition_type": "model_id",
"operator": "in",
"value": ["gpt-4o", "gpt-4o-mini"]
}
OperatorBehavior
eqExact model ID match
neqModel ID does not match
inModel ID is in the list
not_inModel ID is not in the list
starts_withModel ID starts with the given prefix (e.g., gpt-4)

Matches based on the model’s registered risk tier in the Model Registry. Tiers are ordered: tier_4 < tier_3 < tier_2 < tier_1.

{
"condition_type": "model_risk_tier",
"operator": "lte",
"value": "tier_2"
}
OperatorBehavior
eqExact tier match
neqNot this tier
lteThis tier or higher risk (e.g., lte tier_2 matches tier_2 and tier_1)
gteThis tier or lower risk (e.g., gte tier_3 matches tier_3 and tier_4)

Models not present in the registry are evaluated as tier_4 (lowest risk) by default. Set unregistered_model_tier on the chain to change this.

Matches based on the requesting user’s group membership.

{
"condition_type": "group_membership",
"operator": "in",
"value": ["compliance-officers", "legal-team"]
}
OperatorBehavior
inUser is a member of any listed group
not_inUser is not a member of any listed group
all_inUser is a member of all listed groups

Groups are evaluated at request time against the current group membership state. Changes to group membership take effect immediately on the next request.

Matches based on the token count of the prompt or response.

{
"condition_type": "token_count",
"operator": "gt",
"value": 4096,
"direction": "input"
}

Token counting uses the model’s native tokenizer where available, falling back to the cl100k_base tokenizer for unknown models.

OperatorBehavior
gtToken count is greater than value
gteToken count is greater than or equal to value
ltToken count is less than value
lteToken count is less than or equal to value
eqToken count equals value

Matches when the request is classified into one of the platform’s content categories by the content classifier.

{
"condition_type": "content_category",
"operator": "in",
"value": ["VIOLENCE", "SEXUAL_CONTENT", "SELF_HARM"]
}

Content categories are assigned by the configured content classifier (OpenAI Moderation API, Perspective API, or the built-in FastText classifier). See Content Categories for the full category list.

OperatorBehavior
inCategory is in the listed set
not_inCategory is not in the listed set

When a rule matches, the engine applies the rule’s action. There are 7 action types.

Permit the request or response to proceed. No modification is made.

{ "action": "allow" }

In first_applicable mode, an allow action terminates evaluation and forwards the request. Use allow rules to create explicit exemptions before more-restrictive rules in the chain.

Reject the request or response. The user receives a policy violation message.

{
"action": "block",
"block_message": "This request was blocked by your organization's AI use policy."
}

block_message is displayed to the user in the chat interface and returned in the API error body. If omitted, the default block message is shown. The audit log records the matched rule, pack, and chain.

Redact matched sensitive content before forwarding to the model (on input) or before returning to the user (on output). Requires dlp_label or regex_match condition to identify what to redact.

{
"action": "redact",
"redact_replacement": "[REDACTED]"
}

redact_replacement is the string substituted for each detected span. Defaults to [REDACTED].

Redaction is non-reversible in the forwarded content. The original, pre-redaction text is available in the audit log if audit_redacted_content is enabled (off by default).

Allow the request but display a warning message to the user.

{
"action": "warn",
"warn_message": "This message contains sensitive terminology. Review your organization's AI use policy."
}

Warn actions do not halt evaluation in deny_overrides mode. In first_applicable mode, a warn action is terminal and the request proceeds with the warning shown.

Log the event to the audit trail without modifying the request or response. Useful for monitoring without enforcement.

{
"action": "log",
"log_severity": "warning"
}

log_severity accepts info, warning, or critical. The audit event includes the matched rule, pack, chain, and the matched text span (if applicable).

In deny_overrides mode, log actions are non-terminal — evaluation continues and other rules may also match.

Hold the request in a human-in-the-loop review queue before forwarding to the model.

{
"action": "require_approval",
"approval_group": "compliance-reviewers",
"timeout_action": "block",
"timeout_minutes": 60
}
FieldDescription
approval_groupGroup whose members can approve or reject the request
timeout_actionWhat happens if no reviewer acts within timeout_minutes: block or allow
timeout_minutesHow long to wait for approval. Default: 60.

See Human-in-the-Loop Governance for reviewer workflow details.

Apply a dynamic rate limit to the matching request pattern.

{
"action": "rate_limit",
"limit": 10,
"window_seconds": 60,
"scope": "user"
}
FieldDescription
limitMaximum requests allowed within window_seconds
window_secondsSliding window duration in seconds
scopeuser (per-user), group (per-group), or org (org-wide)

When the rate limit is exceeded, the engine returns a 429 response. Rate limit state is stored in Redis. See Rate Limiting Architecture for implementation details.


Each request is evaluated against chains in the following order:

1. User chain (if present and enabled for the org)
2. Org chain

A user chain is a personal policy chain attached to a specific user. User chains allow personal overrides — for example, a user may have a stricter personal policy than the org default, or an admin may grant elevated permissions to a specific user.

User chains are evaluated first. In first_applicable mode (the default), if a user chain rule matches and produces a terminal action, org chain evaluation is skipped entirely.

User chains are created via POST /api/admin/policy-chains with chain_type: "user" and a user_id.

The org chain is the primary enforcement chain applied to all requests in the organization. It is evaluated after the user chain (or on its own, if no user chain is configured or if the user chain did not match).

Every organization has exactly one org chain. It cannot be deleted, only modified.

Request: user "alice", model "gpt-4o", prompt contains PAN
1. Alice's user chain (first_applicable):
- Rule 1: group_membership in [finance-power-users] → ALLOW ← alice is in this group
→ Terminal: ALLOW. Org chain skipped.
Request: user "bob", model "gpt-4o", prompt contains PAN
1. Bob's user chain: (not configured)
2. Org chain (first_applicable):
- Rule 1: group_membership in [finance-power-users] → ALLOW ← bob is NOT in this group. No match.
- Rule 2: dlp_label contains_any [PCI_PAN] → BLOCK
→ Terminal: BLOCK.

Before deploying policy changes to production, you can test them against a simulated request.

POST /api/admin/policy-engine/simulate

Request body:

{
"prompt": "Please send me the credit card number on file for customer 12345.",
"model_id": "gpt-4o",
"user_id": "user_abc",
"direction": "input",
"chain_id": "org"
}

Response 200 OK:

{
"outcome": "block",
"matched_rule": {
"id": "rule_xyz",
"name": "Block PAN in prompts",
"pack": "PCI-DSS v4.0 Compliance Bundle",
"condition_type": "dlp_label",
"action": "block"
},
"evaluation_trace": [
{
"rule_id": "rule_aaa",
"name": "Finance power user exemption",
"matched": false,
"reason": "User not in group finance-power-users"
},
{
"rule_id": "rule_xyz",
"name": "Block PAN in prompts",
"matched": true,
"reason": "DLP detected label PCI_PAN"
}
],
"dry_run": true
}

The evaluation_trace lists every rule evaluated in sequence order, whether it matched, and why. This is the primary debugging tool for unexpected policy behavior.

Using dry_run to test before deploying:

  1. Create or modify rules on a staging chain.
  2. Run simulate with representative test cases.
  3. Verify the outcome and evaluation_trace match your intent.
  4. Promote the chain configuration to the active org chain.

Arbitex ships pre-configured policy packs (bundles) for common compliance standards. Bundles are read-only and updated by Arbitex when standards change.

BundleStandardDefault rule count
pci-dss-v4PCI-DSS v4.014
hipaaHIPAA Privacy + Security Rules11
gdprGDPR Article 5 data minimization9
glbaGramm–Leach–Bliley Act7
soxSarbanes-Oxley Act6
ccpaCalifornia Consumer Privacy Act8
sec-17a-4SEC Rule 17a-4 (broker-dealer records)5
occ-sr-11-7OCC SR 11-7 model risk management8

To add a bundle to your org chain:

POST /api/admin/policy-chains/{chain_id}/packs
{
"pack_id": "pci-dss-v4",
"sequence": 100
}

Create a custom pack to group org-specific rules:

POST /api/admin/policy-packs
{
"name": "Internal Data Classification Policy",
"description": "Enforces Acme Corp data classification rules for AI interactions",
"pack_type": "custom"
}

Add rules to the pack via POST /api/admin/policy-packs/{pack_id}/rules.

Packs within a chain are evaluated in ascending sequence order. Convention:

Sequence rangePurpose
1–99High-priority user exemptions or overrides
100–299Compliance bundles (PCI-DSS, HIPAA, etc.)
300–499Custom org rules
500–699Model governance rules (model_risk_tier, model_id)
700–899Content policy rules
900–999Default catch-all rules

This ordering ensures compliance bundles fire before custom rules, which fire before model governance rules. Adjust to suit your org’s risk posture.


All conditions and actions support a direction field that specifies when the rule is evaluated:

DirectionEvaluated when
inputBefore the prompt is sent to the model
outputBefore the model response is returned to the user
bothOn both prompt and response

Most blocking rules target input to prevent sensitive content from reaching the model. Redaction and DLP detection rules often target both.