Policy Engine rule testing — advanced guide
The Policy Engine ships with two testing tools: the PolicySimulator (part of the admin UI) and the DLP evaluate endpoint (POST /api/admin/dlp-rules/evaluate). Together they let you validate the full enforcement pipeline before rules go live.
- PolicySimulator — tests policy rule chain evaluation with a synthetic request (user, group, provider, model, prompt). Use it to verify that rule conditions fire correctly and that chain ordering produces the expected outcome.
- Evaluate endpoint — simulates DLP rule chain evaluation for an org/group context and returns a decision trace. Use it to verify DLP findings before those findings are passed to the Policy Engine.
This guide covers the advanced testing patterns that go beyond single-rule validation: multi-pack chain evaluation, testing ROUTE_TO rules, interpreting combined DLP + policy outcomes, and reproducing production decisions in a test context.
Prerequisites
Section titled “Prerequisites”- Admin role
- At least one policy pack in the chain
- API access for evaluate endpoint tests (
arb_live_*admin token or equivalent)
PolicySimulator — deep dive
Section titled “PolicySimulator — deep dive”What the simulator evaluates
Section titled “What the simulator evaluates”The PolicySimulator runs a synthetic request through the complete policy evaluation pipeline:
- The DLP pipeline runs on the supplied prompt (Tier 1 regex → Tier 2 NER → Tier 3 DeBERTa, depending on your org config)
- The full org policy chain is evaluated against the request context and the DLP findings
- The simulator returns the exact outcome that would have been produced on a live request
The result is deterministic — the same inputs always produce the same result. Changes you make to rules take effect immediately in the simulator (no deployment step required).
Constructing representative test cases
Section titled “Constructing representative test cases”For each policy pack you want to test, build a matrix of cases that covers:
- Positive cases — requests that should match the rule and produce the expected action
- Negative cases — requests that should not match and should pass through (or be caught by a later rule)
- Boundary cases — requests at the edge of a condition (e.g., confidence threshold exactly at 0.85, a user in one group but not another)
Example matrix for a rule that blocks OpenAI access for the openai_block group:
| Test | User groups | Provider | Expected outcome | Pass? |
|---|---|---|---|---|
| Block applies | openai_block | openai | BLOCK | — |
| Wrong provider | openai_block | anthropic | ALLOW | — |
| Wrong group | engineering | openai | ALLOW | — |
| Both groups | openai_block, engineering | openai | BLOCK | — |
Run each scenario in the simulator and verify the result against the expected column.
Testing catch-all rules
Section titled “Testing catch-all rules”A catch-all BLOCK rule (no conditions, sequence=999) should be tested with requests that you expect no earlier rule to match:
User: (any user not in any targeted group)Groups: (empty override)Provider: openaiModel: gpt-4oChannel: APIPrompt: "Summarize the quarterly report."Expected outcome: BLOCK if your org has a deny-all posture, ALLOW if permissive.
If the outcome is wrong, check whether an earlier rule’s conditions are unintentionally matching. The match_reason field in the simulator result will tell you which rule fired and why.
Interpreting simulator results
Section titled “Interpreting simulator results”Result fields
Section titled “Result fields”{ "outcome": "BLOCK", "matched_pack_id": "pack_01HZ_TRADING_DESK", "matched_rule_id": "rule_01HZ_BLOCK_OPENAI", "matched_rule_name": "Block OpenAI for no-openai group", "matched_sequence": 10, "match_reason": "user_groups matched ['no-openai']; provider=openai", "action_taken": "BLOCK", "message": "OpenAI access is not permitted for your group.", "dlp_findings": []}| Field | What to verify |
|---|---|
outcome | The terminal action. Should match your expected outcome. |
matched_pack_id / matched_rule_id | The specific rule that fired. Confirms the right rule matched, not an unintended one. |
matched_sequence | The sequence position of the matching rule. Lower is earlier in the chain. |
match_reason | Human-readable explanation of which conditions matched. Use this to debug unexpected outcomes. |
dlp_findings | DLP entities detected in the prompt. Non-empty when the prompt contains PII/sensitive content. Verify entity types and confidence match your expectations. |
When outcome is ALLOW but you expected a match
Section titled “When outcome is ALLOW but you expected a match”Check:
- Conditions not met — read
match_reasonon the rule that should have fired. The simulator only shows the matched rule; if no rule matched,matched_rule_idis null. Navigate to the rule in PolicyRuleEditor and verify each condition against the test inputs. - Earlier rule blocked evaluation — under
first_applicable, an earlier rule may have firedALLOW(whitelist), stopping evaluation before your target rule was reached. Check the rules at lower sequence numbers. deny_overridesmode — underdeny_overrides, anyBLOCKbeats anyALLOW. Verify the combining algorithm in PolicyChainEditor.- Group membership — if the rule uses
user_groups, confirm the user is actually in those groups. Use the Groups (override) field to force group membership for testing.
When outcome is BLOCK but you expected ALLOW
Section titled “When outcome is BLOCK but you expected ALLOW”Read match_reason carefully. Common causes:
- A catch-all BLOCK rule at high sequence number fired because no earlier whitelist matched
- A condition on a whitelist rule didn’t match (check entity type, confidence threshold, channel)
- Under
deny_overrides, a BLOCK rule later in the chain overrode an ALLOW rule
Testing multi-pack chain interactions
Section titled “Testing multi-pack chain interactions”Multi-pack chains require testing pack interactions — specifically: which pack fires first and whether earlier packs interfere with later ones.
Scenario: compliance bundle + custom pack
Section titled “Scenario: compliance bundle + custom pack”Setup:
- Sequence=1: PCI-DSS compliance bundle (built-in, read-only)
- Sequence=2: Custom pack with an
ALLOWwhitelist for your QA testing group
Test: verify the QA group whitelist overrides the PCI bundle.
User: qa_testerGroups override: qa_testingProvider: openaiPrompt: "Card 4532-0151-1234-5678 (synthetic test data)"Expected result:
- Under
first_applicable, the custom pack is at sequence=2, so the PCI bundle (sequence=1) evaluates first. If PCI fires BLOCK first, the whitelist never runs. - Fix: Move the custom pack to sequence=1 to evaluate the whitelist before the compliance bundle.
This is a common ordering mistake. The simulator reveals it immediately.
Scenario: testing deny_overrides
Section titled “Scenario: testing deny_overrides”With deny_overrides as the combining algorithm, any BLOCK in the chain beats any ALLOW regardless of sequence:
Combined algorithm: deny_overridesPack A (sequence=1): ALLOW rule for group "finance"Pack B (sequence=2): BLOCK rule for entity_type "credit_card"Test case:
User groups: financePrompt: "Card 4532-0151-1234-5678"Under deny_overrides, the BLOCK from Pack B wins even though Pack A’s ALLOW fired first. The simulator outcome will be BLOCK. If you expected ALLOW, switch to first_applicable or restructure the rules.
Testing ROUTE_TO rules
Section titled “Testing ROUTE_TO rules”ROUTE_TO overrides the destination model at routing time. Test that it fires under the right conditions and routes to the correct target.
Tier-based routing
Section titled “Tier-based routing”Rule: intent_complexity = "simple" → ROUTE_TO tier="haiku"Test with a short, simple prompt:
Prompt: "What is 2 + 2?"Expected outcome: ROUTE_TOredacted_prompt: (not shown for ROUTE_TO)Verify action_taken: "ROUTE_TO" in the simulator result. The simulator does not reveal the resolved model (routing is determined at request execution time), but the action confirms the rule fires.
Provider-targeted routing
Section titled “Provider-targeted routing”Rule: user_groups: ["cost-sensitive"] → ROUTE_TO model="gpt-4o-mini"Test both sides:
- User in
cost-sensitivegroup → expectedROUTE_TO - User not in
cost-sensitivegroup → expectedALLOW(no routing override)
Combined DLP + policy evaluation
Section titled “Combined DLP + policy evaluation”For rules that depend on DLP findings (entity types, confidence thresholds), you need both the DLP pipeline and the policy chain to produce the expected outcome. The PolicySimulator runs both — but when debugging, it helps to isolate them.
Step 1 — verify DLP findings with the evaluate endpoint
Section titled “Step 1 — verify DLP findings with the evaluate endpoint”Before testing the full chain, confirm the DLP pipeline detects the entities your policy rule expects:
curl -s -X POST https://api.arbitex.ai/api/admin/dlp-rules/evaluate \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "text": "My IBAN is GB29NWBK60161331926819", "direction": "input", "org_id": "'$ORG_ID'" }' | jq '{final_action, findings: [.findings[] | {entity_type, confidence, action}]}'Expected output:
{ "final_action": "block", "findings": [ { "entity_type": "iban", "confidence": 1.0, "action": "block" } ]}If the entity type you expect is not in findings, the DLP rule that should detect it is either not active, the pattern doesn’t match, or the confidence threshold isn’t met. Debug in the DLP rule test panel first.
Step 2 — verify the policy rule fires on that entity type
Section titled “Step 2 — verify the policy rule fires on that entity type”Once you’ve confirmed the DLP pipeline produces the expected findings, test in the PolicySimulator with the same prompt:
Prompt: "My IBAN is GB29NWBK60161331926819"Verify that dlp_findings in the simulator result shows entity_type: "iban" and the policy rule with entity_types: ["iban"] fires as expected.
If DLP found the entity but the policy rule didn’t fire, check:
- The rule’s
entity_typescondition lists the exact entity type string - The
minimum_confidencecondition is at or below the detected confidence (e.g., confidence=0.97 satisfies minimum_confidence=0.85) - The
applies_tofield matches the direction (input vs output)
Reproducing a production decision
Section titled “Reproducing a production decision”When an admin reports an unexpected block or allow in production, use the audit log entry to reproduce the decision in the simulator.
From the audit log entry
Section titled “From the audit log entry”Every audit entry includes:
user_id— who made the requestmatched_pack_id,matched_rule_id— what firedmatch_reason— why it firedentity_types_detected— what DLP found
To reproduce:
- Copy the
user_idinto the simulator’s User field - Use the
entity_types_detectedfrom the audit entry to craft a prompt that would generate those entities - Set the same
providerandmodelfrom the audit entry - Run the simulator and compare the outcome and
matched_rule_idto the audit entry
If the simulator outcome differs from the production outcome, the most likely cause is a rule change between the time of the original request and the reproduction. Check the rule’s modification history by comparing the current rule configuration to what was active when the request was made.
See also
Section titled “See also”- Policy Engine — admin guide — step-by-step: create packs, rules, and configure the chain
- Policy Engine overview — evaluation model, combining algorithms, action semantics
- DLP pipeline testing guide — DLP evaluate endpoint reference
- DLP rule testing — admin workflow — compliance validation test scenarios
- Policy Rule Reference — all condition fields and action types