DLP pipeline testing guide
The Arbitex admin UI provides two tools for testing DLP behavior before deploying rule changes:
- Test panel — validates a single rule’s pattern against sample text, showing highlighted match regions and the action that would fire.
- Evaluate endpoint — simulates the full DLP rule chain against sample text with an org/group/user context, returning the complete action trace.
Use the test panel during rule authoring to iterate quickly on patterns. Use the evaluate endpoint to verify how the full rule chain behaves for a specific user or group, including compliance bundle overrides and confidence thresholds.
Test panel — single rule test
Section titled “Test panel — single rule test”The test panel is embedded in the DLP rule editor (Admin > DLP Rules > select or create a rule). It appears as a collapsible “Test Rule” section at the bottom of the rule form.
Opening the panel
Section titled “Opening the panel”- Navigate to Admin > DLP Rules.
- Select an existing rule to edit, or create a new rule.
- Scroll to the bottom of the rule form and click Test Rule to expand the panel.
The panel uses the rule’s current configuration — pattern, detector type, action tier, and entity type — from the form fields above it. You do not need to save the rule first; the panel tests the in-progress configuration.
Entering sample text
Section titled “Entering sample text”Enter sample text in the Sample Text textarea. The text is sent to the backend test endpoint and is not stored or logged. Enter representative text that covers the patterns you expect the rule to match and text that should not match.
Click Run Test when ready. The button is disabled if the pattern field or sample text is empty.
Interpreting results
Section titled “Interpreting results”After the test completes, the panel shows:
| Element | Description |
|---|---|
| Match count badge | Total number of non-overlapping match regions found. Amber if matches found, green if no matches. |
| Entity type chips | Entity types detected (cyan chips). Populated from match results or from the rule’s entity type field when no matches have an entity type. |
| Action badge | The action tier that would fire for this rule (block = red, redact = orange, log_only = blue, prompt = purple). |
| Highlighted text | Sample text rendered with color-coded highlights over matched regions. Colors follow the action tier. |
| No matches message | Shown when match_count is 0. |
Overlapping regions are merged — if two patterns match adjacent or overlapping spans, the highlighted region shows the union span.
Detector type behavior
Section titled “Detector type behavior”The test endpoint supports three detector types, selected from the rule form:
| Detector type | Backend runs | Pattern field usage |
|---|---|---|
regex | Regex pattern match | Pattern must be a valid Python/RE2 regex |
ner | Presidio NER | Pattern is a comma-separated list of Presidio entity types (e.g., PERSON,EMAIL_ADDRESS) |
llm (GLiNER) | GLiNER zero-shot NER | Pattern is a comma-separated list of entity labels in natural language (e.g., person name,email address,phone number) |
The frontend maps llm to gliner when sending the request to the backend.
API reference — test endpoint
Section titled “API reference — test endpoint”POST /api/admin/dlp-rules/testAuthorization: Bearer <admin-token>Content-Type: application/jsonRequest body:
{ "pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b", "sample_text": "Customer SSN: 123-45-6789 and backup 987-65-4321", "rule_type": "regex"}| Field | Type | Required | Description |
|---|---|---|---|
pattern | string | Yes | Regex pattern, Presidio entity list, or GLiNER label list |
sample_text | string | Yes | Text to test against the pattern |
rule_type | string | Yes | regex, ner, or gliner |
Response body:
{ "matches": [ { "start": 14, "end": 25, "matched_text": "123-45-6789", "entity_type": null, "action": "log_only" }, { "start": 38, "end": 49, "matched_text": "987-65-4321", "entity_type": null, "action": "log_only" } ], "match_count": 2, "rule_type": "regex", "valid_pattern": true, "error": null}| Field | Type | Description |
|---|---|---|
matches | array | Match regions, each with start, end, matched_text, entity_type, action |
match_count | int | Total matches found |
rule_type | string | Echoed from request |
valid_pattern | bool | false if the pattern could not be compiled or is unsafe (ReDoS-prone) |
error | string or null | Error message when valid_pattern: false or an internal error occurred |
Error responses:
| HTTP status | Cause |
|---|---|
422 Unprocessable Entity | Missing required fields or invalid rule_type value |
401 Unauthorized | Missing or expired token |
403 Forbidden | Authenticated user does not have admin role |
The endpoint is a dry-run — no rule is saved and no audit log entry is written.
Evaluate endpoint — full chain simulation
Section titled “Evaluate endpoint — full chain simulation”The evaluate endpoint runs the complete DLP rule chain against sample text with an org/group/user context. This simulates the exact pipeline that runs on live traffic: all enabled DLP rules fire in sequence, compliance bundle overrides are applied, and the full action trace is returned.
Use the evaluate endpoint to:
- Confirm which rules fire on a specific text before a policy change goes live.
- Debug why a user or group is (or is not) seeing DLP blocks.
- Verify that bundle-level action overrides are applied correctly.
API reference — evaluate endpoint
Section titled “API reference — evaluate endpoint”POST /api/admin/dlp-rules/evaluateAuthorization: Bearer <admin-token>Content-Type: application/jsonRequest body:
{ "text": "Hello, my SSN is 123-45-6789 and my IBAN is GB29NWBK60161331926819", "direction": "input", "org_id": "550e8400-e29b-41d4-a716-446655440000", "group_id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "user_id": "6ba7b814-9dad-11d1-80b4-00c04fd430c8"}| Field | Type | Required | Description |
|---|---|---|---|
text | string | Yes | Text to evaluate through the full DLP chain |
direction | string | No | input (default) or output — controls which rule directions are evaluated |
org_id | UUID string | No | Org context for compliance bundle resolution |
group_id | UUID string | No | Group context for bundle overrides |
user_id | UUID string | No | User context |
Response body:
{ "findings": [ { "entity_type": "ssn", "matched_text": "123-45-6789", "start": 17, "end": 28, "confidence": 0.97, "tier": "regex", "action": "redact", "rule_id": "rule-uuid-ssn", "rule_name": "SSN detector", "bundle_override": null }, { "entity_type": "iban", "matched_text": "GB29NWBK60161331926819", "start": 43, "end": 65, "confidence": 1.0, "tier": "regex", "action": "block", "rule_id": "rule-uuid-iban", "rule_name": "IBAN detector", "bundle_override": "gdpr-bundle" } ], "action_trace": [ { "rule_id": "rule-uuid-ssn", "fired": true, "action": "redact", "confidence": 0.97 }, { "rule_id": "rule-uuid-iban", "fired": true, "action": "block", "confidence": 1.0 } ], "final_action": "block", "evaluated_at": "2026-03-10T14:00:00Z"}| Field | Type | Description |
|---|---|---|
findings | array | All DLP findings with entity type, span, confidence, tier, action, and bundle override info |
action_trace | array | Ordered list of rules that fired, with per-rule action and confidence |
final_action | string | The highest-severity action across all findings (block > redact > prompt > log_only) |
evaluated_at | ISO 8601 | Timestamp of the evaluation |
Common test scenarios
Section titled “Common test scenarios”SSN in context
Section titled “SSN in context”Tests whether a regex SSN pattern correctly identifies an SSN within surrounding text (not just isolated digits).
Sample text:
Employee record update: SSN 123-45-6789 processed.Note: previous ID 000-00-0000 was a placeholder.Expected behavior: Rule fires on 123-45-6789. The pattern \b\d{3}-\d{2}-\d{4}\b also matches 000-00-0000 — if you want to exclude known invalid SSNs, add a negative lookahead to the pattern before saving the rule.
Credit card formats
Section titled “Credit card formats”Credit card numbers appear in several formats. Test multiple formats to confirm your pattern covers all variants:
Sample text:
Card ending 4111111111111111Formatted: 4111-1111-1111-1111With spaces: 4111 1111 1111 1111A Luhn-validated regex pattern should match all three formats. If it misses a format, refine the pattern in the test panel before saving.
Multi-entity bundle
Section titled “Multi-entity bundle”Use the evaluate endpoint to verify a compliance bundle that covers multiple entity types:
Request:
{ "text": "Contact: john@example.com, SSN 987-65-4321, IBAN GB29NWBK60161331926819", "direction": "input", "group_id": "<gdpr-group-id>"}Verify in the response:
- All three entity types appear in
findings. bundle_overrideshowsgdpr-bundlefor the IBAN finding if the GDPR compliance bundle escalates IBAN toblock.final_actionisblockif any finding resolves toblock.
NER detector test (Presidio)
Section titled “NER detector test (Presidio)”Test a Presidio NER rule for PERSON and EMAIL_ADDRESS:
Pattern field (in rule form): PERSON,EMAIL_ADDRESS
Sample text: Please email Jane Smith at jane.smith@acme.com for approval.
The NER detector recognizes Jane Smith as PERSON and jane.smith@acme.com as EMAIL_ADDRESS. Both should appear in the match results with their entity types.
GLiNER (zero-shot) test
Section titled “GLiNER (zero-shot) test”GLiNER uses natural-language labels, not Presidio entity codes:
Pattern field: passport number,social security number,bank account
Sample text: Passport: AB123456, SSN: 555-12-3456
GLiNER should detect both. Confidence scores vary — if the default confidence threshold (0.70) is too aggressive, adjust confidence_threshold on the rule before saving.
Audit notes
Section titled “Audit notes”The test panel and evaluate endpoint are dry-run operations. No audit log entries are written for test runs. Audit entries are only written when the DLP pipeline processes live traffic. If you need a record of a test run for compliance purposes, capture the response body and store it externally.