Skip to content

DLP pipeline testing guide

The Arbitex admin UI provides two tools for testing DLP behavior before deploying rule changes:

  • Test panel — validates a single rule’s pattern against sample text, showing highlighted match regions and the action that would fire.
  • Evaluate endpoint — simulates the full DLP rule chain against sample text with an org/group/user context, returning the complete action trace.

Use the test panel during rule authoring to iterate quickly on patterns. Use the evaluate endpoint to verify how the full rule chain behaves for a specific user or group, including compliance bundle overrides and confidence thresholds.


The test panel is embedded in the DLP rule editor (Admin > DLP Rules > select or create a rule). It appears as a collapsible “Test Rule” section at the bottom of the rule form.

  1. Navigate to Admin > DLP Rules.
  2. Select an existing rule to edit, or create a new rule.
  3. Scroll to the bottom of the rule form and click Test Rule to expand the panel.

The panel uses the rule’s current configuration — pattern, detector type, action tier, and entity type — from the form fields above it. You do not need to save the rule first; the panel tests the in-progress configuration.

Enter sample text in the Sample Text textarea. The text is sent to the backend test endpoint and is not stored or logged. Enter representative text that covers the patterns you expect the rule to match and text that should not match.

Click Run Test when ready. The button is disabled if the pattern field or sample text is empty.

After the test completes, the panel shows:

ElementDescription
Match count badgeTotal number of non-overlapping match regions found. Amber if matches found, green if no matches.
Entity type chipsEntity types detected (cyan chips). Populated from match results or from the rule’s entity type field when no matches have an entity type.
Action badgeThe action tier that would fire for this rule (block = red, redact = orange, log_only = blue, prompt = purple).
Highlighted textSample text rendered with color-coded highlights over matched regions. Colors follow the action tier.
No matches messageShown when match_count is 0.

Overlapping regions are merged — if two patterns match adjacent or overlapping spans, the highlighted region shows the union span.

The test endpoint supports three detector types, selected from the rule form:

Detector typeBackend runsPattern field usage
regexRegex pattern matchPattern must be a valid Python/RE2 regex
nerPresidio NERPattern is a comma-separated list of Presidio entity types (e.g., PERSON,EMAIL_ADDRESS)
llm (GLiNER)GLiNER zero-shot NERPattern is a comma-separated list of entity labels in natural language (e.g., person name,email address,phone number)

The frontend maps llm to gliner when sending the request to the backend.

POST /api/admin/dlp-rules/test
Authorization: Bearer <admin-token>
Content-Type: application/json

Request body:

{
"pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b",
"sample_text": "Customer SSN: 123-45-6789 and backup 987-65-4321",
"rule_type": "regex"
}
FieldTypeRequiredDescription
patternstringYesRegex pattern, Presidio entity list, or GLiNER label list
sample_textstringYesText to test against the pattern
rule_typestringYesregex, ner, or gliner

Response body:

{
"matches": [
{
"start": 14,
"end": 25,
"matched_text": "123-45-6789",
"entity_type": null,
"action": "log_only"
},
{
"start": 38,
"end": 49,
"matched_text": "987-65-4321",
"entity_type": null,
"action": "log_only"
}
],
"match_count": 2,
"rule_type": "regex",
"valid_pattern": true,
"error": null
}
FieldTypeDescription
matchesarrayMatch regions, each with start, end, matched_text, entity_type, action
match_countintTotal matches found
rule_typestringEchoed from request
valid_patternboolfalse if the pattern could not be compiled or is unsafe (ReDoS-prone)
errorstring or nullError message when valid_pattern: false or an internal error occurred

Error responses:

HTTP statusCause
422 Unprocessable EntityMissing required fields or invalid rule_type value
401 UnauthorizedMissing or expired token
403 ForbiddenAuthenticated user does not have admin role

The endpoint is a dry-run — no rule is saved and no audit log entry is written.


Evaluate endpoint — full chain simulation

Section titled “Evaluate endpoint — full chain simulation”

The evaluate endpoint runs the complete DLP rule chain against sample text with an org/group/user context. This simulates the exact pipeline that runs on live traffic: all enabled DLP rules fire in sequence, compliance bundle overrides are applied, and the full action trace is returned.

Use the evaluate endpoint to:

  • Confirm which rules fire on a specific text before a policy change goes live.
  • Debug why a user or group is (or is not) seeing DLP blocks.
  • Verify that bundle-level action overrides are applied correctly.
POST /api/admin/dlp-rules/evaluate
Authorization: Bearer <admin-token>
Content-Type: application/json

Request body:

{
"text": "Hello, my SSN is 123-45-6789 and my IBAN is GB29NWBK60161331926819",
"direction": "input",
"org_id": "550e8400-e29b-41d4-a716-446655440000",
"group_id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
"user_id": "6ba7b814-9dad-11d1-80b4-00c04fd430c8"
}
FieldTypeRequiredDescription
textstringYesText to evaluate through the full DLP chain
directionstringNoinput (default) or output — controls which rule directions are evaluated
org_idUUID stringNoOrg context for compliance bundle resolution
group_idUUID stringNoGroup context for bundle overrides
user_idUUID stringNoUser context

Response body:

{
"findings": [
{
"entity_type": "ssn",
"matched_text": "123-45-6789",
"start": 17,
"end": 28,
"confidence": 0.97,
"tier": "regex",
"action": "redact",
"rule_id": "rule-uuid-ssn",
"rule_name": "SSN detector",
"bundle_override": null
},
{
"entity_type": "iban",
"matched_text": "GB29NWBK60161331926819",
"start": 43,
"end": 65,
"confidence": 1.0,
"tier": "regex",
"action": "block",
"rule_id": "rule-uuid-iban",
"rule_name": "IBAN detector",
"bundle_override": "gdpr-bundle"
}
],
"action_trace": [
{ "rule_id": "rule-uuid-ssn", "fired": true, "action": "redact", "confidence": 0.97 },
{ "rule_id": "rule-uuid-iban", "fired": true, "action": "block", "confidence": 1.0 }
],
"final_action": "block",
"evaluated_at": "2026-03-10T14:00:00Z"
}
FieldTypeDescription
findingsarrayAll DLP findings with entity type, span, confidence, tier, action, and bundle override info
action_tracearrayOrdered list of rules that fired, with per-rule action and confidence
final_actionstringThe highest-severity action across all findings (block > redact > prompt > log_only)
evaluated_atISO 8601Timestamp of the evaluation

Tests whether a regex SSN pattern correctly identifies an SSN within surrounding text (not just isolated digits).

Sample text:

Employee record update: SSN 123-45-6789 processed.
Note: previous ID 000-00-0000 was a placeholder.

Expected behavior: Rule fires on 123-45-6789. The pattern \b\d{3}-\d{2}-\d{4}\b also matches 000-00-0000 — if you want to exclude known invalid SSNs, add a negative lookahead to the pattern before saving the rule.

Credit card numbers appear in several formats. Test multiple formats to confirm your pattern covers all variants:

Sample text:

Card ending 4111111111111111
Formatted: 4111-1111-1111-1111
With spaces: 4111 1111 1111 1111

A Luhn-validated regex pattern should match all three formats. If it misses a format, refine the pattern in the test panel before saving.

Use the evaluate endpoint to verify a compliance bundle that covers multiple entity types:

Request:

{
"text": "Contact: john@example.com, SSN 987-65-4321, IBAN GB29NWBK60161331926819",
"direction": "input",
"group_id": "<gdpr-group-id>"
}

Verify in the response:

  • All three entity types appear in findings.
  • bundle_override shows gdpr-bundle for the IBAN finding if the GDPR compliance bundle escalates IBAN to block.
  • final_action is block if any finding resolves to block.

Test a Presidio NER rule for PERSON and EMAIL_ADDRESS:

Pattern field (in rule form): PERSON,EMAIL_ADDRESS Sample text: Please email Jane Smith at jane.smith@acme.com for approval.

The NER detector recognizes Jane Smith as PERSON and jane.smith@acme.com as EMAIL_ADDRESS. Both should appear in the match results with their entity types.

GLiNER uses natural-language labels, not Presidio entity codes:

Pattern field: passport number,social security number,bank account Sample text: Passport: AB123456, SSN: 555-12-3456

GLiNER should detect both. Confidence scores vary — if the default confidence threshold (0.70) is too aggressive, adjust confidence_threshold on the rule before saving.


The test panel and evaluate endpoint are dry-run operations. No audit log entries are written for test runs. Audit entries are only written when the DLP pipeline processes live traffic. If you need a record of a test run for compliance purposes, capture the response body and store it externally.