DLP pipeline testing guide

The Arbitex admin UI provides two tools for testing DLP behavior before deploying rule changes:

Test panel — validates a single rule’s pattern against sample text, showing highlighted match regions and the action that would fire.
Evaluate endpoint — simulates the full DLP rule chain against sample text with an org/group/user context, returning the complete action trace.

Use the test panel during rule authoring to iterate quickly on patterns. Use the evaluate endpoint to verify how the full rule chain behaves for a specific user or group, including compliance bundle overrides and confidence thresholds.

Test panel — single rule test

The test panel is embedded in the DLP rule editor (Admin > DLP Rules > select or create a rule). It appears as a collapsible “Test Rule” section at the bottom of the rule form.

Opening the panel

Navigate to Admin > DLP Rules.
Select an existing rule to edit, or create a new rule.
Scroll to the bottom of the rule form and click Test Rule to expand the panel.

The panel uses the rule’s current configuration — pattern, detector type, action tier, and entity type — from the form fields above it. You do not need to save the rule first; the panel tests the in-progress configuration.

Entering sample text

Enter sample text in the Sample Text textarea. The text is sent to the backend test endpoint and is not stored or logged. Enter representative text that covers the patterns you expect the rule to match and text that should not match.

Click Run Test when ready. The button is disabled if the pattern field or sample text is empty.

Interpreting results

After the test completes, the panel shows:

Element	Description
Match count badge	Total number of non-overlapping match regions found. Amber if matches found, green if no matches.
Entity type chips	Entity types detected (cyan chips). Populated from match results or from the rule’s entity type field when no matches have an entity type.
Action badge	The action tier that would fire for this rule (block = red, redact = orange, log_only = blue, prompt = purple).
Highlighted text	Sample text rendered with color-coded highlights over matched regions. Colors follow the action tier.
No matches message	Shown when match_count is 0.

Overlapping regions are merged — if two patterns match adjacent or overlapping spans, the highlighted region shows the union span.

Detector type behavior

The test endpoint supports three detector types, selected from the rule form:

Detector type	Backend runs	Pattern field usage
`regex`	Regex pattern match	Pattern must be a valid Python/RE2 regex
`ner`	Presidio NER	Pattern is a comma-separated list of Presidio entity types (e.g., `PERSON,EMAIL_ADDRESS`)
`llm` (GLiNER)	GLiNER zero-shot NER	Pattern is a comma-separated list of entity labels in natural language (e.g., `person name,email address,phone number`)

The frontend maps llm to gliner when sending the request to the backend.

API reference — test endpoint

POST /api/admin/dlp-rules/test
Authorization: Bearer <admin-token>
Content-Type: application/json

Request body:

{
  "pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b",
  "sample_text": "Customer SSN: 123-45-6789 and backup 987-65-4321",
  "rule_type": "regex"
}

Field	Type	Required	Description
`pattern`	string	Yes	Regex pattern, Presidio entity list, or GLiNER label list
`sample_text`	string	Yes	Text to test against the pattern
`rule_type`	string	Yes	`regex`, `ner`, or `gliner`

Response body:

{
  "matches": [
    {
      "start": 14,
      "end": 25,
      "matched_text": "123-45-6789",
      "entity_type": null,
      "action": "log_only"
    },
    {
      "start": 38,
      "end": 49,
      "matched_text": "987-65-4321",
      "entity_type": null,
      "action": "log_only"
    }
  ],
  "match_count": 2,
  "rule_type": "regex",
  "valid_pattern": true,
  "error": null
}

Field	Type	Description
`matches`	array	Match regions, each with `start`, `end`, `matched_text`, `entity_type`, `action`
`match_count`	int	Total matches found
`rule_type`	string	Echoed from request
`valid_pattern`	bool	`false` if the pattern could not be compiled or is unsafe (ReDoS-prone)
`error`	string or null	Error message when `valid_pattern: false` or an internal error occurred

Error responses:

HTTP status	Cause
`422 Unprocessable Entity`	Missing required fields or invalid `rule_type` value
`401 Unauthorized`	Missing or expired token
`403 Forbidden`	Authenticated user does not have admin role

The endpoint is a dry-run — no rule is saved and no audit log entry is written.

Evaluate endpoint — full chain simulation

The evaluate endpoint runs the complete DLP rule chain against sample text with an org/group/user context. This simulates the exact pipeline that runs on live traffic: all enabled DLP rules fire in sequence, compliance bundle overrides are applied, and the full action trace is returned.

Use the evaluate endpoint to:

Confirm which rules fire on a specific text before a policy change goes live.
Debug why a user or group is (or is not) seeing DLP blocks.
Verify that bundle-level action overrides are applied correctly.

API reference — evaluate endpoint

POST /api/admin/dlp-rules/evaluate
Authorization: Bearer <admin-token>
Content-Type: application/json

Request body:

{
  "text": "Hello, my SSN is 123-45-6789 and my IBAN is GB29NWBK60161331926819",
  "direction": "input",
  "org_id": "550e8400-e29b-41d4-a716-446655440000",
  "group_id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
  "user_id": "6ba7b814-9dad-11d1-80b4-00c04fd430c8"
}

Field	Type	Required	Description
`text`	string	Yes	Text to evaluate through the full DLP chain
`direction`	string	No	`input` (default) or `output` — controls which rule directions are evaluated
`org_id`	UUID string	No	Org context for compliance bundle resolution
`group_id`	UUID string	No	Group context for bundle overrides
`user_id`	UUID string	No	User context

Response body:

{
  "findings": [
    {
      "entity_type": "ssn",
      "matched_text": "123-45-6789",
      "start": 17,
      "end": 28,
      "confidence": 0.97,
      "tier": "regex",
      "action": "redact",
      "rule_id": "rule-uuid-ssn",
      "rule_name": "SSN detector",
      "bundle_override": null
    },
    {
      "entity_type": "iban",
      "matched_text": "GB29NWBK60161331926819",
      "start": 43,
      "end": 65,
      "confidence": 1.0,
      "tier": "regex",
      "action": "block",
      "rule_id": "rule-uuid-iban",
      "rule_name": "IBAN detector",
      "bundle_override": "gdpr-bundle"
    }
  ],
  "action_trace": [
    { "rule_id": "rule-uuid-ssn", "fired": true, "action": "redact", "confidence": 0.97 },
    { "rule_id": "rule-uuid-iban", "fired": true, "action": "block", "confidence": 1.0 }
  ],
  "final_action": "block",
  "evaluated_at": "2026-03-10T14:00:00Z"
}

Field	Type	Description
`findings`	array	All DLP findings with entity type, span, confidence, tier, action, and bundle override info
`action_trace`	array	Ordered list of rules that fired, with per-rule action and confidence
`final_action`	string	The highest-severity action across all findings (`block` > `redact` > `prompt` > `log_only`)
`evaluated_at`	ISO 8601	Timestamp of the evaluation

Common test scenarios

SSN in context

Tests whether a regex SSN pattern correctly identifies an SSN within surrounding text (not just isolated digits).

Sample text:

Employee record update: SSN 123-45-6789 processed.
Note: previous ID 000-00-0000 was a placeholder.

Expected behavior: Rule fires on 123-45-6789. The pattern \b\d{3}-\d{2}-\d{4}\b also matches 000-00-0000 — if you want to exclude known invalid SSNs, add a negative lookahead to the pattern before saving the rule.

Credit card formats

Credit card numbers appear in several formats. Test multiple formats to confirm your pattern covers all variants:

Sample text:

Card ending 4111111111111111
Formatted: 4111-1111-1111-1111
With spaces: 4111 1111 1111 1111

A Luhn-validated regex pattern should match all three formats. If it misses a format, refine the pattern in the test panel before saving.

Multi-entity bundle

Use the evaluate endpoint to verify a compliance bundle that covers multiple entity types:

Request:

{
  "text": "Contact: john@example.com, SSN 987-65-4321, IBAN GB29NWBK60161331926819",
  "direction": "input",
  "group_id": "<gdpr-group-id>"
}

Verify in the response:

All three entity types appear in findings.
bundle_override shows gdpr-bundle for the IBAN finding if the GDPR compliance bundle escalates IBAN to block.
final_action is block if any finding resolves to block.

NER detector test (Presidio)

Test a Presidio NER rule for PERSON and EMAIL_ADDRESS:

Pattern field (in rule form): PERSON,EMAIL_ADDRESS Sample text: Please email Jane Smith at jane.smith@acme.com for approval.

The NER detector recognizes Jane Smith as PERSON and jane.smith@acme.com as EMAIL_ADDRESS. Both should appear in the match results with their entity types.

GLiNER (zero-shot) test

GLiNER uses natural-language labels, not Presidio entity codes:

Pattern field: passport number,social security number,bank account Sample text: Passport: AB123456, SSN: 555-12-3456

GLiNER should detect both. Confidence scores vary — if the default confidence threshold (0.70) is too aggressive, adjust confidence_threshold on the rule before saving.

Audit notes

The test panel and evaluate endpoint are dry-run operations. No audit log entries are written for test runs. Audit entries are only written when the DLP pipeline processes live traffic. If you need a record of a test run for compliance purposes, capture the response body and store it externally.