DLP Pipeline Configuration
The Arbitex DLP pipeline inspects every prompt and model response before it reaches its destination. This guide explains how to configure each tier, define custom detection rules, map findings to policy actions, and test your configuration without affecting production traffic.
Configuration requires the Org Admin role.
Architecture overview
Section titled “Architecture overview”The pipeline runs three tiers sequentially. A finding from an earlier tier does not stop later tiers — all tiers run and their results are merged before the policy engine evaluates actions.
Request prompt │ ▼┌─────────────────────────────┐│ Tier 1: Regex matching │ ~1 ms 65+ built-in patterns + custom rules└─────────────┬───────────────┘ │ ▼┌─────────────────────────────┐│ Tier 2: NER (GLiNER) │ ~20 ms Named entity recognition, zero-shot└─────────────┬───────────────┘ │ ▼┌─────────────────────────────┐│ Tier 3: DeBERTa NLI │ ~150 ms Contextual validation of ambiguous hits└─────────────┬───────────────┘ │ ▼ Policy engine (ALLOW / REDACT / BLOCK) │ ▼ Model providerTiers 1 and 2 are always active. Tier 3 is configurable per organization and runs only on traffic that passes Tiers 1 and 2 without a definitive finding — keeping latency low for clean traffic.
Tier 1: Regex matching
Section titled “Tier 1: Regex matching”Built-in patterns
Section titled “Built-in patterns”Arbitex ships 65+ platform patterns covering:
| Category | Examples |
|---|---|
| Financial | Credit card (Luhn-validated), IBAN+BIC, routing numbers, SWIFT |
| Government IDs | US SSN, EIN, passport formats (15+ countries) |
| Health | US NPI, DEA numbers, ICD-10 patterns |
| Cloud credentials | AWS keys, GCP service accounts, Azure SAS tokens |
| Generic secrets | Bearer tokens, JWTs, private key headers, connection strings |
| Contact | Email, US/international phone, postal codes |
To list all available platform patterns:
curl "https://gateway.arbitex.ai/api/admin/dlp-rules?source=platform&limit=100" \ -H "Authorization: Bearer $ARBITEX_API_KEY"Each pattern entry includes name, rule_type (regex), pattern, enabled, and action_tier (which tier triggers the action).
Disabling a built-in pattern
Section titled “Disabling a built-in pattern”If a built-in pattern generates false positives for your workload, disable it at the org level without deleting the platform rule:
# First, get the platform rule ID you want to suppressRULE_ID="dlprule_01HZ..."
# Create an org-level override that disables itcurl -X POST "https://gateway.arbitex.ai/api/orgs/{org_id}/dlp-rules" \ -H "Authorization: Bearer $ARBITEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "rule_type": "regex", "name": "suppress-phone-false-positives", "target_rule_id": "'$RULE_ID'", "enabled": false }'The org-level override takes precedence. The platform default is suppressed only for your organization.
Creating a custom regex rule
Section titled “Creating a custom regex rule”Add a custom pattern for data types specific to your organization — internal account numbers, proprietary identifiers, or industry-specific formats:
curl -X POST "https://gateway.arbitex.ai/api/admin/dlp-rules" \ -H "Authorization: Bearer $ARBITEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "internal-employee-id", "rule_type": "regex", "pattern": "\\bEMP-[0-9]{6}\\b", "custom_entity_type": "EMPLOYEE_ID", "action_tier": 1, "enabled": true }'| Field | Description |
|---|---|
name | Human-readable rule name (must be unique within org) |
rule_type | "regex", "ner", or "gliner" |
pattern | Python re-compatible regex. Tested with a 1-second timeout per request |
custom_entity_type | Label applied to findings from this rule (appears in audit log and DLP events) |
action_tier | Which pipeline tier the finding is attributed to for action mapping |
enabled | true to activate immediately |
Pattern best practices:
- Use word boundary anchors (
\b) to avoid partial matches - Test your pattern with the DLP test endpoint before enabling in production (see Testing rules)
- Avoid catastrophic backtracking — the gateway enforces a 1-second regex timeout; rules that exceed it are disabled automatically
Testing a regex pattern
Section titled “Testing a regex pattern”Before saving a rule, test it against sample text:
curl -X POST "https://gateway.arbitex.ai/api/admin/dlp-rules/test" \ -H "Authorization: Bearer $ARBITEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "rule_type": "regex", "pattern": "\\bEMP-[0-9]{6}\\b", "text": "Please update EMP-042891 employee record with new address." }'{ "matches": [ { "match": "EMP-042891", "start": 14, "end": 24, "confidence": 1.0 } ], "elapsed_ms": 0.4}A matches array with at least one entry confirms the pattern works. elapsed_ms shows the regex execution time — keep this well under 100 ms to ensure safe production performance.
Tier 2: NER (Named Entity Recognition)
Section titled “Tier 2: NER (Named Entity Recognition)”Tier 2 uses a GLiNER zero-shot NER model to identify unstructured PII in free text that doesn’t match fixed regex patterns — person names, organization names, street addresses, and similar contextual entities.
Default NER entity types
Section titled “Default NER entity types”The following entity types are active by default:
| Entity type | Examples |
|---|---|
PERSON | ”John Smith”, “Dr. Sarah Chen” |
ORG | ”Acme Corporation”, “First National Bank” |
LOCATION | ”123 Main St, Springfield”, “London” |
PHONE_NUMBER | ”555-867-5309”, “+1 (415) 000-0000” |
EMAIL_ADDRESS | ”alice@example.com” |
DATE_OF_BIRTH | ”born March 3, 1985” |
MEDICAL_RECORD | Clinical notes, diagnosis references |
Creating a custom NER rule
Section titled “Creating a custom NER rule”Add NER rules targeting additional entity types the GLiNER model can identify:
curl -X POST "https://gateway.arbitex.ai/api/admin/dlp-rules" \ -H "Authorization: Bearer $ARBITEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "detect-vendor-names", "rule_type": "gliner", "custom_entity_type": "VENDOR_NAME", "action_tier": 2, "enabled": true }'For NER rules, pattern is not required — the entity type label instructs the GLiNER model on what to look for.
NER confidence threshold
Section titled “NER confidence threshold”NER detections include a confidence score (0.0–1.0). By default, findings with confidence ≥ 0.60 are reported. To adjust the threshold for your organization:
curl -X PATCH "https://gateway.arbitex.ai/api/admin/orgs/{org_id}/dlp-config" \ -H "Authorization: Bearer $ARBITEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "ner_confidence_threshold": 0.75 }'Increasing the threshold reduces false positives but may miss true detections. Decreasing it catches more content but increases false positive rate. The recommended starting point is 0.60–0.70 for most workloads.
Tier 3: DeBERTa contextual validation
Section titled “Tier 3: DeBERTa contextual validation”Tier 3 runs a DeBERTa NLI (Natural Language Inference) model on content that passes Tiers 1 and 2 without a definitive finding. It validates whether ambiguous content is genuinely sensitive in context.
Sensitivity modes
Section titled “Sensitivity modes”DeBERTa operates in one of two sensitivity modes:
| Mode | Ambiguous score range (0.35–0.70) | Behavior |
|---|---|---|
standard (default) | Pass | Audit flag raised, no block |
high | Soft block | Request rejected with dlp_soft_block |
A definitive DeBERTa score (above 0.70 or below 0.35) always triggers a block or pass regardless of sensitivity mode.
Set your organization’s sensitivity:
curl -X PATCH "https://gateway.arbitex.ai/api/admin/orgs/{org_id}/dlp-config" \ -H "Authorization: Bearer $ARBITEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "dlp_sensitivity": "high" }'Use "high" for organizations handling regulated data (HIPAA, PCI-DSS, GDPR) where false negatives are more costly than false positives. Use "standard" for general-purpose workloads where latency and user experience take priority.
Enabling Credential Intelligence (CredInt)
Section titled “Enabling Credential Intelligence (CredInt)”CredInt runs in parallel with Tier 3. It checks detected credentials against a breach corpus of known-compromised secrets. When enabled, a credential that matches a known-breached value receives elevated severity regardless of DeBERTa’s confidence score.
curl -X PATCH "https://gateway.arbitex.ai/api/admin/orgs/{org_id}/dlp-config" \ -H "Authorization: Bearer $ARBITEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "credint_enabled": true }'Under high sensitivity, a CredInt critical or high frequency bucket hit triggers a soft block. Under standard sensitivity, it raises an elevated audit flag without blocking.
See Credential Intelligence for full details.
Policy action mapping
Section titled “Policy action mapping”DLP findings do not directly block requests — they produce findings that the policy engine evaluates. You map DLP finding types to actions using policy rules.
Action types
Section titled “Action types”| Action | Description |
|---|---|
ALLOW | Finding is logged; request proceeds unmodified |
REDACT | Matched text is replaced with [REDACTED-{TYPE}] before forwarding |
BLOCK | Request is rejected with HTTP 400 |
REQUIRE_APPROVAL | Request is held for human review; requester receives a pending status |
Example: block SSN in prompts
Section titled “Example: block SSN in prompts”Create a policy rule that blocks any prompt containing an SSN finding:
curl -X POST "https://gateway.arbitex.ai/api/admin/orgs/{org_id}/policy-rules" \ -H "Authorization: Bearer $ARBITEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "block-ssn-in-prompt", "description": "Block requests where the prompt contains a Social Security Number", "conditions": { "dlp_finding_types": ["SSN"], "dlp_locations": ["prompt"] }, "action": "BLOCK", "enabled": true }'Example: redact credit card numbers in responses
Section titled “Example: redact credit card numbers in responses”Redact credit card numbers from model responses before they reach the user:
curl -X POST "https://gateway.arbitex.ai/api/admin/orgs/{org_id}/policy-rules" \ -H "Authorization: Bearer $ARBITEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "redact-cc-in-response", "description": "Redact credit card numbers from model responses", "conditions": { "dlp_finding_types": ["CREDIT_CARD"], "dlp_locations": ["response"] }, "action": "REDACT", "enabled": true }'Example: require approval for medical records
Section titled “Example: require approval for medical records”Route requests containing medical identifiers to a human reviewer:
curl -X POST "https://gateway.arbitex.ai/api/admin/orgs/{org_id}/policy-rules" \ -H "Authorization: Bearer $ARBITEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "approve-medical-record-access", "description": "Hold requests containing NPI or medical record references for approval", "conditions": { "dlp_finding_types": ["NPI", "MEDICAL_RECORD"] }, "action": "REQUIRE_APPROVAL", "enabled": true }'See Policy Engine user guide and Policy Engine API reference for full rule schema documentation.
Testing with the policy simulator
Section titled “Testing with the policy simulator”Before deploying rule changes to production, use the policy simulator to preview what will happen to a given request:
curl -X POST "https://gateway.arbitex.ai/api/admin/orgs/{org_id}/policy/simulate" \ -H "Authorization: Bearer $ARBITEX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Please update employee EMP-042891. Their SSN is 123-45-6789.", "model": "anthropic/claude-sonnet-4-20250514", "user_id": "usr_test_alice" }'{ "outcome": "BLOCK", "dlp_findings": [ { "tier": 1, "type": "EMPLOYEE_ID", "match": "EMP-042891", "confidence": 1.0, "location": "prompt" }, { "tier": 1, "type": "SSN", "match": "123-45-6789", "confidence": 1.0, "location": "prompt" } ], "policy_rules_evaluated": [ { "rule_id": "pr_01HZ...", "name": "block-ssn-in-prompt", "matched": true, "action": "BLOCK" }, { "rule_id": "pr_02HZ...", "name": "block-pii-in-prompt", "matched": true, "action": "BLOCK" } ], "effective_action": "BLOCK", "simulation_only": true}The simulator runs the full DLP pipeline and policy evaluation without making any real requests to the model or writing to the audit log. simulation_only: true confirms no production traffic was affected.
Use the simulator to:
- Verify a new rule catches what you expect before enabling it
- Confirm that disabling a rule doesn’t leave a gap in coverage
- Test edge cases and boundary conditions with sample inputs
- Train your team on how the pipeline behaves
See Outpost Policy Simulator for simulator usage from outpost deployments.
Viewing effective rules
Section titled “Viewing effective rules”To see the merged set of rules actually applied to your organization (platform defaults + org overrides):
curl "https://gateway.arbitex.ai/api/orgs/{org_id}/dlp-rules/effective" \ -H "Authorization: Bearer $ARBITEX_API_KEY"{ "rules": [ { "id": "dlprule_platform_01", "source": "platform", "name": "credit-card-luhn", "rule_type": "regex", "enabled": true, "action_tier": 1, "custom_entity_type": "CREDIT_CARD" }, { "id": "dlprule_org_01HZ", "source": "org", "name": "internal-employee-id", "rule_type": "regex", "enabled": true, "action_tier": 1, "custom_entity_type": "EMPLOYEE_ID" } ], "total": 67}The source field shows whether each rule comes from the platform defaults or your org-level configuration.
Rule versioning
Section titled “Rule versioning”Every change to a DLP rule creates an immutable version record. View the history of a rule:
curl "https://gateway.arbitex.ai/api/admin/dlp-rules/{rule_id}/versions" \ -H "Authorization: Bearer $ARBITEX_API_KEY"{ "versions": [ { "version": 2, "changed_by": "admin@example.com", "changed_at": "2026-03-12T14:00:00Z", "change_type": "update", "old_pattern": "\\bEMP-[0-9]{5}\\b", "new_pattern": "\\bEMP-[0-9]{6}\\b", "reason": "Employee IDs extended to 6 digits" }, { "version": 1, "changed_by": "admin@example.com", "changed_at": "2026-03-01T09:00:00Z", "change_type": "create", "new_pattern": "\\bEMP-[0-9]{5}\\b" } ]}See DLP Rules API reference for bulk export, import, and full version history APIs.
Reference
Section titled “Reference”| Resource | Link |
|---|---|
| DLP event monitoring | DLP Event Monitoring |
| DLP Rules API | API Reference: DLP Rules |
| Policy Engine user guide | Policy Engine User Guide |
| Policy Engine API | API Reference: Policy Engine |
| Policy simulator | Outpost Policy Simulator |
| Credential Intelligence | Credential Intelligence |