Skip to content

DLP Pipeline Wiring Administration

The Arbitex DLP pipeline runs in three tiers — Regex (Tier 1), NER (Tier 2), and DeBERTa (Tier 3). Platform-0039 adds two layers of org-specific customization wired into the intake pipeline:

  • OrgDLPLayer — org-level custom regex patterns and suppression of platform defaults
  • GroupDLPConfig — per-group detector overrides using a most-restrictive-wins algorithm

This guide covers administration of these layers, including the data model, action tiers, and how the pipeline evaluates conflicting group overrides.


Request
Stage 2: Payload Analysis
├─ 1. Platform DLP scan (Regex + NER + DeBERTa)
├─ 2. GroupDLPConfig overrides (T528)
│ Load group IDs for user → fetch enabled GroupDLPConfig rows
│ → compute most-restrictive action per detector
│ → detectors with SKIP action are excluded before scan
└─ 3. OrgDLPLayer application (T527)
→ append org custom pattern matches
→ filter_platform_matches (suppress platform-matched entities)

GroupDLPConfig overrides are applied before the DLP scan (to exclude detectors). OrgDLPLayer is applied after the platform scan (to add matches and suppress specific platform results).

Both layers fail-safe: if either fails to load or execute, the pipeline continues without that layer. A warning log is emitted.


The OrgDLPLayer provides two operations:

  1. scan_custom_patterns(prompt_text) — runs org-defined regex patterns against the prompt and returns additional DLP matches. These are appended to the existing platform matches.
  2. filter_platform_matches(matches) — removes matches whose entity type is targeted by a suppress_default rule for the org.

Org rules are stored in the org_dlp_rules table (OrgDLPRule) and managed via the org DLP rule service. Each rule has a rule_type:

rule_typeDescription
custom_patternAdds a new regex pattern to scan alongside platform defaults
suppress_defaultSuppresses a specific platform-level default detector for this org

Rules are soft-deleted — setting deleted_at excludes the rule from active queries while retaining audit history. The cache is invalidated after every mutation.

FieldTypeDescription
idUUIDRule identifier
org_idUUIDOwning organization UUID
rule_typestringcustom_pattern or suppress_default
namestringHuman-readable rule name
patternstringRegex pattern (required for custom_pattern, null for suppress_default)
target_rule_idUUIDPlatform rule ID to suppress (for suppress_default). Also matched by name.
enabledboolWhether the rule is currently active
action_tierstringDLP action tier for custom patterns (see below)
custom_entity_typestringEntity type label for matches. Defaults to org_custom_pattern if null.
created_byUUIDUUID of the admin who created the rule
deleted_attimestampSoft-delete timestamp (null if active)

The action_tier field on a custom_pattern rule determines the DLP action applied when the pattern matches:

action_tierBehavior
log_onlyMatch is recorded in the audit log. Request proceeds unmodified.
redactMatched text is redacted before the prompt reaches the provider.
blockRequest is blocked with HTTP 400.
promptUser is prompted for justification (ALLOW_WITH_OVERRIDE governance flow).

Default: log_only.

The custom_entity_type field sets the entity type label for matches from this rule. This label appears in audit entries, DLP event records, and the OCSF events streamed to SIEM.

If custom_entity_type is null, matches are labeled org_custom_pattern.

Use descriptive labels that map to your data classification taxonomy, for example: internal_project_code, patient_id, contract_number.

Org DLP rules are cached per-org with a 60-second TTL (_org_rules_cache in org_dlp_rules.py). invalidate_org_rules_cache(org_id) is called after every create, update, or delete to ensure changes take effect within one TTL window.

get_effective_rules(db, org_id) returns the merged view of platform defaults and org-specific rules:

{
"org_id": "...",
"platform_rules_count": 42,
"org_rules_count": 3,
"suppressed_count": 1,
"rules": [
{
"name": "credit_card",
"source": "platform",
"rule_type": "platform_default",
"pattern": "...",
"enabled": true,
"suppressed": true,
"suppressed_by": "suppress-rule-uuid"
},
{
"name": "Internal Project Code",
"source": "org",
"rule_type": "custom_pattern",
"pattern": "PRJ-[0-9]{4}",
"enabled": true,
"suppressed": false,
"suppressed_by": null
}
]
}

GroupDLPConfig allows group administrators to override per-detector DLP behavior for members of a group. When a user belongs to groups with conflicting detector configurations, the most-restrictive-wins algorithm determines the effective action.

GroupDLPConfig rows are stored in the groups table family (backend/app/models/group.py). Each row ties a group to a DLP detector with an action override:

FieldTypeDescription
group_idUUIDGroup identifier
detector_namestringDLP detector name (e.g., regex, ner, deberta)
actionGroupDLPActionOverride action for this detector when triggered
enabledboolWhether this override is active
ValuePriorityBehavior
BLOCK4 (highest)Block the request
REDACT3Redact matched text
CANCEL2Cancel the request without error
SKIP1 (lowest)Skip (exempt) this detector for group members

When a user belongs to multiple groups with different actions for the same detector:

  1. Load all enabled GroupDLPConfig rows for the user’s groups.
  2. For each detector, track the highest-priority action across all groups.
  3. The effective action is the one with the highest priority number (BLOCK > REDACT > CANCEL > SKIP).

Example: A user is a member of Group A (detector ner, action SKIP) and Group B (detector ner, action REDACT). The effective action is REDACT (priority 3 > priority 1).

SKIP behavior: Detectors whose effective action is SKIP are excluded from the scan pipeline for that request. The detector does not run — it does not produce matches, not even log_only ones.

User in groups: [Group A, Group B]
GroupDLPConfig rows:
- Group A, detector=ner, action=SKIP
- Group B, detector=ner, action=REDACT
Effective: ner → REDACT (priority 3 wins over SKIP priority 1)
Result: ner detector runs, REDACT applied on match
User in groups: [Group A, Group C]
GroupDLPConfig rows:
- Group A, detector=ner, action=SKIP
- Group C, detector=regex, action=BLOCK
Effective: ner → SKIP, regex → BLOCK
Result: ner excluded from scan; regex runs and blocks on match

If GroupDLPConfig loading fails (DB query error), the pipeline continues without group overrides — all detectors run with default platform behavior. A warning log is emitted:

Failed to load GroupDLPConfig for user <user_id> — continuing without overrides

All mutations to OrgDLPRule rows are recorded in the org_dlp_rule_audit table (OrgDLPRuleAudit):

FieldDescription
actioncreated, updated, deleted, enabled, or disabled
actor_idUUID of the admin who performed the action
old_valueJSONB snapshot of the previous state (null on create)
new_valueJSONB snapshot of the new state (null on delete)
created_atTimestamp of the audit entry

The audit table provides an immutable history of all org DLP rule changes. Rows are never updated or deleted.


Add a custom pattern that redacts internal project codes

Section titled “Add a custom pattern that redacts internal project codes”
POST /api/admin/org-dlp-rules/
{
"rule_type": "custom_pattern",
"name": "Internal Project Code",
"pattern": "PRJ-[0-9]{4}",
"enabled": true,
"action_tier": "redact",
"custom_entity_type": "internal_project_code"
}

Suppress the platform credit card detector for an org

Section titled “Suppress the platform credit card detector for an org”
POST /api/admin/org-dlp-rules/
{
"rule_type": "suppress_default",
"name": "credit_card",
"target_rule_id": "<platform-credit-card-rule-uuid>",
"enabled": true
}

The platform pattern is also matched by name (name field must equal the platform pattern name). Both target_rule_id and name are checked during suppression resolution.

Grant a group SKIP exemption on the DeBERTa detector

Section titled “Grant a group SKIP exemption on the DeBERTa detector”

Useful for technical teams whose prompts regularly contain source code patterns that DeBERTa flags:

POST /api/admin/groups/{group_id}/dlp-config
{
"detector_name": "deberta",
"action": "SKIP",
"enabled": true
}