Skip to content

ADR-002: Fail-Closed DLP Inference

Status: Accepted Date: 2026-03 Deciders: Platform team (platform-0043, T558)


The Arbitex Platform hosts GPU-accelerated DLP inference microservices (NER entity detection and DeBERTa contextual classification). These services can be temporarily unavailable due to:

  • Container restarts or OOM kills
  • GPU resource exhaustion
  • Rolling deployments
  • Startup latency (model loading takes time)

When a downstream inference microservice is unavailable, the platform must decide what to do with the pending DLP scan request: block the AI request (fail-closed) or allow it through unscanned (fail-open).

The original implementation was fail-open — the legacy path assumed that availability was more important than enforcement correctness. As Arbitex customers deployed into regulated environments (healthcare, finance, legal), this became unacceptable: an unavailable DLP scanner silently passing data to AI providers is a data protection failure.


Fail-closed by default. When a DLP inference microservice (NER or DeBERTa) is unavailable or returns an error, the affected request is blocked rather than passed unscanned.

The DLP_INFERENCE_FAIL_MODE environment variable controls this behaviour:

  • closed (default): Block the request when inference is unavailable.
  • open: Allow the request through unscanned (legacy behaviour).

Positive:

  • Enforces the security invariant: data cannot reach AI providers without passing DLP if DLP is configured.
  • Aligns with the principle of “secure by default” — operators who want fail-open must explicitly opt in.
  • Satisfies HIPAA, SOC 2, and financial DLP policy requirements where bypass of scanning controls is prohibited.
  • Prevents silent data exfiltration during infrastructure degradation events.

Negative / trade-offs:

  • Increased availability impact: GPU microservice downtime now directly blocks AI requests, not just silently skips scanning. Operators must ensure DLP microservice SLAs are high.
  • False negatives become hard errors. A flapping inference service can cause user-visible request failures.
  • Customers with existing deployments must opt in to the new default by removing or setting DLP_INFERENCE_FAIL_MODE=closed — no change required, but awareness needed during upgrades.

Mitigations:

  • Platform health checks alert on GPU microservice unavailability before it impacts DLP scanning.
  • Retry logic with short timeout window reduces impact of transient failures.
  • DLP_INFERENCE_FAIL_MODE=open provides a documented escape hatch for availability-prioritised deployments.