Skip to content

DeBERTa Tier 3 — admin guide

Tier 3 adds contextual classification to the Outpost DLP pipeline. It runs a fine-tuned DeBERTa ONNX model (deberta-dlp-v2-r2) directly inside the Outpost process and classifies each text chunk as pii or clean. Chunks classified as pii with confidence ≥ 0.70 are appended to the merged findings from Tier 1 (regex) and Tier 2 (NER) and passed to the Policy Engine for enforcement.

Tier 3 is optional. When DEBERTA_MODEL_PATH is not set or the ONNX file is not present, the pipeline runs Tier 1 + Tier 2 only and logs an INFO message. No configuration change is required to run without Tier 3.


Tier 1 and Tier 2 identify entity spans using pattern matching and named-entity recognition. Tier 3 provides a second-pass contextual validation layer:

  1. The full request or response text is split into overlapping chunks of up to 450 characters at sentence boundaries.
  2. Each chunk is tokenized (max 512 tokens) and passed through the DeBERTa model.
  3. The model outputs logits for two classes: pii (class 0) and clean (class 1).
  4. The softmax probability of pii is compared against the 0.70 confidence threshold.
  5. Chunks that exceed the threshold produce a finding with tier: "deberta" and action_tier: "redact".

The model runs entirely within the Outpost — it makes no calls to Platform or Cloud.

Model: deberta-dlp-v2-r2. Validation set: 376 examples across 11 entity types.

CategoryEntity typesMacro F1
Contact infoemail, telephone0.79
Credentialsapi_key, username_password_combo0.76
Infrastructureip_address0.79
Government IDssn, passport0.70
Financial identifiersiban, credit_card0.73
MNPImaterial_contract, earnings_announcement0.63
Multi-entityname0.89

Overall macro F1: 0.741. Overall accuracy: 0.606. The model is calibrated for high recall (0.89 overall) with moderate precision — false positives at the chunk level are expected and are filtered downstream by the Policy Engine confidence threshold.


MetricValue
Average inference latency (CPU)88.9 ms per request
P95 inference latency (CPU)164 ms per request
Chunk size450 chars / 512 tokens max
Minimum confidence for reporting0.70

CPU is functional but GPU is recommended for production. On CPU, each inference call adds ~89 ms to the DLP pipeline. On a GPU node (NVIDIA T4 or equivalent), inference drops to 5–15 ms per chunk.


The promoted ONNX artifact is:

/home/brian/models/Arbitex/deploy/deberta-dlp-v2/
├── model.onnx ← required
├── tokenizer.json ← required
├── tokenizer_config.json
└── vocab files ← required by AutoTokenizer

DEBERTA_MODEL_PATH must point to model.onnx. The tokenizer is loaded from the same directory (os.path.dirname(DEBERTA_MODEL_PATH)).

Set the environment variable to the model file path:

Terminal window
DEBERTA_MODEL_PATH=/opt/arbitex/models/deberta/model.onnx
DLP_DEBERTA_ENABLED=true

Mount the model directory into the container:

docker-compose.yml
volumes:
- /host/path/to/deberta-dlp-v2:/app/models/deberta:ro
environment:
DEBERTA_MODEL_PATH: /app/models/deberta/model.onnx
DLP_DEBERTA_ENABLED: "true"

Add a model volume to the Outpost pod and set the Helm values:

values-override.yaml
outpost:
dlpDebertaEnabled: true
debertaModelPath: /app/models/deberta/model.onnx
gpuEnabled: true # set false for CPU-only deployment
# Add a volume for the model
extraVolumes:
- name: deberta-model
persistentVolumeClaim:
claimName: deberta-model-pvc # or hostPath, NFS, etc.
extraVolumeMounts:
- name: deberta-model
mountPath: /app/models/deberta
readOnly: true

Deploy:

Terminal window
helm upgrade arbitex-outpost ./charts/arbitex-outpost \
-f values-override.yaml \
--set outpost.dlpDebertaEnabled=true \
--set outpost.debertaModelPath=/app/models/deberta/model.onnx

For GPU nodes, the chart sets resource requests for nvidia.com/gpu: 1 when outpost.gpuEnabled: true.

The Outpost image must include at least one of these dependency sets:

RuntimeRequired packagesNotes
Preferred (optimum ORT)transformers, optimum[onnxruntime], torchRicher HuggingFace API
Fallback (raw onnxruntime)transformers, onnxruntimeMinimal deps, CPU only

Install:

Terminal window
# Preferred
pip install transformers optimum[onnxruntime] torch
# Minimal fallback
pip install transformers onnxruntime

All settings are environment variables. The Outpost image uses Pydantic settings — prefix-free, case-insensitive.

Environment variableHelm valueTypeDefaultDescription
DLP_DEBERTA_ENABLEDoutpost.dlpDebertaEnabledboolfalseEnable Tier 3. Must also set DEBERTA_MODEL_PATH.
DEBERTA_MODEL_PATHoutpost.debertaModelPathstring""Absolute path to model.onnx. When empty, Tier 3 is inactive.
GPU_ENABLEDoutpost.gpuEnabledboolfalseRequest GPU resources (nvidia.com/gpu: 1). Enables CUDA execution provider.
DLP_ENABLEDoutpost.dlpEnabledbooltrueMaster DLP toggle. Tier 3 requires this to be true.

The confidence threshold is fixed at 0.70 in the current model version. Chunks where P(pii) < 0.70 are silently discarded. Chunks at or above 0.70 generate a finding with:

{
"entity_type": "pii",
"tier": "deberta",
"action_tier": "redact",
"confidence": 0.84
}

The Policy Engine then applies compliance bundle rules. If a bundle’s DLP action for pii is log_only or block, that takes precedence over the Tier 3 default redact.

Tier 3 outputs only pii or clean — it does not identify specific entity types (SSN, email, etc.). Entity-type specificity comes from Tier 1 and Tier 2. Tier 3 confirms or discards their findings contextually.


On successful load, the Outpost logs at INFO level:

INFO outpost.dlp.deberta DeBERTa Tier 3 loaded via optimum ORT — contextual classification active

or:

INFO outpost.dlp.deberta DeBERTa Tier 3 loaded via onnxruntime — contextual classification active

If the model is not configured:

INFO outpost.dlp.deberta DeBERTa Tier 3 not configured — using regex+NER only. Set DEBERTA_MODEL_PATH to enable.

Each Tier 3 finding writes an audit log entry with the following fields:

FieldValue
tierdeberta
action_tierredact (default, may be overridden by Policy Engine)
confidenceFloat, e.g. 0.847
entity_typepii

To extract Tier 3 audit events:

Terminal window
# On the Outpost host
jq 'select(.tier == "deberta")' audit_buffer/audit.jsonl
# Via Platform audit API
GET /api/admin/audit?tier=deberta&limit=100
  • Confirm: P(pii) ≥ 0.70 — chunk appended to findings, forwarded to Policy Engine.
  • Discard: P(pii) < 0.70 — chunk ignored, no audit entry written for the discarded chunk.

The ratio of confirmed to discarded chunks appears in debug logs when LOG_LEVEL=debug.


Symptom: Startup log shows “DeBERTa Tier 3 not configured” even after setting DEBERTA_MODEL_PATH.

Check:

  1. Confirm the file exists at the configured path inside the container:
    Terminal window
    kubectl exec <pod> -- ls -la /app/models/deberta/model.onnx
  2. Confirm DLP_DEBERTA_ENABLED=true is set.
  3. Confirm the volume mount is correct — the path must match DEBERTA_MODEL_PATH exactly, pointing to the .onnx file, not the directory.

Symptom: Warning log: Failed to load DeBERTa ONNX model from '...': <error>. Falling back to regex+NER only.

Common causes and fixes:

Error messageCauseFix
transformers not installedPackage missingpip install transformers onnxruntime
No such file or directoryWrong pathVerify mount and DEBERTA_MODEL_PATH value
ORT ONNX model load failedCorrupt or incompatible ONNX fileRe-export model with matching ONNX opset
Cannot allocate memoryInsufficient container memoryIncrease memory limit (minimum 2 Gi for CPU, 8 Gi for GPU)

Symptom: Tier 3 consistently classifies clean text as pii or vice versa.

Root cause: The deberta-dlp-v2-r2 training checkpoint uses an inverted label order relative to the original spec — class 0 = pii, class 1 = clean. The runtime in outpost/dlp/deberta.py accounts for this correctly. If you load the model with a custom inference script, ensure DEBERTA_ENTITY_LABELS[0] = "pii" and DEBERTA_ENTITY_LABELS[1] = "clean".

Do not use argmin(logits) — use argmax(softmax(logits)) with the ["pii", "clean"] label order.

Symptom: Each DLP scan adds 100–200 ms to request latency.

Fix: Move the Outpost to a GPU node and set GPU_ENABLED=true. GPU inference reduces latency to 5–15 ms per chunk. Alternatively, reduce the proportion of long-text requests that require chunking, or disable Tier 3 for latency-sensitive Policy Pack paths using dlp_deberta_enabled=false in the relevant compliance bundle.

Symptom: InvalidGraph: Load model from ... failed: ... opset X not supported.

Fix: The model was exported with a specific ONNX opset. Install the matching onnxruntime version:

Terminal window
pip install onnxruntime==1.17.3 # or the version used during export

Check the model’s opset:

import onnx
m = onnx.load("/app/models/deberta/model.onnx")
print(m.opset_import)