DeBERTa Tier 3 — admin guide

Tier 3 adds contextual classification to the Outpost DLP pipeline. It runs a fine-tuned DeBERTa ONNX model (deberta-dlp-v2-r2) directly inside the Outpost process and classifies each text chunk as pii or clean. Chunks classified as pii with confidence ≥ 0.70 are appended to the merged findings from Tier 1 (regex) and Tier 2 (NER) and passed to the Policy Engine for enforcement.

Tier 3 is optional. When DEBERTA_MODEL_PATH is not set or the ONNX file is not present, the pipeline runs Tier 1 + Tier 2 only and logs an INFO message. No configuration change is required to run without Tier 3.

What Tier 3 does

Tier 1 and Tier 2 identify entity spans using pattern matching and named-entity recognition. Tier 3 provides a second-pass contextual validation layer:

The full request or response text is split into overlapping chunks of up to 450 characters at sentence boundaries.
Each chunk is tokenized (max 512 tokens) and passed through the DeBERTa model.
The model outputs logits for two classes: pii (class 0) and clean (class 1).
The softmax probability of pii is compared against the 0.70 confidence threshold.
Chunks that exceed the threshold produce a finding with tier: "deberta" and action_tier: "redact".

The model runs entirely within the Outpost — it makes no calls to Platform or Cloud.

Validated entity categories

Model: deberta-dlp-v2-r2. Validation set: 376 examples across 11 entity types.

Category	Entity types	Macro F1
Contact info	email, telephone	0.79
Credentials	api_key, username_password_combo	0.76
Infrastructure	ip_address	0.79
Government ID	ssn, passport	0.70
Financial identifiers	iban, credit_card	0.73
MNPI	material_contract, earnings_announcement	0.63
Multi-entity	name	0.89

Overall macro F1: 0.741. Overall accuracy: 0.606. The model is calibrated for high recall (0.89 overall) with moderate precision — false positives at the chunk level are expected and are filtered downstream by the Policy Engine confidence threshold.

Performance characteristics

Metric	Value
Average inference latency (CPU)	88.9 ms per request
P95 inference latency (CPU)	164 ms per request
Chunk size	450 chars / 512 tokens max
Minimum confidence for reporting	0.70

CPU is functional but GPU is recommended for production. On CPU, each inference call adds ~89 ms to the DLP pipeline. On a GPU node (NVIDIA T4 or equivalent), inference drops to 5–15 ms per chunk.

Deployment

Model artifact

The promoted ONNX artifact is:

/home/brian/models/Arbitex/deploy/deberta-dlp-v2/
├── model.onnx          ← required
├── tokenizer.json      ← required
├── tokenizer_config.json
└── vocab files         ← required by AutoTokenizer

DEBERTA_MODEL_PATH must point to model.onnx. The tokenizer is loaded from the same directory (os.path.dirname(DEBERTA_MODEL_PATH)).

Bare-metal or Docker deployment

Set the environment variable to the model file path:

DEBERTA_MODEL_PATH=/opt/arbitex/models/deberta/model.onnx
DLP_DEBERTA_ENABLED=true

Mount the model directory into the container:

volumes:
  - /host/path/to/deberta-dlp-v2:/app/models/deberta:ro
environment:
  DEBERTA_MODEL_PATH: /app/models/deberta/model.onnx
  DLP_DEBERTA_ENABLED: "true"

Kubernetes / Helm deployment

Add a model volume to the Outpost pod and set the Helm values:

outpost:
  dlpDebertaEnabled: true
  debertaModelPath: /app/models/deberta/model.onnx
  gpuEnabled: true   # set false for CPU-only deployment

# Add a volume for the model
extraVolumes:
  - name: deberta-model
    persistentVolumeClaim:
      claimName: deberta-model-pvc   # or hostPath, NFS, etc.

extraVolumeMounts:
  - name: deberta-model
    mountPath: /app/models/deberta
    readOnly: true

Deploy:

helm upgrade arbitex-outpost ./charts/arbitex-outpost \
  -f values-override.yaml \
  --set outpost.dlpDebertaEnabled=true \
  --set outpost.debertaModelPath=/app/models/deberta/model.onnx

For GPU nodes, the chart sets resource requests for nvidia.com/gpu: 1 when outpost.gpuEnabled: true.

Python dependencies

The Outpost image must include at least one of these dependency sets:

Runtime	Required packages	Notes
Preferred (optimum ORT)	`transformers`, `optimum[onnxruntime]`, `torch`	Richer HuggingFace API
Fallback (raw onnxruntime)	`transformers`, `onnxruntime`	Minimal deps, CPU only

Install:

# Preferred
pip install transformers optimum[onnxruntime] torch

# Minimal fallback
pip install transformers onnxruntime

Configuration reference

All settings are environment variables. The Outpost image uses Pydantic settings — prefix-free, case-insensitive.

Environment variable	Helm value	Type	Default	Description
`DLP_DEBERTA_ENABLED`	`outpost.dlpDebertaEnabled`	bool	`false`	Enable Tier 3. Must also set `DEBERTA_MODEL_PATH`.
`DEBERTA_MODEL_PATH`	`outpost.debertaModelPath`	string	`""`	Absolute path to `model.onnx`. When empty, Tier 3 is inactive.
`GPU_ENABLED`	`outpost.gpuEnabled`	bool	`false`	Request GPU resources (`nvidia.com/gpu: 1`). Enables CUDA execution provider.
`DLP_ENABLED`	`outpost.dlpEnabled`	bool	`true`	Master DLP toggle. Tier 3 requires this to be `true`.

Confidence threshold

The confidence threshold is fixed at 0.70 in the current model version. Chunks where P(pii) < 0.70 are silently discarded. Chunks at or above 0.70 generate a finding with:

{
  "entity_type": "pii",
  "tier": "deberta",
  "action_tier": "redact",
  "confidence": 0.84
}

The Policy Engine then applies compliance bundle rules. If a bundle’s DLP action for pii is log_only or block, that takes precedence over the Tier 3 default redact.

Escalation band

Tier 3 outputs only pii or clean — it does not identify specific entity types (SSN, email, etc.). Entity-type specificity comes from Tier 1 and Tier 2. Tier 3 confirms or discards their findings contextually.

Monitoring

Startup log

On successful load, the Outpost logs at INFO level:

INFO     outpost.dlp.deberta  DeBERTa Tier 3 loaded via optimum ORT — contextual classification active

or:

INFO     outpost.dlp.deberta  DeBERTa Tier 3 loaded via onnxruntime — contextual classification active

If the model is not configured:

INFO     outpost.dlp.deberta  DeBERTa Tier 3 not configured — using regex+NER only. Set DEBERTA_MODEL_PATH to enable.

Audit log entries

Each Tier 3 finding writes an audit log entry with the following fields:

Field	Value
`tier`	`deberta`
`action_tier`	`redact` (default, may be overridden by Policy Engine)
`confidence`	Float, e.g. `0.847`
`entity_type`	`pii`

To extract Tier 3 audit events:

# On the Outpost host
jq 'select(.tier == "deberta")' audit_buffer/audit.jsonl

# Via Platform audit API
GET /api/admin/audit?tier=deberta&limit=100

Confirm vs discard

Confirm: P(pii) ≥ 0.70 — chunk appended to findings, forwarded to Policy Engine.
Discard: P(pii) < 0.70 — chunk ignored, no audit entry written for the discarded chunk.

The ratio of confirmed to discarded chunks appears in debug logs when LOG_LEVEL=debug.

Troubleshooting

Tier 3 not activating

Symptom: Startup log shows “DeBERTa Tier 3 not configured” even after setting DEBERTA_MODEL_PATH.

Check:

Confirm the file exists at the configured path inside the container:
Terminal window
```
kubectl exec <pod> -- ls -la /app/models/deberta/model.onnx
```
Confirm DLP_DEBERTA_ENABLED=true is set.
Confirm the volume mount is correct — the path must match DEBERTA_MODEL_PATH exactly, pointing to the .onnx file, not the directory.

Model load failure

Symptom: Warning log: Failed to load DeBERTa ONNX model from '...': <error>. Falling back to regex+NER only.

Common causes and fixes:

Error message	Cause	Fix
`transformers not installed`	Package missing	`pip install transformers onnxruntime`
`No such file or directory`	Wrong path	Verify mount and `DEBERTA_MODEL_PATH` value
`ORT ONNX model load failed`	Corrupt or incompatible ONNX file	Re-export model with matching ONNX opset
`Cannot allocate memory`	Insufficient container memory	Increase memory limit (minimum 2 Gi for CPU, 8 Gi for GPU)

Incorrect label mapping

Symptom: Tier 3 consistently classifies clean text as pii or vice versa.

Root cause: The deberta-dlp-v2-r2 training checkpoint uses an inverted label order relative to the original spec — class 0 = pii, class 1 = clean. The runtime in outpost/dlp/deberta.py accounts for this correctly. If you load the model with a custom inference script, ensure DEBERTA_ENTITY_LABELS[0] = "pii" and DEBERTA_ENTITY_LABELS[1] = "clean".

Do not use argmin(logits) — use argmax(softmax(logits)) with the ["pii", "clean"] label order.

High latency on CPU

Symptom: Each DLP scan adds 100–200 ms to request latency.

Fix: Move the Outpost to a GPU node and set GPU_ENABLED=true. GPU inference reduces latency to 5–15 ms per chunk. Alternatively, reduce the proportion of long-text requests that require chunking, or disable Tier 3 for latency-sensitive Policy Pack paths using dlp_deberta_enabled=false in the relevant compliance bundle.

ONNX Runtime version mismatch

Symptom: InvalidGraph: Load model from ... failed: ... opset X not supported.

Fix: The model was exported with a specific ONNX opset. Install the matching onnxruntime version:

pip install onnxruntime==1.17.3   # or the version used during export

Check the model’s opset:

import onnx
m = onnx.load("/app/models/deberta/model.onnx")
print(m.opset_import)