Credential Intelligence

Credential Intelligence (CredInt) is an optional DLP subsystem that detects known-compromised credentials in AI prompts and responses. When a user pastes a password, API key, or bearer token into an AI request, CredInt checks whether that credential has appeared in a known breach corpus and routes the request based on its breach frequency.

CredInt operates as a non-blocking side-channel: it never delays AI responses. Results arrive asynchronously and are written to the audit log.

Architecture

L1 Extraction — Credential Candidate Detection

Before any network call is made, a synchronous L1 extractor (CredentialExtractor) scans the request text for credential candidates using four pattern families:

Pattern family	Example matches
Explicit assignment	`password=s3cr3t`, `api_key=abc123`, `secret=VALUE`
Environment variable	`DB_PASSWORD=mypassword`, `AWS_SECRET_ACCESS_KEY=...`
Authorization header	`Authorization: Bearer TOKEN`, `X-API-Key: VALUE`
High-entropy token	Quoted or bare tokens with Shannon entropy > 3.5, 2+ character classes

The extractor deduplicates candidates by value and sorts them by position in the text. Candidate cleartext is never logged at any point in the pipeline — only the SHA-1 prefix (first 8 hex chars) appears in audit records.

High-entropy token detection uses:

Shannon entropy threshold: > 3.5 bits/character
Minimum character class requirement: 2+ classes (letters, digits, symbols)
False-positive filters: UUIDs, URLs, and common English words are excluded

Breach Corpus — 861M+ Known Compromised Credentials

Arbitex integrates with a separate CredInt microservice (credint:8202) that maintains the breach corpus. The corpus contains SHA-1 hashes of over 861 million credentials drawn from known public breach datasets. The corpus is frequency-weighted: each hash carries a bucket label indicating how many times the credential has appeared across breaches.

The microservice is not part of the platform container image — it is a separately deployed service. Configuration:

CREDINT_SERVICE_URL=http://credint:8202   # default
CREDINT_SERVICE_TIMEOUT=3.0               # seconds

k-Anonymity Lookup Protocol

CredInt uses a k-anonymity protocol to protect credential privacy during lookup:

The platform computes the SHA-1 hash of each credential candidate locally.
Only the 5-character hex prefix (20 bits) is sent to the CredInt microservice in a POST /v1/check request.
The microservice returns all corpus entries matching that prefix, along with their frequency buckets.
The platform checks the full hash locally — the microservice never sees the complete hash or the original cleartext.

This is identical to the Have I Been Pwned k-anonymity model. Even a compromised CredInt microservice cannot reconstruct the credential being checked.

The SHA-1 prefix stored in the audit log is also the 5-character lookup prefix, not the full hash.

Frequency-Weighted Risk Buckets

Each breach corpus entry carries a frequency bucket indicating how many times it has appeared across known breaches:

Bucket	Frequency threshold	Risk level
`critical`	Very high frequency (widely circulated breach)	Highest risk
`high`	High frequency (appeared in multiple major breaches)	High risk
`medium`	Moderate frequency	Medium risk
`low`	Infrequent (appeared in limited breach data)	Low risk

When a lookup returns hits across multiple frequency buckets (multiple candidates in the same request), CredInt selects the worst bucket (highest severity) to represent the overall request risk.

Routing Logic

CredInt’s routing decision depends on two inputs: the frequency bucket and the org’s DLP sensitivity setting.

Frequency bucket	DLP sensitivity	Action	Audit flag
`critical` or `high`	`high`	`soft_block` — request blocked	`true`
`critical` or `high`	`standard`	`pass` — elevated flag only	`true`
`medium` or `low`	any	`pass` — flag only	`true`
No hit	any	`pass` — no flag	`false`

soft_block means the AI request is blocked before reaching the model. The user sees a standard block message. The audit record captures credint_action: "soft_block".

Routing path is recorded in the audit log as credint_routing_path, which takes one of four values:

Path	Meaning
`credint_no_hit`	Credential not found in breach corpus
`credint_soft_block_high_sensitivity`	Critical/high breach hit + high sensitivity = block
`credint_elevated_flag_standard`	Critical/high breach hit + standard sensitivity = flag
`credint_medium_low_flag`	Medium/low breach hit = flag only

Confidence Scoring

Each CredInt audit record includes a credint_confidence score:

Score	Meaning
`1.0`	Strong hit — direct corpus match
`0.5`	Partial or heuristic match
`0.0`	No match

Circuit Breaker Behavior

The CredInt client implements a circuit breaker to protect against CredInt microservice unavailability:

State	Condition	Behavior
Closed	Normal operation	Requests proceed to CredInt
Open	3 consecutive failures	All CredInt calls bypassed immediately
Auto-reset	60 seconds after opening	Circuit attempts to close on next request

The circuit breaker uses monotonic time (not wall clock) for the 60-second reset window, making it immune to system clock changes.

Fail-Open Design

CredInt is designed fail-open: if the microservice is unreachable, times out, or returns an unexpected HTTP status, the platform:

Records credint_available: false in the audit log
Returns CredIntResult.unavailable() — the sentinel result that signals no check was performed
Does not block the AI request

This ensures CredInt microservice downtime never interrupts AI service delivery. The fail-open design is intentional — CredInt is a risk-flagging signal, not a hard gate.

credint_available: false in the audit log indicates that the check could not be completed. Monitor this field for CredInt microservice health.

Org Sensitivity Configuration

CredInt behavior scales with the org’s DLP sensitivity setting. The two sensitivity levels are:

# Standard (default) — flags critical/high hits but does not block
DLP_SENSITIVITY=standard

# High — blocks on critical/high frequency breach hits
DLP_SENSITIVITY=high

Configure via Admin → Organization Settings → DLP, or via the policy engine’s dlp_sensitivity parameter.

CredInt is disabled by default at the org level. Enable it explicitly:

# Via OrgDLPConfig (platform API)
curl -X PATCH https://your-platform/api/admin/org/dlp-config \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"credint_enabled": true}'

Audit Fields

Every request processed by CredInt (whether hit or miss) writes the following fields to the audit log:

Field	Type	Description
`credint_enabled`	`bool`	Whether CredInt was active for this request
`credint_hit`	`bool`	Whether a breach corpus match was found
`frequency_bucket`	`string`	Worst-case bucket: `critical`, `high`, `medium`, `low`, or `null`
`context_type`	`string`	L1 extraction context: `EXPLICIT_ASSIGNMENT`, `ENVIRONMENT_VARIABLE`, `AUTHORIZATION_HEADER`, `HIGH_ENTROPY_CODE`
`sha1_prefix`	`string`	First 8 hex characters of the matched SHA-1 hash (never full hash)
`credint_confidence`	`float`	Match confidence: 1.0, 0.5, or 0.0
`credint_available`	`bool`	Whether the CredInt microservice was reachable
`candidate_count`	`int`	Number of credential candidates extracted from the text

These fields are added as nullable columns in the audit_logs table (migration 041_credint_audit_fields). All six fields are null when CredInt is disabled.

Performance

CredInt is designed for sub-millisecond decision latency at the platform layer:

L1 extraction is synchronous CPU-bound work; typical runtime < 1 ms for prompt-sized inputs
CredInt microservice lookup is a single HTTP round-trip with a 3-second timeout; the circuit breaker ensures this never blocks indefinitely
Fan-out: multiple credential candidates in a single request are checked in parallel via asyncio.gather
Non-blocking: CredInt runs as a asyncio.create_task — the AI request proceeds immediately; CredInt completes asynchronously and writes to the audit log via callback

The on_complete callback in fire_credint_check() is used internally to write audit fields after the CredInt result arrives without blocking the streaming AI response.

Deployment

CredInt Microservice

The CredInt microservice is a separate deployment. It exposes:

POST /v1/check — accept {prefix: "5-char hex", candidates: [{sha1: "40-char hex"}]}, return frequency buckets
GET /health — liveness probe

In Kubernetes, deploy it as a sidecar or internal service and set CREDINT_SERVICE_URL on the platform deployment.

Air-Gap Deployments

In air-gap mode (OUTPOST_AIRGAP=true), the Outpost does not connect to the CredInt microservice. CredInt checks are skipped and credint_available: false is recorded. See Air-Gap Deployment for full details.

Monitoring

Monitor CredInt health via the audit log:

# Count credint unavailable events in the last hour
curl "https://your-platform/api/admin/audit-logs?credint_available=false&from=2026-03-13T00:00:00Z" \
  -H "Authorization: Bearer $ADMIN_TOKEN" | jq '.total'

Configure an alert on credint_available=false spikes to detect CredInt microservice outages — see Alert Configuration.

DLP Pipeline Configuration — full DLP rule pipeline including CredInt integration
Compliance Frameworks — regulatory framework bundles
Alert Configuration — alerting on DLP and CredInt events
Air-Gap Deployment — offline operation without CredInt connectivity