Skip to content

Credential Intelligence

Credential Intelligence (CredInt) is an optional DLP subsystem that detects known-compromised credentials in AI prompts and responses. When a user pastes a password, API key, or bearer token into an AI request, CredInt checks whether that credential has appeared in a known breach corpus and routes the request based on its breach frequency.

CredInt operates as a non-blocking side-channel: it never delays AI responses. Results arrive asynchronously and are written to the audit log.


L1 Extraction — Credential Candidate Detection

Section titled “L1 Extraction — Credential Candidate Detection”

Before any network call is made, a synchronous L1 extractor (CredentialExtractor) scans the request text for credential candidates using four pattern families:

Pattern familyExample matches
Explicit assignmentpassword=s3cr3t, api_key=abc123, secret=VALUE
Environment variableDB_PASSWORD=mypassword, AWS_SECRET_ACCESS_KEY=...
Authorization headerAuthorization: Bearer TOKEN, X-API-Key: VALUE
High-entropy tokenQuoted or bare tokens with Shannon entropy > 3.5, 2+ character classes

The extractor deduplicates candidates by value and sorts them by position in the text. Candidate cleartext is never logged at any point in the pipeline — only the SHA-1 prefix (first 8 hex chars) appears in audit records.

High-entropy token detection uses:

  • Shannon entropy threshold: > 3.5 bits/character
  • Minimum character class requirement: 2+ classes (letters, digits, symbols)
  • False-positive filters: UUIDs, URLs, and common English words are excluded

Breach Corpus — 861M+ Known Compromised Credentials

Section titled “Breach Corpus — 861M+ Known Compromised Credentials”

Arbitex integrates with a separate CredInt microservice (credint:8202) that maintains the breach corpus. The corpus contains SHA-1 hashes of over 861 million credentials drawn from known public breach datasets. The corpus is frequency-weighted: each hash carries a bucket label indicating how many times the credential has appeared across breaches.

The microservice is not part of the platform container image — it is a separately deployed service. Configuration:

Terminal window
CREDINT_SERVICE_URL=http://credint:8202 # default
CREDINT_SERVICE_TIMEOUT=3.0 # seconds

CredInt uses a k-anonymity protocol to protect credential privacy during lookup:

  1. The platform computes the SHA-1 hash of each credential candidate locally.
  2. Only the 5-character hex prefix (20 bits) is sent to the CredInt microservice in a POST /v1/check request.
  3. The microservice returns all corpus entries matching that prefix, along with their frequency buckets.
  4. The platform checks the full hash locally — the microservice never sees the complete hash or the original cleartext.

This is identical to the Have I Been Pwned k-anonymity model. Even a compromised CredInt microservice cannot reconstruct the credential being checked.

The SHA-1 prefix stored in the audit log is also the 5-character lookup prefix, not the full hash.

Each breach corpus entry carries a frequency bucket indicating how many times it has appeared across known breaches:

BucketFrequency thresholdRisk level
criticalVery high frequency (widely circulated breach)Highest risk
highHigh frequency (appeared in multiple major breaches)High risk
mediumModerate frequencyMedium risk
lowInfrequent (appeared in limited breach data)Low risk

When a lookup returns hits across multiple frequency buckets (multiple candidates in the same request), CredInt selects the worst bucket (highest severity) to represent the overall request risk.


CredInt’s routing decision depends on two inputs: the frequency bucket and the org’s DLP sensitivity setting.

Frequency bucketDLP sensitivityActionAudit flag
critical or highhighsoft_block — request blockedtrue
critical or highstandardpass — elevated flag onlytrue
medium or lowanypass — flag onlytrue
No hitanypass — no flagfalse

soft_block means the AI request is blocked before reaching the model. The user sees a standard block message. The audit record captures credint_action: "soft_block".

Routing path is recorded in the audit log as credint_routing_path, which takes one of four values:

PathMeaning
credint_no_hitCredential not found in breach corpus
credint_soft_block_high_sensitivityCritical/high breach hit + high sensitivity = block
credint_elevated_flag_standardCritical/high breach hit + standard sensitivity = flag
credint_medium_low_flagMedium/low breach hit = flag only

Each CredInt audit record includes a credint_confidence score:

ScoreMeaning
1.0Strong hit — direct corpus match
0.5Partial or heuristic match
0.0No match

The CredInt client implements a circuit breaker to protect against CredInt microservice unavailability:

StateConditionBehavior
ClosedNormal operationRequests proceed to CredInt
Open3 consecutive failuresAll CredInt calls bypassed immediately
Auto-reset60 seconds after openingCircuit attempts to close on next request

The circuit breaker uses monotonic time (not wall clock) for the 60-second reset window, making it immune to system clock changes.

CredInt is designed fail-open: if the microservice is unreachable, times out, or returns an unexpected HTTP status, the platform:

  1. Records credint_available: false in the audit log
  2. Returns CredIntResult.unavailable() — the sentinel result that signals no check was performed
  3. Does not block the AI request

This ensures CredInt microservice downtime never interrupts AI service delivery. The fail-open design is intentional — CredInt is a risk-flagging signal, not a hard gate.

credint_available: false in the audit log indicates that the check could not be completed. Monitor this field for CredInt microservice health.


CredInt behavior scales with the org’s DLP sensitivity setting. The two sensitivity levels are:

Terminal window
# Standard (default) — flags critical/high hits but does not block
DLP_SENSITIVITY=standard
# High — blocks on critical/high frequency breach hits
DLP_SENSITIVITY=high

Configure via Admin → Organization Settings → DLP, or via the policy engine’s dlp_sensitivity parameter.

CredInt is disabled by default at the org level. Enable it explicitly:

Terminal window
# Via OrgDLPConfig (platform API)
curl -X PATCH https://your-platform/api/admin/org/dlp-config \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"credint_enabled": true}'

Every request processed by CredInt (whether hit or miss) writes the following fields to the audit log:

FieldTypeDescription
credint_enabledboolWhether CredInt was active for this request
credint_hitboolWhether a breach corpus match was found
frequency_bucketstringWorst-case bucket: critical, high, medium, low, or null
context_typestringL1 extraction context: EXPLICIT_ASSIGNMENT, ENVIRONMENT_VARIABLE, AUTHORIZATION_HEADER, HIGH_ENTROPY_CODE
sha1_prefixstringFirst 8 hex characters of the matched SHA-1 hash (never full hash)
credint_confidencefloatMatch confidence: 1.0, 0.5, or 0.0
credint_availableboolWhether the CredInt microservice was reachable
candidate_countintNumber of credential candidates extracted from the text

These fields are added as nullable columns in the audit_logs table (migration 041_credint_audit_fields). All six fields are null when CredInt is disabled.


CredInt is designed for sub-millisecond decision latency at the platform layer:

  • L1 extraction is synchronous CPU-bound work; typical runtime < 1 ms for prompt-sized inputs
  • CredInt microservice lookup is a single HTTP round-trip with a 3-second timeout; the circuit breaker ensures this never blocks indefinitely
  • Fan-out: multiple credential candidates in a single request are checked in parallel via asyncio.gather
  • Non-blocking: CredInt runs as a asyncio.create_task — the AI request proceeds immediately; CredInt completes asynchronously and writes to the audit log via callback

The on_complete callback in fire_credint_check() is used internally to write audit fields after the CredInt result arrives without blocking the streaming AI response.


The CredInt microservice is a separate deployment. It exposes:

  • POST /v1/check — accept {prefix: "5-char hex", candidates: [{sha1: "40-char hex"}]}, return frequency buckets
  • GET /health — liveness probe

In Kubernetes, deploy it as a sidecar or internal service and set CREDINT_SERVICE_URL on the platform deployment.

In air-gap mode (OUTPOST_AIRGAP=true), the Outpost does not connect to the CredInt microservice. CredInt checks are skipped and credint_available: false is recorded. See Air-Gap Deployment for full details.


Monitor CredInt health via the audit log:

Terminal window
# Count credint unavailable events in the last hour
curl "https://your-platform/api/admin/audit-logs?credint_available=false&from=2026-03-13T00:00:00Z" \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq '.total'

Configure an alert on credint_available=false spikes to detect CredInt microservice outages — see Alert Configuration.