Skip to content

ADR-006: Bloom Filter + k-Anonymity for Credential Scanning

ADR-006: Bloom Filter + k-Anonymity for Credential Scanning

Section titled “ADR-006: Bloom Filter + k-Anonymity for Credential Scanning”

Status: Accepted Date: 2026-03 Deciders: Outpost team (outpost-0021, T77-T80)


Arbitex Credential Intelligence (CredInt) scans AI request and response content for known-compromised credentials — API keys, passwords, and secrets that have appeared in public breaches. This is Tier 4 of the four-tier DLP pipeline.

The core challenge is matching against a large corpus of known-compromised credentials (hundreds of millions of entries) efficiently on an embedded outpost with limited memory, without sending the candidate credentials to an external service for exact-match lookup (which would itself be a data exfiltration risk).

Three approaches were evaluated:

  1. Exact hash lookup (SHA-256) against a local database: Requires gigabytes of storage per outpost; too large for embedded deployments.
  2. Direct API lookup against a breach service (e.g. Have I Been Pwned): Sends full hashes or raw values to a third party on every scan; unacceptable for air-gapped and privacy-sensitive deployments.
  3. Probabilistic bloom filter (local) + k-anonymity partial-hash API (optional): Local bloom filter provides fast, space-efficient membership testing with a controlled false positive rate. k-anonymity provides exact confirmation for bloom filter hits without exposing the full credential.

Tier 4 CredInt uses a two-stage approach:

Stage 1 — Local bloom filter:

  • A compact binary bloom filter file (.bf) is distributed by Arbitex and mounted on the outpost (CREDINT_BLOOM_PATH).
  • The filter contains HMAC hashes of known-compromised credentials.
  • At scan time, the candidate credential is hashed and checked against the filter. False positives are possible (configurable FPR, ~0.1% at the distributed filter size); false negatives are not.
  • No network request is made for the bloom filter check.
  • In air-gapped deployments, the filter is updated by manual file replacement. In connected deployments, CDN-based automatic refresh is available (CREDINT_CDN_URL).

Stage 2 — k-Anonymity partial-hash API (optional):

  • When CREDINT_KANON_ENABLED=true, bloom filter hits are escalated to a k-anonymity API (default: Have I Been Pwned range endpoint at api.pwnedpasswords.com/range).
  • The first 5 characters of the SHA-1 hash of the credential are sent to the API. The API returns all hashes with that prefix; the outpost checks the full hash against the list locally.
  • The plain-text credential or full hash is never sent to the API — only the 5-character prefix.
  • k-Anonymity confirmation eliminates bloom filter false positives, converting “possible hit” to “confirmed hit” or “false positive”.

Positive:

  • Bloom filter check is O(1) and sub-millisecond on embedded hardware — no latency impact on request scanning.
  • Filter file is compact (tens of MB for hundreds of millions of entries at 0.1% FPR).
  • Air-gap compatible: bloom filter check requires no network access.
  • k-Anonymity API never sees full credentials — privacy-preserving by design.
  • FPR is bounded and disclosed in filter metadata; operators can tune acceptable false positive rates.

Negative / trade-offs:

  • False positives exist in the bloom filter. Without k-anonymity confirmation enabled, some legitimate credentials will be flagged. Production deployments with strict accuracy requirements should enable CREDINT_KANON_ENABLED=true.
  • The bloom filter requires periodic refresh to include newly discovered compromised credentials. Stale filters miss recent breaches.
  • k-Anonymity API availability (Have I Been Pwned or custom endpoint) introduces a network dependency for confirmed hit detection when enabled.
  • Air-gapped deployments cannot use k-anonymity confirmation or CDN refresh — manual operational procedures are required.

Configuration:

  • CREDINT_BLOOM_PATH — path to the bloom filter binary.
  • CREDINT_CDN_URL — CDN URL for automatic filter refresh (empty = air-gap mode).
  • CREDINT_KANON_ENABLED — enable k-anonymity confirmation stage.
  • CREDINT_KANON_URL — k-anonymity API base URL (default: https://api.pwnedpasswords.com/range).