Skip to content

Data Retention & Archival

This guide covers the Arbitex AI Gateway data lifecycle management: tiered audit log retention, PostgreSQL table partitioning strategy, Azure Blob immutable storage for cold-tier archival, Redis cache TTL configuration, log rotation, and GDPR-compliant data deletion procedures.

Arbitex stores operational and compliance data across three tiers:

TierStorageLatencyRetentionUse Case
HotPostgreSQL (primary)< 10ms0–30 daysLive query, real-time audit search
WarmPostgreSQL (archive schema)< 100ms31–90 daysCompliance review, incident investigation
ColdAzure Blob (WORM)Minutes> 90 daysLegal hold, long-term compliance

Data transitions between tiers automatically via scheduled PostgreSQL jobs and the arbitex-archiver cron job.

Hot tier: 30 days (configurable: AUDIT_HOT_RETENTION_DAYS)
Warm tier: 60 days (configurable: AUDIT_WARM_RETENTION_DAYS)
Cold tier: 7 years (configurable: AUDIT_COLD_RETENTION_YEARS)

Configuring Retention via Environment Variables

Section titled “Configuring Retention via Environment Variables”

Set in your platform deployment:

Terminal window
# Platform environment (docker-compose or Kubernetes Secret)
AUDIT_HOT_RETENTION_DAYS=30
AUDIT_WARM_RETENTION_DAYS=90
AUDIT_COLD_RETENTION_YEARS=7
AUDIT_ARCHIVER_SCHEDULE="0 2 * * *" # 2 AM UTC daily
COLD_STORAGE_PROVIDER=azure_blob # azure_blob | s3 | gcs
COLD_STORAGE_CONTAINER=arbitex-audit-cold
COLD_STORAGE_ACCOUNT=yourauditstorage

For Kubernetes, store cold-storage credentials in a Secret:

apiVersion: v1
kind: Secret
metadata:
name: arbitex-cold-storage
namespace: arbitex
type: Opaque
stringData:
AZURE_STORAGE_ACCOUNT: "yourauditstorage"
AZURE_STORAGE_KEY: "your-storage-key"
AZURE_STORAGE_CONTAINER: "arbitex-audit-cold"

Query and update retention policies via the admin API:

Terminal window
# Get current retention policy
curl -H "Authorization: Bearer $ADMIN_TOKEN" \
https://api.arbitex.example.com/api/admin/retention/policy
# Response
{
"hot_retention_days": 30,
"warm_retention_days": 90,
"cold_retention_years": 7,
"archiver_schedule": "0 2 * * *",
"last_archival_run": "2026-03-12T02:00:00Z",
"last_archival_status": "success",
"rows_archived_last_run": 48293
}
# Update retention policy
curl -X PUT \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"hot_retention_days": 45,
"warm_retention_days": 90
}' \
https://api.arbitex.example.com/api/admin/retention/policy

The audit_logs table uses range partitioning on created_at to enable efficient data lifecycle management and partition pruning.

-- Master audit_logs table (partitioned)
CREATE TABLE audit_logs (
id UUID NOT NULL DEFAULT gen_random_uuid(),
created_at TIMESTAMPTZ NOT NULL,
org_id UUID NOT NULL,
user_id UUID,
action TEXT NOT NULL,
resource_type TEXT NOT NULL,
resource_id TEXT,
request_id UUID,
ip_address INET,
user_agent TEXT,
details JSONB,
dlp_triggered BOOLEAN DEFAULT FALSE,
severity TEXT DEFAULT 'info',
hmac_signature TEXT,
PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
-- Monthly partitions (auto-created by arbitex-archiver)
CREATE TABLE audit_logs_2026_01
PARTITION OF audit_logs
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE audit_logs_2026_02
PARTITION OF audit_logs
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
-- Indexes created per partition
CREATE INDEX idx_audit_logs_2026_01_org_id
ON audit_logs_2026_01 (org_id, created_at DESC);
CREATE INDEX idx_audit_logs_2026_01_user_id
ON audit_logs_2026_01 (user_id, created_at DESC);

The arbitex-archiver job creates next-month partitions on the 25th of each month:

# Partition creation (executed by archiver job)
def create_next_partition(conn, target_month: date):
partition_name = f"audit_logs_{target_month.strftime('%Y_%m')}"
start = target_month.replace(day=1)
end = (start + relativedelta(months=1))
conn.execute(f"""
CREATE TABLE IF NOT EXISTS {partition_name}
PARTITION OF audit_logs
FOR VALUES FROM ('{start}') TO ('{end}');
CREATE INDEX IF NOT EXISTS idx_{partition_name}_org_id
ON {partition_name} (org_id, created_at DESC);
CREATE INDEX IF NOT EXISTS idx_{partition_name}_user_id
ON {partition_name} (user_id, created_at DESC);
""")

Data older than AUDIT_HOT_RETENTION_DAYS moves to the archive schema:

-- Archive schema uses identical partition structure
CREATE TABLE archive.audit_logs (
LIKE public.audit_logs INCLUDING ALL
) PARTITION BY RANGE (created_at);
-- Move hot → warm (run nightly)
WITH moved AS (
DELETE FROM public.audit_logs
WHERE created_at < NOW() - INTERVAL '30 days'
RETURNING *
)
INSERT INTO archive.audit_logs SELECT * FROM moved;
-- Check partition sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS data_size
FROM pg_tables
WHERE tablename LIKE 'audit_logs_%'
ORDER BY tablename;
-- Check row counts per partition
SELECT
child.relname AS partition,
pg_stat_user_tables.n_live_tup AS live_rows
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
JOIN pg_stat_user_tables ON pg_stat_user_tables.relname = child.relname
WHERE parent.relname = 'audit_logs'
ORDER BY child.relname;
-- Vacuum a specific partition after large deletes
VACUUM ANALYZE public.audit_logs_2026_01;
-- Drop a partition (after cold archival is confirmed)
DROP TABLE public.audit_logs_2026_01;
-- Detach a partition for offline processing without dropping
ALTER TABLE public.audit_logs
DETACH PARTITION public.audit_logs_2026_01;

Data older than 90 days is exported to Azure Blob Storage with immutability policies (WORM — Write Once, Read Many) for compliance and legal hold purposes.

1. Create the storage account with immutable storage:

Terminal window
# Create resource group
az group create --name arbitex-compliance --location eastus
# Create storage account with versioning enabled
az storage account create \
--name yourauditstorage \
--resource-group arbitex-compliance \
--location eastus \
--sku Standard_GRS \
--kind StorageV2 \
--enable-hierarchical-namespace false \
--allow-blob-public-access false \
--min-tls-version TLS1_2
# Create the audit container
az storage container create \
--name arbitex-audit-cold \
--account-name yourauditstorage \
--auth-mode login
# Enable versioning on the account
az storage account blob-service-properties update \
--account-name yourauditstorage \
--enable-versioning true \
--enable-delete-retention true \
--delete-retention-days 90

2. Set container-level immutability policy:

Terminal window
# Create time-based retention policy (7 years = 2557 days)
az storage container immutability-policy create \
--account-name yourauditstorage \
--container-name arbitex-audit-cold \
--period 2557
# Lock the policy (irreversible — prevents deletion)
# Only do this after testing; locked policies cannot be shortened
az storage container immutability-policy lock \
--account-name yourauditstorage \
--container-name arbitex-audit-cold \
--if-match $(az storage container immutability-policy show \
--account-name yourauditstorage \
--container-name arbitex-audit-cold \
--query etag -o tsv)

Each cold-tier archival file is a NDJSON (newline-delimited JSON) export, gzip-compressed, with an HMAC-SHA256 manifest:

arbitex-audit-cold/
├── 2026/
│ ├── 01/
│ │ ├── audit_logs_2026_01_001.ndjson.gz
│ │ ├── audit_logs_2026_01_002.ndjson.gz
│ │ └── MANIFEST.json
│ └── 02/
│ ├── audit_logs_2026_02_001.ndjson.gz
│ └── MANIFEST.json

Manifest format:

{
"period": "2026-01",
"exported_at": "2026-02-01T02:15:33Z",
"total_rows": 1294847,
"files": [
{
"filename": "audit_logs_2026_01_001.ndjson.gz",
"sha256": "e3b0c44298fc1c149afb...",
"rows": 500000,
"size_bytes": 104857600
}
],
"schema_version": "2",
"hmac_signature": "sha256=abc123..."
}
Terminal window
# Trigger an immediate cold archival run
curl -X POST \
-H "Authorization: Bearer $ADMIN_TOKEN" \
https://api.arbitex.example.com/api/admin/retention/archive-now
# Check archival status
curl -H "Authorization: Bearer $ADMIN_TOKEN" \
https://api.arbitex.example.com/api/admin/retention/archive-status
# Response
{
"status": "running",
"started_at": "2026-03-12T10:00:00Z",
"rows_processed": 12500,
"current_partition": "audit_logs_2025_12",
"estimated_completion": "2026-03-12T10:45:00Z"
}
import hashlib, hmac, gzip, json
from azure.storage.blob import BlobServiceClient
def verify_archive_manifest(account_url: str, container: str, period: str, hmac_key: bytes):
client = BlobServiceClient(account_url=account_url)
container_client = client.get_container_client(container)
year, month = period.split("-")
manifest_path = f"{year}/{month}/MANIFEST.json"
manifest_data = container_client.download_blob(manifest_path).readall()
manifest = json.loads(manifest_data)
# Verify HMAC signature
payload = json.dumps({k: v for k, v in manifest.items() if k != "hmac_signature"}, sort_keys=True)
expected_sig = "sha256=" + hmac.new(hmac_key, payload.encode(), hashlib.sha256).hexdigest()
if not hmac.compare_digest(manifest["hmac_signature"], expected_sig):
raise ValueError("Manifest HMAC signature mismatch — archive integrity compromised")
# Verify each file checksum
for file_entry in manifest["files"]:
blob_data = container_client.download_blob(f"{year}/{month}/{file_entry['filename']}").readall()
actual_sha256 = hashlib.sha256(blob_data).hexdigest()
if actual_sha256 != file_entry["sha256"]:
raise ValueError(f"Checksum mismatch for {file_entry['filename']}")
print(f"Archive {period}: integrity verified, {manifest['total_rows']} rows across {len(manifest['files'])} files")

Arbitex uses Redis for session storage, rate-limit counters, DLP cache, and MFA state. Default TTLs are tunable via environment variables.

Cache Key PatternDefault TTLEnvironment VariableDescription
session:{token}24hSESSION_TTL_SECONDS=86400User session tokens
ratelimit:{key}60sRATELIMIT_WINDOW_SECONDS=60Rate limit windows
dlp_cache:{hash}5mDLP_CACHE_TTL_SECONDS=300DLP result cache (deduplicate)
mfa_challenge:{id}10mMFA_CHALLENGE_TTL_SECONDS=600Pending MFA challenges
mfa_verified:{token}1hMFA_VERIFIED_TTL_SECONDS=3600MFA step-up assertion
budget_cache:{group}5mBUDGET_CACHE_TTL_SECONDS=300Budget counter cache
provider_health:{id}30sPROVIDER_HEALTH_TTL_SECONDS=30Provider health cache
Terminal window
# In .env or Kubernetes ConfigMap
SESSION_TTL_SECONDS=86400
DLP_CACHE_TTL_SECONDS=300
MFA_CHALLENGE_TTL_SECONDS=600
MFA_VERIFIED_TTL_SECONDS=3600
BUDGET_CACHE_TTL_SECONDS=300
Terminal window
# Monitor key expiry and memory usage
redis-cli INFO keyspace
redis-cli INFO memory
# List TTLs for active sessions (admin debugging)
redis-cli --scan --pattern "session:*" | head -20 | xargs -I{} redis-cli TTL {}
# Flush DLP cache if stale entries suspected
redis-cli --scan --pattern "dlp_cache:*" | xargs redis-cli DEL

Configure Redis maxmemory policy to evict volatile keys when memory pressure occurs:

redis.conf
maxmemory 2gb
maxmemory-policy volatile-lru # Evict keys with TTLs set, LRU order

Platform containers write structured JSON logs to /var/log/arbitex/. Configure logrotate for VM-based deployments:

/etc/logrotate.d/arbitex
/var/log/arbitex/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
copytruncate
postrotate
# Signal platform to reopen log files
kill -USR1 $(cat /var/run/arbitex/platform.pid) 2>/dev/null || true
endscript
}

For Kubernetes deployments, use a log aggregation sidecar (Fluent Bit or Vector) instead of logrotate:

# values.yaml — enable log aggregation sidecar
platform:
logAggregation:
enabled: true
sidecar: fluent-bit # fluent-bit | vector | fluentd
destination: loki # loki | elasticsearch | splunk
lokiEndpoint: http://loki:3100
/etc/logrotate.d/nginx
/var/log/nginx/*.log {
daily
rotate 14
compress
delaycompress
missingok
notifempty
sharedscripts
postrotate
nginx -s reopen 2>/dev/null || true
endscript
}

For Kubernetes, configure pod log rotation at the kubelet level:

# kubelet configuration
containerLogMaxSize: "100Mi"
containerLogMaxFiles: 5

When a user invokes their right to erasure, the platform must delete or anonymize all personal data. The deletion procedure is coordinated through the admin API.

Terminal window
# Submit a GDPR deletion request for a user
curl -X POST \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"user_id": "usr_01HXYZ...",
"reason": "gdpr_erasure",
"requested_by": "privacy@example.com",
"reference": "GDPR-2026-042"
}' \
https://api.arbitex.example.com/api/admin/gdpr/deletion-requests
# Response
{
"request_id": "gdpr_del_01HXYZ...",
"user_id": "usr_01HXYZ...",
"status": "queued",
"estimated_completion": "2026-03-13T02:00:00Z",
"scope": {
"audit_logs": "anonymized",
"sessions": "deleted",
"api_keys": "deleted",
"preferences": "deleted",
"cold_tier": "legal_hold_check"
}
}
Data TypeHot Tier ActionWarm Tier ActionCold Tier Action
User sessionsDelete immediately
API keysDelete + revoke
User preferencesDeleteDeleteAnonymize
Audit log entriesAnonymize (pseudonymize)AnonymizeLegal hold check
DLP scan resultsDeleteAnonymizeAnonymize
Usage statisticsAggregate (remove user ref)AggregateRetain (no PII)

Audit log anonymization replaces PII fields with anonymized tokens:

{
"user_id": "GDPR_DELETED_01HXYZ",
"ip_address": "0.0.0.0",
"user_agent": "GDPR_DELETED",
"details": {"gdpr_anonymized": true, "original_user_hash": "sha256:abc123..."}
}
Terminal window
# Poll deletion request status
curl -H "Authorization: Bearer $ADMIN_TOKEN" \
https://api.arbitex.example.com/api/admin/gdpr/deletion-requests/gdpr_del_01HXYZ
# Completed response
{
"request_id": "gdpr_del_01HXYZ...",
"status": "completed",
"completed_at": "2026-03-13T02:15:00Z",
"actions_taken": {
"sessions_deleted": 3,
"api_keys_revoked": 2,
"audit_log_rows_anonymized": 14829,
"cold_tier_status": "legal_hold_exempt_archived"
}
}

Cold-tier data under legal hold is exempt from erasure. Declare a hold before processing erasure requests:

Terminal window
# Place a legal hold on a user's cold-tier data
curl -X POST \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"user_id": "usr_01HXYZ...",
"reason": "litigation_hold",
"reference": "LEGAL-2026-007",
"expires": null
}' \
https://api.arbitex.example.com/api/admin/gdpr/legal-holds

To export all data for a user (GDPR Article 15):

Terminal window
# Request DSAR export
curl -X POST \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"user_id": "usr_01HXYZ...", "format": "json"}' \
https://api.arbitex.example.com/api/admin/gdpr/dsar-exports
# Download when ready (signed URL, expires in 24h)
{
"export_id": "dsar_01HXYZ...",
"status": "completed",
"download_url": "https://...",
"expires_at": "2026-03-13T10:00:00Z"
}
  • Confirm archival job ran successfully (/api/admin/retention/archive-status)
  • Verify manifest HMAC signature for previous month’s cold archive
  • Check partition sizes for growth trends
  • Review and drop partitions that have been fully archived to cold tier
  • Review retention periods against current compliance requirements
  • Audit active legal holds — release expired holds
  • Test GDPR deletion procedure on staging environment
  • Verify cold-tier immutability policy has not been modified
  • Check Azure Blob replication status (GRS)