Data Retention & Archival
Data Retention & Archival
Section titled “Data Retention & Archival”This guide covers the Arbitex AI Gateway data lifecycle management: tiered audit log retention, PostgreSQL table partitioning strategy, Azure Blob immutable storage for cold-tier archival, Redis cache TTL configuration, log rotation, and GDPR-compliant data deletion procedures.
Retention Architecture Overview
Section titled “Retention Architecture Overview”Arbitex stores operational and compliance data across three tiers:
| Tier | Storage | Latency | Retention | Use Case |
|---|---|---|---|---|
| Hot | PostgreSQL (primary) | < 10ms | 0–30 days | Live query, real-time audit search |
| Warm | PostgreSQL (archive schema) | < 100ms | 31–90 days | Compliance review, incident investigation |
| Cold | Azure Blob (WORM) | Minutes | > 90 days | Legal hold, long-term compliance |
Data transitions between tiers automatically via scheduled PostgreSQL jobs and the arbitex-archiver cron job.
Audit Log Retention Policies
Section titled “Audit Log Retention Policies”Default Retention Schedule
Section titled “Default Retention Schedule”Hot tier: 30 days (configurable: AUDIT_HOT_RETENTION_DAYS)Warm tier: 60 days (configurable: AUDIT_WARM_RETENTION_DAYS)Cold tier: 7 years (configurable: AUDIT_COLD_RETENTION_YEARS)Configuring Retention via Environment Variables
Section titled “Configuring Retention via Environment Variables”Set in your platform deployment:
# Platform environment (docker-compose or Kubernetes Secret)AUDIT_HOT_RETENTION_DAYS=30AUDIT_WARM_RETENTION_DAYS=90AUDIT_COLD_RETENTION_YEARS=7AUDIT_ARCHIVER_SCHEDULE="0 2 * * *" # 2 AM UTC dailyCOLD_STORAGE_PROVIDER=azure_blob # azure_blob | s3 | gcsCOLD_STORAGE_CONTAINER=arbitex-audit-coldCOLD_STORAGE_ACCOUNT=yourauditstorageFor Kubernetes, store cold-storage credentials in a Secret:
apiVersion: v1kind: Secretmetadata: name: arbitex-cold-storage namespace: arbitextype: OpaquestringData: AZURE_STORAGE_ACCOUNT: "yourauditstorage" AZURE_STORAGE_KEY: "your-storage-key" AZURE_STORAGE_CONTAINER: "arbitex-audit-cold"Retention Policy API
Section titled “Retention Policy API”Query and update retention policies via the admin API:
# Get current retention policycurl -H "Authorization: Bearer $ADMIN_TOKEN" \ https://api.arbitex.example.com/api/admin/retention/policy
# Response{ "hot_retention_days": 30, "warm_retention_days": 90, "cold_retention_years": 7, "archiver_schedule": "0 2 * * *", "last_archival_run": "2026-03-12T02:00:00Z", "last_archival_status": "success", "rows_archived_last_run": 48293}
# Update retention policycurl -X PUT \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "hot_retention_days": 45, "warm_retention_days": 90 }' \ https://api.arbitex.example.com/api/admin/retention/policyPostgreSQL Table Partitioning
Section titled “PostgreSQL Table Partitioning”The audit_logs table uses range partitioning on created_at to enable efficient data lifecycle management and partition pruning.
Partition Schema
Section titled “Partition Schema”-- Master audit_logs table (partitioned)CREATE TABLE audit_logs ( id UUID NOT NULL DEFAULT gen_random_uuid(), created_at TIMESTAMPTZ NOT NULL, org_id UUID NOT NULL, user_id UUID, action TEXT NOT NULL, resource_type TEXT NOT NULL, resource_id TEXT, request_id UUID, ip_address INET, user_agent TEXT, details JSONB, dlp_triggered BOOLEAN DEFAULT FALSE, severity TEXT DEFAULT 'info', hmac_signature TEXT, PRIMARY KEY (id, created_at)) PARTITION BY RANGE (created_at);
-- Monthly partitions (auto-created by arbitex-archiver)CREATE TABLE audit_logs_2026_01 PARTITION OF audit_logs FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE audit_logs_2026_02 PARTITION OF audit_logs FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
-- Indexes created per partitionCREATE INDEX idx_audit_logs_2026_01_org_id ON audit_logs_2026_01 (org_id, created_at DESC);CREATE INDEX idx_audit_logs_2026_01_user_id ON audit_logs_2026_01 (user_id, created_at DESC);Automatic Partition Management
Section titled “Automatic Partition Management”The arbitex-archiver job creates next-month partitions on the 25th of each month:
# Partition creation (executed by archiver job)def create_next_partition(conn, target_month: date): partition_name = f"audit_logs_{target_month.strftime('%Y_%m')}" start = target_month.replace(day=1) end = (start + relativedelta(months=1))
conn.execute(f""" CREATE TABLE IF NOT EXISTS {partition_name} PARTITION OF audit_logs FOR VALUES FROM ('{start}') TO ('{end}');
CREATE INDEX IF NOT EXISTS idx_{partition_name}_org_id ON {partition_name} (org_id, created_at DESC); CREATE INDEX IF NOT EXISTS idx_{partition_name}_user_id ON {partition_name} (user_id, created_at DESC); """)Warm Tier (Archive Schema)
Section titled “Warm Tier (Archive Schema)”Data older than AUDIT_HOT_RETENTION_DAYS moves to the archive schema:
-- Archive schema uses identical partition structureCREATE TABLE archive.audit_logs ( LIKE public.audit_logs INCLUDING ALL) PARTITION BY RANGE (created_at);
-- Move hot → warm (run nightly)WITH moved AS ( DELETE FROM public.audit_logs WHERE created_at < NOW() - INTERVAL '30 days' RETURNING *)INSERT INTO archive.audit_logs SELECT * FROM moved;Monitoring Partition Health
Section titled “Monitoring Partition Health”-- Check partition sizesSELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size, pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS data_sizeFROM pg_tablesWHERE tablename LIKE 'audit_logs_%'ORDER BY tablename;
-- Check row counts per partitionSELECT child.relname AS partition, pg_stat_user_tables.n_live_tup AS live_rowsFROM pg_inheritsJOIN pg_class parent ON pg_inherits.inhparent = parent.oidJOIN pg_class child ON pg_inherits.inhrelid = child.oidJOIN pg_stat_user_tables ON pg_stat_user_tables.relname = child.relnameWHERE parent.relname = 'audit_logs'ORDER BY child.relname;Maintenance Tasks
Section titled “Maintenance Tasks”-- Vacuum a specific partition after large deletesVACUUM ANALYZE public.audit_logs_2026_01;
-- Drop a partition (after cold archival is confirmed)DROP TABLE public.audit_logs_2026_01;
-- Detach a partition for offline processing without droppingALTER TABLE public.audit_logs DETACH PARTITION public.audit_logs_2026_01;Azure Blob WORM (Cold Tier Archival)
Section titled “Azure Blob WORM (Cold Tier Archival)”Data older than 90 days is exported to Azure Blob Storage with immutability policies (WORM — Write Once, Read Many) for compliance and legal hold purposes.
Azure Blob Setup
Section titled “Azure Blob Setup”1. Create the storage account with immutable storage:
# Create resource groupaz group create --name arbitex-compliance --location eastus
# Create storage account with versioning enabledaz storage account create \ --name yourauditstorage \ --resource-group arbitex-compliance \ --location eastus \ --sku Standard_GRS \ --kind StorageV2 \ --enable-hierarchical-namespace false \ --allow-blob-public-access false \ --min-tls-version TLS1_2
# Create the audit containeraz storage container create \ --name arbitex-audit-cold \ --account-name yourauditstorage \ --auth-mode login
# Enable versioning on the accountaz storage account blob-service-properties update \ --account-name yourauditstorage \ --enable-versioning true \ --enable-delete-retention true \ --delete-retention-days 902. Set container-level immutability policy:
# Create time-based retention policy (7 years = 2557 days)az storage container immutability-policy create \ --account-name yourauditstorage \ --container-name arbitex-audit-cold \ --period 2557
# Lock the policy (irreversible — prevents deletion)# Only do this after testing; locked policies cannot be shortenedaz storage container immutability-policy lock \ --account-name yourauditstorage \ --container-name arbitex-audit-cold \ --if-match $(az storage container immutability-policy show \ --account-name yourauditstorage \ --container-name arbitex-audit-cold \ --query etag -o tsv)Archival File Format
Section titled “Archival File Format”Each cold-tier archival file is a NDJSON (newline-delimited JSON) export, gzip-compressed, with an HMAC-SHA256 manifest:
arbitex-audit-cold/├── 2026/│ ├── 01/│ │ ├── audit_logs_2026_01_001.ndjson.gz│ │ ├── audit_logs_2026_01_002.ndjson.gz│ │ └── MANIFEST.json│ └── 02/│ ├── audit_logs_2026_02_001.ndjson.gz│ └── MANIFEST.jsonManifest format:
{ "period": "2026-01", "exported_at": "2026-02-01T02:15:33Z", "total_rows": 1294847, "files": [ { "filename": "audit_logs_2026_01_001.ndjson.gz", "sha256": "e3b0c44298fc1c149afb...", "rows": 500000, "size_bytes": 104857600 } ], "schema_version": "2", "hmac_signature": "sha256=abc123..."}Running Cold Archival Manually
Section titled “Running Cold Archival Manually”# Trigger an immediate cold archival runcurl -X POST \ -H "Authorization: Bearer $ADMIN_TOKEN" \ https://api.arbitex.example.com/api/admin/retention/archive-now
# Check archival statuscurl -H "Authorization: Bearer $ADMIN_TOKEN" \ https://api.arbitex.example.com/api/admin/retention/archive-status
# Response{ "status": "running", "started_at": "2026-03-12T10:00:00Z", "rows_processed": 12500, "current_partition": "audit_logs_2025_12", "estimated_completion": "2026-03-12T10:45:00Z"}Verifying Cold Archive Integrity
Section titled “Verifying Cold Archive Integrity”import hashlib, hmac, gzip, jsonfrom azure.storage.blob import BlobServiceClient
def verify_archive_manifest(account_url: str, container: str, period: str, hmac_key: bytes): client = BlobServiceClient(account_url=account_url) container_client = client.get_container_client(container)
year, month = period.split("-") manifest_path = f"{year}/{month}/MANIFEST.json"
manifest_data = container_client.download_blob(manifest_path).readall() manifest = json.loads(manifest_data)
# Verify HMAC signature payload = json.dumps({k: v for k, v in manifest.items() if k != "hmac_signature"}, sort_keys=True) expected_sig = "sha256=" + hmac.new(hmac_key, payload.encode(), hashlib.sha256).hexdigest()
if not hmac.compare_digest(manifest["hmac_signature"], expected_sig): raise ValueError("Manifest HMAC signature mismatch — archive integrity compromised")
# Verify each file checksum for file_entry in manifest["files"]: blob_data = container_client.download_blob(f"{year}/{month}/{file_entry['filename']}").readall() actual_sha256 = hashlib.sha256(blob_data).hexdigest() if actual_sha256 != file_entry["sha256"]: raise ValueError(f"Checksum mismatch for {file_entry['filename']}")
print(f"Archive {period}: integrity verified, {manifest['total_rows']} rows across {len(manifest['files'])} files")Redis Cache TTLs
Section titled “Redis Cache TTLs”Arbitex uses Redis for session storage, rate-limit counters, DLP cache, and MFA state. Default TTLs are tunable via environment variables.
TTL Configuration Reference
Section titled “TTL Configuration Reference”| Cache Key Pattern | Default TTL | Environment Variable | Description |
|---|---|---|---|
session:{token} | 24h | SESSION_TTL_SECONDS=86400 | User session tokens |
ratelimit:{key} | 60s | RATELIMIT_WINDOW_SECONDS=60 | Rate limit windows |
dlp_cache:{hash} | 5m | DLP_CACHE_TTL_SECONDS=300 | DLP result cache (deduplicate) |
mfa_challenge:{id} | 10m | MFA_CHALLENGE_TTL_SECONDS=600 | Pending MFA challenges |
mfa_verified:{token} | 1h | MFA_VERIFIED_TTL_SECONDS=3600 | MFA step-up assertion |
budget_cache:{group} | 5m | BUDGET_CACHE_TTL_SECONDS=300 | Budget counter cache |
provider_health:{id} | 30s | PROVIDER_HEALTH_TTL_SECONDS=30 | Provider health cache |
Setting TTLs in Deployment
Section titled “Setting TTLs in Deployment”# In .env or Kubernetes ConfigMapSESSION_TTL_SECONDS=86400DLP_CACHE_TTL_SECONDS=300MFA_CHALLENGE_TTL_SECONDS=600MFA_VERIFIED_TTL_SECONDS=3600BUDGET_CACHE_TTL_SECONDS=300Redis Memory Management
Section titled “Redis Memory Management”# Monitor key expiry and memory usageredis-cli INFO keyspaceredis-cli INFO memory
# List TTLs for active sessions (admin debugging)redis-cli --scan --pattern "session:*" | head -20 | xargs -I{} redis-cli TTL {}
# Flush DLP cache if stale entries suspectedredis-cli --scan --pattern "dlp_cache:*" | xargs redis-cli DELConfigure Redis maxmemory policy to evict volatile keys when memory pressure occurs:
maxmemory 2gbmaxmemory-policy volatile-lru # Evict keys with TTLs set, LRU orderLog Rotation
Section titled “Log Rotation”Application Log Rotation (logrotate)
Section titled “Application Log Rotation (logrotate)”Platform containers write structured JSON logs to /var/log/arbitex/. Configure logrotate for VM-based deployments:
/var/log/arbitex/*.log { daily rotate 7 compress delaycompress missingok notifempty copytruncate postrotate # Signal platform to reopen log files kill -USR1 $(cat /var/run/arbitex/platform.pid) 2>/dev/null || true endscript}For Kubernetes deployments, use a log aggregation sidecar (Fluent Bit or Vector) instead of logrotate:
# values.yaml — enable log aggregation sidecarplatform: logAggregation: enabled: true sidecar: fluent-bit # fluent-bit | vector | fluentd destination: loki # loki | elasticsearch | splunk lokiEndpoint: http://loki:3100Nginx/Ingress Access Log Rotation
Section titled “Nginx/Ingress Access Log Rotation”/var/log/nginx/*.log { daily rotate 14 compress delaycompress missingok notifempty sharedscripts postrotate nginx -s reopen 2>/dev/null || true endscript}Kubernetes Log Management
Section titled “Kubernetes Log Management”For Kubernetes, configure pod log rotation at the kubelet level:
# kubelet configurationcontainerLogMaxSize: "100Mi"containerLogMaxFiles: 5GDPR Data Deletion Procedures
Section titled “GDPR Data Deletion Procedures”Right to Erasure (Article 17 GDPR)
Section titled “Right to Erasure (Article 17 GDPR)”When a user invokes their right to erasure, the platform must delete or anonymize all personal data. The deletion procedure is coordinated through the admin API.
Initiating a Data Deletion Request
Section titled “Initiating a Data Deletion Request”# Submit a GDPR deletion request for a usercurl -X POST \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "user_id": "usr_01HXYZ...", "reason": "gdpr_erasure", "requested_by": "privacy@example.com", "reference": "GDPR-2026-042" }' \ https://api.arbitex.example.com/api/admin/gdpr/deletion-requests
# Response{ "request_id": "gdpr_del_01HXYZ...", "user_id": "usr_01HXYZ...", "status": "queued", "estimated_completion": "2026-03-13T02:00:00Z", "scope": { "audit_logs": "anonymized", "sessions": "deleted", "api_keys": "deleted", "preferences": "deleted", "cold_tier": "legal_hold_check" }}Deletion Scope by Data Type
Section titled “Deletion Scope by Data Type”| Data Type | Hot Tier Action | Warm Tier Action | Cold Tier Action |
|---|---|---|---|
| User sessions | Delete immediately | — | — |
| API keys | Delete + revoke | — | — |
| User preferences | Delete | Delete | Anonymize |
| Audit log entries | Anonymize (pseudonymize) | Anonymize | Legal hold check |
| DLP scan results | Delete | Anonymize | Anonymize |
| Usage statistics | Aggregate (remove user ref) | Aggregate | Retain (no PII) |
Audit log anonymization replaces PII fields with anonymized tokens:
{ "user_id": "GDPR_DELETED_01HXYZ", "ip_address": "0.0.0.0", "user_agent": "GDPR_DELETED", "details": {"gdpr_anonymized": true, "original_user_hash": "sha256:abc123..."}}Checking Deletion Status
Section titled “Checking Deletion Status”# Poll deletion request statuscurl -H "Authorization: Bearer $ADMIN_TOKEN" \ https://api.arbitex.example.com/api/admin/gdpr/deletion-requests/gdpr_del_01HXYZ
# Completed response{ "request_id": "gdpr_del_01HXYZ...", "status": "completed", "completed_at": "2026-03-13T02:15:00Z", "actions_taken": { "sessions_deleted": 3, "api_keys_revoked": 2, "audit_log_rows_anonymized": 14829, "cold_tier_status": "legal_hold_exempt_archived" }}Legal Hold Exemption
Section titled “Legal Hold Exemption”Cold-tier data under legal hold is exempt from erasure. Declare a hold before processing erasure requests:
# Place a legal hold on a user's cold-tier datacurl -X POST \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "user_id": "usr_01HXYZ...", "reason": "litigation_hold", "reference": "LEGAL-2026-007", "expires": null }' \ https://api.arbitex.example.com/api/admin/gdpr/legal-holdsData Subject Access Request (DSAR)
Section titled “Data Subject Access Request (DSAR)”To export all data for a user (GDPR Article 15):
# Request DSAR exportcurl -X POST \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"user_id": "usr_01HXYZ...", "format": "json"}' \ https://api.arbitex.example.com/api/admin/gdpr/dsar-exports
# Download when ready (signed URL, expires in 24h){ "export_id": "dsar_01HXYZ...", "status": "completed", "download_url": "https://...", "expires_at": "2026-03-13T10:00:00Z"}Operational Checklist
Section titled “Operational Checklist”Monthly Archival Verification
Section titled “Monthly Archival Verification”- Confirm archival job ran successfully (
/api/admin/retention/archive-status) - Verify manifest HMAC signature for previous month’s cold archive
- Check partition sizes for growth trends
- Review and drop partitions that have been fully archived to cold tier
Quarterly Retention Review
Section titled “Quarterly Retention Review”- Review retention periods against current compliance requirements
- Audit active legal holds — release expired holds
- Test GDPR deletion procedure on staging environment
- Verify cold-tier immutability policy has not been modified
- Check Azure Blob replication status (GRS)