Data Retention & Archival

This guide covers the Arbitex AI Gateway data lifecycle management: tiered audit log retention, PostgreSQL table partitioning strategy, Azure Blob immutable storage for cold-tier archival, Redis cache TTL configuration, log rotation, and GDPR-compliant data deletion procedures.

Retention Architecture Overview

Arbitex stores operational and compliance data across three tiers:

Tier	Storage	Latency	Retention	Use Case
Hot	PostgreSQL (primary)	< 10ms	0–30 days	Live query, real-time audit search
Warm	PostgreSQL (archive schema)	< 100ms	31–90 days	Compliance review, incident investigation
Cold	Azure Blob (WORM)	Minutes	> 90 days	Legal hold, long-term compliance

Data transitions between tiers automatically via scheduled PostgreSQL jobs and the arbitex-archiver cron job.

Audit Log Retention Policies

Default Retention Schedule

Hot tier:  30 days  (configurable: AUDIT_HOT_RETENTION_DAYS)
Warm tier: 60 days  (configurable: AUDIT_WARM_RETENTION_DAYS)
Cold tier: 7 years  (configurable: AUDIT_COLD_RETENTION_YEARS)

Configuring Retention via Environment Variables

Set in your platform deployment:

# Platform environment (docker-compose or Kubernetes Secret)
AUDIT_HOT_RETENTION_DAYS=30
AUDIT_WARM_RETENTION_DAYS=90
AUDIT_COLD_RETENTION_YEARS=7
AUDIT_ARCHIVER_SCHEDULE="0 2 * * *"   # 2 AM UTC daily
COLD_STORAGE_PROVIDER=azure_blob      # azure_blob | s3 | gcs
COLD_STORAGE_CONTAINER=arbitex-audit-cold
COLD_STORAGE_ACCOUNT=yourauditstorage

For Kubernetes, store cold-storage credentials in a Secret:

apiVersion: v1
kind: Secret
metadata:
  name: arbitex-cold-storage
  namespace: arbitex
type: Opaque
stringData:
  AZURE_STORAGE_ACCOUNT: "yourauditstorage"
  AZURE_STORAGE_KEY: "your-storage-key"
  AZURE_STORAGE_CONTAINER: "arbitex-audit-cold"

Retention Policy API

Query and update retention policies via the admin API:

# Get current retention policy
curl -H "Authorization: Bearer $ADMIN_TOKEN" \
  https://api.arbitex.example.com/api/admin/retention/policy

# Response
{
  "hot_retention_days": 30,
  "warm_retention_days": 90,
  "cold_retention_years": 7,
  "archiver_schedule": "0 2 * * *",
  "last_archival_run": "2026-03-12T02:00:00Z",
  "last_archival_status": "success",
  "rows_archived_last_run": 48293
}

# Update retention policy
curl -X PUT \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "hot_retention_days": 45,
    "warm_retention_days": 90
  }' \
  https://api.arbitex.example.com/api/admin/retention/policy

PostgreSQL Table Partitioning

The audit_logs table uses range partitioning on created_at to enable efficient data lifecycle management and partition pruning.

Partition Schema

-- Master audit_logs table (partitioned)
CREATE TABLE audit_logs (
    id              UUID        NOT NULL DEFAULT gen_random_uuid(),
    created_at      TIMESTAMPTZ NOT NULL,
    org_id          UUID        NOT NULL,
    user_id         UUID,
    action          TEXT        NOT NULL,
    resource_type   TEXT        NOT NULL,
    resource_id     TEXT,
    request_id      UUID,
    ip_address      INET,
    user_agent      TEXT,
    details         JSONB,
    dlp_triggered   BOOLEAN     DEFAULT FALSE,
    severity        TEXT        DEFAULT 'info',
    hmac_signature  TEXT,
    PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);

-- Monthly partitions (auto-created by arbitex-archiver)
CREATE TABLE audit_logs_2026_01
    PARTITION OF audit_logs
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE audit_logs_2026_02
    PARTITION OF audit_logs
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

-- Indexes created per partition
CREATE INDEX idx_audit_logs_2026_01_org_id
    ON audit_logs_2026_01 (org_id, created_at DESC);
CREATE INDEX idx_audit_logs_2026_01_user_id
    ON audit_logs_2026_01 (user_id, created_at DESC);

Automatic Partition Management

The arbitex-archiver job creates next-month partitions on the 25th of each month:

# Partition creation (executed by archiver job)
def create_next_partition(conn, target_month: date):
    partition_name = f"audit_logs_{target_month.strftime('%Y_%m')}"
    start = target_month.replace(day=1)
    end = (start + relativedelta(months=1))

    conn.execute(f"""
        CREATE TABLE IF NOT EXISTS {partition_name}
            PARTITION OF audit_logs
            FOR VALUES FROM ('{start}') TO ('{end}');

        CREATE INDEX IF NOT EXISTS idx_{partition_name}_org_id
            ON {partition_name} (org_id, created_at DESC);
        CREATE INDEX IF NOT EXISTS idx_{partition_name}_user_id
            ON {partition_name} (user_id, created_at DESC);
    """)

Warm Tier (Archive Schema)

Data older than AUDIT_HOT_RETENTION_DAYS moves to the archive schema:

-- Archive schema uses identical partition structure
CREATE TABLE archive.audit_logs (
    LIKE public.audit_logs INCLUDING ALL
) PARTITION BY RANGE (created_at);

-- Move hot → warm (run nightly)
WITH moved AS (
    DELETE FROM public.audit_logs
    WHERE created_at < NOW() - INTERVAL '30 days'
    RETURNING *
)
INSERT INTO archive.audit_logs SELECT * FROM moved;

Monitoring Partition Health

-- Check partition sizes
SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
    pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS data_size
FROM pg_tables
WHERE tablename LIKE 'audit_logs_%'
ORDER BY tablename;

-- Check row counts per partition
SELECT
    child.relname AS partition,
    pg_stat_user_tables.n_live_tup AS live_rows
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child  ON pg_inherits.inhrelid  = child.oid
JOIN pg_stat_user_tables ON pg_stat_user_tables.relname = child.relname
WHERE parent.relname = 'audit_logs'
ORDER BY child.relname;

Maintenance Tasks

-- Vacuum a specific partition after large deletes
VACUUM ANALYZE public.audit_logs_2026_01;

-- Drop a partition (after cold archival is confirmed)
DROP TABLE public.audit_logs_2026_01;

-- Detach a partition for offline processing without dropping
ALTER TABLE public.audit_logs
    DETACH PARTITION public.audit_logs_2026_01;

Azure Blob WORM (Cold Tier Archival)

Data older than 90 days is exported to Azure Blob Storage with immutability policies (WORM — Write Once, Read Many) for compliance and legal hold purposes.

Azure Blob Setup

1. Create the storage account with immutable storage:

# Create resource group
az group create --name arbitex-compliance --location eastus

# Create storage account with versioning enabled
az storage account create \
  --name yourauditstorage \
  --resource-group arbitex-compliance \
  --location eastus \
  --sku Standard_GRS \
  --kind StorageV2 \
  --enable-hierarchical-namespace false \
  --allow-blob-public-access false \
  --min-tls-version TLS1_2

# Create the audit container
az storage container create \
  --name arbitex-audit-cold \
  --account-name yourauditstorage \
  --auth-mode login

# Enable versioning on the account
az storage account blob-service-properties update \
  --account-name yourauditstorage \
  --enable-versioning true \
  --enable-delete-retention true \
  --delete-retention-days 90

2. Set container-level immutability policy:

# Create time-based retention policy (7 years = 2557 days)
az storage container immutability-policy create \
  --account-name yourauditstorage \
  --container-name arbitex-audit-cold \
  --period 2557

# Lock the policy (irreversible — prevents deletion)
# Only do this after testing; locked policies cannot be shortened
az storage container immutability-policy lock \
  --account-name yourauditstorage \
  --container-name arbitex-audit-cold \
  --if-match $(az storage container immutability-policy show \
    --account-name yourauditstorage \
    --container-name arbitex-audit-cold \
    --query etag -o tsv)

Archival File Format

Each cold-tier archival file is a NDJSON (newline-delimited JSON) export, gzip-compressed, with an HMAC-SHA256 manifest:

arbitex-audit-cold/
├── 2026/
│   ├── 01/
│   │   ├── audit_logs_2026_01_001.ndjson.gz
│   │   ├── audit_logs_2026_01_002.ndjson.gz
│   │   └── MANIFEST.json
│   └── 02/
│       ├── audit_logs_2026_02_001.ndjson.gz
│       └── MANIFEST.json

Manifest format:

{
  "period": "2026-01",
  "exported_at": "2026-02-01T02:15:33Z",
  "total_rows": 1294847,
  "files": [
    {
      "filename": "audit_logs_2026_01_001.ndjson.gz",
      "sha256": "e3b0c44298fc1c149afb...",
      "rows": 500000,
      "size_bytes": 104857600
    }
  ],
  "schema_version": "2",
  "hmac_signature": "sha256=abc123..."
}

Running Cold Archival Manually

# Trigger an immediate cold archival run
curl -X POST \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  https://api.arbitex.example.com/api/admin/retention/archive-now

# Check archival status
curl -H "Authorization: Bearer $ADMIN_TOKEN" \
  https://api.arbitex.example.com/api/admin/retention/archive-status

# Response
{
  "status": "running",
  "started_at": "2026-03-12T10:00:00Z",
  "rows_processed": 12500,
  "current_partition": "audit_logs_2025_12",
  "estimated_completion": "2026-03-12T10:45:00Z"
}

Verifying Cold Archive Integrity

import hashlib, hmac, gzip, json
from azure.storage.blob import BlobServiceClient

def verify_archive_manifest(account_url: str, container: str, period: str, hmac_key: bytes):
    client = BlobServiceClient(account_url=account_url)
    container_client = client.get_container_client(container)

    year, month = period.split("-")
    manifest_path = f"{year}/{month}/MANIFEST.json"

    manifest_data = container_client.download_blob(manifest_path).readall()
    manifest = json.loads(manifest_data)

    # Verify HMAC signature
    payload = json.dumps({k: v for k, v in manifest.items() if k != "hmac_signature"}, sort_keys=True)
    expected_sig = "sha256=" + hmac.new(hmac_key, payload.encode(), hashlib.sha256).hexdigest()

    if not hmac.compare_digest(manifest["hmac_signature"], expected_sig):
        raise ValueError("Manifest HMAC signature mismatch — archive integrity compromised")

    # Verify each file checksum
    for file_entry in manifest["files"]:
        blob_data = container_client.download_blob(f"{year}/{month}/{file_entry['filename']}").readall()
        actual_sha256 = hashlib.sha256(blob_data).hexdigest()
        if actual_sha256 != file_entry["sha256"]:
            raise ValueError(f"Checksum mismatch for {file_entry['filename']}")

    print(f"Archive {period}: integrity verified, {manifest['total_rows']} rows across {len(manifest['files'])} files")

Redis Cache TTLs

Arbitex uses Redis for session storage, rate-limit counters, DLP cache, and MFA state. Default TTLs are tunable via environment variables.

TTL Configuration Reference

Cache Key Pattern	Default TTL	Environment Variable	Description
`session:{token}`	24h	`SESSION_TTL_SECONDS=86400`	User session tokens
`ratelimit:{key}`	60s	`RATELIMIT_WINDOW_SECONDS=60`	Rate limit windows
`dlp_cache:{hash}`	5m	`DLP_CACHE_TTL_SECONDS=300`	DLP result cache (deduplicate)
`mfa_challenge:{id}`	10m	`MFA_CHALLENGE_TTL_SECONDS=600`	Pending MFA challenges
`mfa_verified:{token}`	1h	`MFA_VERIFIED_TTL_SECONDS=3600`	MFA step-up assertion
`budget_cache:{group}`	5m	`BUDGET_CACHE_TTL_SECONDS=300`	Budget counter cache
`provider_health:{id}`	30s	`PROVIDER_HEALTH_TTL_SECONDS=30`	Provider health cache

Setting TTLs in Deployment

# In .env or Kubernetes ConfigMap
SESSION_TTL_SECONDS=86400
DLP_CACHE_TTL_SECONDS=300
MFA_CHALLENGE_TTL_SECONDS=600
MFA_VERIFIED_TTL_SECONDS=3600
BUDGET_CACHE_TTL_SECONDS=300

Redis Memory Management

# Monitor key expiry and memory usage
redis-cli INFO keyspace
redis-cli INFO memory

# List TTLs for active sessions (admin debugging)
redis-cli --scan --pattern "session:*" | head -20 | xargs -I{} redis-cli TTL {}

# Flush DLP cache if stale entries suspected
redis-cli --scan --pattern "dlp_cache:*" | xargs redis-cli DEL

Configure Redis maxmemory policy to evict volatile keys when memory pressure occurs:

maxmemory 2gb
maxmemory-policy volatile-lru   # Evict keys with TTLs set, LRU order

Log Rotation

Application Log Rotation (logrotate)

Platform containers write structured JSON logs to /var/log/arbitex/. Configure logrotate for VM-based deployments:

/var/log/arbitex/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    copytruncate
    postrotate
        # Signal platform to reopen log files
        kill -USR1 $(cat /var/run/arbitex/platform.pid) 2>/dev/null || true
    endscript
}

For Kubernetes deployments, use a log aggregation sidecar (Fluent Bit or Vector) instead of logrotate:

# values.yaml — enable log aggregation sidecar
platform:
  logAggregation:
    enabled: true
    sidecar: fluent-bit     # fluent-bit | vector | fluentd
    destination: loki       # loki | elasticsearch | splunk
    lokiEndpoint: http://loki:3100

Nginx/Ingress Access Log Rotation

/var/log/nginx/*.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    sharedscripts
    postrotate
        nginx -s reopen 2>/dev/null || true
    endscript
}

Kubernetes Log Management

For Kubernetes, configure pod log rotation at the kubelet level:

# kubelet configuration
containerLogMaxSize: "100Mi"
containerLogMaxFiles: 5

When a user invokes their right to erasure, the platform must delete or anonymize all personal data. The deletion procedure is coordinated through the admin API.

Initiating a Data Deletion Request

# Submit a GDPR deletion request for a user
curl -X POST \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "usr_01HXYZ...",
    "reason": "gdpr_erasure",
    "requested_by": "privacy@example.com",
    "reference": "GDPR-2026-042"
  }' \
  https://api.arbitex.example.com/api/admin/gdpr/deletion-requests

# Response
{
  "request_id": "gdpr_del_01HXYZ...",
  "user_id": "usr_01HXYZ...",
  "status": "queued",
  "estimated_completion": "2026-03-13T02:00:00Z",
  "scope": {
    "audit_logs": "anonymized",
    "sessions": "deleted",
    "api_keys": "deleted",
    "preferences": "deleted",
    "cold_tier": "legal_hold_check"
  }
}

Deletion Scope by Data Type

Data Type	Hot Tier Action	Warm Tier Action	Cold Tier Action
User sessions	Delete immediately	—	—
API keys	Delete + revoke	—	—
User preferences	Delete	Delete	Anonymize
Audit log entries	Anonymize (pseudonymize)	Anonymize	Legal hold check
DLP scan results	Delete	Anonymize	Anonymize
Usage statistics	Aggregate (remove user ref)	Aggregate	Retain (no PII)

Audit log anonymization replaces PII fields with anonymized tokens:

{
  "user_id": "GDPR_DELETED_01HXYZ",
  "ip_address": "0.0.0.0",
  "user_agent": "GDPR_DELETED",
  "details": {"gdpr_anonymized": true, "original_user_hash": "sha256:abc123..."}
}

Checking Deletion Status

# Poll deletion request status
curl -H "Authorization: Bearer $ADMIN_TOKEN" \
  https://api.arbitex.example.com/api/admin/gdpr/deletion-requests/gdpr_del_01HXYZ

# Completed response
{
  "request_id": "gdpr_del_01HXYZ...",
  "status": "completed",
  "completed_at": "2026-03-13T02:15:00Z",
  "actions_taken": {
    "sessions_deleted": 3,
    "api_keys_revoked": 2,
    "audit_log_rows_anonymized": 14829,
    "cold_tier_status": "legal_hold_exempt_archived"
  }
}

Legal Hold Exemption

Cold-tier data under legal hold is exempt from erasure. Declare a hold before processing erasure requests:

# Place a legal hold on a user's cold-tier data
curl -X POST \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "usr_01HXYZ...",
    "reason": "litigation_hold",
    "reference": "LEGAL-2026-007",
    "expires": null
  }' \
  https://api.arbitex.example.com/api/admin/gdpr/legal-holds

Data Subject Access Request (DSAR)

To export all data for a user (GDPR Article 15):

# Request DSAR export
curl -X POST \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"user_id": "usr_01HXYZ...", "format": "json"}' \
  https://api.arbitex.example.com/api/admin/gdpr/dsar-exports

# Download when ready (signed URL, expires in 24h)
{
  "export_id": "dsar_01HXYZ...",
  "status": "completed",
  "download_url": "https://...",
  "expires_at": "2026-03-13T10:00:00Z"
}

Operational Checklist

Monthly Archival Verification

Confirm archival job ran successfully (/api/admin/retention/archive-status)
Verify manifest HMAC signature for previous month’s cold archive
Check partition sizes for growth trends
Review and drop partitions that have been fully archived to cold tier

Quarterly Retention Review

Review retention periods against current compliance requirements
Audit active legal holds — release expired holds
Test GDPR deletion procedure on staging environment
Verify cold-tier immutability policy has not been modified
Check Azure Blob replication status (GRS)

Data Retention & Archival

Data Retention & Archival

Retention Architecture Overview

Audit Log Retention Policies

Default Retention Schedule

Configuring Retention via Environment Variables

Retention Policy API

PostgreSQL Table Partitioning

Partition Schema

Automatic Partition Management

Warm Tier (Archive Schema)

Monitoring Partition Health

Maintenance Tasks

Azure Blob WORM (Cold Tier Archival)

Azure Blob Setup

Archival File Format

Running Cold Archival Manually

Verifying Cold Archive Integrity

Redis Cache TTLs

TTL Configuration Reference

Setting TTLs in Deployment

Redis Memory Management

Log Rotation

Application Log Rotation (logrotate)

Nginx/Ingress Access Log Rotation

Kubernetes Log Management

GDPR Data Deletion Procedures

Right to Erasure (Article 17 GDPR)

Initiating a Data Deletion Request

Deletion Scope by Data Type

Checking Deletion Status

Legal Hold Exemption

Data Subject Access Request (DSAR)

Operational Checklist

Monthly Archival Verification

Quarterly Retention Review

Related Documentation