Cloud production deployment checklist

Use this checklist before every production deployment of the Arbitex cloud portal. Each item must be verified before running helm upgrade.

Pre-deploy checklist

1. Helm values

values-prod.yaml exists and is up to date
Image tag matches the release version: image.tag
Replica count is appropriate for expected load: replicaCount
Resource requests/limits are set: resources.requests, resources.limits
Environment-specific overrides are correct (not leftover staging values)

2. TLS certificates

TLS secret exists in the target namespace: kubectl -n arbitex get secret arbitex-tls
Certificate is not expired: kubectl -n arbitex get secret arbitex-tls -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -enddate
Certificate SAN matches the production domain
cert-manager Certificate resource is healthy (if using automated renewal): kubectl -n arbitex get certificate

3. Secrets and credentials

JWT_SECRET_KEY is set and matches across all replicas
DATABASE_URL points to the production PostgreSQL instance
REDIS_URL points to the production Redis instance
AZURE_KEYVAULT_URL is set (if using Key Vault for secret management)
API keys for external providers are configured
AUDIT_HMAC_KEY is set for audit chain integrity
Secrets are sourced from Key Vault / external secret store (not hardcoded in values)

4. Ingress

ingress.enabled: true in values-prod.yaml
ingress.className: nginx (or your ingress controller class)
ingress.hosts lists the production domain
ingress.tls references the correct TLS secret
Ingress annotations include rate limiting and security headers if required

# Expected values-prod.yaml ingress block
ingress:
  enabled: true
  className: nginx
  hosts:
    - host: app.arbitex.io
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: arbitex-tls
      hosts:
        - app.arbitex.io

5. DNS

DNS A/CNAME record points to the ingress controller’s external IP/hostname
TTL is low enough for quick rollback (recommend 300s during deploy)
DNS propagation verified: dig +short app.arbitex.io

6. Database

Alembic migrations are at the expected head: alembic current
Backup was taken before deploy (see DR runbook)
No pending migrations that could cause downtime

7. Redis

Redis is reachable from the cluster: redis-cli -u $REDIS_URL ping
Redis memory policy is set (maxmemory-policy allkeys-lru)
Bloom filter will initialize on startup (sufficient memory allocated)

Deploy command

helm upgrade arbitex-cloud ./charts/arbitex-cloud \
  -f values-prod.yaml \
  --namespace arbitex \
  --atomic \
  --timeout 10m \
  --wait

The --atomic flag automatically rolls back on failure.

Post-deploy verification

All pods are running: kubectl -n arbitex get pods
Rollout completed: kubectl -n arbitex rollout status deployment/arbitex-cloud
Health endpoint returns 200: curl -sf https://app.arbitex.io/health
Login flow works end-to-end
TLS certificate is served correctly: echo | openssl s_client -connect app.arbitex.io:443 2>/dev/null | openssl x509 -noout -subject
No error spikes in logs: kubectl -n arbitex logs deploy/arbitex-cloud --since=5m | grep -c ERROR

Rollback

If issues are detected post-deploy:

# Helm automatic rollback (if --atomic was used, this already happened)
helm rollback arbitex-cloud --namespace arbitex

# Manual rollback to previous revision
helm history arbitex-cloud --namespace arbitex
helm rollback arbitex-cloud <revision> --namespace arbitex