Skip to content

Cloud production deployment checklist

Use this checklist before every production deployment of the Arbitex cloud portal. Each item must be verified before running helm upgrade.


  • values-prod.yaml exists and is up to date
  • Image tag matches the release version: image.tag
  • Replica count is appropriate for expected load: replicaCount
  • Resource requests/limits are set: resources.requests, resources.limits
  • Environment-specific overrides are correct (not leftover staging values)
  • TLS secret exists in the target namespace: kubectl -n arbitex get secret arbitex-tls
  • Certificate is not expired: kubectl -n arbitex get secret arbitex-tls -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -enddate
  • Certificate SAN matches the production domain
  • cert-manager Certificate resource is healthy (if using automated renewal): kubectl -n arbitex get certificate
  • JWT_SECRET_KEY is set and matches across all replicas
  • DATABASE_URL points to the production PostgreSQL instance
  • REDIS_URL points to the production Redis instance
  • AZURE_KEYVAULT_URL is set (if using Key Vault for secret management)
  • API keys for external providers are configured
  • AUDIT_HMAC_KEY is set for audit chain integrity
  • Secrets are sourced from Key Vault / external secret store (not hardcoded in values)
  • ingress.enabled: true in values-prod.yaml
  • ingress.className: nginx (or your ingress controller class)
  • ingress.hosts lists the production domain
  • ingress.tls references the correct TLS secret
  • Ingress annotations include rate limiting and security headers if required
# Expected values-prod.yaml ingress block
ingress:
enabled: true
className: nginx
hosts:
- host: app.arbitex.io
paths:
- path: /
pathType: Prefix
tls:
- secretName: arbitex-tls
hosts:
- app.arbitex.io
  • DNS A/CNAME record points to the ingress controller’s external IP/hostname
  • TTL is low enough for quick rollback (recommend 300s during deploy)
  • DNS propagation verified: dig +short app.arbitex.io
  • Alembic migrations are at the expected head: alembic current
  • Backup was taken before deploy (see DR runbook)
  • No pending migrations that could cause downtime
  • Redis is reachable from the cluster: redis-cli -u $REDIS_URL ping
  • Redis memory policy is set (maxmemory-policy allkeys-lru)
  • Bloom filter will initialize on startup (sufficient memory allocated)

Terminal window
helm upgrade arbitex-cloud ./charts/arbitex-cloud \
-f values-prod.yaml \
--namespace arbitex \
--atomic \
--timeout 10m \
--wait

The --atomic flag automatically rolls back on failure.


  • All pods are running: kubectl -n arbitex get pods
  • Rollout completed: kubectl -n arbitex rollout status deployment/arbitex-cloud
  • Health endpoint returns 200: curl -sf https://app.arbitex.io/health
  • Login flow works end-to-end
  • TLS certificate is served correctly: echo | openssl s_client -connect app.arbitex.io:443 2>/dev/null | openssl x509 -noout -subject
  • No error spikes in logs: kubectl -n arbitex logs deploy/arbitex-cloud --since=5m | grep -c ERROR

If issues are detected post-deploy:

Terminal window
# Helm automatic rollback (if --atomic was used, this already happened)
helm rollback arbitex-cloud --namespace arbitex
# Manual rollback to previous revision
helm history arbitex-cloud --namespace arbitex
helm rollback arbitex-cloud <revision> --namespace arbitex