Skip to content

Epic M — deployment architecture overview

Epic M Phase A (MA1–MA6) delivers the production container and Kubernetes infrastructure for the Arbitex Platform. This document describes the deployment architecture: how the services are containerized, how the Helm chart deploys them to AKS, how the CI/CD pipeline builds and pushes images, what security controls are in place, and where to find the operational runbooks.

This is an architecture overview. Step-by-step operational procedures are in the ops runbooks linked at the end of this document.


Sub-phaseScope
MA1Multi-stage Dockerfiles for all four services (backend, frontend, NER GPU, DeBERTa validator)
MA2Helm chart (deploy/helm/arbitex-platform/) for AKS deployment — Deployments, Services, Ingress, PVC, HPA, PodDisruptionBudget
MA3GitHub Actions CI/CD pipeline: lint → test → Trivy vulnerability scan → build → ACR push
MA4Security hardening: non-root containers, read-only root filesystems, Azure Key Vault secrets, HSTS, mTLS CA verification
MA5NGINX Ingress configuration: request body limits, SSE long-poll support, TLS termination
MA6Ops runbooks: Alembic rollback, Docker→AKS migration, incident response playbook

Four containers run as separate Kubernetes Deployments. All are built with multi-stage Dockerfiles pinned to digest-verified base images in production.

ServiceImage nameInternal portBase image
FastAPI backendplatform-api8000python:3.12.8-slim
React/Nginx frontendplatform-frontend8080node:20.18-alpinenginx:1.27.3-alpine
NER GPU microservice (GLiNER)ner-gpu8200GPU-enabled PyTorch image
DeBERTa validator microservicedeberta-validator8201GPU-enabled PyTorch image

Two build stages:

  1. Builder — installs Python dependencies into /install from requirements.txt (production deps only; dev tooling is excluded).
  2. Production — copies the installed packages from the builder stage, creates a non-root user (appuser, UID 1000), exposes port 8000, and runs the docker-entrypoint.sh which executes Alembic migrations then starts uvicorn.

Security properties:

  • PYTHONDONTWRITEBYTECODE=1 and PYTHONUNBUFFERED=1 for clean container logging.
  • PYTHONPATH=/app to resolve backend.app.* import paths.
  • Healthcheck uses Python stdlib (urllib.request) — no curl installed.
  • Entrypoint uses exec form so Linux signals propagate correctly to uvicorn.

Two build stages:

  1. Builder — Node 20 Alpine image runs npm ci && npm run build to produce the Vite/React dist bundle.
  2. Production — Nginx 1.27.3 Alpine image serves the static bundle. The default nginx config is replaced with a custom nginx.conf (request size limits, SSE, proxy headers). Runs as nginx user (UID 101).

Healthcheck: wget -qO /dev/null http://localhost:8080/healthz.

GLiNER zero-shot NER model (urchade/gliner_medium-v2.1) served over HTTP on port 8200. Runs on GPU node pool (accelerator=nvidia). Circuit breaker: 3 consecutive failures / 60-second reset window.

DeBERTa NLI validator served over HTTP on port 8201. Runs on GPU node pool. Provides /validate endpoint for fine-grained entity type classification. Circuit breaker: 3 consecutive failures / 60-second reset window.

Note: the Outpost also includes an in-process DeBERTa scanner (see DeBERTa Tier 3 admin guide) that runs independently of this microservice.


Chart location: deploy/helm/arbitex-platform/

deploy/helm/arbitex-platform/
├── Chart.yaml # name: arbitex-platform, version: 0.1.0, appVersion: 0026
├── values.yaml # default values
├── values-prod.yaml # production resource overrides
└── templates/
├── deployment-api.yaml # FastAPI backend Deployment
├── deployment-frontend.yaml # React/Nginx frontend Deployment
├── deployment-ner-gpu.yaml # GLiNER NER Deployment
├── deployment-deberta.yaml # DeBERTa validator Deployment
├── service-api.yaml # ClusterIP :8100
├── service-frontend.yaml # ClusterIP :3100
├── service-ner-gpu.yaml # ClusterIP :8200
├── service-deberta.yaml # ClusterIP :8201
├── ingress.yaml # NGINX Ingress with TLS
├── hpa.yaml # HorizontalPodAutoscaler
├── pdb.yaml # PodDisruptionBudget
├── job-alembic-migrate.yaml # pre-upgrade Helm hook for migrations
└── secret-provider-class.yaml # Azure Key Vault CSI SecretProviderClass
values.yaml ← base defaults
values-prod.yaml ← production resource overrides
--set ... ← CI/CD runtime overrides (image tags, digests)
ServiceCluster DNSPort
Backend APIarbitex-platform-api8100 → container 8000
Frontendarbitex-platform-frontend3100 → container 8080
NER GPUarbitex-platform-ner-gpu8200
DeBERTaarbitex-platform-deberta8201

When alembic.runAsInitContainer: true (the default), every API pod rollout includes an alembic-migrate init container that runs python -m alembic upgrade head before the main container starts. If the migration fails, the pod never reaches Ready state and Kubernetes blocks the rollout.

A pre-upgrade Helm hook Job also runs before each helm upgrade (backoffLimit: 3, ttlSecondsAfterFinished: 300).

NER GPU and DeBERTa pods require GPU nodes. The chart sets nodeSelector: { accelerator: nvidia } and tolerations for nvidia.com/gpu: NoSchedule on those Deployments. The NVIDIA device plugin DaemonSet must be installed in the cluster (kube-system namespace) for GPU resource scheduling.


The pipeline runs on GitHub Actions. The trigger is a push to main or a release tag. Stages run in order; the pipeline halts on any failure.

Lint (ruff, eslint)
Unit tests (pytest, vitest)
Trivy vulnerability scan (CRITICAL/HIGH findings fail the build)
Docker build (multi-stage, base image digest pinned)
ACR push (four images: platform-api, platform-frontend, ner-gpu, deberta-validator)
Helm upgrade (--atomic --wait, 10-minute timeout)

Each build produces two tags per image:

  • <acr>.azurecr.io/platform-api:<semver> — immutable release tag (e.g. v0.29.0)
  • <acr>.azurecr.io/platform-api:latest — mutable floating tag

Production Helm deployments pin by digest (--set api.image.digest=sha256:<digest>) to prevent tag-overwrite incidents.

Trivy scans all four images for OS package and Python dependency CVEs. CRITICAL and HIGH severity findings that do not have a --ignore-unfixed exemption fail the build. The scan report is uploaded as a GitHub Actions artifact.


The Ingress controller is NGINX (ingress-nginx/ingress-nginx Helm chart). The Arbitex Ingress resource routes api.arbitex.ai to the backend and frontend services.

The backend API accepts large file uploads and multi-turn conversation requests. The NGINX configuration sets:

client_max_body_size 50m;
proxy_request_buffering off;

This is set via the Ingress annotation nginx.ingress.kubernetes.io/proxy-body-size: 50m in templates/ingress.yaml.

The chat completion endpoint streams responses as Server-Sent Events. NGINX is configured to disable proxy buffering for SSE routes:

proxy_buffering off;
proxy_cache off;
X-Accel-Buffering: no

This is applied via annotation on the Ingress resource.

TLS terminates at the Ingress controller. Certificates are issued by cert-manager using a Let’s Encrypt ClusterIssuer (letsencrypt-prod). The Ingress references a TLS secret (arbitex-tls) that cert-manager populates.


Internal mTLS is used for Outpost-to-Platform communication. The Platform backend validates the Outpost’s client certificate against the Platform CA. Certificate paths are mounted as Kubernetes Secrets.

Sensitive environment variables (DATABASE_URL, SECRET_KEY, AUDIT_HMAC_KEY, REDIS_URL, POLICY_SIGNING_KEY) are stored in Azure Key Vault and surfaced to pods via one of two methods:

MethodWhen to use
Kubernetes Secret (manual)Simpler setup; requires manual rotation
Key Vault CSI driver (SecretProviderClass)Recommended for production; automatic rotation via AKS managed identity

The Helm template expects a Kubernetes Secret named <release-name>-api-secrets with at minimum DATABASE_URL and SECRET_KEY keys.

All containers run with these security context settings:

securityContext:
runAsNonRoot: true
runAsUser: 1000 # appuser (backend) / 101 nginx user (frontend)
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: [ALL]

The Ingress sets the Strict-Transport-Security response header via annotation:

nginx.ingress.kubernetes.io/configuration-snippet: |
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

SCIM provisioning tokens are stored in Key Vault and rotated by the identity team. The SCIM_BEARER_TOKEN secret in Key Vault is updated out-of-band; the CSI driver re-syncs the Kubernetes Secret on the next sync interval (default: 2 minutes).

Non-GPU pods run under restricted Pod Security Standard. GPU pods (ner-gpu, deberta-validator) require baseline PSS due to device plugin requirements. If GPU and non-GPU pods share the arbitex namespace, set the namespace enforcement level to baseline with a warn=restricted label.


The following runbooks are maintained in docs/ops/ of the platform repository:

RunbookLocationUse when
Alembic migration rollbackdocs/ops/alembic-rollback.mdA migration breaks the API after deploy (startup errors, 500s, constraint violations)
Docker → AKS migrationdocs/ops/migration-runbook.mdMigrating an existing Docker Compose deployment to AKS for the first time
Incident responsedocs/ops/incident-response.mdKill switch activation, provider failover, DLP bypass, HMAC integrity failure, platform outage
SymptomAction
Pods stuck in init container failure after deployAlembic rollback runbook (run alembic downgrade -1 as a one-off Job)
API CrashLoopBackOff, no migration issueshelm rollback arbitex-platform 0 --namespace arbitex
Provider returning 5xx errorsIncident response → Incident 2 (provider failover / circuit breaker)
DLP blocking legitimate contentIncident response → Incident 3 (DLP bypass procedure)
HMAC audit chain failureIncident response → Incident 4 (audit chain integrity)
Complete platform outageIncident response → Incident 5 (diagnostic steps + restart sequence)

The Alembic migration chain as of Epic M Phase A close:

RevisionDescription
051_user_timezoneCurrent head — user timezone column
050_org_scim_tokensSCIM token storage for orgs
049_audit_hmac_key_idHMAC key ID for audit chain versioning

The next migration will be 052_*. The full revision chain is documented in docs/ops/alembic-rollback.md, Section 8.