Content Categories
import { Aside, Badge } from ‘@astrojs/starlight/components’;
Content Categories
Section titled “Content Categories”Content Categories give Arbitex policy rules semantic context about the subject matter of an AI conversation. Where DLP rules detect what kind of data is present (PII, credentials, financial account numbers), Content Categories detect what topic or domain the conversation is about (medical advice, legal strategy, financial analysis).
Combining DLP findings with category context enables richer enforcement:
IF category = "medical/clinical-advice" AND user.group = "general-employees"THEN action = block, reason = "Medical advice outside approved personas"
IF category = "financial/investment-research" AND dlp.finding = "material-nonpublic"THEN action = block + alert, reason = "Potential insider trading risk"Category Taxonomy
Section titled “Category Taxonomy”The Arbitex content category system organizes topics into 8 top-level domains with 25 subcategories. Classification is performed by the Tier 3 DeBERTa model (the same model used for DLP classification).
Domain Overview
Section titled “Domain Overview”| Domain | Subcategories | Primary Use Cases |
|---|---|---|
medical | 4 | Clinical advice, drug information, diagnostic support, mental health |
legal | 3 | Contract review, litigation strategy, regulatory compliance |
financial | 4 | Investment research, M&A analysis, trading signals, tax advice |
hr | 3 | Compensation data, performance reviews, hiring decisions |
security | 3 | Vulnerability research, exploit development, security tooling |
political | 3 | Electoral content, lobbying, policy advocacy |
personal | 3 | Relationship advice, personal health, religious/spiritual topics |
technical | 2 | Code generation, architecture design |
Full Taxonomy Reference
Section titled “Full Taxonomy Reference”medical — 4 subcategories
Section titled “medical — 4 subcategories”| Subcategory ID | Label | Description |
|---|---|---|
medical/clinical-advice | Clinical Advice | Diagnosis, treatment recommendations, clinical decision support |
medical/drug-information | Drug Information | Medication dosing, interactions, pharmacology |
medical/mental-health | Mental Health | Therapy, psychiatric conditions, crisis support |
medical/public-health | Public Health | Epidemiology, vaccination, disease surveillance |
legal — 3 subcategories
Section titled “legal — 3 subcategories”| Subcategory ID | Label | Description |
|---|---|---|
legal/contract-review | Contract Review | Contract drafting, clause analysis, legal obligations |
legal/litigation | Litigation Strategy | Case strategy, discovery, settlement negotiations |
legal/regulatory | Regulatory Compliance | Regulatory requirements, filings, compliance programs |
financial — 4 subcategories
Section titled “financial — 4 subcategories”| Subcategory ID | Label | Description |
|---|---|---|
financial/investment-research | Investment Research | Equity analysis, market research, investment recommendations |
financial/ma-analysis | M&A Analysis | Deal structuring, due diligence, valuation |
financial/trading | Trading Signals | Market timing, trading strategies, price predictions |
financial/tax | Tax Advice | Tax planning, filing strategy, cross-border tax |
hr — 3 subcategories
Section titled “hr — 3 subcategories”| Subcategory ID | Label | Description |
|---|---|---|
hr/compensation | Compensation | Salary bands, bonus structure, equity plans |
hr/performance | Performance Reviews | Employee evaluations, PIPs, termination decisions |
hr/hiring | Hiring Decisions | Candidate screening, interview scoring, offer decisions |
security — 3 subcategories
Section titled “security — 3 subcategories”| Subcategory ID | Label | Description |
|---|---|---|
security/vulnerability-research | Vulnerability Research | CVE analysis, security research, threat modeling |
security/exploit-development | Exploit Development | Exploit PoC, attack tooling, offensive security |
security/security-tooling | Security Tooling | Pen testing tools, SIEM queries, detection engineering |
political — 3 subcategories
Section titled “political — 3 subcategories”| Subcategory ID | Label | Description |
|---|---|---|
political/electoral | Electoral Content | Voting, candidates, election administration |
political/lobbying | Lobbying | Government relations, advocacy campaigns |
political/policy | Policy Advocacy | Policy positions, regulatory advocacy |
personal — 3 subcategories
Section titled “personal — 3 subcategories”| Subcategory ID | Label | Description |
|---|---|---|
personal/relationships | Relationships | Romantic advice, family dynamics, interpersonal conflict |
personal/health | Personal Health | Non-clinical health topics, wellness, fitness |
personal/spiritual | Religious / Spiritual | Religious practices, spiritual counseling |
technical — 2 subcategories
Section titled “technical — 2 subcategories”| Subcategory ID | Label | Description |
|---|---|---|
technical/code-generation | Code Generation | Writing, reviewing, or debugging code |
technical/architecture | Architecture Design | System design, infrastructure planning |
Category Hierarchy
Section titled “Category Hierarchy”Categories follow a two-level hierarchy: domain/subcategory. Policy rules can match at either level:
- Domain-level match:
category.startsWith("medical")— matches all medical subcategories - Subcategory match:
category == "medical/clinical-advice"— matches exact subcategory only
# Policy rule using domain-level match- name: "Block all medical advice for general employees" conditions: - field: content.category operator: starts_with value: "medical" - field: user.groups operator: not_contains value: "approved-medical-ai-users" action: block reason: "Medical AI use restricted to approved users"
# Policy rule using exact subcategory- name: "Flag exploit development content" conditions: - field: content.category operator: equals value: "security/exploit-development" action: flag severity: high reason: "Potential offensive security use"Category Confidence Score
Section titled “Category Confidence Score”Each classification result includes a confidence score (0.0–1.0). Policy conditions can filter by minimum confidence to reduce false positives:
conditions: - field: content.category operator: equals value: "financial/investment-research" - field: content.category_confidence operator: gte value: 0.75The default confidence threshold for category-based policy conditions is 0.70. Conversations below threshold are treated as uncategorized.
Policy Rule Conditions Using Categories
Section titled “Policy Rule Conditions Using Categories”When enable_content_categories is enabled, the following condition fields become available in the Policy Engine rule builder:
| Field | Type | Description |
|---|---|---|
content.category | string | Top predicted category (e.g., "medical/clinical-advice") |
content.category_domain | string | Domain portion only (e.g., "medical") |
content.category_confidence | float | Classification confidence score (0.0–1.0) |
content.category_scores | map<string,float> | Scores for all categories above 0.05 threshold |
content.is_uncategorized | bool | True if top score < configured threshold |
Supported Operators
Section titled “Supported Operators”| Operator | Applicable Fields | Description |
|---|---|---|
equals | category, category_domain | Exact match |
not_equals | category, category_domain | Exclude specific category |
starts_with | category | Domain-level match (e.g., "medical") |
in | category, category_domain | Match any of a list of values |
not_in | category, category_domain | Exclude any of a list |
gte / lte | category_confidence | Confidence threshold filter |
Example Use Cases
Section titled “Example Use Cases”Use Case 1: Restrict Medical Content by Role
Section titled “Use Case 1: Restrict Medical Content by Role”Scenario: A healthcare company deploys Arbitex as a productivity assistant. Clinical staff are allowed to use the medical AI persona; general employees are not.
rules: - name: "Medical content — general employees blocked" priority: 100 conditions: - field: content.category_domain operator: equals value: "medical" - field: user.groups operator: not_contains value: "clinical-staff" action: block message: "Medical AI assistance is available to clinical staff only. Contact your administrator to request access."
- name: "Clinical advice — require step-up auth" priority: 90 conditions: - field: content.category operator: equals value: "medical/clinical-advice" - field: user.groups operator: contains value: "clinical-staff" - field: user.mfa_verified operator: equals value: false action: require_mfa reason: "Clinical advice requires MFA step-up"Use Case 2: Financial Content with DLP Combination
Section titled “Use Case 2: Financial Content with DLP Combination”Scenario: A financial services firm wants to prevent AI use for investment research when material non-public information (MNPI) is detected in the conversation.
rules: - name: "Investment research + MNPI — block and alert" priority: 200 conditions: - field: content.category operator: equals value: "financial/investment-research" - field: dlp.findings operator: contains_type value: "material-nonpublic-information" action: block alert: severity: critical channels: ["compliance-team", "legal-team"] reason: "Potential MNPI in investment research context"
- name: "M&A analysis — restrict to deal team" priority: 150 conditions: - field: content.category operator: equals value: "financial/ma-analysis" - field: user.groups operator: not_contains value: "deal-team" action: block reason: "M&A AI analysis restricted to active deal team members"Use Case 3: Allow Technical Content, Restrict Exploit Development
Section titled “Use Case 3: Allow Technical Content, Restrict Exploit Development”Scenario: A technology company wants to allow general code generation but restrict exploit development content.
rules: - name: "Allow code generation broadly" priority: 50 conditions: - field: content.category operator: equals value: "technical/code-generation" action: allow
- name: "Flag exploit development" priority: 300 conditions: - field: content.category operator: equals value: "security/exploit-development" - field: user.groups operator: not_contains value: "security-research-team" action: flag severity: high reason: "Exploit development content outside security research team"Category Classification Performance
Section titled “Category Classification Performance”The DeBERTa Tier 3 model provides category classification with the following performance characteristics (based on internal evaluation corpus):
| Metric | Value | Notes |
|---|---|---|
| Top-1 accuracy | ~87% | Correct subcategory predicted |
| Top-3 accuracy | ~96% | Correct category in top 3 predictions |
| Average latency | 45–80ms | GPU inference, p50 |
| P99 latency | ~200ms | Under normal load |
| False positive rate | ~4% | At default 0.70 threshold |
uncategorized rate | ~12% | General/conversational content |
Enabling Content Categories
Section titled “Enabling Content Categories”Feature Flag
Section titled “Feature Flag”Content Categories are gated behind the enable_content_categories org-level feature flag:
# Enable for your organization (admin API)curl -X PUT https://api.arbitex.example.com/api/admin/org/feature-flags \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"enable_content_categories": true}'Infrastructure Requirements
Section titled “Infrastructure Requirements”- DeBERTa Tier 3 service must be running (GPU node required)
- Minimum GPU memory: 10 GB VRAM (A10G or equivalent)
- DeBERTa model must be trained or fine-tuned for the Arbitex category taxonomy (model ID:
arbitex-deberta-v3-categories-v1)
See Kubernetes Deployment for GPU node pool configuration.
Audit Events
Section titled “Audit Events”Category classification decisions are recorded in the audit log with the following fields:
{ "event_type": "content_category_classified", "request_id": "req_abc123", "category": "financial/investment-research", "category_confidence": 0.89, "category_scores": { "financial/investment-research": 0.89, "financial/trading": 0.06, "financial/ma-analysis": 0.03 }, "policy_rule_triggered": "investment-research-restrict", "action_taken": "block"}Roadmap
Section titled “Roadmap”The following capabilities are planned for the Content Categories GA release:
| Feature | Status | Target |
|---|---|---|
| 8-domain / 25-subcategory taxonomy | In development | Q2 2026 |
| Policy Engine integration | In development | Q2 2026 |
| Custom category fine-tuning | Planned | Q3 2026 |
| Category-level audit dashboard | Planned | Q3 2026 |
| Per-group category allowlists | Planned | Q3 2026 |
| Multi-label classification (multiple categories per conversation) | Research | Q4 2026 |