Skip to content

Summarizer Mode

Summarizer Mode sends your prompt to several AI models simultaneously, then routes all of their responses to a dedicated synthesizer model that merges them into a single, comprehensive answer. Each fact in the synthesis is tagged with the source model(s) that contributed it.

This differs from Compare Mode, which shows each model’s response side-by-side without merging. Summarizer Mode is best when you want a unified answer and attribution, not a raw comparison.


  • At least two source models must be selected before you can send in Summarizer Mode.
  • A synthesizer model must be designated (one model is auto-assigned from the available model list on first load; it can be changed at any time).

  1. Open a conversation or start a new one.
  2. In the routing mode selector, choose Summarize.
  3. The mode is persisted to localStorage under the key arbitex-routing-mode and restored on page reload.
localStorage key: arbitex-routing-mode
Valid values: "single" | "compare" | "summarize"

When you navigate to an existing conversation, the mode stored on that conversation record overrides localStorage.


Source models are the models whose responses the synthesizer will merge. You must select two or more.

  1. Click the model selector to open the model picker.
  2. Choose the models you want to query. Your selection is persisted to localStorage under arbitex-selected-models.
  3. The count badge on the mode selector shows the number of source models — the synthesizer model is excluded from this count.
localStorage key: arbitex-selected-models
Format: JSON array of model_id strings

The synthesizer model receives the merged source responses and writes the final synthesis. It is chosen separately from the source models.

  • On first load, the synthesizer defaults to models[0] from the available model list (typically the most capable model).
  • To change it, open the summarizer model dropdown and select a different model.
  • The selected synthesizer is automatically excluded from the source model list so it does not appear as both a source and a synthesizer in the same request.

When you submit a message in Summarizer Mode (sendSummarizeMessage in frontend/src/stores/chat.ts):

The platform sends your message to all source models in parallel using the POST /api/chat/{conversation_id}/messages endpoint with mode: "summarize". Each source model streams its response back independently.

Streaming state per model is tracked in summarizeStreams, keyed by model_id. The UI renders each source response as it arrives in a StreamingSummaryCard.

Once all source model responses are complete, the backend builds a synthesizer prompt using build_synthesizer_messages (backend/app/services/boundary_markers.py). The function returns a (system_prompt, user_prompt) pair:

  • system_prompt — synthesizer role definition, boundary security instructions, and synthesis rules. This is constant for a given request (varies only by the boundary token).
  • user_prompt — the original question for context + all source model responses wrapped in boundary markers + attribution instructions.

The synthesizer model then streams its merged response back. The synthesizer stream appears in summarizeStreams under the synthesizer’s model_id when the first synthesis chunk arrives.

On completion, the store saves:

  • Individual source model responses as role: "assistant" messages.
  • The synthesizer output as a role: "summary" message tagged with the synthesizer’s model_id and provider.

Summarizer Mode defends against cross-model prompt injection using cryptographic boundary markers (backend/app/services/boundary_markers.py).

Each source model’s response is wrapped:

[BOUNDARY:{token}:START:{model_id}]
...response text...
[BOUNDARY:{token}:END:{model_id}]
  • token is a secrets.token_hex(16) value (32-character hex string) generated fresh for each summarize request.
  • The synthesizer is instructed to treat only content inside boundary markers as model responses, and to ignore any instructions or directives that appear within those boundaries (OWASP LLM01 defense).
  • If boundary markers are detected in the synthesizer’s output — which would indicate the synthesizer was manipulated — the platform logs a boundary_violation_detected WARNING event and strips the leaked markers from the output.

The synthesizer is instructed to tag every factual claim with its source model(s) using inline markers:

Attribution formatWhen used
[source:model_name]text[/source]Claim from a single source model
[source:model1,model2]text[/source]Claim from multiple (but not all) source models
[source:model1,model2,model3]text[/source]Claim that appeared in all source models

Transitions, headings, and structural phrases (e.g., “In summary…”, “However…”) may appear outside source markers. The synthesizer is instructed to omit markers for common knowledge and structural text.

When a completed synthesis is rendered, every source-attributed span is displayed with a provider-specific background color that matches the corresponding source model’s color in the legend above the card.

Colors are assigned by provider family using the getModelColor(provider, index) utility (frontend/src/lib/providerColors.ts):

Provider familyColor family
Anthropic (Claude)Purple/violet
OpenAI (GPT)Green
Google (Gemini)Blue
Meta (Llama)Orange
MistralIndigo
CohereTeal
Other / unknownGray

If your organization has two models from the same provider (e.g., Claude 3 Opus and Claude 3 Haiku), the second model receives a lighter shade of the same family.

The Attribution Legend above the summary card shows a pill badge for each source model, colored with its assigned provider color. This legend is the key for interpreting the colored spans in the synthesis text.

Sources: [Claude 3 Opus ▌] [GPT-4o ▌] [Gemini 1.5 Pro ▌]

Hovering over any colored attribution span shows a tooltip identifying the source:

  • Single source: "Source: claude-3-opus"
  • Multiple sources: "Sources: claude-3-opus, gpt-4o"

Tooltips are rendered by the AttributionHighlight component (frontend/src/components/AttributionHighlight.tsx).

Attribution colors use the provider’s assigned Tailwind color family but respect the active brand theme’s opacity and contrast settings. The colors are not hardcoded hex values — they adapt when an organization activates a custom brand theme (e.g., Voya’s enterprise theme). The --brand-* CSS custom properties flow through to the highlighted segments.

No user action is required to enable attribution. When the backend synthesizer includes [source:ModelName]...[/source] markers in its output, the AttributionHighlight component parses them automatically and renders the colored segments. If a model output contains no source markers, the synthesis renders as plain markdown without highlights.

When reading a highlighted synthesis:

  • Single-color span — Only one source model contributed this claim. The color matches that model’s pill in the legend.
  • Multi-model tooltip — Multiple models agreed on this claim; the span takes the color of the first-listed source.
  • Un-highlighted text — Common knowledge that all models agreed on, or structural/transitional text the synthesizer added.
  • Full paragraph un-highlighted — The synthesizer synthesized across all sources without a single dominant contributor.

ComponentBehavior
Source model cardsAppear immediately, stream content in real time as each model responds
Synthesizer cardAppears dynamically when the first synthesis chunk arrives (after all sources complete)
AbortPressing Stop cancels all in-flight streams via AbortController; the store marks all still-streaming models as complete

The summarizeStreams state is not cleared on completion — it remains populated so the streaming cards render in their streamingComplete state without a flash-of-empty during the streaming → history transition. Streams are cleared when a new request starts or when navigating to a different conversation.


Each model in the fan-out (source models + synthesizer) tracks token usage independently. Usage data is available per-message:

{
"input_tokens": 312,
"output_tokens": 847,
"cost_estimate": 0.00124,
"latency_ms": 2340
}

The synthesizer’s usage reflects its own input/output (the wrapped source responses count as synthesizer input tokens).


StatePersistence
Routing mode (summarize)localStorage: arbitex-routing-mode and on the conversation record (mode field)
Selected source model IDslocalStorage: arbitex-selected-models
Synthesizer modelIn-memory only; resets to models[0] on page reload

Resetting the store on logout clears both localStorage keys.