OrcaRouter Lite — Self-hosted LLM router

Spend (7d)

$—

across 0 requests

Routing savings

$—

vs always-GPT-4o

vs hosted-auto +$—

p50 latency

— ms

p99 — ms

Models available

—

— providers configured

Quick start

You're up — point any OpenAI SDK at your base URL.

loading…

Use model="auto" to let the router pick the cheapest capable model.

Recent activity

Last 5 requests routed through this server.

No requests yet.

Once you send your first chat.completions call, it'll show up here.

Provider keys BYOK

Encrypted at rest with AES-256-GCM. Env vars override DB rows for the same provider.

Provider	Prefix	Status

No provider keys yet

Add at least one to start routing real traffic. Pick a provider above or click a quick-add chip.

Routing strategy

How the router picks between candidate models when you send model="auto".

Balanced

50/50 weighted blend of AA quality & cost. The sane default for most teams.

Recommended

Cheapest

Lowest per-token cost that still meets the request's capabilities.

Fastest

Highest throughput + lowest first-token latency, from Artificial Analysis benchmarks. Great for chat UIs.

Quality

Prefers frontier models. Best for hard reasoning tasks.

Pick a card to switch strategy. Saved automatically.

How `model="auto"` works

Three filters, applied in order.

Capability filter. The router inspects your request — is there an image? a tool definition? response_format=json? — and drops models that can't handle it.
Provider filter. Only models whose provider you've configured (or that hosted upstream covers) survive.
Strategy ranking. The remaining candidates are scored by your chosen strategy above. The winner is called.

The chosen model comes back to your client in the x-orca-resolved-model response header. The strategy in effect is echoed as x-orca-routing-strategy.

How each strategy maps to LiteLLM Router

Strategy	litellm `routing_strategy`	`model="auto"` picks
balanced	`None` (we rank ourselves)	50/50 normalized AA quality & inverted cost; strict two-axis coverage
cheapest	`cost-based-routing`	cheapest capable (blended 0.3 input + 0.7 output cost)
fastest	`None` (we rank ourselves)	50/50 normalized AA TPS & inverted TTFT; strict two-axis coverage
quality	`None` (we rank ourselves)	highest AA Intelligence Index (or manual override); unscored models rank below scored

The strategy controls two things: which model model="auto" resolves to, and how LiteLLM Router picks between deployments that serve the same model (e.g. local OpenAI key + hosted upstream).

Spend by model

—

Latency by provider

p50 and p99 — sourced from local request logs.

Provider	Requests	p50	p99

Recent requests

Newest first. Click a row to copy its trace ID.

When	Model	Provider	Tokens (in / out)	Latency	Status

API keys

Each key authenticates against this Lite workspace. Plaintext is shown once on creation.

Save this key — it won't be shown again.

Name	Prefix	Status	Last used

Models

Edit a row's Manual column to override the AA score for routing decisions. Manual values win over AA. Use this when your own evals disagree, or when AA hasn't scored a model yet.

Model	Provider	AA	Manual	Effective	TPS	TTFT	$/M blended	Status

Powered by Artificial Analysis — Intelligence Index aggregates MMLU-Pro, GPQA, MATH, HumanEval, and other benchmarks. Attribution required.

Welcome back

Overview

Quick start

Recent activity

Hosted fallback Not configured

New here? Get fully set up in 2 minutes

Hosted fallback Not configured

Provider keys BYOK

No provider keys yet

Routing strategy

How `model="auto"` works

Spend by model

Latency by provider

Recent requests

No traffic yet

API keys

Set up quality scoring

Quality scores

Models