Quick start
You're up — point any OpenAI SDK at your base URL.
loading…
Use model="auto" to let the router pick the cheapest capable model.
Recent activity
Last 5 requests routed through this server.
No requests yet.
Once you send your first chat.completions call, it'll show up here.
Hosted fallback Not configured
One key, every model. Standard fallback for any model you don't have a local key for. Free $5 trial credit on sign-up — no credit card.
sk-orca-* key →
New here? Get fully set up in 2 minutes
- Add at least one provider key
- Pick a routing strategy
- Send your first request (it'll appear in analytics)
Hosted fallback Not configured
Cover the long tail without per-provider sign-ups. Free $5 credit, billed at cost after.
Provider keys BYOK
Encrypted at rest with AES-256-GCM. Env vars override DB rows for the same provider.
| Provider | Prefix | Status |
|---|
No provider keys yet
Add at least one to start routing real traffic. Pick a provider above or click a quick-add chip.
Routing strategy
How the router picks between candidate models when you send model="auto".
Pick a card to switch strategy. Saved automatically.
How model="auto" works
Three filters, applied in order.
-
Capability filter.
The router inspects your request — is there an image? a tool definition?
response_format=json? — and drops models that can't handle it. - Provider filter. Only models whose provider you've configured (or that hosted upstream covers) survive.
- Strategy ranking. The remaining candidates are scored by your chosen strategy above. The winner is called.
The chosen model comes back to your client in the x-orca-resolved-model response header. The strategy in effect is echoed as x-orca-routing-strategy.
How each strategy maps to LiteLLM Router
| Strategy | litellm routing_strategy |
model="auto" picks |
|---|---|---|
| balanced | None (we rank ourselves) | 50/50 normalized AA quality & inverted cost; strict two-axis coverage |
| cheapest | cost-based-routing | cheapest capable (blended 0.3 input + 0.7 output cost) |
| fastest | None (we rank ourselves) | 50/50 normalized AA TPS & inverted TTFT; strict two-axis coverage |
| quality | None (we rank ourselves) | highest AA Intelligence Index (or manual override); unscored models rank below scored |
The strategy controls two things: which model model="auto"
resolves to, and how LiteLLM Router picks between deployments
that serve the same model (e.g. local OpenAI key + hosted upstream).
Spend by model
—
Latency by provider
p50 and p99 — sourced from local request logs.
| Provider | Requests | p50 | p99 |
|---|
Recent requests
Newest first. Click a row to copy its trace ID.
| When | Model | Provider | Tokens (in / out) | Latency | Status |
|---|
No traffic yet
Once you start sending requests, they'll appear here in real time.
API keys
Each key authenticates against this Lite workspace. Plaintext is shown once on creation.
| Name | Prefix | Status | Last used |
|---|
Set up quality scoring
Strategy quality currently picks the most expensive model — a proxy that broke when newer flagships (Claude Opus 4.7, GPT-5.x) shipped at lower prices than older ones. Set an Artificial Analysis API key to route by real benchmark scores instead.
- Sign up free at artificialanalysis.ai and generate an API key (free tier: 1,000 req/day, plenty for 1h-cached usage).
- Add it to your
.envasARTIFICIAL_ANALYSIS_API_KEY=...and restart. - Reload this page — scores will appear automatically.
Without a key, quality falls back to the legacy cost-based behavior. cheapest / balanced / fastest are unaffected. You can still set manual overrides on individual models in the table below — those work without an AA key and take precedence when present.
Quality scores
Loading…
model="auto" would resolve to:
Models
Edit a row's Manual column to override the AA score for routing decisions. Manual values win over AA. Use this when your own evals disagree, or when AA hasn't scored a model yet.
| Model | Provider | AA | Manual | Effective | TPS | TTFT | $/M blended | Status |
|---|
Powered by Artificial Analysis — Intelligence Index aggregates MMLU-Pro, GPQA, MATH, HumanEval, and other benchmarks. Attribution required.