Qorinix Benchmark | Independent AI Speed, Quality & Cost Leaderboard

Default plot · Quality vs. Speed

The Pareto frontier

Top-right is best: faster output speed and higher quality. Qorinix lanes sit on the frontier — high quality at speeds public references cannot match.

Output speed (tok/s) →

↑ Quality index (0–100)

Frontier leader

Qorinix 3.2

Top quality among production-fast lanes. p50 TTFT 168 ms, 226 tok/s, US$1.95 / M tokens.

Quality92.4 / 100
Output speed226 tok/s
TTFT p50168 ms
Cache saving62%

Speed leader

Qorinix 3.1

Lowest TTFT and total latency. Built for live agents, voice, gaming NPCs, and trading alerts.

Quality88.6 / 100
Output speed238 tok/s
TTFT p50142 ms
Cost / MUS$ 1.35

Best public ref

ChatGPT (GPT-4o)

Strong quality but ~5× slower TTFT and ~3.5× more expensive per million output tokens.

Quality91.2 / 100
Output speed108 tok/s
TTFT p50780 ms
Cost / MUS$ 4.80

Default plots · Per-metric ranking

Where each model lands on the metric that matters

Sorted bar plots make trade-offs explicit. Qorinix dominates speed, latency, and cost while staying competitive on quality.

Output speed tok/s · higher is better

TTFT p50 ms · lower is better

Total p95 ms · lower is better

Cost per 1M output US$ · lower is better

Quality index 0–100 · higher is better

JSON reliability 0–100 · higher is better

Detailed leaderboard

Sortable, filterable leaderboard

Filter by category and sort by any column. Qorinix rows highlight in orange.

#	Lane / Model	Quality	TTFT p50	Total p95	Output speed	JSON	Success	Cost / M	Cache saving	Value

Category winners

Best-in-class per workload

Different workloads value different trade-offs. Here are the winners by intent.

Real-time agents

Qorinix 3.1

TTFT under 150 ms, throughput above 230 tok/s — voice agents, gaming NPCs, and trading alerts where every millisecond matters.

Why: lowest TTFT and total latency, with adaptive routing across speed-class.

High-volume support automation

Qorinix 3.2

62% cache saving on repeated queries with quality matching frontier public models, at less than half the cost.

Why: semantic cache + Quality lane keeps unit economics healthy at scale.

Long-form reasoning

ChatGPT (GPT-4o)

Highest reasoning index in the public reference set; pair with Qorinix routing for speed-tiered fallback.

Caveat: 4–5× slower TTFT and ~3.5× cost per million output tokens.

Cost-sensitive batch

DeepSeek

Cheapest non-Qorinix lane; useful for offline batch where latency does not matter.

Caveat: low cache saving and middle-of-pack quality.

Methodology

How the benchmark is computed

Transparency about prompt mix, measurement, and what is held server-side.

1 · Prompt mix

14,200 prompts per day distributed across reasoning (35%), code (25%), JSON / tool-use (20%), creative (15%), and short-form chat (5%). Prompts rotate every 72 hours.

2 · Latency measurement

TTFT measured server-side from request receipt to first byte of response. Total latency captured to last token. p50 is the median across the rolling 72-hour window; p95 is the slow-tail.

3 · Quality scoring

Composite of model-graded preference (LLM-as-judge with cross-model rotation), task-deterministic checks (HumanEval-lite for code, JSON-schema validation for tools), and reading-level coherence.

4 · Cost

Listed as the per-1M output-token list price applicable to the lane on the measurement day. Cache savings are computed on Qorinix-internal traffic and assume 40%+ semantic cache hit rate.

5 · What stays server-side

Exact provider model IDs, supplier API routes, credentials, failover order, and routing weights are never exposed in the public leaderboard. Only public benchmark names and observed metrics are shown.

6 · Updates

Numbers refresh continuously from production Arena traffic. The visible board is the rolling 72-hour aggregate. Anomalies (regional outage, supplier rate-limit) are flagged in the live status panel.

The Qorinix Benchmark

The Pareto frontier

Where each model lands on the metric that matters

Output speed tok/s · higher is better

TTFT p50 ms · lower is better

Total p95 ms · lower is better

Cost per 1M output US$ · lower is better

Quality index 0–100 · higher is better

JSON reliability 0–100 · higher is better

Sortable, filterable leaderboard

Best-in-class per workload

Qorinix 3.1

Qorinix 3.2

ChatGPT (GPT-4o)

DeepSeek

How the benchmark is computed

1 · Prompt mix

2 · Latency measurement

3 · Quality scoring

4 · Cost

5 · What stays server-side

6 · Updates

Test these numbers yourself.