Qorinix QX / BENCHMARK
Run Arena

Independent AI Leaderboard · 72-hour rolling window

The Qorinix Benchmark

A live, independent leaderboard tracking AI speed, quality, throughput, and cost across the major model lanes — refreshed continuously from Qorinix Arena production traffic. Sort by what matters for your workload, not a single composite score.

Models tracked 8 Qorinix lanes + 6 public references
Daily prompts 14,200 Across reasoning, code, JSON, creative
p50 TTFT — Qorinix 148 ms 4.6× faster than the public-reference average
Cost — Qorinix vs refs 2.6× cheaper Per million output tokens, weighted

Default plot · Quality vs. Speed

The Pareto frontier

Top-right is best: faster output speed and higher quality. Qorinix lanes sit on the frontier — high quality at speeds public references cannot match.

Default plots · Per-metric ranking

Where each model lands on the metric that matters

Sorted bar plots make trade-offs explicit. Qorinix dominates speed, latency, and cost while staying competitive on quality.

Output speed tok/s · higher is better

TTFT p50 ms · lower is better

Total p95 ms · lower is better

Cost per 1M output US$ · lower is better

Quality index 0–100 · higher is better

JSON reliability 0–100 · higher is better

Detailed leaderboard

Sortable, filterable leaderboard

Filter by category and sort by any column. Qorinix rows highlight in orange.

# Lane / Model Quality TTFT p50 Total p95 Output speed JSON Success Cost / M Cache saving Value

Category winners

Best-in-class per workload

Different workloads value different trade-offs. Here are the winners by intent.

Real-time agents

Qorinix 3.1

TTFT under 150 ms, throughput above 230 tok/s — voice agents, gaming NPCs, and trading alerts where every millisecond matters.

Why: lowest TTFT and total latency, with adaptive routing across speed-class.

High-volume support automation

Qorinix 3.2

62% cache saving on repeated queries with quality matching frontier public models, at less than half the cost.

Why: semantic cache + Quality lane keeps unit economics healthy at scale.

Long-form reasoning

ChatGPT (GPT-4o)

Highest reasoning index in the public reference set; pair with Qorinix routing for speed-tiered fallback.

Caveat: 4–5× slower TTFT and ~3.5× cost per million output tokens.

Cost-sensitive batch

DeepSeek

Cheapest non-Qorinix lane; useful for offline batch where latency does not matter.

Caveat: low cache saving and middle-of-pack quality.

Methodology

How the benchmark is computed

Transparency about prompt mix, measurement, and what is held server-side.

1 · Prompt mix

14,200 prompts per day distributed across reasoning (35%), code (25%), JSON / tool-use (20%), creative (15%), and short-form chat (5%). Prompts rotate every 72 hours.

2 · Latency measurement

TTFT measured server-side from request receipt to first byte of response. Total latency captured to last token. p50 is the median across the rolling 72-hour window; p95 is the slow-tail.

3 · Quality scoring

Composite of model-graded preference (LLM-as-judge with cross-model rotation), task-deterministic checks (HumanEval-lite for code, JSON-schema validation for tools), and reading-level coherence.

4 · Cost

Listed as the per-1M output-token list price applicable to the lane on the measurement day. Cache savings are computed on Qorinix-internal traffic and assume 40%+ semantic cache hit rate.

5 · What stays server-side

Exact provider model IDs, supplier API routes, credentials, failover order, and routing weights are never exposed in the public leaderboard. Only public benchmark names and observed metrics are shown.

6 · Updates

Numbers refresh continuously from production Arena traffic. The visible board is the rolling 72-hour aggregate. Anomalies (regional outage, supplier rate-limit) are flagged in the live status panel.

Test these numbers yourself.

Run the same prompt against all six lanes in the live Arena.

Open Arena Create workspace View pricing