AI Model Leaderboard

AI model scores,
ranked out of 10.

Every major LLM — GPT, Claude, Gemini, Llama, DeepSeek, and more — scored from real benchmarks and updated weekly.

106 Models
2 Benchmarks
2026-06 Updated
Claude Fable 5 Anthropic 9.9/10 Elite Available via API

Latest models

Newest additions to the leaderboard
AI model leaderboard, ranked by overall score out of 10.
# Model Score Category Open
1 Claude Fable 5 Anthropic 9.9 Agentic Coding
2 Claude Opus 4.8 Anthropic 9.7 Overall
3 Claude Opus 4.7 Anthropic 9.6 Intelligent Reasoning
4 Claude Opus 4.5 Anthropic 9.5 Intelligent Reasoning
5 GPT-5.5 Pro OpenAI 9.5 Agentic Coding
6 GPT-5.5 OpenAI 9.4 Overall
7 GPT-5 OpenAI 9.3 Intelligent Reasoning
8 Claude Opus 4.6 Anthropic 9.2 Intelligent Coding
9 GLM 5.2 Z Ai 9.2 Intelligent Coding
10 GPT-5 Turbo OpenAI 9.1 Intelligent Fast
11 Z.ai: GLM 5.1 Z Ai 9.1 Intelligent Coding
12 DeepSeek V4 Pro DeepSeek 9.0 Intelligent Open Weight
13 Qwen3.7 Max Alibaba Cloud 9.0 Open-Weight
14 Claude Sonnet 4.6 Anthropic 9.0 Agentic Coding
15 Qwen3.6 Max Preview Alibaba Cloud 8.9 Intelligent Reasoning
16 Z.ai: GLM 5 Z Ai 8.9 Intelligent Coding
17 GPT-5.4 OpenAI 8.9 Intelligent Coding
18 Claude Sonnet 4 Anthropic 8.8 Intelligent Coding
19 Grok 4.20 xAI 8.8 Agentic Intelligent
20 Qwen: Qwen3 235B A22B Qwen 8.8 Open Weight Intelligent
21 Gemini 3.1 Pro Preview Google 8.8 Intelligent Coding
22 Llama 4 405B Meta 8.7 Open-Weight
23 Deep Cogito: Cogito v2.1 671B Deepcogito 8.7 Intelligent Open Weight
24 Z.ai: GLM 5V Turbo Z Ai 8.7 Multimodal Coding
25 Z.ai: GLM 5 Turbo Z Ai 8.6 Fast Agentic
26 Tencent: Hy3 preview Tencent 8.6 Agentic Coding
27 xAI: Grok 4.20 Multi-Agent xAI 8.6 Agentic Intelligent
28 DeepSeek R1 DeepSeek 8.5 Reasoning
29 Gemini Pro 2 Google DeepMind 8.5 Multimodal
30 Grok 4.3 xAI 8.5 Intelligent Reasoning
31 MiniMax M3 MiniMax 8.5 Intelligent Reasoning
32 Qwen3.6 35B A3B Alibaba Cloud 8.5 Local Model
33 NVIDIA: Nemotron 3 Ultra Nvidia 8.5 Intelligent Agentic
34 Xiaomi: MiMo-V2.5-Pro Xiaomi 8.5 Intelligent Coding
35 Mistral: Mistral Large 3 2512 Mistral AI 8.5 Intelligent Coding
36 Qwen: Qwen3.7 Plus Qwen 8.5 Intelligent Image Input
37 Kimi K2.7 Code Moonshotai 8.5 Coding Intelligent
38 Gemini 3.5 Flash Google DeepMind 8.4 Cheap Model
39 Kimi K2.6 Moonshot 8.4 Coding
40 Amazon: Nova Premier 1.0 Amazon 8.4 Intelligent Multimodal
41 DeepSeek V4 Flash DeepSeek 8.4 Intelligent Open Weight
42 MoonshotAI: Kimi K2 0905 Moonshotai 8.4 Intelligent Coding
43 Perplexity: Sonar Pro Search Perplexity 8.4 Intelligent Agentic
44 ByteDance Seed: Seed-2.0-Lite Bytedance Seed 8.4 Multimodal Intelligent
45 inclusionAI: Ring-2.6-1T Inclusionai 8.4 Reasoning Coding
46 Nous: Hermes 4 405B Nousresearch 8.4 Open Weight Reasoning
47 Mistral: Codestral 2508 Mistral AI 8.4 Coding
48 GPT-5.3-Codex OpenAI 8.4 Coding Agentic
49 Mistral Large 3 Mistral AI 8.3 Intelligent Coding
50 Mistral: Devstral 2 2512 Mistral AI 8.3 Coding Agentic
51 Anthropic: Claude Haiku 4.5 Anthropic 8.3 Fast Coding
52 Mistral: Mistral Medium 3.5 Mistral AI 8.3 Intelligent Coding
53 Writer: Palmyra X5 Writer 8.3 Intelligent Long Context
54 OpenAI: o4 Mini OpenAI 8.3 Reasoning
55 Prime Intellect: INTELLECT-3 Prime Intellect 8.3 Open Weight Intelligent
56 GPT-5.4 Mini OpenAI 8.3 Intelligent Coding
57 Grok 3 xAI 8.2 Intelligent Reasoning
58 Xiaomi: MiMo-V2.5 Xiaomi 8.2 Multimodal Image Input
59 inclusionAI: Ling-2.6-1T Inclusionai 8.2 Fast Agentic
60 OpenAI: gpt-oss-120b OpenAI 8.2 Open Weight Intelligent
61 Upstage: Solar Pro 3 Upstage 8.2 Open Weight Intelligent
62 Amazon: Nova 2 Lite Amazon 8.2 Fast Long Context
63 Tencent: Hunyuan A13B Instruct Tencent 8.2 Open Weight Intelligent
64 AI21: Jamba Large 1.7 Ai21 8.2 Intelligent Long Context
65 Qwen3.6 Plus Qwen 8.2 Intelligent Coding
66 Qwen3.6 27B Alibaba Cloud 8.1 Local Model
67 NVIDIA: Nemotron 3 Super Nvidia 8.1 Intelligent Agentic
68 Inception: Mercury 2 Inception 8.1 Fast Reasoning
69 Kwaipilot: KAT-Coder-Pro V2 Kwaipilot 8.1 Coding Agentic
70 NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 Nvidia 8.1 Agentic Intelligent
71 Grok Build 0.1 xAI 8.0 Coding
72 Qwen 3 72B Alibaba Cloud 8.0 Intelligent Open Weight
73 Mistral: Mistral Small 4 Mistral AI 8.0 Fast Coding
74 Cohere: Command A Cohere 8.0 Open Weight Intelligent
75 MiniMax M2.7 Minimax 8.0 Intelligent Coding
76 ByteDance Seed: Seed-2.0-Mini Bytedance Seed 7.9 Fast Multimodal
77 Claude Haiku 4 Anthropic 7.8 Fast Model
78 Codestral Mistral AI 7.8 Coding
79 Llama 4 Scout Meta 7.8 Intelligent Fast
80 inclusionAI: Ling-2.6-flash Inclusionai 7.8 Fast Coding
81 Google: Gemma 4 31B Google 7.8 Intelligent Open Weight
82 Nex AGI: Nex-N2-Pro Nex Agi 7.8 Agentic Open Weight
83 Qwen3.5 397B A17B Qwen 7.8 Intelligent Open Weight
84 NVIDIA: Nemotron 3 Nano 30B A3B Nvidia 7.6 Fast Agentic
85 GPT-5.4 Nano OpenAI 7.6 Coding Fast
86 DeepSeek V3 DeepSeek 7.5 Intelligent Open Weight
87 GPT-4o OpenAI 7.5 Intelligent Multimodal
88 LFM2.5-8B-A1B Liquid AI 7.5 Local Model
89 Step 3.7 Flash StepFun 7.5 Intelligent Fast
90 LiquidAI: LFM2-24B-A2B Liquid 7.5 Fast Open Weight
91 Qwen3.5-27B Qwen 7.3 Open Weight
92 Gemini Flash 2 Google DeepMind 7.2 Cheap Model
93 Qwen3.5-122B-A10B Qwen 7.2 Open Weight
94 MiniMax M2.5 Minimax 7.2 Intelligent
95 Grok 3 Mini xAI 7.0 Intelligent Fast
96 Qwen3 Max Thinking Qwen 7.0 Reasoning Intelligent
97 Command R+ Cohere 6.8 Intelligent Open Weight
98 Qwen3.5-35B-A3B Qwen 6.8 Open Weight Fast
99 Phi-4 Microsoft 6.5 Intelligent Open Weight
100 Gemini 3.1 Flash Lite Preview Google 6.4 Fast
101 Mistral Medium Mistral AI 6.2 Intelligent Coding
102 Qwen3.5-9B Qwen 6.2 Open Weight Fast
103 Qwen 2.5 72B Alibaba Cloud 6.0 Intelligent Open Weight
104 Gemma 4 26B A4B Google 6.0 Open Weight Fast
105 Llama 3.3 70B Meta 5.8 Intelligent Open Weight
106 Qwen3 Coder Next Qwen 5.8 Coding Open Weight

Press / to search, 0–9 to switch category, Esc to reset.

About the LLM leaderboard

The LLM leaderboard gives every major large language model — GPT, Claude, Gemini, Llama, Mistral, DeepSeek, and others — a single score out of 10. The number is aggregated from public benchmarks such as MMLU, HumanEval, MATH, and Chatbot Arena, then weighted and reduced to a comparable tier. Frontier models cluster between 8.5 and 9.5; capable general-purpose models land in the 7 to 8 band. The full weighting scheme and tier definitions live on the methodology page.

Rankings move weekly — a new release can reshuffle the top three overnight. An LLM leaderboard exists so you do not have to cross-reference a dozen separate benchmark papers to answer "which model should I use?" One comparable number per model tells you, at a glance, which is the strongest, which is the cheapest, and which is the best fit for the task you actually have.

Use the filters above to narrow by category (coding, reasoning, math, multilingual, and more), sort by score or price, or click any row for per-benchmark scores, context window, and pricing details.