AI Model Leaderboard

Find the best
AI models, faster.

Transparent, benchmark-based rankings you can trust.
Independent evaluations. Updated regularly.

Top Model Overall #1 Claude Opus 4.8 Anthropic 9.7/10 Elite Available via API
104 Models Ranked
2 Benchmarks Tracked
2026-06 Last Updated

Latest models

Newest additions to the leaderboard
AI model leaderboard, ranked by overall score out of 10.
# Model Score Category Open
1 Claude Opus 4.8 Anthropic 9.7 Overall
2 Claude Opus 4.7 Anthropic 9.6 Intelligent Reasoning
3 Anthropic: Claude Fable 5 Anthropic 9.6 Intelligent Coding
4 Claude Opus 4.5 Anthropic 9.5 Intelligent Reasoning
5 OpenAI: GPT-5.5 Pro OpenAI 9.5 Intelligent Reasoning
6 GPT-5.5 OpenAI 9.4 Overall
7 GPT-5 OpenAI 9.3 Intelligent Reasoning
8 Claude Opus 4.6 Anthropic 9.2 Intelligent Coding
9 GPT-5 Turbo OpenAI 9.1 Intelligent Fast
10 Z.ai: GLM 5.1 Z Ai 9.1 Intelligent Coding
11 DeepSeek V4 Pro DeepSeek 9.0 Intelligent Open Weight
12 Qwen3.7 Max Alibaba Cloud 9.0 Open-Weight
13 Anthropic: Claude Sonnet 4.6 Anthropic 9.0 Intelligent Coding
14 Qwen3.6 Max Preview Alibaba Cloud 8.9 Intelligent Reasoning
15 Z.ai: GLM 5 Z Ai 8.9 Intelligent Coding
16 GPT-5.4 OpenAI 8.9 Intelligent Coding
17 Claude Sonnet 4 Anthropic 8.8 Intelligent Coding
18 xAI: Grok 4.20 xAI 8.8 Intelligent Reasoning
19 Qwen: Qwen3 235B A22B Qwen 8.8 Open Weight Intelligent
20 Gemini 3.1 Pro Preview Google 8.8 Intelligent Coding
21 Llama 4 405B Meta 8.7 Open-Weight
22 Deep Cogito: Cogito v2.1 671B Deepcogito 8.7 Intelligent Open Weight
23 Z.ai: GLM 5V Turbo Z Ai 8.7 Multimodal Coding
24 Z.ai: GLM 5 Turbo Z Ai 8.6 Fast Agentic
25 Tencent: Hy3 preview Tencent 8.6 Agentic Coding
26 xAI: Grok 4.20 Multi-Agent xAI 8.6 Agentic Intelligent
27 DeepSeek R1 DeepSeek 8.5 Reasoning
28 Gemini Pro 2 Google DeepMind 8.5 Multimodal
29 Grok 4.3 xAI 8.5 Intelligent Reasoning
30 MiniMax M3 MiniMax 8.5 Intelligent Reasoning
31 Qwen3.6 35B A3B Alibaba Cloud 8.5 Local Model
32 NVIDIA: Nemotron 3 Ultra Nvidia 8.5 Intelligent Agentic
33 Xiaomi: MiMo-V2.5-Pro Xiaomi 8.5 Intelligent Coding
34 Mistral: Mistral Large 3 2512 Mistral AI 8.5 Intelligent Coding
35 Qwen: Qwen3.7 Plus Qwen 8.5 Intelligent Image Input
36 Gemini 3.5 Flash Google DeepMind 8.4 Cheap Model
37 Kimi K2.6 Moonshot 8.4 Coding
38 Amazon: Nova Premier 1.0 Amazon 8.4 Intelligent Multimodal
39 DeepSeek V4 Flash DeepSeek 8.4 Intelligent Open Weight
40 MoonshotAI: Kimi K2 0905 Moonshotai 8.4 Intelligent Coding
41 Perplexity: Sonar Pro Search Perplexity 8.4 Intelligent Agentic
42 ByteDance Seed: Seed-2.0-Lite Bytedance Seed 8.4 Multimodal Intelligent
43 inclusionAI: Ring-2.6-1T Inclusionai 8.4 Reasoning Coding
44 Nous: Hermes 4 405B Nousresearch 8.4 Open Weight Reasoning
45 Mistral: Codestral 2508 Mistral AI 8.4 Coding
46 GPT-5.3-Codex OpenAI 8.4 Coding Agentic
47 Mistral Large 3 Mistral AI 8.3 Intelligent Coding
48 Mistral: Devstral 2 2512 Mistral AI 8.3 Coding Agentic
49 Anthropic: Claude Haiku 4.5 Anthropic 8.3 Fast Coding
50 Mistral: Mistral Medium 3.5 Mistral AI 8.3 Intelligent Coding
51 Writer: Palmyra X5 Writer 8.3 Intelligent Long Context
52 OpenAI: o4 Mini OpenAI 8.3 Reasoning
53 Prime Intellect: INTELLECT-3 Prime Intellect 8.3 Open Weight Intelligent
54 GPT-5.4 Mini OpenAI 8.3 Intelligent Coding
55 Grok 3 xAI 8.2 Intelligent Reasoning
56 Xiaomi: MiMo-V2.5 Xiaomi 8.2 Multimodal Image Input
57 inclusionAI: Ling-2.6-1T Inclusionai 8.2 Fast Agentic
58 OpenAI: gpt-oss-120b OpenAI 8.2 Open Weight Intelligent
59 Upstage: Solar Pro 3 Upstage 8.2 Open Weight Intelligent
60 Amazon: Nova 2 Lite Amazon 8.2 Fast Long Context
61 Tencent: Hunyuan A13B Instruct Tencent 8.2 Open Weight Intelligent
62 AI21: Jamba Large 1.7 Ai21 8.2 Intelligent Long Context
63 Qwen3.6 Plus Qwen 8.2 Intelligent Coding
64 Qwen3.6 27B Alibaba Cloud 8.1 Local Model
65 NVIDIA: Nemotron 3 Super Nvidia 8.1 Intelligent Agentic
66 Inception: Mercury 2 Inception 8.1 Fast Reasoning
67 Kwaipilot: KAT-Coder-Pro V2 Kwaipilot 8.1 Coding Agentic
68 NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 Nvidia 8.1 Agentic Intelligent
69 Grok Build 0.1 xAI 8.0 Coding
70 Qwen 3 72B Alibaba Cloud 8.0 Intelligent Open Weight
71 Mistral: Mistral Small 4 Mistral AI 8.0 Fast Coding
72 Cohere: Command A Cohere 8.0 Open Weight Intelligent
73 MiniMax M2.7 Minimax 8.0 Intelligent Coding
74 ByteDance Seed: Seed-2.0-Mini Bytedance Seed 7.9 Fast Multimodal
75 Claude Haiku 4 Anthropic 7.8 Fast Model
76 Codestral Mistral AI 7.8 Coding
77 Llama 4 Scout Meta 7.8 Intelligent Fast
78 inclusionAI: Ling-2.6-flash Inclusionai 7.8 Fast Coding
79 Google: Gemma 4 31B Google 7.8 Intelligent Open Weight
80 Nex AGI: Nex-N2-Pro Nex Agi 7.8 Agentic Open Weight
81 Qwen3.5 397B A17B Qwen 7.8 Intelligent Open Weight
82 NVIDIA: Nemotron 3 Nano 30B A3B Nvidia 7.6 Fast Agentic
83 GPT-5.4 Nano OpenAI 7.6 Coding Fast
84 DeepSeek V3 DeepSeek 7.5 Intelligent Open Weight
85 GPT-4o OpenAI 7.5 Intelligent Multimodal
86 LFM2.5-8B-A1B Liquid AI 7.5 Local Model
87 Step 3.7 Flash StepFun 7.5 Intelligent Fast
88 LiquidAI: LFM2-24B-A2B Liquid 7.5 Fast Open Weight
89 Qwen3.5-27B Qwen 7.3 Open Weight
90 Gemini Flash 2 Google DeepMind 7.2 Cheap Model
91 Qwen3.5-122B-A10B Qwen 7.2 Open Weight
92 MiniMax M2.5 Minimax 7.2 Intelligent
93 Grok 3 Mini xAI 7.0 Intelligent Fast
94 Qwen3 Max Thinking Qwen 7.0 Reasoning Intelligent
95 Command R+ Cohere 6.8 Intelligent Open Weight
96 Qwen3.5-35B-A3B Qwen 6.8 Open Weight Fast
97 Phi-4 Microsoft 6.5 Intelligent Open Weight
98 Gemini 3.1 Flash Lite Preview Google 6.4 Fast
99 Mistral Medium Mistral AI 6.2 Intelligent Coding
100 Qwen3.5-9B Qwen 6.2 Open Weight Fast
101 Qwen 2.5 72B Alibaba Cloud 6.0 Intelligent Open Weight
102 Gemma 4 26B A4B Google 6.0 Open Weight Fast
103 Llama 3.3 70B Meta 5.8 Intelligent Open Weight
104 Qwen3 Coder Next Qwen 5.8 Coding Open Weight

Press / to search, 0–9 to switch category, Esc to reset.

About the LLM leaderboard

The LLM leaderboard gives every major large language model — GPT, Claude, Gemini, Llama, Mistral, DeepSeek, and others — a single score out of 10. The number is aggregated from public benchmarks such as MMLU, HumanEval, MATH, and Chatbot Arena, then weighted and reduced to a comparable tier. Frontier models cluster between 8.5 and 9.5; capable general-purpose models land in the 7 to 8 band. The full weighting scheme and tier definitions live on the methodology page.

Rankings move weekly — a new release can reshuffle the top three overnight. An LLM leaderboard exists so you do not have to cross-reference a dozen separate benchmark papers to answer "which model should I use?" One comparable number per model tells you, at a glance, which is the strongest, which is the cheapest, and which is the best fit for the task you actually have.

Use the filters above to narrow by category (coding, reasoning, math, multilingual, and more), sort by score or price, or click any row for per-benchmark scores, context window, and pricing details.