Find the best
AI models, faster.
Transparent, benchmark-based rankings you can trust.
Independent evaluations. Updated regularly.
Latest models
Newest additions to the leaderboardNo models match your filters.
Try clearing the search or selecting a different category.
Press / to search, 0–9 to switch category, Esc to reset.
About the LLM leaderboard
The LLM leaderboard gives every major large language model — GPT, Claude, Gemini, Llama, Mistral, DeepSeek, and others — a single score out of 10. The number is aggregated from public benchmarks such as MMLU, HumanEval, MATH, and Chatbot Arena, then weighted and reduced to a comparable tier. Frontier models cluster between 8.5 and 9.5; capable general-purpose models land in the 7 to 8 band. The full weighting scheme and tier definitions live on the methodology page.
Rankings move weekly — a new release can reshuffle the top three overnight. An LLM leaderboard exists so you do not have to cross-reference a dozen separate benchmark papers to answer "which model should I use?" One comparable number per model tells you, at a glance, which is the strongest, which is the cheapest, and which is the best fit for the task you actually have.
Use the filters above to narrow by category (coding, reasoning, math, multilingual, and more), sort by score or price, or click any row for per-benchmark scores, context window, and pricing details.