How LMRank scores models
One clear number out of 10 for every model — here's exactly where it comes from.
The Simple Idea
LMRank assigns every AI model a single score out of 10. No wall of benchmark numbers, no spinning — just a clear answer to "how good is this model?"
Score Sources
We aggregate results from established public benchmarks:
- MMLU — Knowledge and understanding across 57 subjects
- HumanEval — Code generation correctness
- MATH / GSM8K — Mathematical reasoning
- BIG-Bench — Hard reasoning tasks
- Chatbot Arena (LMSYS) — Human preference rankings
Score Scale
| Range | Tier | Meaning |
|---|---|---|
| 9.0–10 | Elite | Frontier models, best in class |
| 8.0–8.9 | Excellent | Strong performers, near-frontier |
| 7.0–7.9 | Strong | Capable models for most tasks |
| 6.0–6.9 | Capable | Entry-level or older models |
| 1.0–5.9 | Basic | Limited capability, niche use cases |
Updates
We update scores as new models release and benchmark results change. New models are typically added within one week of public launch.