X
MiMo V2.5
● Excellent8.3
/ 10
MiMo V2.5 is a native omnimodal model from Xiaomi delivering Pro-level agentic performance at roughly half the inference cost of MiMo V2.5 Pro. Supports text, image, audio, and video input with a 1M-token context window. Strong combination of reasoning, multimodal perception, and cost efficiency for agent frameworks.
Specifications
| Attribute | Value |
|---|---|
| Lab | Xiaomi |
| Tags | Intelligent |
| Overall Score | 8.3/10 |
| Release Date | 2026-04 |
| Context Window | 1,048,576 tokens |
| Input Price / 1M | $0.14 |
| Output Price / 1M | $0.28 |
| Input Modalities | Text |
| Output Modalities | Text |
Strengths
- Native omnimodal input — text, image, audio, and video
- Pro-level agentic performance at half the cost of MiMo V2.5 Pro
- 1M-token context window for extended document and conversation processing
- Strong multimodal perception surpassing previous MiMo models
- 130+ pt HN reception with strong community validation
Weaknesses
- Xiaomi AI ecosystem less established than Western frontier labs
- Text-only output — no native image or audio generation
Best For
- Cost-efficient agentic workflows and tool use
- Multimodal document and media understanding
- Extended context reasoning and long conversations
- Budget-conscious production AI deployments
Sources & Further Reading
Related Models
Scores are aggregated from public benchmarks (MMLU, HumanEval, MATH, GSM8K, LMSYS) and normalized to a 1–10 scale. Methodology →