MiMo V2.5

● Excellent

by Xiaomi Intelligent Rank #28 of 48

8.3
/ 10

MiMo V2.5 is a native omnimodal model from Xiaomi delivering Pro-level agentic performance at roughly half the inference cost of MiMo V2.5 Pro. Supports text, image, audio, and video input with a 1M-token context window. Strong combination of reasoning, multimodal perception, and cost efficiency for agent frameworks.

Compare Xiaomi models

Specifications

Specifications for MiMo V2.5
AttributeValue
Lab Xiaomi
Tags Intelligent
Overall Score 8.3/10
Release Date 2026-04
Context Window 1,048,576 tokens
Input Price / 1M $0.14
Output Price / 1M $0.28
Input Modalities Text
Output Modalities Text

Strengths

  • Native omnimodal input — text, image, audio, and video
  • Pro-level agentic performance at half the cost of MiMo V2.5 Pro
  • 1M-token context window for extended document and conversation processing
  • Strong multimodal perception surpassing previous MiMo models
  • 130+ pt HN reception with strong community validation

Weaknesses

  • Xiaomi AI ecosystem less established than Western frontier labs
  • Text-only output — no native image or audio generation

Best For

  • Cost-efficient agentic workflows and tool use
  • Multimodal document and media understanding
  • Extended context reasoning and long conversations
  • Budget-conscious production AI deployments

Sources & Further Reading

Related Models

Scores are aggregated from public benchmarks (MMLU, HumanEval, MATH, GSM8K, LMSYS) and normalized to a 1–10 scale. Methodology →