Xiaomi: MiMo-V2.5

Name: MiMo-V2.5
Author: Xiaomi

by Xiaomi Multimodal Image Input Video Input

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding...

Choose a model to compare against MiMo-V2.5

Specifications

Specifications for MiMo-V2.5
Attribute	Value
Lab	Xiaomi
Tags	Multimodal Image Input Video Input
Release Date	2026-04
Context Window	1,048,576 tokens
Input Price / 1M	$0.14
Output Price / 1M	$0.28
Input Modalities	Text, Audio, Image, Video
Output Modalities	Text

Strengths

Native omnimodal vision-language-audio
Pro-level agentic performance at lower cost
1M context window
Strong image and video understanding

Weaknesses

Below MiMo-V2.5-Pro on complex tasks
Limited Western market presence

Best For

Multimodal agent workflows
Video and image understanding tasks
Cost-effective omnimodal applications

See what MiMo-V2.5 built

Interactive pages generated from the same creative briefs given to every model.

Explore all 3

Quantum-Powered AI Lab Generate a credible landing page for an AI research lab powered by quantum computing. Safari Reserve Manager Generate a 3D management game about operating a wildlife reserve and safari business. Cambrian Open World Generate an explorable 3D Cambrian seafloor populated by strange prehistoric life.

In Depth: MiMo-V2.5

Summary

MiMo-V2.5 is an AI model from Xiaomi.

Released 2026-04. It currently appears in the Overall category on LMRank and 1 other category. It supports Text, Audio, Image, Video input and produces Text output, with a context window of 1,048,576 tokens. Input pricing is $0.14 per 1M tokens and output is $0.28 per 1M tokens on OpenRouter.

Sources & Further Reading

OpenRouter xiaomi/mimo-v2.5

Xiaomi: MiMo-V2.5

Specifications

✓ Strengths

! Weaknesses

★ Best For