Z
Z.ai: GLM 5V Turbo
● Excellent8.7
/ 10
GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, complex coding,...
Specifications
| Attribute | Value |
|---|---|
| Lab | Z Ai |
| Tags | Multimodal Coding Agentic Image Input Video Input |
| Overall Score | 8.7/10 |
| Release Date | 2026-04 |
| Context Window | 202,752 tokens |
| Input Price / 1M | $1.20 |
| Output Price / 1M | $4.00 |
| Input Modalities | Image, Text, Video |
| Output Modalities | Text |
Strengths
- Native multimodal vision-language understanding
- Strong vision-based coding from screenshots and diagrams
- Long-horizon planning with visual context
- Agentic task execution across modalities
Weaknesses
- Higher price than text-only GLM models
- Vision features add latency
Best For
- Vision-based coding and UI development
- Multimodal agent workflows
- Screenshot-to-code tasks
Sources & Further Reading
Related Models
Scores are aggregated from public benchmarks (MMLU, HumanEval, MATH, GSM8K, LMSYS) and normalized to a 1–10 scale. Methodology →