Z.ai: GLM 5V Turbo

● Excellent

by Z Ai Multimodal Coding Agentic Image Input Video Input Rank #20 of 75

8.7
/ 10

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, complex coding,...

Choose a model to compare against Z.ai: GLM 5V Turbo

Specifications

Specifications for Z.ai: GLM 5V Turbo
AttributeValue
Lab Z Ai
Tags Multimodal Coding Agentic Image Input Video Input
Overall Score 8.7/10
Release Date 2026-04
Context Window 202,752 tokens
Input Price / 1M $1.20
Output Price / 1M $4.00
Input Modalities Image, Text, Video
Output Modalities Text

Strengths

  • Native multimodal vision-language understanding
  • Strong vision-based coding from screenshots and diagrams
  • Long-horizon planning with visual context
  • Agentic task execution across modalities

Weaknesses

  • Higher price than text-only GLM models
  • Vision features add latency

Best For

  • Vision-based coding and UI development
  • Multimodal agent workflows
  • Screenshot-to-code tasks

Sources & Further Reading

Related Models

Scores are aggregated from public benchmarks (MMLU, HumanEval, MATH, GSM8K, LMSYS) and normalized to a 1–10 scale. Methodology →