GLM-4.5V Visual Reasoning Model Officially Released and Open-Sourced
Hexun Tech News, August 11th - Pudu AI has released its open-source visual reasoning model GLM-4.5V (total parameters 106B, activation parameters 12B) and simultaneously made it available in the MoCA community and on Hugging Face's platform. Additionally, API calls with prices as low as 2 yuan/M tokens for input and 6 yuan/M tokens for output have been introduced.
GLM-4.5V is based on Pudu's latest flagship text-based model GLM-4.5-Air, continuing the technology route of GLM-4.1V-Thinking. In 41 public visual multimodal benchmark lists, the comprehensive performance reached the same level as open-source models in terms of SOTA performance, covering image, video, document understanding, and GUI Agent tasks. For instance, GLM-4.5V can accurately identify, analyze, and locate target objects based on user queries and output their bounding boxes.
As described, multimodal reasoning is considered a crucial ability for achieving general artificial intelligence (AGI), enabling AI to comprehensively perceive, understand, and make decisions like humans. Among these abilities, visual-language models (Vision-Language Models, VLMs) are the core foundation for implementing multimodal reasoning.