> ## Documentation Index > Fetch the complete documentation index at: https://docs.z.ai/llms.txt > Use this file to discover all available pages before exploring further. # GLM-4.5 ## Overview GLM-4.5 and GLM-4.5-Air are Z.AI's models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters. Both models share a similar training pipeline: an initial pretraining phase on 15 trillion tokens of general-domain data, followed by targeted fine-tuning on datasets covering code, reasoning, and agent-specific tasks. The context length has been extended to 128k tokens, and reinforcement learning was applied to further enhance reasoning, coding, and agent performance. GLM-4.5 and GLM-4.5-Air are optimized for tool invocation, web browsing, software engineering, and front-end development. They can be integrated into code-centric agents such as Claude Code and Roo Code, and also support arbitrary agent applications through tool invocation APIs. Both models support hybrid reasoning modes, offering two execution modes: Thinking Mode for complex reasoning and tool usage, and Non-Thinking Mode for instant responses. These modes can be toggled via the `thinking.type`parameter (with `enabled` and `disabled` settings), and dynamic thinking is enabled by default. Text Text 128K 96K ## GLM-4.5 Serials

GLM

GLM-4.5

Our most powerful reasoning model, with 355 billion parameters

AIR

GLM-4.5-Air

Cost-Effective Lightweight Strong Performance

GLM-4.5-X

High Performance Strong Reasoning Ultra-Fast Response

AirX

GLM-4.5-AirX

Lightweight Strong Performance Ultra-Fast Response

FLASH

GLM-4.5-Flash

Free Strong Performance Excellent for Reasoning Coding & Agents

## Capability Enable deep thinking mode for more advanced reasoning and analysis Support real-time streaming responses to enhance user interaction experience Powerful tool invocation capabilities, enabling integration with various external toolsets Intelligent caching mechanism to optimize performance in long conversations Support for structured output formats like JSON, facilitating system integration ## Introducing GLM-4.5 ### Overview The first-principle measure of AGI lies in integrating more general intelligence capabilities without compromising existing functions. GLM-4.5 represents our first complete realization of this concept. It combines advanced reasoning, coding, and agent capabilities within a single model, achieving a significant technological breakthrough by natively fusing reasoning, coding, and agent abilities to meet the complex demands of agent-based applications. To comprehensively evaluate the model’s general intelligence, we selected 12 of the most representative benchmark suites, including MMLU Pro, AIME24, MATH 500, SciCode, GPQA, HLE, LiveCodeBench, SWE-Bench, Terminal-bench, TAU-Bench, BFCL v3, and BrowseComp. Based on the aggregated average scores, GLM-4.5 ranks second globally among all models, first among domestic models, and first among open-source models. Description

### **Higher Parameter Efficiency** GLM-4.5 has half the number of parameters of DeepSeek-R1 and one-third that of Kimi-K2, yet it outperforms them on multiple standard benchmark tests. This is attributed to the higher parameter efficiency of GLM architecture. Notably, GLM-4.5-Air, with 106 billion total parameters and 12 billion active parameters, achieves a significant breakthrough—surpassing models such as Gemini 2.5 Flash, Qwen3-235B, and Claude 4 Opus on reasoning benchmarks like Artificial Analysis, ranking among the top three domestic models in performance. On charts such as SWE-Bench Verified, the GLM-4.5 series lies on the Pareto frontier for performance-to-parameter ratio, demonstrating that at the same scale, the GLM-4.5 series delivers optimal performance. Description

### **Low Cost, High Speed** Beyond performance optimization, the GLM-4.5 series also achieves breakthroughs in cost and efficiency, resulting in pricing far lower than mainstream models: API call costs are as low as \$0.2 per million input tokens and \$1.1 per million output tokens. At the same time, the high-speed version demonstrates a generation speed exceeding 100 tokens per second in real-world tests, supporting low-latency and high-concurrency deployment scenarios—balancing cost-effectiveness with user interaction experience. Description

### **Real-World Evaluation** Real-world performance matters more than leaderboard rankings. To evaluate GLM-4.5’s effectiveness in practical Agent Coding scenarios, we integrated it into Claude Code and benchmarked it against Claude 4 Sonnet, Kimi-K2, and Qwen3-Coder. The evaluation consisted of 52 programming and development tasks spanning six major domains, executed in isolated container environments with multi-turn interaction tests. As shown in the results (below), GLM-4.5 demonstrates a strong competitive advantage over other open-source models, particularly in tool invocation reliability and task completion rate. While there remains room for improvement compared to Claude 4 Sonnet, GLM-4.5 delivers a largely comparable experience in most scenarios. To ensure transparency, we have released all [52 test problems along with full agent trajectories](https://huggingface.co/datasets/zai-org/CC-Bench-trajectories) for industry validation and reproducibility. Description

## Usage **Core Capability:** Coding Skills → Intelligent code generation | Real-time code completion | Automated bug fixing * Supports major languages including Python, JavaScript, and Java. * Generates well-structured, scalable, high-quality code based on natural language instructions. * Focuses on real-world development needs, avoiding templated or generic outputs. **Use Case:** Complete refactoring-level tasks within 1 hour; generate full product prototypes in 5 minutes.

**Core Capabilities:** Agent Abilities → Autonomous task planning | Multi-tool orchestration | Dynamic environment interaction * Automatically decomposes complex tasks into clear, executable step-by-step plans. * Flexibly invokes development tools to complete coding, debugging, and validation in a one-stop workflow. * Dynamically adjusts strategies based on real-time feedback, quickly adapting to task changes and continuously optimizing execution paths. **Use Case:** In multi-module collaborative development projects, delivery cycles were shortened by 40%, and manpower investment was reduced by approximately 30%.

**Core Capability:** PPT Creation → Clear logic | Complete content | Effective visual presentation * **Content Expansion by Theme:** Generates multi-slide PPT content from a single title or central concept. * **Logical Structure Organization:** Automatically segments content into introduction, body, and conclusion modules with well-organized semantic flow. * **Slide Layout Suggestions:** Works with template systems to recommend optimal presentation styles for the generated content. **Use Case:** Suitable for office automation platforms, AI presentation tools, and other productivity-focused products. **Core Capability:** Model reasoning power → Precise instruction parsing | Multi-turn logical reasoning | Domain knowledge integration * **Deep Natural Language Understanding** – Accurately interprets natural language instructions, extracts key intents, and converts them into executable tasks. * **Complex Multi-Turn Reasoning** – Supports multi-step logical reasoning chains, efficiently handling composite problems involving cross-step dependencies and multiple variables. * **Domain Knowledge Fusion** – Integrates domain-specific expertise with contextual information to enhance reasoning accuracy and output stability. **Use Case:** In complex business workflows, accuracy improves by *60%*, and reasoning efficiency improves by *70%*. **Core Capabilities:** Translation Proficiency → Strong contextual consistency | Accurate style preservation | Excellent handling of long passages * **Long, Complex Sentence Translation:** Maintains semantic coherence and structural accuracy, ideal for policy and academic materials. * **Style Retention and Adaptation:** Preserves the original tone or adapts to the target language’s commonly used expression style during translation. * **Support for Low-Resource Languages and Informal Contexts:** Preliminary coverage of 26 languages, with capabilities to translate social and informal texts. **Use Cases:** Suitable for publishing house translations, content localization for overseas markets, cross-border customer service, and social media platforms. **Core Capability:** Creative Writing → Natural expression | Rich emotion | Complete structure * Generates coherent literary texts with clear narrative flow based on given themes, characters, or worldviews. * Produces emotionally engaging copy tailored to audience profiles and product characteristics. * Supports short videos and new media scripts aligned with platforms like Douyin and Xiaohongshu, integrating emotion control and narrative pacing. **Use Case:** Ideal for deployment in content creation platforms, marketing toolchains, or AI writing assistants to enhance content production efficiency and personalization. **Core Capability:** Humanized Expression → Natural tone | Accurate emotional conveyance | Consistent character behavior * **Role-Playing Dialogue System:** Maintains consistent tone and behavior of the designated character across multi-turn conversations. * **Emotionally Rich Copywriting:** Delivers warm, relatable expressions suitable for building “humanized” brands or companion-style user products. * **Virtual Persona Content Creation:** Supports generation of content aligned with virtual streamers or character IPs, including social posts and fan interactions. **Use Case:** Ideal for virtual humans, social AI, and brand personification operations. ## Resources * [API Documentation](/api-reference/llm/chat-completion): Learn how to call the API. ## Quick Start ### Thinking Mode GLM-4.5 offers a “Deep Thinking Mode” that users can enable or disable by setting the `thinking.type` parameter. This parameter supports two values: `enabled` (enabled) and `disabled` (disabled). By default, dynamic thinking is enabled. * **Simple Tasks (No Thinking Required):** For straightforward requests that do not require complex reasoning (e.g., fact retrieval or classification), thinking is unnecessary. Examples include: * When was Z.AI founded? * Translate the sentence “I love you” into Chinese. * **Moderate Tasks (Default/Some Thinking Required):** Many common requests require stepwise processing or deeper understanding. The GLM-4.5 series can flexibly apply thinking capabilities to handle tasks such as: * Why does Jupiter have more moons than Saturn, despite Saturn being larger? * Compare the advantages and disadvantages of flying versus taking the high-speed train from Beijing to Shanghai. **Difficult Tasks (Maximum Thinking Capacity):** For truly complex challenges—such as solving advanced math problems, network-related questions, or coding issues—these tasks require the model to fully engage its reasoning and planning abilities, often involving many internal steps before arriving at an answer. Examples include: * Explain in detail how different experts in a Mixture-of-Experts (MoE) model collaborate. * Based on the recent week’s fluctuations of the Shanghai Composite Index and current political information, should I invest in a stock index ETF? Why? ### Samples Code **Basic Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-4.5", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "max_tokens": 4096, "temperature": 0.6 }' ``` **Streaming Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-4.5", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "stream": true, "max_tokens": 4096, "temperature": 0.6 }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.3 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Basic Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-4.5", messages=[ {"role": "user", "content": "As a marketing expert, please create an attractive slogan for my product."}, {"role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product."}, {"role": "user", "content": "Z.AI Open Platform"} ], thinking={ "type": "enabled", }, max_tokens=4096, temperature=0.6 ) # Get complete response print(response.choices[0].message) ``` **Streaming Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-4.5", messages=[ {"role": "user", "content": "As a marketing expert, please create an attractive slogan for my product."}, {"role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product."}, {"role": "user", "content": "Z.AI Open Platform"} ], thinking={ "type": "enabled", # Optional: "disabled" or "enabled", default is "enabled" }, stream=True, max_tokens=4096, temperature=0.6 ) # Stream response for chunk in response: if chunk.choices[0].delta.reasoning_content: print(chunk.choices[0].delta.reasoning_content, end='', flush=True) if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end='', flush=True) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.5 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.5' ``` **Basic Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import java.util.Arrays; public class BasicChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI() .apiKey("your-api-key") .build(); // Create chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4.5") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content("Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build() )) .thinking(ChatThinking.builder().type("enabled").build()) .maxTokens(4096) .temperature(0.6f) .build(); // Send request ChatCompletionResponse response = client.chat().createChatCompletion(request); // Get response if (response.isSuccess()) { Object reply = response.getData().getChoices().get(0).getMessage(); System.out.println("AI Response: " + reply); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Streaming Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import ai.z.openapi.service.model.Delta; import java.util.Arrays; public class StreamingChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI() .apiKey("your-api-key") .build(); // Create streaming chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4.5") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content("Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build() )) .thinking(ChatThinking.builder().type("enabled").build()) .stream(true) // Enable streaming output .maxTokens(4096) .temperature(0.6f) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { response.getFlowable().subscribe( // Process streaming message data data -> { if (data.getChoices() != null && !data.getChoices().isEmpty()) { Delta delta = data.getChoices().get(0).getDelta(); System.out.print(delta + "\n"); }}, // Process streaming response error error -> System.err.println("\nStream error: " + error.getMessage()), // Process streaming response completion event () -> System.out.println("\nStreaming response completed") ); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Install SDK** ```bash theme={null} # Install or upgrade to latest version pip install --upgrade 'openai>=1.0' ``` **Verify Installation** ```python theme={null} python -c "import openai; print(openai.__version__)" ``` **Usage Example** ```python theme={null} from openai import OpenAI client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/" ) completion = client.chat.completions.create( model="glm-4.5", messages=[ {"role": "system", "content": "You are a smart and creative novelist"}, {"role": "user", "content": "Please write a short fairy tale story as a fairy tale master"} ] ) print(completion.choices[0].message.content) ```