GLM-5.2 - Overview - Z.AI DEVELOPER DOCUMENT

Overview

GLM-5.2 is a flagship model built for the era of long-horizon tasks. With truly usable 1M-token context, it has been tested to handle project-scale engineering context, delivering more stable long-task execution, more reliable adherence to engineering standards, and higher success rates in development scenarios. A single task can complete the full development workflow—from requirements to deployable products across multiple platforms.

Positioning

Flagship Foundation Model

Input Modalities

Text

Output Modalitie

Text

Context Length

Maximum Output Tokens

128K

Capability

Thinking Mode

Offering multiple thinking modes for different scenarios

Streaming Output

Support real-time streaming responses to enhance user interaction experience

Function Call

Powerful tool invocation capabilities, enabling integration with various external toolsets

Context Caching

Intelligent caching mechanism to optimize performance in long conversations

Structured Output

Support for structured output formats like JSON, facilitating system integration

MCP

Flexibly integrate external MCP tools and data sources to expand application scenarios

Usage

Mobile Development

GLM-5.2 has a deeper understanding of client-side architecture, streaming messages, long-lived connection state, mobile interactions, and local state management. In mobile scenarios such as Android, it can leverage ADB, logcat, screenshots, and runtime logs to locate issues, continuously completing the debugging loop from code implementation, build and install, to on-device verification—closer to a real mobile engineering workflow.Recommended way to experience it: Choose a real Android or mini-program task, and let the model go from implementation to verification:

Please implement a native Android client in Kotlin that integrates with the existing server-side API, supporting multi-session, streaming messages, voice input, notifications, and reconnection on disconnect. After completion, install it on a real device using ADB, and complete debugging with logcat and screenshots.

Project-Level Engineering Takeover

Let GLM-5.2 take over a medium-to-large project, comb through module boundaries, core call chains, architectural constraints, technical debt and risk points, and plan a subsequent refactoring path. Thanks to stronger long-context capacity, the model can continuously retain architectural boundaries, interface contracts, and historical decisions throughout the session, greatly reducing context gaps in the later stages of complex tasks.Recommended way to experience it: Select a real repository containing backend, frontend, configuration, tests, and documentation, and let GLM-5.2 produce a complete project technical inventory:

Please read the current project and produce a system architecture map, core module responsibilities, key interface contracts, main data flows, potential risk points, and the engineering constraints that must be followed during subsequent refactoring.

Cross-Module Root Cause Analysis

The Debug value of GLM-5.2 is reflected in link-level investigation. It can trace the source of a problem across files, services, and call chains, and further check for similar defects and related impact surfaces. It is suitable for handling complex issues such as interface inconsistency, state desynchronization, message loss, configuration drift, and version adaptation failures.Recommended way to experience it: Give the model real logs, reproduction paths, related service code, configuration, and interface definitions, and let it analyze before fixing:

Please trace the root cause of the problem along the call chain, explaining which modules, configurations, interfaces, or data flows are involved. Please determine whether there are similar risks, and provide a minimal fix plan, verification steps, and a regression checklist.

Production-Grade Standards Stress Testing

GLM-5.2 has a higher degree of adherence to engineering standards, especially in long-context and multi-round execution. It is better able to comply with code style, architectural boundaries, dependency constraints, build processes, testing requirements, and commit boundaries, reducing the risk of out-of-scope modifications, invalid dependencies, skipped verification, and unauthorized commits.Recommended way to experience it: Hand the model the team’s real standards, such as CLAUDE.md, lint rules, build commands, testing requirements, commit conventions, and a list of prohibited operations. Then give it a real modification task:

Please strictly follow the current repository’s engineering standards. Do not introduce new dependencies, do not modify interface contracts, and do not proactively commit. After the modification is complete, run the build, lint, and tests, and explain the verification results and any uncovered risks.

Introducing GLM-5.2

1M Context: Making Long-Horizon Tasks Stable and Practical

The foundation of long-horizon tasks is not having a 1M context, but making 1M context truly usable. GLM-5.2 delivers a Solid 1M lossless context and has undergone months of specialized training for long-horizon Coding Agent scenarios, covering high-value tasks such as large-scale implementation, automated research, and performance optimization.Compared to solutions that merely extend context length, GLM-5.2 maintains more stable performance at ultra-long context, even surpassing Opus in select real-world benchmarks.GLM-5.2 delivers state-of-the-art long-horizon coding performance among open-source models. Across FrontierSWE, PostTrainBench, and SWE-Marathon, it consistently ranks among the top models overall—trailing Opus 4.8 by just 1% on FrontierSWE, outperforming GPT-5.5 and Opus 4.7 on multiple benchmarks, and remaining the highest-ranked open-source model across all three. These results demonstrate that GLM-5.2’s 1M context window translates into practical long-horizon engineering capability. Description

Coding Capabilities Validated by Both Benchmarks and Developers

On standard coding benchmarks, GLM-5.2 is the strongest open-source model, improving on GLM-5.1 by a wide margin: 81.0 vs. 62.0 on Terminal-Bench 2.1 and 62.1 vs. 58.4 on SWE-bench Pro. It also closes much of the gap to the closed-source frontier — on Terminal-Bench 2.1 (81.0) it lands within a few points of Claude Opus 4.8 (85.0) — while staying ahead of Gemini 3.1 Pro. Description

Before its official release, GLM-5.2 was made available in advance to GLM Coding Plan users. Developers reported improvements mainly in the following areas:

Stronger project-level context capacity, enabling an entire codebase to be placed within a single reasoning workflow;
More stable long-horizon task execution, allowing complex tasks to progress continuously without easily going off track;
More reliable adherence to production-grade engineering standards, helping enforce hard constraints in team development workflows;
Stronger client-side and mobile engineering capabilities, going beyond app generation to support a complete on-device debugging loop.

↗ Blog

Resources

API Documentation: Learn how to call the API.

Quick Start

The following is a full sample code to help you onboard GLM-5.2 with ease.

cURL
Official Python SDK
Official Java SDK
OpenAI Python SDK

Basic Call

curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
  "model": "glm-5.2",
  "messages": [
    {
      "role": "system",
      "content": "You are a senior full-stack software engineer, proficient in frontend development, backend architecture design, and modern web technology stacks."
    },
    {
      "role": "user",
      "content": "Design and build a personal blog website for me, including a homepage, article list page, and article detail page, using React + Node.js technology stack."
    }
  ],
  "thinking": {
    "type": "enabled"
  },
  "reasoning_effort": "max",
  "max_tokens": 4096,
  "temperature": 1.0
}'

Streaming Call

curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
  "model": "glm-5.2",
  "messages": [
    {
      "role": "system",
      "content": "You are a senior full-stack software engineer, proficient in frontend development, backend architecture design, and modern web technology stacks."
    },
    {
      "role": "user",
      "content": "Design and build a personal blog website for me, including a homepage, article list page, and article detail page, using React + Node.js technology stack."
    }
  ],
  "thinking": {
    "type": "enabled"
  },
  "reasoning_effort": "max",
  "stream": true,
  "max_tokens": 4096,
  "temperature": 1.0
}'

Install SDK

# Install latest version
pip install zai-sdk

# Or specify version
pip install zai-sdk==0.2.3

Verify Installation

import zai

print(zai.__version__)

Basic Call

from zai import ZaiClient

client = ZaiClient(api_key="your-api-key")  # Your API Key

response = client.chat.completions.create(
    model="glm-5.2",
    messages=[
        {
            "role": "system",
            "content": "You are a senior full-stack software engineer, proficient in frontend development, backend architecture design, and modern web technology stacks.",
        },
        {
            "role": "user",
            "content": "Design and build a personal blog website for me, including a homepage, article list page, and article detail page, using React + Node.js technology stack.",
        },
    ],
    thinking={
        "type": "enabled",
    },
    reasoning_effort="max",
    max_tokens=4096,
    temperature=1.0,
)

# Get complete response
print(response.choices[0].message)

Streaming Call

from zai import ZaiClient

client = ZaiClient(api_key="your-api-key")  # Your API Key

response = client.chat.completions.create(
    model="glm-5.2",
    messages=[
        {
            "role": "system",
            "content": "You are a senior full-stack software engineer, proficient in frontend development, backend architecture design, and modern web technology stacks.",
        },
        {
            "role": "user",
            "content": "Design and build a personal blog website for me, including a homepage, article list page, and article detail page, using React + Node.js technology stack.",
        },
    ],
    thinking={
        "type": "enabled",  # Optional: "disabled" or "enabled", default is "enabled"
    },
    reasoning_effort="max",
    stream=True,
    max_tokens=4096,
    temperature=0.6,
)

# Stream response
for chunk in response:
    if chunk.choices[0].delta.reasoning_content:
        print(chunk.choices[0].delta.reasoning_content, end="", flush=True)

    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Install SDKMaven

<dependency>
    <groupId>ai.z.openapi</groupId>
    <artifactId>zai-sdk</artifactId>
    <version>0.3.5</version>
</dependency>

Gradle (Groovy)

implementation 'ai.z.openapi:zai-sdk:0.3.5'

Basic Call

import ai.z.openapi.ZaiClient;
import ai.z.openapi.service.model.ChatCompletionCreateParams;
import ai.z.openapi.service.model.ChatCompletionResponse;
import ai.z.openapi.service.model.ChatMessage;
import ai.z.openapi.service.model.ChatMessageRole;
import ai.z.openapi.service.model.ChatThinking;
import java.util.Arrays;

public class BasicChat {
    public static void main(String[] args) {
        // Initialize client
        ZaiClient client = ZaiClient.builder().ofZAI().apiKey("your-api-key").build();

        // Create chat completion request
        ChatCompletionCreateParams request = ChatCompletionCreateParams.builder()
            .model("glm-5.2")
            .messages(
                Arrays.asList(
                    ChatMessage.builder()
                        .role(ChatMessageRole.SYSTEM.value())
                        .content(
                            "You are a senior full-stack software engineer, proficient in frontend development, backend architecture design, and modern web technology stacks.")
                        .build(),
                    ChatMessage.builder()
                        .role(ChatMessageRole.USER.value())
                        .content(
                            "Design and build a personal blog website for me, including a homepage, article list page, and article detail page, using React + Node.js technology stack.")
                        .build()))
            .thinking(ChatThinking.builder().type("enabled").build())
            .reasoningEffort("max")
            .maxTokens(4096)
            .temperature(1.0f)
            .build();

        // Send request
        ChatCompletionResponse response = client.chat().createChatCompletion(request);

        // Get response
        if (response.isSuccess()) {
            Object reply = response.getData().getChoices().get(0).getMessage();
            System.out.println("AI Response: " + reply);
        } else {
            System.err.println("Error: " + response.getMsg());
        }
    }
}

Streaming Call

import ai.z.openapi.ZaiClient;
import ai.z.openapi.service.model.ChatCompletionCreateParams;
import ai.z.openapi.service.model.ChatCompletionResponse;
import ai.z.openapi.service.model.ChatMessage;
import ai.z.openapi.service.model.ChatMessageRole;
import ai.z.openapi.service.model.ChatThinking;
import ai.z.openapi.service.model.Delta;
import java.util.Arrays;

public class StreamingChat {
    public static void main(String[] args) {
        // Initialize client
        ZaiClient client = ZaiClient.builder().ofZAI().apiKey("your-api-key").build();

        // Create streaming chat completion request
        ChatCompletionCreateParams request = ChatCompletionCreateParams.builder()
            .model("glm-5.2")
            .messages(
                Arrays.asList(
                    ChatMessage.builder()
                        .role(ChatMessageRole.SYSTEM.value())
                        .content(
                            "You are a senior full-stack software engineer, proficient in frontend development, backend architecture design, and modern web technology stacks.")
                        .build(),
                    ChatMessage.builder()
                        .role(ChatMessageRole.USER.value())
                        .content(
                            "Design and build a personal blog website for me, including a homepage, article list page, and article detail page, using React + Node.js technology stack.")
                        .build()))
            .thinking(ChatThinking.builder().type("enabled").build())
            .reasoningEffort("max")
            .stream(true) // Enable streaming output
            .maxTokens(4096)
            .temperature(1.0f)
            .build();

        ChatCompletionResponse response = client.chat().createChatCompletion(request);

        if (response.isSuccess()) {
            response.getFlowable()
                .subscribe(
                    // Process streaming message data
                    data -> {
                        if (data.getChoices() != null && !data.getChoices().isEmpty()) {
                            Delta delta = data.getChoices().get(0).getDelta();
                            System.out.print(delta + "\n");
                        }
                    },
                    // Process streaming response error
                    error -> System.err.println("\nStream error: " + error.getMessage()),
                    // Process streaming response completion event
                    () -> System.out.println("\nStreaming response completed"));
        } else {
            System.err.println("Error: " + response.getMsg());
        }
    }
}

Install SDK

# Install or upgrade to latest version
pip install --upgrade 'openai>=1.0'

Verify Installation

python -c "import openai; print(openai.__version__)"

Usage Example

from openai import OpenAI

client = OpenAI(
    api_key="your-Z.AI-api-key",
    base_url="https://api.z.ai/api/paas/v4/",
)

completion = client.chat.completions.create(
    model="glm-5.2",
    messages=[
        {"role": "system", "content": "You are a senior full-stack software engineer, proficient in frontend development, backend architecture design, and modern web technology stacks."},
        {
            "role": "user",
            "content": "Design and build a personal blog website for me, including a homepage, article list page, and article detail page, using React + Node.js technology stack.",
        },
    ],
)

print(completion.choices[0].message.content)

​Overview

Positioning

Input Modalities

Output Modalitie

Context Length

Maximum Output Tokens

​Capability

Thinking Mode

Streaming Output

Function Call

Context Caching

Structured Output

MCP

​Usage

​Introducing GLM-5.2

1M Context: Making Long-Horizon Tasks Stable and Practical

Coding Capabilities Validated by Both Benchmarks and Developers

​Resources

​Quick Start

Overview

Capability

Usage

Introducing GLM-5.2

Resources

Quick Start