This guide explains how to migrate your calls from GLM-4.5 or other earlier models to Z.AI GLM-4.6, our most powerful coding model to date, covering sampling parameter differences, streaming tool calls, and other key points.
GLM-4.6 Features
- Support for larger context and output: Maximum context 200K, maximum output 128K.
- New support for streaming output during tool calling process (
tool_stream=true), real-time retrieval of tool call parameters. - Same as GLM-4.5 series, supports deep thinking (
thinking={ type: "enabled" }). - Superior code performance and advanced reasoning capabilities.
Migration Checklist
- Update model identifier to
glm-4.6 - Sampling parameters:
temperaturedefault value1.0,top_pdefault value0.95, recommend choosing only one for tuning - Deep thinking: Enabled or disable
thinking={ type: "enabled" }as needed for complex reasoning/coding - Streaming response: Enable
stream=trueand properly handledelta.reasoning_contentanddelta.content - Streaming tool calls: Enable
stream=trueandtool_stream=trueand stream-concatenatedelta.tool_calls[*].function.arguments - Maximum output and context: Set
max_tokensappropriately (GLM-4.6 maximum output 128K, context 200K) - Prompt optimization: Work with deep thinking, use clearer instructions and constraints
- Development environment verification: Conduct use case testing and regression, focus on randomness, latency, parameter completeness in tool streams
Start Migration
1. Update Model Identifier
- Update
modeltoglm-4.6.
2. Update Sampling Parameters
temperature: Controls randomness; higher values are more divergent, lower values are more stable.top_p: Controls nucleus sampling; higher values expand candidate set, lower values converge candidate set.temperaturedefaults to1.0,top_pdefaults to0.95, not recommended to adjust both simultaneously.
3. Deep Thinking (Optional)
- GLM-4.6 continues to support deep thinking capability, enabled by default.
- Recommended to enable for complex reasoning and coding tasks:
4. Streaming Output and Tool Calls (Optional)
- GLM-4.6 exclusively supports real-time streaming construction and output during tool calling process, disabled by default (
False), requires enabling both:stream=True: Enable streaming output for responsestool_stream=True: Enable streaming output for tool call parameters
5. Testing and Regression
First verify in development environment that post-migration calls are stable, focus on:
- Whether responses meet expectations, whether thereβs excessive randomness or excessive conservatism in output
- Whether tool streaming construction and output work normally
- Latency and cost in long context and deep thinking scenarios