Stream Tool Call is a unique feature of Z.ai’s latest GLM-4.6 model, allowing real-time access to reasoning processes, response content, and tool call information during tool invocation, providing better user experience and real-time feedback.

Features

Tool calling in the latest GLM-4.6 model now supports streaming output for responses. This allows developers to stream tool usage parameters without buffering or JSON validation when calling chat.completions, thereby reducing call latency and providing a better user experience.

Core Parameter Description

  • stream=True: Enable streaming output, must be set to True
  • tool_stream=True: Enable tool call streaming output
  • model: Use a model that supports tool calling, limited to glm-4.6

Response Parameter Description

The delta object in streaming responses contains the following fields:
  • reasoning_content: Text content of the model’s reasoning process
  • content: Text content of the model’s response
  • tool_calls: Tool call information, including function names and parameters

Code Example

By setting the tool_stream=True parameter, you can enable streaming tool call functionality:
Install SDK
# Install latest version
pip install zai-sdk

# Or specify version
pip install zai-sdk==0.0.4
Verify Installation
import zai
print(zai.__version__)
Complete Example
from zai import ZaiClient

# Initialize client
client = ZaiClient(api_key='Your API key')

# Create streaming tool call request
response = client.chat.completions.create(
    model="glm-4.6",  # Use model that supports tool calling
    messages=[
        {"role": "user", "content": "How's the weather in Beijing?"},
    ],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get current weather conditions for a specified location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string", "description": "City, e.g.: Beijing, Shanghai"},
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                    },
                    "required": ["location"]
                }
            }
        }
    ],
    stream=True,        # Enable streaming output
    tool_stream=True    # Enable tool call streaming output
)

# Initialize variables to collect streaming data
reasoning_content = ""      # Reasoning process content
content = ""               # Response content
final_tool_calls = {}      # Tool call information
reasoning_started = False  # Reasoning process start flag
content_started = False    # Content output start flag

# Process streaming response
for chunk in response:
    if not chunk.choices:
        continue

    delta = chunk.choices[0].delta

    # Handle streaming reasoning process output
    if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
        if not reasoning_started and delta.reasoning_content.strip():
            print("\n🧠 Thinking Process:")
            reasoning_started = True
        reasoning_content += delta.reasoning_content
        print(delta.reasoning_content, end="", flush=True)

    # Handle streaming response content output
    if hasattr(delta, 'content') and delta.content:
        if not content_started and delta.content.strip():
            print("\n\nπŸ’¬ Response Content:")
            content_started = True
        content += delta.content
        print(delta.content, end="", flush=True)

    # Handle streaming tool call information
    if delta.tool_calls:
        for tool_call in delta.tool_calls:
            index = tool_call.index
            if index not in final_tool_calls:
                # New tool call
                final_tool_calls[index] = tool_call
                final_tool_calls[index].function.arguments = tool_call.function.arguments
            else:
                # Append tool call parameters (streaming construction)
                final_tool_calls[index].function.arguments += tool_call.function.arguments

# Output final tool call information
if final_tool_calls:
    print("\nπŸ“‹ Function Calls Triggered:")
    for index, tool_call in final_tool_calls.items():
        print(f"  {index}: Function Name: {tool_call.function.name}, Parameters: {tool_call.function.arguments}")

Use Cases

Intelligent Customer Service System

  • Real-time display of query progress
  • Improve waiting experience

Code Assistant

  • Real-time code analysis process
  • Display tool call chain