Stream Tool Call is a unique feature of Z.aiβs latest model GLM-4.6, allowing real-time access to reasoning processes, response content, and tool call information during tool invocation, providing better user experience and real-time feedback.
Features
Tool calling in the latest GLM-4.6 model now supports streaming output for responses. This allows developers to stream tool usage parameters without buffering or JSON validation when calling chat.completions, reducing call latency and providing better user experience.
Core Parameter Description
stream=True : Enable streaming output, must be set to True
tool_stream=True : Enable tool call streaming output
model : Use a model that supports tool calling, limited to glm-4.6
Response Parameter Description
The delta object in streaming responses contains the following fields:
reasoning_content : Text content of the modelβs reasoning process
content : Text content of the modelβs response
tool_calls : Tool call information, including function names and parameters
Code Examples
By setting the tool_stream=True parameter, you can enable streaming tool call functionality:
Install SDK # Install latest version
pip install zai-sdk
# Or specify version
pip install zai-sdk== 0.0.4
Verify Installation import zai
print (zai. __version__ )
Complete Example from zai import ZaiClient
# Initialize client
client = ZaiClient( api_key = 'Your API Key' )
# Create streaming tool call request
response = client.chat.completions.create(
model = "glm-4.6" , # Use model that supports tool calling
messages = [
{ "role" : "user" , "content" : "How's the weather in Beijing?" },
],
tools = [
{
"type" : "function" ,
"function" : {
"name" : "get_weather" ,
"description" : "Get current weather conditions for a specified location" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"location" : { "type" : "string" , "description" : "City, e.g.: Beijing, Shanghai" },
"unit" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ]}
},
"required" : [ "location" ]
}
}
}
],
stream = True , # Enable streaming output
tool_stream = True # Enable tool call streaming output
)
# Initialize variables to collect streaming data
reasoning_content = "" # Reasoning process content
content = "" # Response content
final_tool_calls = {} # Tool call information
reasoning_started = False # Reasoning process start flag
content_started = False # Content output start flag
# Process streaming response
for chunk in response:
if not chunk.choices:
continue
delta = chunk.choices[ 0 ].delta
# Handle streaming reasoning process output
if hasattr (delta, 'reasoning_content' ) and delta.reasoning_content:
if not reasoning_started and delta.reasoning_content.strip():
print ( " \n π§ Thinking Process:" )
reasoning_started = True
reasoning_content += delta.reasoning_content
print (delta.reasoning_content, end = "" , flush = True )
# Handle streaming response content output
if hasattr (delta, 'content' ) and delta.content:
if not content_started and delta.content.strip():
print ( " \n\n π¬ Response Content:" )
content_started = True
content += delta.content
print (delta.content, end = "" , flush = True )
# Handle streaming tool call information
if delta.tool_calls:
for tool_call in delta.tool_calls:
index = tool_call.index
if index not in final_tool_calls:
# New tool call
final_tool_calls[index] = tool_call
final_tool_calls[index].function.arguments = tool_call.function.arguments
else :
# Append tool call parameters (streaming construction)
final_tool_calls[index].function.arguments += tool_call.function.arguments
# Output final tool call information
if final_tool_calls:
print ( " \n π Function Calls Triggered:" )
for index, tool_call in final_tool_calls.items():
print ( f " { index } : Function Name: { tool_call.function.name } , Parameters: { tool_call.function.arguments } " )
Application Scenarios
Intelligent Customer Service
Real-time query progress display
Improved waiting experience
Code Assistant
Real-time code analysis process
Display tool call chains