Stream Tool Call is a unique feature of Z.aiβs latest GLM-4.6 model, allowing real-time access to reasoning processes, response content, and tool call information during tool invocation, providing better user experience and real-time feedback.
Features
Tool calling in the latest GLM-4.6 model now supports streaming output for responses. This allows developers to stream tool usage parameters without buffering or JSON validation when calling chat.completions
, thereby reducing call latency and providing a better user experience.
Core Parameter Description
stream=True
: Enable streaming output, must be set to True
tool_stream=True
: Enable tool call streaming output
model
: Use a model that supports tool calling, limited to glm-4.6
Response Parameter Description
The delta
object in streaming responses contains the following fields:
reasoning_content
: Text content of the modelβs reasoning process
content
: Text content of the modelβs response
tool_calls
: Tool call information, including function names and parameters
Code Example
By setting the tool_stream=True
parameter, you can enable streaming tool call functionality:
Install SDK # Install latest version
pip install zai-sdk
# Or specify version
pip install zai-sdk== 0.0.4
Verify Installation import zai
print (zai. __version__ )
Complete Example from zai import ZaiClient
# Initialize client
client = ZaiClient( api_key = 'Your API key' )
# Create streaming tool call request
response = client.chat.completions.create(
model = "glm-4.6" , # Use model that supports tool calling
messages = [
{ "role" : "user" , "content" : "How's the weather in Beijing?" },
],
tools = [
{
"type" : "function" ,
"function" : {
"name" : "get_weather" ,
"description" : "Get current weather conditions for a specified location" ,
"parameters" : {
"type" : "object" ,
"properties" : {
"location" : { "type" : "string" , "description" : "City, e.g.: Beijing, Shanghai" },
"unit" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ]}
},
"required" : [ "location" ]
}
}
}
],
stream = True , # Enable streaming output
tool_stream = True # Enable tool call streaming output
)
# Initialize variables to collect streaming data
reasoning_content = "" # Reasoning process content
content = "" # Response content
final_tool_calls = {} # Tool call information
reasoning_started = False # Reasoning process start flag
content_started = False # Content output start flag
# Process streaming response
for chunk in response:
if not chunk.choices:
continue
delta = chunk.choices[ 0 ].delta
# Handle streaming reasoning process output
if hasattr (delta, 'reasoning_content' ) and delta.reasoning_content:
if not reasoning_started and delta.reasoning_content.strip():
print ( " \n π§ Thinking Process:" )
reasoning_started = True
reasoning_content += delta.reasoning_content
print (delta.reasoning_content, end = "" , flush = True )
# Handle streaming response content output
if hasattr (delta, 'content' ) and delta.content:
if not content_started and delta.content.strip():
print ( " \n\n π¬ Response Content:" )
content_started = True
content += delta.content
print (delta.content, end = "" , flush = True )
# Handle streaming tool call information
if delta.tool_calls:
for tool_call in delta.tool_calls:
index = tool_call.index
if index not in final_tool_calls:
# New tool call
final_tool_calls[index] = tool_call
final_tool_calls[index].function.arguments = tool_call.function.arguments
else :
# Append tool call parameters (streaming construction)
final_tool_calls[index].function.arguments += tool_call.function.arguments
# Output final tool call information
if final_tool_calls:
print ( " \n π Function Calls Triggered:" )
for index, tool_call in final_tool_calls.items():
print ( f " { index } : Function Name: { tool_call.function.name } , Parameters: { tool_call.function.arguments } " )
Use Cases
Intelligent Customer Service System
Real-time display of query progress
Improve waiting experience
Code Assistant
Real-time code analysis process
Display tool call chain