Skip to main content
Streaming Messages allow real-time content retrieval while the model generates responses, without waiting for the complete response to be generated. This approach can significantly improve user experience, especially when generating long text content, as users can immediately see output beginning to appear.

Features

Streaming messages use an incremental generation mechanism, transmitting content in chunks in real-time during the generation process, rather than waiting for the complete response to be generated before returning it all at once. This mechanism allows developers to:
  • Real-time Response: No need to wait for complete response, content displays progressively
  • Improved Experience: Reduce user waiting time, provide instant feedback
  • Reduced Latency: Content is transmitted as it’s generated, reducing perceived latency
  • Flexible Processing: Real-time processing and display during reception

Core Parameter Description

  • stream=True: Enable streaming output, must be set to True
  • model: Models that support streaming output, such as glm-4.6, glm-4.5, etc.

Response Format Description

Streaming responses use Server-Sent Events (SSE) format, with each event containing:
  • choices[0].delta.content: Incremental text content
  • choices[0].delta.reasoning_content: Incremental reasoning content
  • choices[0].finish_reason: Completion reason (only appears in the last chunk)
  • usage: Token usage statistics (only appears in the last chunk)

Code Examples

  • cURL
  • Python
curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "glm-4.6",
    "messages": [
        {
            "role": "user",
            "content": "Write a poem about spring"
        }
    ],
    "stream": true
}'

Response Example

The streaming response format is as follows:
data: {"id":"1","created":1677652288,"model":"glm-4.6","choices":[{"index":0,"delta":{"content":"Spring"},"finish_reason":null}]}

data: {"id":"1","created":1677652288,"model":"glm-4.6","choices":[{"index":0,"delta":{"content":" comes"},"finish_reason":null}]}

data: {"id":"1","created":1677652288,"model":"glm-4.6","choices":[{"index":0,"delta":{"content":" with"},"finish_reason":null}]}

...

data: {"id":"1","created":1677652288,"model":"glm-4.6","choices":[{"index":0,"finish_reason":"stop","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":8,"completion_tokens":262,"total_tokens":270,"prompt_tokens_details":{"cached_tokens":0}}}

data: [DONE]

Application Scenarios

Chat Applications

  • Real-time conversation experience
  • Character-by-character reply display
  • Reduced waiting time

Content Generation

  • Article writing assistant
  • Code generation tools
  • Creative content creation

Educational Applications

  • Online Q&A systems
  • Learning assistance tools
  • Knowledge Q&A platforms

Customer Service Systems

  • Intelligent customer service bots
  • Real-time problem solving
  • User support systems
⌘I