Streaming Messages allow real-time content retrieval while the model generates responses, without waiting for the complete response to be generated. This approach can significantly improve user experience, especially when generating long text content, as users can immediately see output beginning to appear.
Features
Streaming messages use an incremental generation mechanism, transmitting content in chunks in real-time during the generation process, rather than waiting for the complete response to be generated before returning it all at once. This mechanism allows developers to:- Real-time Response: No need to wait for complete response, content displays progressively
- Improved Experience: Reduce user waiting time, provide instant feedback
- Reduced Latency: Content is transmitted as itβs generated, reducing perceived latency
- Flexible Processing: Real-time processing and display during reception
Core Parameter Description
stream=True: Enable streaming output, must be set toTruemodel: Models that support streaming output, such asglm-4.6,glm-4.5, etc.
Response Format Description
Streaming responses use Server-Sent Events (SSE) format, with each event containing:choices[0].delta.content: Incremental text contentchoices[0].delta.reasoning_content: Incremental reasoning contentchoices[0].finish_reason: Completion reason (only appears in the last chunk)usage: Token usage statistics (only appears in the last chunk)
Code Examples
- cURL
- Python
Response Example
The streaming response format is as follows:Application Scenarios
Chat Applications
- Real-time conversation experience
- Character-by-character reply display
- Reduced waiting time
Content Generation
- Article writing assistant
- Code generation tools
- Creative content creation
Educational Applications
- Online Q&A systems
- Learning assistance tools
- Knowledge Q&A platforms
Customer Service Systems
- Intelligent customer service bots
- Real-time problem solving
- User support systems