Skip to main content
POST
/
paas
/
v4
/
chat
/
completions
curl --request POST \
--url https://api.z.ai/api/paas/v4/chat/completions \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
"model": "glm-4.6",
"messages": [
{
"role": "system",
"content": "You are a useful AI assistant."
},
{
"role": "user",
"content": "Please tell us about the development of artificial intelligence."
}
],
"temperature": 1,
"max_tokens": 65536,
"stream": false
}'
{
  "id": "<string>",
  "request_id": "<string>",
  "created": 123,
  "model": "<string>",
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "assistant",
        "content": "<string>",
        "reasoning_content": "<string>",
        "tool_calls": [
          {
            "function": {
              "name": "<string>",
              "arguments": {}
            },
            "id": "<string>",
            "type": "<string>"
          }
        ]
      },
      "finish_reason": "<string>"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "prompt_tokens_details": {
      "cached_tokens": 123
    },
    "total_tokens": 123
  },
  "web_search": [
    {
      "title": "<string>",
      "content": "<string>",
      "link": "<string>",
      "media": "<string>",
      "icon": "<string>",
      "refer": "<string>",
      "publish_date": "<string>"
    }
  ]
}

Authorizations

Authorization
string
header
required

Use the following format for authentication: Bearer <your api key>

Headers

Accept-Language
enum<string>
default:en-US,en

Config desired response language for HTTP requests.

Available options:
en-US,en
Example:

"en-US,en"

Body

application/json
  • Text Model
  • Vision Model
model
enum<string>
default:glm-4.6
required

The model code to be called. GLM-4.6 are the latest flagship model series, foundational models specifically designed for agent applications.

Available options:
glm-4.6,
glm-4.5,
glm-4.5-air,
glm-4.5-x,
glm-4.5-airx,
glm-4.5-flash,
glm-4-32b-0414-128k
Example:

"glm-4.6"

messages
(User Message Β· object | System Message Β· object | Assistant Message Β· object | Tool Message Β· object)[]
required

The current conversation message list as the model’s prompt input, provided in JSON array format, e.g.,{β€œrole”: β€œuser”, β€œcontent”: β€œHello”}. Possible message types include system messages, user messages, assistant messages, and tool messages. Note: The input must not consist of system messages or assistant messages only.

Minimum length: 1
  • User Message
  • System Message
  • Assistant Message
  • Tool Message
request_id
string

Passed by the user side, needs to be unique; used to distinguish each request. If not provided by the user side, the platform will generate one by default.

do_sample
boolean
default:true

When do_sample is true, sampling strategy is enabled; when do_sample is false, sampling strategy parameters such as temperature and top_p will not take effect. Default value is true.

Example:

true

stream
boolean
default:false

This parameter should be set to false or omitted when using synchronous call. It indicates that the model returns all content at once after generating all content. Default value is false. If set to true, the model will return the generated content in chunks via standard Event Stream. When the Event Stream ends, a data: [DONE] message will be returned.

Example:

false

thinking
object

Only supported by GLM-4.5 series and higher models. This parameter is used to control whether the model enable the chain of thought.

temperature
number
default:1

Sampling temperature, controls the randomness of the output, must be a positive number within the range: [0.0, 1.0]. The GLM-4.6 series default value is 1.0, GLM-4.5 series default value is 0.6, GLM-4-32B-0414-128K default value is 0.75.

Required range: 0 <= x <= 1
Example:

1

top_p
number
default:0.95

Another method of temperature sampling, value range is: (0.0, 1.0]. The GLM-4.6, GLM-4.5 series default value is 0.95, GLM-4-32B-0414-128K default value is 0.9.

Required range: 0 <= x <= 1
Example:

0.95

max_tokens
integer

The maximum number of tokens for model output, the GLM-4.6 series supports 128K maximum output, the GLM-4.5 series supports 96K maximum output, the GLM-4.5v series supports 16K maximum output, GLM-4-32B-0414-128K supports 16K maximum output.

Required range: 1 <= x <= 98304
Example:

1024

tool_stream
boolean
default:false

Whether to enable streaming response for Function Calls. Default value is false. Only supported by GLM-4.6. Refer the Stream Tool Call

Example:

false

tools
(Function Call Β· object | Retrieval Β· object | Web Search Β· object)[]

A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.

  • Function Call
  • Retrieval
tool_choice
enum<string>

Controls how the model selects a tool. Used to control how the model selects which function to call. This is only applicable when the tool type is function. The default value is auto, and only auto is supported.

Available options:
auto
stop
string[]

Stop word list. Generation stops when the model encounters any specified string. Currently, only one stop word is supported, in the format ["stop_word1"].

Maximum length: 1
response_format
object

Specifies the response format of the model. Defaults to text. Supports two formats:{ "type": "text" } plain text mode, returns natural language text, { "type": "json_object" } JSON mode, returns valid JSON data. When using JSON mode, it’s recommended to clearly request JSON output in the prompt.

user_id
string

Unique ID for the end user, 6–128 characters. Avoid using sensitive information.

Required string length: 6 - 128

Response

Processing successful

id
string

Task ID

request_id
string

Request ID

created
integer

Request creation time, Unix timestamp in seconds

model
string

Model name

choices
object[]

List of model responses

usage
object

Token usage statistics returned when the model call ends.

Search results.