Audio Transcriptions

Speech to Text

curl --request POST \
  --url https://api.z.ai/api/paas/v4/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form model=glm-asr-2512 \
  --form stream=false \
  --form file='@example-file'

{
  "id": "<string>",
  "created": 123,
  "request_id": "<string>",
  "model": "<string>",
  "text": "<string>"
}

POST

paas

audio

transcriptions

Speech to Text

curl --request POST \
  --url https://api.z.ai/api/paas/v4/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form model=glm-asr-2512 \
  --form stream=false \
  --form file='@example-file'

{
  "id": "<string>",
  "created": 123,
  "request_id": "<string>",
  "model": "<string>",
  "text": "<string>"
}

Authorizations

Authorization

string

header

required

Use the following format for authentication: Bearer

Body

multipart/form-data

file

required

The audio file to be transcribed. Supported audio file formats: .wav / .mp3. Specifications: file size ≤ 25 MB, audio duration ≤ 30 seconds.

model

enum<string>

default:glm-asr-2512

required

The model ID to invoke.

Available options:

glm-asr-2512

file_base64

string

Base64 encoded audio file. Only one of file_base64 or file needs to be provided (if both are provided, file takes precedence).

prompt

string

In long text scenarios, you can provide previous transcription results as context. Recommended to be less than 8000 characters.

hotwords

string[]

Hotword list to improve recognition accuracy for domain-specific vocabulary. Format example: ["person_name","place_name"]. Recommended not to exceed 100 items.

Maximum array length: 100

stream

boolean

default:false

This parameter should be set to false or omitted when using synchronous calls. It indicates that the model returns all content at once after generating all content. Default is false. If set to true, the model will return generated content in chunks via standard Event Stream. When the Event Stream ends, a data: [DONE] message will be returned.

request_id

string

Passed by the client, must be unique. A unique identifier to distinguish each request. If not provided by the client, the platform will generate one by default.

user_id

string

A unique ID for the end user, helping the platform intervene in illegal activities, generation of illegal or inappropriate content, or other abusive behaviors by end users. ID length requirement: at least 6 characters, at most 128 characters.

Response

Request processed successfully

string

Task ID

created

integer<int64>

Request creation time, as a Unix timestamp in seconds.

request_id

string

Passed by the client, must be unique. A unique identifier to distinguish each request. If not provided by the client, the platform will generate one by default.

model

string

Model name

text

string

The complete transcribed content of the audio.

Retrieve Result Tokenizer

Using the APIs

Model API

Image API

Video API

Audio API

Tool API

Agent API

Authorizations

Body

Response