Skip to main content
POST
/
paas
/
v4
/
audio
/
transcriptions
Speech to Text
curl --request POST \
  --url https://api.z.ai/api/paas/v4/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form model=glm-asr-2512 \
  --form stream=false \
  --form file='@example-file'
{
  "id": "<string>",
  "created": 123,
  "request_id": "<string>",
  "model": "<string>",
  "text": "<string>"
}

Authorizations

Authorization
string
header
required

Use the following format for authentication: Bearer

Body

multipart/form-data
file
file
required

The audio file to be transcribed. Supported audio file formats: .wav / .mp3. Specifications: file size ≤ 25 MB, audio duration ≤ 30 seconds.

model
enum<string>
default:glm-asr-2512
required

The model ID to invoke.

Available options:
glm-asr-2512
file_base64
string

Base64 encoded audio file. Only one of file_base64 or file needs to be provided (if both are provided, file takes precedence).

prompt
string

In long text scenarios, you can provide previous transcription results as context. Recommended to be less than 8000 characters.

hotwords
string[]

Hotword list to improve recognition accuracy for domain-specific vocabulary. Format example: ["person_name","place_name"]. Recommended not to exceed 100 items.

Maximum array length: 100
stream
boolean
default:false

This parameter should be set to false or omitted when using synchronous calls. It indicates that the model returns all content at once after generating all content. Default is false. If set to true, the model will return generated content in chunks via standard Event Stream. When the Event Stream ends, a data: [DONE] message will be returned.

request_id
string

Passed by the client, must be unique. A unique identifier to distinguish each request. If not provided by the client, the platform will generate one by default.

user_id
string

A unique ID for the end user, helping the platform intervene in illegal activities, generation of illegal or inappropriate content, or other abusive behaviors by end users. ID length requirement: at least 6 characters, at most 128 characters.

Response

Request processed successfully

id
string

Task ID

created
integer<int64>

Request creation time, as a Unix timestamp in seconds.

request_id
string

Passed by the client, must be unique. A unique identifier to distinguish each request. If not provided by the client, the platform will generate one by default.

model
string

Model name

text
string

The complete transcribed content of the audio.