Use the GLM-ASR-2512 model to transcribe audio files into text, supporting multiple languages and real-time streaming transcription.
The audio file to be transcribed. Supported audio file formats: .wav / .mp3. Specifications: file size ≤ 25 MB, audio duration ≤ 30 seconds.
The model ID to invoke.
glm-asr-2512 Base64 encoded audio file. Only one of file_base64 or file needs to be provided (if both are provided, file takes precedence).
In long text scenarios, you can provide previous transcription results as context. Recommended to be less than 8000 characters.
Hotword list to improve recognition accuracy for domain-specific vocabulary. Format example: ["person_name","place_name"]. Recommended not to exceed 100 items.
100This parameter should be set to false or omitted when using synchronous calls. It indicates that the model returns all content at once after generating all content. Default is false. If set to true, the model will return generated content in chunks via standard Event Stream. When the Event Stream ends, a data: [DONE] message will be returned.
Passed by the client, must be unique. A unique identifier to distinguish each request. If not provided by the client, the platform will generate one by default.
A unique ID for the end user, helping the platform intervene in illegal activities, generation of illegal or inappropriate content, or other abusive behaviors by end users. ID length requirement: at least 6 characters, at most 128 characters.
Request processed successfully
Task ID
Request creation time, as a Unix timestamp in seconds.
Passed by the client, must be unique. A unique identifier to distinguish each request. If not provided by the client, the platform will generate one by default.
Model name
The complete transcribed content of the audio.