# Agent Chat Source: https://docs.z.ai/api-reference/agents/agent POST /v1/agents General Translation: General Translation API provides large model-based multilingual translation services, including general translation, paraphrase translation, two-step translation, and three-pass translation strategies. It supports automatic language detection, glossary customization, translation suggestions, and streaming output. Users only need to call the Translation API, input the text to be processed, specify the source language (auto-detection supported) and target language to receive high-quality translation results. Popular Special Effects Videos: Popular special effects videos are intelligent templates launched based on trending features from pan-entertainment platforms, designed to precisely adapt to short video creative production needs. Currently, three effect templates are available: `French Kiss`, `Body Shake Dance`, and `Sexy Me`. After selecting a template, users only need to upload an image and enter corresponding prompts to generate a special effects video. GLM Slide/Poster Agent: An intelligent creation agent built for working people and creators. It goes beyond traditional engineering-style assembly tools—supporting one-click generation of slides or posters from natural language instructions. By natively integrating content generation with layout aesthetics and design conventions, it helps you quickly produce polished, professional-grade materials while lowering design barriers and boosting creative efficiency. # Conversation History Source: https://docs.z.ai/api-reference/agents/agent-conversation POST /v1/agents/conversation This endpoint is used to query the agent conversation history.Only support slides_glm_agent # File Upload Source: https://docs.z.ai/api-reference/agents/file-upload POST /paas/v4/files This API is designed for uploading auxiliary files (such as glossaries, terminology lists) to support the translation service. It allows users to upload reference materials that can enhance translation accuracy and consistency. ## File Limitations * Maximum 100 files * Maximum 100MB per file * Files retained for 180 days * Supported formats: pdf, doc, xlsx, ppt, txt, jpg, png # Retrieve Result Source: https://docs.z.ai/api-reference/agents/get-async-result POST /v1/agents/async-result This endpoint is used to query the result of an asynchronous request. # Errors Source: https://docs.z.ai/api-reference/api-code When calling the API, the response code consists of two parts: the outer layer is the HTTP status code, and the inner layer is the business error code defined by Z.AI in the response body, which provides a more detailed error description. ## HTTP Status Code | Code | Reason | Solution | | :--- | :---------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------- | | 200 | Business processing successful | - | | 400 | Parameter error | Check if the interface parameters are correct | | 400 | File content anomaly | Check if the jsonl file content meets the requirements | | 401 | Authentication failure or Token timeout | Confirm if the API KEY and authentication token are correctly generated | | 404 | Fine-tuning feature not available | Contact customer service to activate this feature | | 404 | Fine-tuning task does not exist | Ensure the fine-tuning task ID is correct | | 429 | Interface request concurrency exceeded | Adjust the request frequency or contact business to increase concurrency | | 429 | File upload frequency too fast | Wait briefly and then request again | | 429 | Account balance exhausted | Recharge the account to ensure sufficient balance | | 429 | Account anomaly | Account has violation, please contact platform customer service to unlock | | 429 | Terminal account anomaly | Terminal user has violation, account has been locked | | 434 | No API permission, fine-tuning API and file management API are in beta phase, we will open soon | Wait for the interface to be officially open or contact platform customer service to apply for beta | | 435 | File size exceeds 100MB | Use a jsonl file smaller than 100MB or upload in batches | | 500 | Server error occurred while processing the request | Try again later or contact customer service | ## Business Error Codes | Error Category | Code | Error Message | | :--------------------- | :--- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Basic Error | 500 | Internal Error | | Authentication Error | 1000 | Authentication Failed | | | 1001 | Authentication parameter not received in Header, unable to authenticate | | | 1002 | Invalid Authentication Token, please confirm the correct transmission of the Authentication Token | | | 1003 | Authentication Token expired, please regenerate/obtain | | | 1004 | Authentication failed with the provided Authentication Token | | | 1100 | Account Read/Write | | Account Error | 1110 | Your account is currently inactive. Please check your account information | | | 1111 | Your account does not exist | | | 1112 | Your account has been locked, please contact customer service to unlock | | | 1113 | Your account is in arrears, please recharge and try again | | | 1120 | Unable to successfully access your account, please try again later | | | 1121 | Account has irregular activities and has been locked | | API Call Error | 1200 | API Call Error | | | 1210 | Incorrect API call parameters, please check the documentation | | | 1211 | Model does not exist, please check the model code | | | 1212 | Current model does not support `${method}` call method | | | 1213 | `${field}` parameter not received properly | | | 1214 | Invalid `${field}` parameter. Please check the documentation | | | 1215 | `${field1}` and `${field2}` cannot be set simultaneously, please check the documentation | | | 1220 | You do not have permission to access `${API_name}` | | | 1221 | API `${API_name}` has been taken offline | | | 1222 | API `${API_name}` does not exist | | | 1230 | API call process error | | | 1231 | You already have a request: `${request_id}` | | | 1234 | Network error, error id: `${error_id}`, please contact customer service | | | 1261 | Prompt exceeds max length | | API Policy Block Error | 1300 | API call blocked by policy | | | 1301 | System detected potentially unsafe or sensitive content in input or generation. Please avoid using prompts that may generate sensitive content. Thank you for your cooperation. | | | 1302 | High concurrency usage of this API, please reduce concurrency or contact customer service to increase limits | | | 1303 | High frequency usage of this API, please reduce frequency or contact customer service to increase limits | | | 1304 | Daily call limit for this API reached. For more requests, please contact customer service to purchase | | | 1305 | The API has triggered a rate limit. | | | 1308 | Usage limit reached for . Your limit will reset at `${next_flush_time}` | | | 1309 | Your GLM Coding Plan package has expired and is temporarily unavailable. You can resume using it after renewing the subscription on the official website. [https://z.ai/subscribe](https://z.ai/subscribe) | | | 1310 | Weekly/Monthly Limit Exhausted. Your limit will reset at `${next_flush_time}` | | | 1311 | Your current subscription plan does not yet include access to `${model_name}` | | | 1312 | This model is currently experiencing high traffic. Please try again later, or switch to another model such as `${model_name}` | | | 1313 | Your usage violates the Fair Use Policy. Your request rate has been restricted. See Subscription Service Agreement for details. To restore access, go to Personal Center → My Subscription and request to lift the restriction | ## Error Shapes Errors are always returned as JSON, with a top-level error object that includes a `code` and `message` value. ```json theme={null} { "error": { "code": "1214", "message": "Input cannot be empty" } } ``` ## Error Example The following is the response message of a curl request, where 401 is the HTTP status code and 1002 is the business error code. ``` * We are completely uploaded and fine < HTTP/2 401 < date: Wed, 20 Mar 2024 03:06:05 GMT < content-type: application/json < set-cookie: acw_tc=76b20****a0e42;path=/;HttpOnly;Max-Age=1800 < server: nginx/1.21.6 < vary: Origin < vary: Access-Control-Request-Method < vary: Access-Control-Request-Headers < * Connection #0 to host open.z.ai left intact {"error":{"code":"1002","message":"Authorization Token is invalid, please ensure that the Authorization Token is correctly provided."}} ``` > **Note**: When using streaming (SSE) calls, if the API terminates abnormally during inference, the above error codes will not be returned. Instead, the reason for the exception will be provided in the `finish_reason` parameter of the response body. For details, please refer to the description of the `finish_reason` parameter. # Audio Transcriptions Source: https://docs.z.ai/api-reference/audio/audio-transcriptions POST /paas/v4/audio/transcriptions Use the [GLM-ASR-2512](/guides/audio/glm-asr-2512) model to transcribe audio files into text, supporting multiple languages and real-time streaming transcription. # Generate Image Source: https://docs.z.ai/api-reference/image/generate-image POST /paas/v4/images/generations Use [GLM-Image](/guides/image/glm-image) series models to generate high-quality images from text prompts. Through quick and accurate understanding of user text descriptions, `AI` image expression becomes more precise and personalized. # Generate Image(Async) Source: https://docs.z.ai/api-reference/image/generate-image-async POST /paas/v4/async/images/generations Use the [GLM-Image](/guides/image/glm-image) series models to generate high-quality images from text prompts. Through quick and accurate understanding of user text descriptions, `AI` image expression becomes more precise and personalized. Only supports `GLM-Image` model. # Retrieve Result Source: https://docs.z.ai/api-reference/image/get-image-status GET /paas/v4/async-result/{id} This endpoint is used to query the result of an asynchronous request. # Introduction Source: https://docs.z.ai/api-reference/introduction The API reference describes the RESTful APIs you can use to interact with the Z.AI platform. Z.AI provides standard HTTP API interfaces that support multiple programming languages and development environments, with [SDKs](/guides/develop/python/introduction) also available. ## API Endpoint Z.ai Platform's general API endpoint is as follows: ``` https://api.z.ai/api/paas/v4 ``` Note: When using the [GLM Coding Plan](/devpack/overview), you need to configure the dedicated \ Coding endpoint - [https://api.z.ai/api/coding/paas/v4](https://api.z.ai/api/coding/paas/v4) \ instead of the general endpoint - [https://api.z.ai/api/paas/v4](https://api.z.ai/api/paas/v4) \ Note: - The GLM Coding Plan API endpoint is intended for [supported tools](https://docs.z.ai/devpack/tool/others#step-1-supported-tools) only. For other use cases, we recommend using the general API endpoint. ## Authentication The Z.AI API uses the standard **HTTP Bearer** for authentication. An API key is required, which you can create or manage on the [API Keys Page](https://z.ai/manage-apikey/apikey-list). API keys should be provided via HTTP Bearer Authentication in HTTP Request Headers. ``` Authorization: Bearer ZAI_API_KEY ``` ## Playground The API Playground allows developers to quickly try out API calls. Simply click **Try it** on the API details page to get started. * On the API details page, there are many interactive options, such as **switching input types**, **switching tabs**, and **adding new content**. * You can click **Add an item** or **Add new property** to add more properties the API need. * **Note** that when switching the tabs, the previous properties value you need re-input or re-switch. ## Call Examples ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Accept-Language: en-US,en" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "model": "glm-5.1", "messages": [ { "role": "system", "content": "You are a helpful AI assistant." }, { "role": "user", "content": "Hello, please introduce yourself." } ], "temperature": 1.0, "stream": true }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Usage Example** ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key="YOUR_API_KEY") # Create chat completion request response = client.chat.completions.create( model="glm-5.1", messages=[ { "role": "system", "content": "You are a helpful AI assistant." }, { "role": "user", "content": "Hello, please introduce yourself." } ] ) # Get response print(response.choices[0].message.content) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Usage Example** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.*; import java.util.Arrays; public class QuickStart { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI() .apiKey("YOUR_API_KEY") .build(); // Create chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5.1") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Hello, who are you?") .build() )) .stream(false) .build(); // Send request ChatCompletionResponse response = client.chat().createChatCompletion(request); // Get response System.out.println(response.getData().getChoices().get(0).getMessage().getContent()); } } ``` **Install SDK** ```bash theme={null} # Install or upgrade to latest version pip install --upgrade 'openai>=1.0' ``` **Verify Installation** ```python theme={null} python -c "import openai; print(openai.__version__)" ``` **Usage Example** ```python theme={null} from openai import OpenAI client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/" ) completion = client.chat.completions.create( model="glm-5.1", messages=[ {"role": "system", "content": "You are a smart and creative novelist"}, {"role": "user", "content": "Please write a short fairy tale story as a fairy tale master"} ] ) print(completion.choices[0].message.content) ``` **Install SDK** ```bash theme={null} # Install or upgrade to latest version npm install openai # Or using yarn yarn add openai ``` **Usage Example** ```javascript theme={null} import OpenAI from "openai"; const client = new OpenAI({ apiKey: "your-Z.AI-api-key", baseURL: "https://api.z.ai/api/paas/v4/" }); async function main() { const completion = await client.chat.completions.create({ model: "glm-5.1", messages: [ { role: "system", content: "You are a helpful AI assistant." }, { role: "user", content: "Hello, please introduce yourself." } ] }); console.log(completion.choices[0].message.content); } main(); ``` **Install SDK** **Maven** ```xml theme={null} com.openai openai-java 2.20.1 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'com.openai:openai-java:2.20.1' ``` **Usage Example** ```java theme={null} import com.openai.client.OpenAIClient; import com.openai.client.okhttp.OpenAIOkHttpClient; import com.openai.models.chat.completions.ChatCompletion; import com.openai.models.chat.completions.ChatCompletionCreateParams; public class QuickStart { public static void main(String[] args) { // Initialize client OpenAIClient client = OpenAIOkHttpClient.builder() .apiKey("your-Z.AI-api-key") .baseUrl("https://api.z.ai/api/paas/v4/") .build(); // Create chat completion request ChatCompletionCreateParams params = ChatCompletionCreateParams.builder() .addSystemMessage("You are a helpful AI assistant.") .addUserMessage("Hello, please introduce yourself.") .model("glm-5.1") .build(); // Send request and get response ChatCompletion chatCompletion = client.chat().completions().create(params); Object response = chatCompletion.choices().get(0).message().content(); System.out.println(response); } } ``` # Chat Completion Source: https://docs.z.ai/api-reference/llm/chat-completion POST /paas/v4/chat/completions Create a chat completion model that generates AI replies for given conversation messages. It supports multimodal inputs (text, images, audio, video, file), offers configurable parameters (like temperature, max tokens, tool use), and supports both streaming and non-streaming output modes. # Rate Limits Source: https://docs.z.ai/api-reference/rate-limit # Layout Parsing Source: https://docs.z.ai/api-reference/tools/layout-parsing POST /paas/v4/layout_parsing Use the [GLM-OCR](/guides/vlm/glm-ocr) model to parse the layout of documents and images and extract text content. Support OCR recognition of images and PDF documents, returning detailed layout information and visualization results. # Tokenizer Source: https://docs.z.ai/api-reference/tools/tokenizer POST /paas/v4/tokenizer `Tokenizer` is used to split text into `tokens` recognizable by the model and calculate the count. It receives user input text, processes it through the model for tokenization, and finally returns the corresponding `token` count. It is suitable for text length evaluation, model input estimation, dialogue context truncation, cost calculation, etc. # Web Reader Source: https://docs.z.ai/api-reference/tools/web-reader POST /paas/v4/reader Reads and parses the content of the specified URL. Supports selectable return formats, cache control, image retention, and summary options. # Web Search Source: https://docs.z.ai/api-reference/tools/web-search POST /paas/v4/web_search The [Web Search](/guides/tools/web-search) is a specialized search engine for large language models. Building upon traditional search engine capabilities like web crawling and ranking, it enhances intent recognition to return results better suited for LLM processing (including webpage titles, URLs, summaries, site names, favicons etc.). # Generate Video(Async) Source: https://docs.z.ai/api-reference/video/generate-video POST /paas/v4/videos/generations CogVideoX is a video generation large model developed by Z.AI, equipped with powerful video generation capabilities. Simply inputting text or images allows for effortless video creation. Vidu: A high-performance video large model that combines high consistency and high dynamism, with precise semantic understanding and exceptional reasoning speed. # Retrieve Result Source: https://docs.z.ai/api-reference/video/get-video-status GET /paas/v4/async-result/{id} This endpoint is used to query the result of an asynchronous request. # Invite Friends, Get Credits Source: https://docs.z.ai/devpack/credit-campaign-rules Please read and fully understand all the rules below before participating in the campaign. Participation constitutes your acceptance and agreement to be bound by these rules. Last Update: March 15, 2026 ## 1. General Principles * **Campaign Organizer**: This campaign is initiated and operated by the service provider of Z.ai Platform (hereinafter referred to as "we," "us," or the "Platform"). * **Campaign Purpose**: To reward users and enhance product experience, we are launching the "Invite Friends, Get Credits" campaign. * **Campaign Nature**: This campaign offers incentive Credits. All Credits are preferential benefits provided by the Platform, **not cash, currency, or financial instruments**, and do not possess monetary attributes. * **User Agreement**: By participating, you represent that you have **read, understood, and agreed** to be bound by these rules. We reserve the right to modify or adjust these rules based on laws, regulations, and operational needs. Changes will be announced on the campaign page in advance. Event link: >>[Click here to join](https://z.ai/subscribe?invitedialog=true) ## 2. Benefits for Invited Users * **Invitation Validity Period**: * Friends enter via referral link, log in on the event page to bind. Inviter gets bonus if friend completes first subscription payment within 72hrs. If friend clicks multiple links, reward goes to the last valid touchpoint inviter. * **Instant Discount**: * Invited users who complete registration via a unique invitation link or code and **successfully pay for their first subscription to 【GLM Coding】 service exclusively** will enjoy a **5% instant discount** on the order amount (the discount amount will be directly deducted from the total amount payable). * **Important Payment Note**: Due to requirements of our payment processor (Stripe), the final amount payable **after applying all discounts and Credits must be at least 0.50 USD**. If the discount reduces the payable amount below 0.50 USD, the discount applied may be adjusted to ensure the minimum payment threshold is met. * **Eligible Users**: * Only **newly registered users** of the Platform, or **existing registered users who have never had any paid subscription history**. * Each user (determined by mobile number or email address) can only enjoy the first-order discount **once as an invited user**. * **Exclusions**: * This discount **applies only to the current initial subscription order**. * **Subsequent renewals, upgrades, downgrades, or any other follow-up subscription orders are NOT eligible** for this discount. * This discount cannot be stacked with other similar first-order discount campaigns on the Platform. ## 3. Benefits for Inviters Inviting users (hereinafter "Inviter") can earn the following Credits rewards after successfully inviting friends (i.e., "Invited Users") who complete registration and their first valid payment: * **Bonus Eligibility Threshold (Payout begins once 3 people are reached)**: * The referrer must **successfully invite a total of 3 eligible friends** who have completed their first package purchase before receiving the bonus reward. * **Once the 3-person threshold is met, the referral bonuses for the first 3 valid friends will be disbursed in a single lump-sum payment**; thereafter, the corresponding bonus for each new valid friend will be disbursed immediately. * While the 3-person threshold has not yet been met, any accrued referral bonuses will remain in a **pending** status. They can be viewed in the referrer’s account but cannot be used until the 3-person threshold is met, at which point they will be automatically unlocked and disbursed. * **Base Reward**: * For each successfully invited valid friend, the Inviter receives a Credits equivalent to **10% of the friend's first order Actual Payment Amount** (defined below). * There is **no upper limit** on the number of invites; rewards accumulate continuously. * **Tiered Extra Reward**: * For **every cumulative 30** successfully invited valid friends, the Inviter receives an additional one-time Credits equivalent to **10% of the total Actual Payment Amount** from those 30 friends. * Extra rewards are calculated per "**round**" (each set of 30 invites). Rounds can repeat and accumulate without an upper limit. * **Definition of "Actual Payment Amount"**: * Refers to **the final cash amount actually paid** by the Invited User for their first subscription order. This amount specifically includes payments made via cash balance in the account or third-party payment methods. * This amount **excludes** any reductions from coupons, discount codes, credits, points, cash rebates, or any other promotional benefits, **regardless of** any subsequent refunds. * **Credits Nature & Usage**: * **Nature**: Credits are **non-cash** preferential benefits provided by the Platform. * **Usage**: Credits can only be used to offset the following consumption on Z.ai Platform: * Purchasing or renewing GLM Coding and other subscription services. * Purchasing virtual products like resource packs or feature expansion packs within the platform. * Paying for platform API call fees. * **Restrictions**: Credits are **non-refundable, non-transferable, non-giftable**, and **cannot be used to request refunds for subscription services**. * **Validity Period**: Credits are **long-term valid** unless the user account is deactivated or the Platform ceases operation. We reserve the right to adjust the Credits validity period based on business needs in the future, with 30 days' advance notice provided. ## 4. Important Notes & Risk Warnings * **Validity Criteria & Single Account Principle**: * The **sole criterion** for a valid invitation is: The friend registers via the Inviter's **unique invitation link or code** and completes the first subscription payment without refund within the campaign period. * **Single Account Principle**: Each natural person may only use **one platform account** to participate. The Platform determines user identity based on multiple factors including **phone number, email address, device ID (e.g., IMEI/OAID/IMSI), IP address, etc. The same phone number, device, payment account, or network environment may be deemed the same user**. * The Platform's backend system records are the **sole and final basis** for determining invitation relationships, validity, and user identity. In case of discrepancy with user records, the system records prevail. * If you have objections regarding invitation records or Credits issuance, please contact customer service within **15 days** of the objection arising for investigation. Objections raised after this period will be considered waived. * **User Information Authorization & Privacy**: * To facilitate invitation tracking, Credits distribution, and anti-fraud checks, you understand and authorize the Platform to process necessary account and information (e.g., account ID, registration time, device info, payment info) for you and your friends within the scope of this campaign. * The Platform will strictly adhere to its privacy policy to protect all users' personal information. Invitation relationship information will not be disclosed to third parties without user consent. * **Violations & Penalties**: * Users must comply with the Platform's User Agreement, Terms of Service, Subscription Agreement, Top-Up Agreement, campaign rules, etc. **Any form of cheating, fraud, or disruption is strictly prohibited**, including but not limited to: * **Technical Cheating**: Using plugins, scripts, emulators, virtual location software, hacker tools, automated scripts, bulk registration tools, etc., to fake invitations, registrations, payments, or tamper with device/network data. * **False Info & Account Trading**: Using false information for registration/verification, infringing on third-party rights, buying/selling platform accounts. * **Artificial Engagement**: Paying for services, joining booster groups, or exchanging non-genuine social engagements to complete tasks. * **Disruption**: Launching network attacks, exploiting loopholes, bypassing security measures, or any activity disrupting normal platform operations. * **Account Association**: If any account is found involved in fraud/cheating, we reserve the right to **disqualify all associated accounts** from the campaign and **reclaim all rewards** earned through those accounts based on reasonable judgment. * **Penalties**: If users are found violating these rules per our risk control systems or reasonable judgment, we may take one or more actions **without prior notice**: * **Deduct** all issued but unused Credits; **Reclaim the cash value** of any consumed Credits. * **Restrict, freeze, or permanently ban** violating accounts. * **Permanently prohibit** violating and associated accounts from all current and future Platform campaigns and benefits. * **Reserve the right to claim compensation** for all direct/indirect losses caused (e.g., extra operational costs, reputational damage, recovery costs) and **report to authorities**. * **Pursue legal liability** for severe cases involving significant amounts or illegal activities. * **Campaign Changes, Suspension, & Termination**: * We reserve the right to **adjust, pause, or terminate** this campaign early based on actual operations, laws, regulations, or force majeure, announced via our website, in-app messages, or other reasonable means. Users may not hold the Platform liable for such actions. * **Final Interpretation Right**: * **To the extent permitted by law**, Z.ai Platform reserves the **final right to interpret** all terms, details, and reward distributions related to this campaign. We have the right to make final judgments on unforeseen circumstances or ambiguities in a **fair, reasonable manner consistent with the campaign's purpose**. *** **If you have any questions about these rules, please contact our customer service. Thank you for your participation!** # Coding Tool Helper Source: https://docs.z.ai/devpack/extension/coding-tool-helper A command-line assistant that helps GLM Coding Plan users centrally manage and configure CLI tools such as Claude Code. **NPM Package**: [@z\_ai/coding-helper](https://www.npmjs.com/package/@z_ai/coding-helper) \ **Prerequisite**: [Node.js >= v18.0.0](https://nodejs.org/en/download/) ## Tool Overview Coding Tool Helper is a coding-tool companion that quickly loads **GLM Coding Plan** into your favorite **Coding Tools**. Install and run it, then follow the on-screen guidance to automatically install tools, configure plan, and manage MCP servers. The current coding tools supported are: * **Claude Code** * **OpenCode** * **Crush** * **Factory Droid** ## Key Features Friendly setup guidance Connect GLM Plan to your preferred coding tools Automatically detect, install, and configure coding tools Easily manage MCP services Secure local storage configuration Interfaces support multiple languages ## Quick Start Visit the [Z.AI Open Platform](https://z.ai/model-api) to retrieve your API Key. Prerequisite: You need [Node.js 18+ or newer](https://nodejs.org/en/download/) \ Choose either installation method below. Best for occasional users—no global install required. Run via npx to start instantly. ```shell theme={null} ## Run Coding Tool Helper directly in the terminal npx @z_ai/coding-helper ``` Ideal for heavy users. Install globally, then launch with `coding-helper` or `chelper`. If `npm install` fails with `permission denied`, add sudo (macOS/Linux) or run the terminal as administrator (Windows).\ Example: `sudo npm install -g @z_ai/coding-helper` \ Alternatively, simply use npx: `npx @z_ai/coding-helper` ```shell theme={null} ## Install @z_ai/coding-helper globally npm install -g @z_ai/coding-helper ## Then run coding-helper or chelper coding-helper ``` Inside the wizard, use the arrow keys to choose options and Enter to confirm. Follow the guide to: \ Select UI language --> Choose a coding plan --> Enter API key --> Pick tools to manage \ \--> Auto-install tools (if needed) --> Open the tool management menu --> Load plan into tools \ \--> Manage MCP services (optional) --> Finish setup and launch your coding tools ## Additional Information ### Command List > Beyond the interactive wizard, Coding Tool Helper also supports running specific commands via `coding-helper` or `chelper` with arguments: ```bash theme={null} # Launch the initialization wizard coding-helper init # Language management coding-helper lang show # Display the current language coding-helper lang set en_US # Switch to English coding-helper lang --help # View language command help # API key management coding-helper auth # Configure the key interactively coding-helper auth glm_coding_plan_global # Select the Global plan and set the key directly coding-helper auth revoke # Remove the stored key coding-helper auth reload claude # Load the latest plan into Claude Code coding-helper auth --help # View auth command help coding-helper doctor # Check system configuration and tool status coding-helper --help # Show help information coding-helper --version # Show version ``` ### Troubleshooting If issues arise, run `coding-helper doctor` first for a health check. **Issue:** When saving or validating the API KEY or performing other network operations, you may see network errors such as `Network Error`. **Solution:** 1. Check your network connection or configure a proxy. 2. Note: If you must use a proxy to access external networks, Node.js does not automatically use the system proxy settings. Set the environment variables `HTTP_PROXY` and `HTTPS_PROXY` to make Node.js use the proxy. ```shell theme={null} # Example: export HTTP_PROXY=http://your.proxy.server:port export HTTPS_PROXY=http://your.proxy.server:port ``` **Issue:** When running or installing the coding tools, timeout or other network timeout errors appear. **Solution:** 1. Check the network connection or configure a proxy. **Issue:** `npm install -g` throws EACCES: permission denied. **Solution:** 1. Retry with sudo (macOS / Linux). 2. Run the terminal as administrator (Windows). 3. Start directly via `npx @z_ai/coding-helper`. 4. Use nvm to manage Node.js versions and avoid global permission issues. **Issue:** While using the Claude Code Marketplace, the plugin status is incorrect (e.g., it shows “not installed” even though it is already installed). **Solution:** 1. Run `claude update` to upgrade Claude Code to version 2.0.70 or later. **Issue:** API Key reported as invalid. **Solution:** 1. Confirm the API Key was copied correctly. 2. Check that the associated account has sufficient balance. **Issue:** Service connection timed out. **Solution:** 1. Check network connectivity. 2. Verify firewall settings. 3. Ensure Node.js and the network environment are ready. # FAQ Source: https://docs.z.ai/devpack/faq ## GLM Coding Plan Details **Q: Plan to Add Support for GLM-5.1 ? How can I switch between them?** **A:** All plans support **GLM-5.1**, GLM-5-Turbo, GLM-4.7 and GLM-4.5-Air. Mapping between Claude Code internal model environment variables and GLM models, with the default configuration as follows: * `ANTHROPIC_DEFAULT_OPUS_MODEL`: `GLM-4.7` * `ANTHROPIC_DEFAULT_SONNET_MODEL`: `GLM-4.7` * `ANTHROPIC_DEFAULT_HAIKU_MODEL`: `GLM-4.5-Air` If you want to use `GLM-5.1`, you need to manually change the model to “GLM-5.1” in your custom configuration (e.g., `~/.claude/settings.json` in Claude Code). *** **Q: Why is the quota consumed faster when using GLM-5.1? Is there a way to save quota?** **A:** As advanced models, GLM-5.1 and GLM-5-Turbo are comparable to Claude Opus model. **Usage for these models will be calculated and deducted at “3 times the standard rate during peak hours and 2 times the standard rate during off-peak hours”;** we recommend switching to GLM-5.1 for complex tasks and continuing to use GLM-4.7 for general tasks to avoid depleting your plan’s usage quota too quickly. **As a limited-time benefit, GLM-5.1 and GLM-5-Turbo will only consume 1× quota during off-peak hours, valid through the end of June.** Usage suggestions for more efficient quota utilization: * GLM-4.7 is comparable to Claude's Sonnet-level model, and GLM-4.7 is sufficient for daily development and routine tasks. * GLM-5.1 and GLM-5-Turbo is comparable to Claude's Opus-level model, and in high-difficulty scenarios such as complex reasoning and large-scale engineering tasks, it switches to GLM-5.1. By reasonably combining different model capability levels, it is possible to more efficiently control quota consumption while ensuring effectiveness. **Recommended Configuration (taking Claude Code as an example):** * Edit or add the `settings.json` file * MacOS & Linux: `~/.claude/settings.json` * Windows user directory: `/.claude/settings.json` * Add or modify the env field inside * Note to replace the `your_zhipu_api_key` inside with the API Key you obtained in the previous step. ``` { "env": { "ANTHROPIC_AUTH_TOKEN": "your_Z.ai_api_key", "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic", "API_TIMEOUT_MS": "3000000", "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": 1, "ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5.1", "ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7", "ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air" } } ``` *** **Q: How much usage quota does the plan provide?** **A:** Subscribe once, unlock unmatched usage and unbeatable value. * **Lite Plan**: Up to \~80 prompts every 5 hours — about 3× the usage quota of the Claude Pro plan. * **Pro Plan**: Up to \~400 prompts every 5 hours — about 5× the usage quota of the Lite plan. * **Max Plan**: Up to \~1600 prompts every 5 hours — about 4× the usage quota of the Pro plan. In terms of token consumption, each prompt typically allows 15–20 model calls, giving a total monthly allowance of tens of billions of tokens — all at only \~1% of standard API pricing, making it extremely cost-effective. > * The numbers above are estimates; actual usage may vary depending on task complexity, codebase size, and whether auto-accept is enabled. *** **Q: Apart from the 5-hour usage limit, are there any other restrictions on usage?** **A:** To manage resources and ensure fair access for all users, we have introduced a weekly usage quota limit. This quota starts counting from the time you place your order, and the quota is refreshed and reset on a 7-day cycle. You can view the progress of your quota consumption in the [subscription](https://z.ai/manage-apikey/subscription). Users who subscribe and enable auto-renewal before February 12 will enjoy unlimited weekly usage for the duration of their subscription. *** **Q: Which tools are supported?** **A:** The GLM Coding Plan is strictly limited to use within officially [supported tools and products](https://docs.z.ai/devpack/tool/others#step-1-supported-tools). The subscriber shall not use the subscription benefits in any unsupported tools or scenarios.
Please refer to our tool guide for step-by-step setup. All supported coding tools share the same usage quota under your subscription. *** **Q: What happens when my plan quota runs out? Will the system consume my account balance?** **A:** No. GLM calls made in supported tools will **only** use your Coding Plan quota. * Once the quota is used up, you’ll need to wait until the next **5-hour cycle** for it to refresh. The system will not deduct from your account balance. * Users subscribed to the Coding Plan can only make calls via the plan’s quota in supported tools. API calls outside the plan are not available. *** ## MCP Call **Q:Which plan supports Vision Understanding, Web Search and Web Reader MCP tools?** **A:** All plans support these MCP tools, but the Lite plan includes only a limited quota for trial purposes. *** **Q:What is the quota of Vision Understanding, Web Search and Web Reader MCP in the plan?** **A:** The MCP quotas for the Lite, Pro and Max plans are as follows: * **Lite Plan**: Include a total of 100 web searches and web readers per month, along with the 5-hour maximum prompt resource pool of the package for vision understanding. * **Pro Plan**: Include a total of 1,000 web searches and web readers per month, along with the 5-hour maximum prompt resource pool of the package for vision understanding. * **Max Plan**: Include a total of 4,000 web searches and web readers per month, along with the 5-hour maximum prompt resource pool of the package for vision understanding. *** **Q:Besides using the GLM Coding Plan resource package, can I use other methods to call these three MCP tools for Vision Understanding, Web Search and Web Reader?** **A:** Other than the resource package, we currently do not provide any other access solutions for calling these three MCP tools. If you use other similar MCP tools and incur billing issues during their use, such issues are not within the scope of this package. *** ## Subscription Management **Q: Will my subscription renew automatically?** **A:** Yes. Your subscription will automatically renew at the end of each billing cycle, and the fee will be charged to your saved payment method. *** **Q: How are subscription fees deducted?** **A:** The system deducts fees in the following order: 1. Credits balance will be used first. 2. If insufficient, cash balance will be used;. 3. If still insufficient, the remaining amount will be charged from your linked payment method (e.g., bank card or PayPal). Please note that a small minimum applies when charging your credit card. If the remaining amount is less, we will round up the deduction to meet this minimum. *** **Q: How can I cancel my subscription?** **A:** You can cancel your subscription on the subscription management page. Please make sure to cancel at least 24 hours before your next billing date to avoid auto-renewal. After cancellation, your current plan remains valid until it expires. *** **Q: Can I get a refund?** **A:** Please note that subscriptions are non-refundable once purchased. Even if you haven’t used your full plan, the fees cannot be returned. We recommend choosing a subscription plan and billing cycle that best fits your usage needs. *** ## Upgrade your Plan **Q: How can I upgrade my coding plan?** **A:** For **same-tier plan upgrades** (e.g., monthly plan to annual plan), the newly purchased plan will **not take effect immediately**. It will only become active after the expiration of your current plan, meaning the validity periods of the plans are cumulative. Example: Upgrading from a **Lite Monthly Plan** to a **Lite Annual Plan** will grant you a total of 13 months of service validity. For **cross-tier plan upgrades**, the new plan will take effect immediately. The remaining value of your original plan will be converted into account balance on a pro-rated basis, which can be used to offset the price difference of the upgrade. *** **Q: How can I downgrade my plan?** **A:** On the subsciption page, select your desired plan. The change will take effect after the current billing cycle ends. You can also cancel your current subscription and re-subscribe to the desired plan after the billing cycle ends. Steps are as follows: 1. Go to your [subscription page](https://z.ai/subscribe?utm_source=zai\&utm_medium=index\&utm_term=glm-coding-plan\&utm_campaign=Platform_Ops&_channel_track_key=6lShUDnv). 2. Click “Subscribe” and select your desired plan. 3. Confirm the change. 4. The new plan will take effect after the current billing cycle ends. *** **Q: How can I change my billing cycle?** **A:** On the subsciption page, select your desired plan. The change will take effect after the current billing cycle ends. You can also cancel your current subscription and re-subscribe to the desired plan after the billing cycle ends. Steps are as follows: 1. Go to your [subscription page](https://z.ai/subscribe?utm_source=zai\&utm_medium=index\&utm_term=glm-coding-plan\&utm_campaign=Platform_Ops&_channel_track_key=6lShUDnv). 2. Click “Subscribe” and select your desired billing cycle. 3. Confirm the change. 4. The new cycle will take effect after the current billing period ends. *** ## 'Invite Friends, Get Credits' **Q: Where can I view the credits I earned?** **A:** You can view your credit rewards in the \[Invite Friends, Earn Credits pop-up - Your Credit Rewards] or check the details under \[Billing - Transaction History]. *** **Q: Where can I use my credits? Can I withdraw them?** **A:** Credits can be used on \[Z.ai platform] to offset various purchases, including subscriptions like GLM Coding Plan, resource packs, feature extensions, and API calls. Credits are promotional benefits and cannot be withdrawn, transferred, or refunded. *** **Q: How is a successful invitation defined? What does my friend need to do?** **A:** A successful invitation must meet all of the following: 1. The referred friend shall access the campaign page through the inviter's unique referral link or code. 2. Your friend is a new user (has never paid for a subscription). 3. They complete their first GLM Coding Plan subscription payment within 72 hours. 4. The order is not refunded within 24 hours. *** **Q: When will I receive the credits after inviting a friend?** **A:** Credits are usually issued within 24-48 hours after your friend’s payment is confirmed and the order passes review. Check \[Finance - Transaction History]. If not received after 72 hours, contact support. *** **Q: What does "friend’s actual payment amount" mean? How are credits calculated if a coupon is used?** **A:** "Actual payment amount" refers to the final cash amount your friend paid for their first order. For example: If a plan costs 100 USD, and your friend gets a 50% first-order discount, an extra 10% off from your referral, and uses 5 USD in credits, the final actual payment is 40 USD. Your credits will be calculated based on 40 USD (e.g., 10% of 40 USD = 4 USD). Credit calculation does not include any coupons, discounts, point deductions, or refunded amounts. *** **Q: Why didn’t I receive credits after referring a friend?** **A:** Common reasons include: 1. Your friend didn’t register using your exclusive link/code. 2. Your friend completed payment after 72 hours. 3. Your friend was not a new subscribing user. 4. The order was refunded. 5. The system detected unusual activity (e.g., bulk registrations, fraud). **If none of these apply, please contact customer support with your account and your friend's registration information.** *** ## Technical Support *** **Q:Why does it still report error "1113 Insufficient Balance" after purchasing the coding package? Why is the account balance still deducted after purchasing the coding package?** **A:** The situation of reporting insufficient balance or deducting account balance may be due to not meeting the usage conditions of the GLM Coding Plan coding package: 1. The GLM Coding Plan is strictly limited to use within officially supported tools and products. 2. A specific baseurl address must be configured to use it: * API endpoint for Claude Code and Goose is:`https://api.z.ai/api/anthropic` * API endpoint for other tools is: `https://api.z.ai/api/coding/paas/v4` 3. Only the following four models can be called: GLM-5.1, GLM-5-Turbo, GLM-4.7, and GLM-4.5-Air. For detailed instructions, please refer to: [https://docs.z.ai/devpack/overview](https://docs.z.ai/devpack/overview) # Web Reader MCP Server Source: https://docs.z.ai/devpack/mcp/reader-mcp-server The Web Reader MCP Server is a Z.AI implementation based on the Model Context Protocol (MCP). It provides Claude Code, Cline, and other MCP-compatible clients with powerful web content extraction capabilities, including full-page content retrieval and structured data extraction. ## Overview This remote MCP server with web content reading capability is available to users on **GLM Coding Plan**, enabling your code agent to fetch detailed page content and structured data. ## Features Fetch the complete content of any webpage, including text, and links Extract structured data such as title, main body, and metadata HTTP-based remote MCP service, no local installation required ## Tools This server implements the Model Context Protocol and works with any MCP-compatible client. Currently, it provides the following tool: * **`webReader`** — Fetch webpage content for a specified URL. Returns the page title, main content, metadata, list of links, and more. ## Example Scenarios Automatically fetch and parse titles, body content, examples, and release notes from official documentation pages, distilling key takeaways to accelerate integration and implementation. Parse project websites or repository pages (such as README, release notes, and usage guides) to extract core information and link lists, assisting evaluation and integration. Extract steps, commands, and caveats from blogs, tutorials, and guide pages, organizing unstructured content into actionable developer notes and task lists. For issue remediation, read the publicly available steps on the specified web page and use them as references to resolve the problem. Convert content from designated web pages into structured data and leverage in-page links for incremental synchronization to build a team technical knowledge base. ## Installation and Usage ### Quick Start Visit [Z.AI Console](https://z.ai/manage-apikey/apikey-list) to get your api key According to the client you’re using, **choose the corresponding installation method from the options below**. ### Supported Clients **One-click install command** Replace `your_api_key` with the API key you obtained in the previous step ```bash theme={null} claude mcp add -s user -t http web-reader https://api.z.ai/api/mcp/web_reader/mcp --header "Authorization: Bearer your_api_key" ``` **Manual configuration** Edit the Claude Code configuration file under your home directory, the MCP section of `.claude.json`: ```json theme={null} { "mcpServers": { "web-reader": { "type": "http", "url": "https://api.z.ai/api/mcp/web_reader/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` Add the MCP server configuration in the Cline extension settings: Replace `your_api_key` with the API key you obtained in the previous step ```json theme={null} { "mcpServers": { "web-reader": { "type": "streamableHttp", "url": "https://api.z.ai/api/mcp/web_reader/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` If Cline older version does not support StreamableHttp type MCP server, you can use SSE type configuration: ```json theme={null} { "mcpServers": { "web-reader": { "type": "sse", "url": "https://api.z.ai/api/mcp/web_reader/sse?Authorization=your_api_key" } } } ``` Add the MCP server configuration in OpenCode settings: See the [OpenCode MCP documentation](https://opencode.ai/docs/mcp-servers) Replace `your_api_key` with the API key you obtained in the previous step ```json theme={null} { "$schema": "https://opencode.ai/config.json", "mcp": { "web-reader": { "type": "remote", "url": "https://api.z.ai/api/mcp/web_reader/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` Add the MCP server configuration in Crush settings: Replace `your_api_key` with the API key you obtained in the previous step ```json theme={null} { "$schema": "https://charm.land/crush.json", "mcp": { "web-reader": { "type": "http", "url": "https://api.z.ai/api/mcp/web_reader/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` Not support Goose now,refer [Issue](https://github.com/block/goose/issues/6576) Add the MCP server in Goose: Go to `Extensions` -> `Add custom extension` Set Extension Name to `web-reader`, Type to `HTTP`, and use the following endpoint: ``` https://api.z.ai/api/mcp/web_reader/mcp ``` Set Request Headers Add `Authorization` : `your_api_key` Click `Add Extension` at the bottom. Remember to replace `your_api_key` with the API key you obtained in the previous step. For Roo Code, Kilo Code, and other MCP-compatible clients, use the following general configuration: Replace `your_api_key` with the API key you obtained in the previous step ```json theme={null} { "mcpServers": { "web-reader": { "type": "streamable-http", "url": "https://api.z.ai/api/mcp/web_reader/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` ## Troubleshooting **Issue:** Received an invalid access token error **Solutions:** 1. Verify the token was copied correctly 2. Check that the token is activated 3. Ensure the token has sufficient balance 4. Confirm the Authorization header format is correct **Issue:** Connection to the MCP server timed out **Solutions:** 1. Check your network connection 2. Verify firewall settings 3. Ensure the server URL is correct 4. Increase client timeout settings **Issue:** Web content reading returned empty result or error **Solutions:** 1. Confirm the target URL is accessible 2. Check if the page has anti-scraping mechanisms 3. Try different URLs 4. Ensure network connectivity is normal 5. Contact technical support for assistance ## Quota The MCP quotas for the Lite, Pro and Max plans are as follows: * **Lite:** Include a total of 100 web searches and web readers, along with the 5-hour maximum prompt resource pool of the package for vision understanding. * **Pro:** Include a total of 1,000 web searches and web readers, along with the 5-hour maximum prompt resource pool of the package for vision understanding. * **Max:** Include a total of 4,000 web searches and web readers, along with the 5-hour maximum prompt resource pool of the package for vision understanding. ## Resources * [Model Context Protocol (MCP) Documentation](https://modelcontextprotocol.io/) * [Claude Code MCP Configuration Guide](https://docs.anthropic.com/en/docs/claude-code/mcp) * [Z.AI API Reference](/api-reference/introduction) * [GLM Coding Plan Overview](/devpack/overview) # Web Search MCP Server Source: https://docs.z.ai/devpack/mcp/search-mcp-server Web Search MCP Server is a Z.AI search capability implementation based on the Model Context Protocol (MCP), providing powerful Z.AI search capabilities for MCP-compatible clients such as Claude Code and Cline, including web search, real-time information retrieval, and other features. ## Product Overview This Remote MCP Server with search capabilities is an exclusive server developed by Z.AI for **GLM Coding Plan users**, empowering your Code Agent with search capabilities and unlimited access to real-time information and web resources. ## Features Supports comprehensive web search to retrieve the latest web information and resources Retrieves real-time updated information including news, stock prices, weather, and more HTTP protocol-based remote MCP service, no local installation required ## Supported Tools This server implements the Model Context Protocol and can be used with any MCP-compatible client. Currently provides the following tools: * **`webSearchPrime`** - Search web information, returning results including page titles, URLs, summaries, site names, site icons, and more. ## Installation and Usage ### Quick Start Visit [Z.AI Console](https://z.ai/manage-apikey/apikey-list) to get your api key According to the client you’re using, **choose the corresponding installation method from the options below**. ### Supported Clients **One-click Installation Command** Be sure to replace `your_api_key` with the API Key you obtained. ```bash theme={null} claude mcp add -s user -t http web-search-prime https://api.z.ai/api/mcp/web_search_prime/mcp --header "Authorization: Bearer your_api_key" ``` **Manual Configuration** Edit Claude Code's configuration file `.claude.json` in the user directory, MCP section: ```json theme={null} { "mcpServers": { "web-search-prime": { "type": "http", "url": "https://api.z.ai/api/mcp/web_search_prime/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` Add MCP server configuration in Cline extension settings: Be sure to replace `your_api_key` with the API Key you obtained. ```json theme={null} { "mcpServers": { "web-search-prime": { "type": "streamableHttp", "url": "https://api.z.ai/api/mcp/web_search_prime/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` If Cline older version does not support StreamableHttp type MCP server, you can use SSE type configuration: ```json theme={null} { "mcpServers": { "web-search-prime": { "type": "sse", "url": "https://api.z.ai/api/mcp/web_search_prime/sse?Authorization=your_api_key" } } } ``` Add MCP server configuration in OpenCode settings: Refer [OpenCode MCP Doc](https://opencode.ai/docs/mcp-servers) Be sure to replace `your_api_key` with the API Key you obtained. ```json theme={null} { "$schema": "https://opencode.ai/config.json", "mcp": { "web-search-prime": { "type": "remote", "url": "https://api.z.ai/api/mcp/web_search_prime/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` Add MCP server configuration in Crush settings: Be sure to replace `your_api_key` with the API Key you obtained. ```json theme={null} { "$schema": "https://charm.land/crush.json", "mcp": { "web-search-prime": { "type": "http", "url": "https://api.z.ai/api/mcp/web_search_prime/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` Not support Goose now,refer [Issue](https://github.com/block/goose/issues/6576) Add MCP server configuration in Goose settings: Click `Extensions` -> `Add custom extension` Set `Extension Name` is `web-search-prime`,`Type` switch `HTTP`,`Endpoint` as follow: ``` https://api.z.ai/api/mcp/web_search_prime/mcp ``` Set Request Headers Add `Authorization` : `your_api_key` Finally, click `Add Extension` at the bottom. Remember to replace your\_api\_key with the API Key you obtained in the previous step. For Roo Code, Kilo Code and other clients that support MCP protocol, use the following general configuration: Be sure to replace `your_api_key` with the API Key you obtained. ```json theme={null} { "mcpServers": { "web-search-prime": { "type": "streamable-http", "url": "https://api.z.ai/api/mcp/web_search_prime/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` ## Usage Example Through the previous step of installing the Search MCP server to the client, you can directly use MCP in your Coding client.\ You can directly use search functionality in conversations: * "Help me search for the latest AI technology developments" * "Find best practices for Python asynchronous programming" ## Troubleshooting **Issue:** Receiving invalid api key error **Solutions:** 1. Confirm the api key is correctly copied 2. Check if the api key is activated 3. Confirm the api key has sufficient balance 4. Check if the Authorization header format is correct **Issue:** MCP server connection timeout **Solutions:** 1. Check network connection 2. Confirm firewall settings 3. Verify the server URL is correct 4. Increase timeout settings **Issue:** Search returns empty results **Solutions:** 1. Try using different search keywords 2. Check if the search query is too specific 3. Confirm network connection is normal 4. Contact technical support for assistance ## Quota The MCP quotas for the Lite, Pro and Max plans are as follows: * **Lite:** Include a total of 100 web searches and web readers, along with the 5-hour maximum prompt resource pool of the package for vision understanding. * **Pro:** Include a total of 1,000 web searches and web readers, along with the 5-hour maximum prompt resource pool of the package for vision understanding. * **Max:** Include a total of 4,000 web searches and web readers, along with the 5-hour maximum prompt resource pool of the package for vision understanding. ## Related Resources * [Model Context Protocol (MCP) Official Documentation](https://modelcontextprotocol.io/) * [Claude Code MCP Configuration Guide](https://docs.anthropic.com/en/docs/claude-code/mcp) * [Z.AI API Reference](/api-reference/introduction) * [GLM Coding Plan Overview](/devpack/overview) # Vision MCP Server Source: https://docs.z.ai/devpack/mcp/vision-mcp-server Vision MCP Server is a Z.AI capability implementation based on the Model Context Protocol (MCP), providing powerful Z.AI GLM-4.6V capabilities for MCP-compatible clients such as Claude Code and Cline, including image analysis, video understanding, and other features. **NPM Package**: [@z\_ai/mcp-server](https://www.npmjs.com/package/@z_ai/mcp-server) \ **Prerequisites**: [Node.js >= v22.0.0](https://nodejs.org/en/download/) Please install the latest version(>= 0.1.2) of the vision mcp server to experience the GLM-4.6V capability. \ Existing users might still be using a cached older version. Please clear the npx cache, or append the `@latest` tag to `z_ai/mcp-server` to force-install the newest version (i.e., `z_ai/mcp-server@latest`). ## Product Overview This Local MCP Server is an exclusive server developed by Z.AI for **GLM Coding Plan users**, empowering your Code Agent with eyes and limitless vision understanding. Except in Claude Code, pasting an image directly into the client cannot call this MCP Server, as the client will by default transcode the image and call the model interface directly. \ The best practice is to place the image in a local directory and invoke the MCP Server by specifying the image name or path in the conversation. \ For example: `What does demo.png describe?` ## Features Supports intelligent analysis and content understanding of multiple image formats, giving your AI Agent visual capabilities Supports visual understanding of both local and remote videos One-click installation, quick integration with Claude Code and other MCP-compatible clients ## Supported Tools This server implements the Model Context Protocol and can be used with any MCP-compatible client. Currently provides the following tools: * **`ui_to_artifact`** - Turn UI screenshots into code, prompts, specs, or descriptions. * **`extract_text_from_screenshot`** - OCR screenshots for code, terminals, docs, and general text. * **`diagnose_error_screenshot`** - Analyze error snapshots and propose actionable fixes. * **`understand_technical_diagram`** - Interpret architecture, flow, UML, ER, and system diagrams. * **`analyze_data_visualization`** - Read charts and dashboards to surface insights and trends. * **`ui_diff_check`** - Compare two UI shots to flag visual or implementation drift. * **`image_analysis`** - General-purpose image understanding when other tools don’t fit. * **`video_analysis`** - Inspect videos (local/remote ≤8 MB; MP4/MOV/M4V) to describe scenes, moments, and entities. ## Environment Variable Configuration ### Detailed Configuration | Environment Variable | Description | Default Value | Optional Values | | :------------------- | :------------------------- | :------------ | :-------------- | | `Z_AI_API_KEY` | Z.AI API KEY | Required | Your API key | | `Z_AI_MODE` | Service platform selection | Required | `ZAI` | ## Installation and Usage ### Quick Start Visit [Z.AI Open Platform](https://z.ai/manage-apikey/apikey-list) to get your API Key Prerequisites: [Node.js >= v22.0.0](https://nodejs.org/en/download/) \ According to the client you’re using, **choose the corresponding installation method from the options below**. ### Supported Clients **Method A: One-click Installation Command** Be sure to replace `your_api_key` with the API Key you obtained. ```bash theme={null} claude mcp add -s user zai-mcp-server --env Z_AI_API_KEY=your_api_key Z_AI_MODE=ZAI -- npx -y "@z_ai/mcp-server" ``` If you forgot to replace the API Key, you need to uninstall the old MCP Server before re-executing the installation command: ```bash theme={null} claude mcp list claude mcp remove zai-mcp-server ``` **Method B: Manual Configuration** Edit Claude Desktop's configuration file `.claude.json` `mcpServers` content: \ Be sure to replace `your_api_key` with the API Key you obtained. ```json theme={null} { "mcpServers": { "zai-mcp-server": { "type": "stdio", "command": "npx", "args": [ "-y", "@z_ai/mcp-server" ], "env": { "Z_AI_API_KEY": "your_api_key", "Z_AI_MODE": "ZAI" } } } } ``` Add MCP server configuration in Cline extension settings: Be sure to replace `your_api_key` with the API Key you obtained. ```json theme={null} { "mcpServers": { "zai-mcp-server": { "type": "stdio", "command": "npx", "args": [ "-y", "@z_ai/mcp-server" ], "env": { "Z_AI_API_KEY": "your_api_key", "Z_AI_MODE": "ZAI" } } } } ``` Add MCP server configuration in OpenCode settings: Refer [OpenCode MCP Doc](https://opencode.ai/docs/mcp-servers) Be sure to replace `your_api_key` with the API Key you obtained. ```json theme={null} { "$schema": "https://opencode.ai/config.json", "mcp": { "zai-mcp-server": { "type": "local", "command": ["npx","-y","@z_ai/mcp-server"], "environment": { "Z_AI_API_KEY": "your_api_key", "Z_AI_MODE": "ZAI" } } } } ``` Add MCP server configuration in Crush settings: Be sure to replace `your_api_key` with the API Key you obtained. ```json theme={null} { "$schema": "https://charm.land/crush.json", "mcp": { "zai-mcp-server": { "type": "stdio", "command": "npx", "args": [ "-y", "@z_ai/mcp-server" ], "env": { "Z_AI_API_KEY": "your_api_key", "Z_AI_MODE": "ZAI" } } } } ``` For Roo Code, Kilo Code and other clients that support MCP protocol, use the following general configuration: Be sure to replace `your_api_key` with the API Key you obtained. ```json theme={null} { "mcpServers": { "zai-mcp-server": { "type": "stdio", "command": "npx", "args": [ "-y", "@z_ai/mcp-server" ], "env": { "Z_AI_API_KEY": "your_api_key", "Z_AI_MODE": "ZAI" } } } } ``` ## Usage Example Except in Claude Code, pasting an image directly into the client cannot call this MCP Server, as the client will by default transcode the image and call the model interface directly. \ The best practice is to place the image in a local directory and invoke the MCP Server by specifying the image name or path in the conversation. \ For example: `What does demo.png describe?` Through the previous step of installing the Vision MCP server to the client, you can directly use MCP in your Coding client. \ For example, in Claude Code, inputting `hi describe this xx.png` in the conversation, the MCP Server will process the image and return the description result. (The prerequisite is that you have the image in your current directory) ![Description](https://cdn.bigmodel.cn/markdown/1760501186683image.png?attname=image.png) ![code](https://cdn.bigmodel.cn/markdown/1757345118471code.jpg?attname=code.jpg) ## Troubleshooting Run the following command in your local terminal to verify if it can be installed locally, to troubleshoot environment, permission, and other issues: ```bash Linux/macOS theme={null} Z_AI_API_KEY=your_api_key npx -y @z_ai/mcp-server ``` ```cmd Windows Cmd theme={null} set Z_AI_API_KEY=your_api_key && npx -y @z_ai/mcp-server ``` ```powershell Windows PowerShell theme={null} $env:Z_AI_API_KEY="your_api_key"; npx -y @z_ai/mcp-server ``` * If installed successfully, it indicates that the environment is correct, and the issue may be with the client configuration. Please check the client's MCP configuration. * If installation fails, please troubleshoot based on the error message. It is recommended to paste the error message to a large model for analysis and resolution. Other common issues: **Issue:** Mcp server connection closed **Solutions:** 1. Check whether Node.js 22 or a newer version is installed locally. 2. Run `node -v` and `npx -v` to verify that the execution environment is available. 3. Check the environment variable `Z_AI_API_KEY` is configured correctly. **Issue:** Receiving invalid API Key error **Solutions:** 1. Confirm the API Key is correctly copied 2. Check if the API Key is activated 3. Confirm the selected platform (`Z_AI_MODE`) matches the API Key 4. Check if the API Key has sufficient balance **Issue:** MCP server connection timeout **Solutions:** 1. Check network connection 2. Confirm firewall settings 3. Increase timeout settings ## Quota The MCP quotas for the Lite, Pro and Max plans are as follows: * **Lite:** Include a total of 100 web searches and web readers, along with the 5-hour maximum prompt resource pool of the package for vision understanding. * **Pro:** Include a total of 1,000 web searches and web readers, along with the 5-hour maximum prompt resource pool of the package for vision understanding. * **Max:** Include a total of 4,000 web searches and web readers, along with the 5-hour maximum prompt resource pool of the package for vision understanding. ## Related Resources * [Model Context Protocol (MCP) Official Documentation](https://modelcontextprotocol.io/) * [Claude Desktop MCP Configuration Guide](https://docs.anthropic.com/en/docs/claude-code/mcp) * [Z.AI API Reference](/api-reference/introduction) * [Vision Model Introduction](/guides/vlm/glm-4.6v) # Zread MCP Server Source: https://docs.z.ai/devpack/mcp/zread-mcp-server The Zread MCP Server is a Z.AI implementation based on the Model Context Protocol (MCP). Powered by [zread.ai](https://zread.ai), it provides Claude Code, Cline, and other MCP-compatible clients with knowledge documentation and code access capabilities for open source repositories. ## Overview This remote MCP server with open source repository Q\&A capability is available to users on **GLM Coding Plan**, enabling your code agent to deeply understand open source projects and efficiently fetch documentation, code structure, and file content. ## Features Search documentation, code, and comments in Github repositories Get the directory structure and file list of GitHub repositories to quickly master project layout Read the complete code content of specified files in GitHub repositories to deeply analyze implementation details ## Tools This server implements the Model Context Protocol and works with any MCP-compatible client. Currently, it provides the following tools: * **`search_doc`** — Search for knowledge documentation corresponding to the GitHub repository, quickly understanding repository knowledge, news, recent issues, PRs, and contributors. * **`get_repo_structure`** — Get the directory structure and file list of the GitHub repository to understand project module splitting and directory organization. * **`read_file`** — Read the complete code content of specified files in the GitHub repository to deeply analyze the implementation details of the file code. ## Example Scenarios Quickly understand the core concepts, installation steps, and code organization of open source libraries by searching documentation and obtaining repository structures, accelerating the learning curve. When encountering problems, search the repository's Issue and Commit history to find solutions or fix records for similar problems. Directly read the code content of core files, analyze implementation logic, and assist in secondary development or Debugging. Before introducing a new dependency library, evaluate its activity, code quality, and maintenance status by viewing its repository structure and documentation. ## Installation and Usage ### Quick Start Visit [Z.AI Console](https://z.ai/manage-apikey/apikey-list) to get your api key According to the client you’re using, **choose the corresponding installation method from the options below**. ### Supported Clients **One-click install command** Replace `your_api_key` with the API key you obtained in the previous step ```bash theme={null} claude mcp add -s user -t http zread https://api.z.ai/api/mcp/zread/mcp --header "Authorization: Bearer your_api_key" ``` **Manual configuration** Edit the Claude Code configuration file under your home directory, the MCP section of `.claude.json`: ```json theme={null} { "mcpServers": { "zread": { "type": "http", "url": "https://api.z.ai/api/mcp/zread/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` Add the MCP server configuration in the Cline extension settings: Replace `your_api_key` with the API key you obtained in the previous step ```json theme={null} { "mcpServers": { "zread": { "type": "streamableHttp", "url": "https://api.z.ai/api/mcp/zread/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` If Cline older version does not support StreamableHttp type MCP server, you can use SSE type configuration: ```json theme={null} { "mcpServers": { "zread": { "type": "sse", "url": "https://api.z.ai/api/mcp/zread/sse?Authorization=your_api_key" } } } ``` Add the MCP server configuration in OpenCode settings: See the [OpenCode MCP documentation](https://opencode.ai/docs/mcp-servers) Replace `your_api_key` with the API key you obtained in the previous step ```json theme={null} { "$schema": "https://opencode.ai/config.json", "mcp": { "zread": { "type": "remote", "url": "https://api.z.ai/api/mcp/zread/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` Add the MCP server configuration in Crush settings: Replace `your_api_key` with the API key you obtained in the previous step ```json theme={null} { "$schema": "https://charm.land/crush.json", "mcp": { "zread": { "type": "http", "url": "https://api.z.ai/api/mcp/zread/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` Not support Goose now,refer [Issue](https://github.com/block/goose/issues/6576) Add the MCP server in Goose: Go to `Extensions` -> `Add custom extension` Set Extension Name to `zread`, Type to `HTTP`, and use the following endpoint: ``` https://api.z.ai/api/mcp/zread/mcp ``` Set Request Headers Add `Authorization` : `your_api_key` Click `Add Extension` at the bottom. Remember to replace `your_api_key` with the API key you obtained in the previous step. For Roo Code, Kilo Code, and other MCP-compatible clients, use the following general configuration: Replace `your_api_key` with the API key you obtained in the previous step ```json theme={null} { "mcpServers": { "zread": { "type": "streamable-http", "url": "https://api.z.ai/api/mcp/zread/mcp", "headers": { "Authorization": "Bearer your_api_key" } } } } ``` ## Troubleshooting **Issue:** Received an invalid access token error **Solutions:** 1. Verify the token was copied correctly 2. Check that the token is activated 3. Ensure the token has sufficient balance 4. Confirm the Authorization header format is correct **Issue:** Connection to the MCP server timed out **Solutions:** 1. Check your network connection 2. Verify firewall settings 3. Ensure the server URL is correct 4. Increase client timeout settings **Issue:** Unable to search or read specified repository content **Solutions:** 1. Confirm the repository exists and is open source (public) 2. Check if the repository name is spelled correctly (owner/repo) 3. Visit zread.ai to search if this open source repository is supported ## Quota The MCP quotas for the Lite, Pro and Max plans are as follows: * **Lite:** Include a total of 100 web searches, web readers and ZRead MCP calls, along with the 5-hour maximum prompt resource pool of the package for vision understanding. * **Pro:** Include a total of 1,000 web searches, web readers and ZRead MCP calls, along with the 5-hour maximum prompt resource pool of the package for vision understanding. * **Max:** Include a total of 4,000 web searches, web readers and ZRead MCP calls, along with the 5-hour maximum prompt resource pool of the package for vision understanding. ## Resources * [Model Context Protocol (MCP) Documentation](https://modelcontextprotocol.io/) * [Claude Code MCP Configuration Guide](https://docs.anthropic.com/en/docs/claude-code/mcp) * [Z.AI API Reference](/api-reference/introduction) * [GLM Coding Plan Overview](/devpack/overview) # Overview Source: https://docs.z.ai/devpack/overview The GLM Coding Plan is a subscription package designed specifically for AI-powered coding. The GLM Coding Plan is strictly limited to use within officially supported tools and products. The subscriber shall not use the subscription benefits in any unsupported tools or scenarios. If the system detects usage through unauthorized or unsupported tools (such as SDK-based access or other third-party integrations), some subscription benefits may be restricted to ensure fairness and service stability. ## Usage The plan can be applied to coding tools such as Claude Code, Cline, and OpenCode, covering a wide range of development scenarios: Describe requirements in plain language to automatically generate plans, write code, debug issues, and ensure smooth execution. Get real-time, context-aware completion suggestions that reduce manual typing and significantly improve productivity. Input error messages or descriptions to automatically analyze your codebase, locate problems, and provide fixes. Ask questions about your team’s codebase anytime, maintain global understanding, and receive precise answers with external data integration. Automatically fix lint issues, resolve merge conflicts, and generate release notes—allowing developers to stay focused on core logic. ## Advantages * **Access to high-intelligence Coding Model:** Upon release, the GLM series achieved SOTA performance among open-source models in reasoning, coding, and agent capabilities, delivering outstanding results in tool use and complex task execution. * **Works with Multiple Tools:** Beyond Claude Code, it also supports Cline, OpenCode, and some specific tools, giving you flexibility across development workflows. * **Generous Usage at a Fair Price:** Get higher call limits than standard plans. Starting at just 18 USD per month, with Pro and Max plans designed for high-frequency, complex projects. * **Expanded Capabilities:** All plans support Vision Understanding, Web Search MCP, Web Reader MCP and Zread MCP helping you tackle a wider range of development tasks. ## Benefits ### Supported Models All plans support **GLM-5.1**, GLM-5-Turbo, GLM-4.7 and GLM-4.5-Air. ### Usage Instruction To manage resources and ensure fair access for all users, we apply usage limits on a 5-hour and weekly basis. You can check your quota consumption progress in [Usage Statistics](https://z.ai/manage-apikey/subscription). One prompt refers to one query. Each prompt is estimated to invoke the model 15–20 times. **The monthly available quota is converted based on API pricing, equivalent to approximately 15–30× the monthly subscription fee (weekly caps already factored in).** | Plan Type | 5-Hour Limit (Dynamically refreshed; quota resets 5 hours after consumption) | Weekly Limit (Activated upon subscription; resets every 7 days) | | --------- | ---------------------------------------------------------------------------- | --------------------------------------------------------------- | | Lite Plan | Up to approx. 80 prompts | Up to approx. 400 prompts | | Pro Plan | Up to approx. 400 prompts | Up to approx. 2,000 prompts | | Max Plan | Up to approx. 1,600 prompts | Up to approx. 8,000 prompts | The above figures are estimates. Actual available usage may vary depending on project complexity, repository size, and whether auto-accept is enabled. **GLM-5.1** and **GLM-5-Turbo** are advanced models designed to rival Claude Opus model. Its usage will be deducted at **3 × during peak hours** and **2 × during off-peak hours**. We recommend switching to GLM-5.1 for complex tasks and continuing to use GLM-4.7 for routine tasks to avoid rapid quota consumption. **As a limited-time benefit, GLM-5.1 and GLM-5-Turbo will only consume 1× quota during off-peak hours, valid through the end of June.** Peak hours are 14:00–18:00 (UTC+8). ### Exclusive MCP Access **MCP Usage Limits** * **Lite Plan**: Include a total of 100 web searches and web readers per month, along with the 5-hour maximum prompt resource pool of the package for vision understanding. * **Pro Plan**: Include a total of 1,000 web searches and web readers per month, along with the 5-hour maximum prompt resource pool of the package for vision understanding. * **Max Plan**: Include a total of 4,000 web searches and web readers per month, along with the 5-hour maximum prompt resource pool of the package for vision understanding. Once the monthly quota is reached, these MCP tools will be unavailable until the next billing cycle. **For Vision Understanding MCP**, all plans share a 5-hour maximum prompt resource pool with the underlying model. Once the limit is reached, the quota will automatically reset after a 5-hour window. ### How to Switch Models Mapping between Claude Code internal model environment variables and GLM models, with the default configuration as follows: * `ANTHROPIC_DEFAULT_OPUS_MODEL`: `GLM-4.7` * `ANTHROPIC_DEFAULT_SONNET_MODEL`: `GLM-4.7` * `ANTHROPIC_DEFAULT_HAIKU_MODEL`: `GLM-4.5-Air` If adjustments are needed, you can directly modify the configuration file (for example, `~/.claude/settings.json` in Claude Code) to switch to GLM-4.5 or other models. ## How to Integrate with Coding Tools ## Next Steps Get up and running in minutes — from subscribing to the plan to using it in your coding tools. Learn about account usage rules, rate limits, refund policies, and other important guidelines. Find answers to common questions about subscriptions, promotions, and using the plan. # Quick Start Source: https://docs.z.ai/devpack/quick-start This guide will help you get started with [GLM Coding Plan](https://z.ai/subscribe?utm_source=zai\&utm_medium=link\&utm_term=quickstart\&utm_campaign=Platform_Ops&_channel_track_key=DRUfXN42) in minutes—from subscribing to using GLM models in officially [supported tools and products](https://docs.z.ai/devpack/tool/others#step-1-supported-tools). Using the GLM Coding Plan, you need to configure the dedicated Coding API `https://api.z.ai/api/coding/paas/v4` instead of the General API `https://api.z.ai/api/paas/v4` ## Getting Started * Access [Z.AI Open Platform](https://z.ai/model-api), Register or Login. description After logging in, navigate to the [GLM Coding Plan](https://z.ai/subscribe?utm_source=zai\&utm_medium=link\&utm_term=quickstart\&utm_campaign=Platform_Ops&_channel_track_key=DRUfXN42) to select your preferred subscription plan. ![Description](https://cdn.bigmodel.cn/markdown/1776859780386image.png?attname=image.png) After subscribing, navigate to your account dashboard and click [API Keys](https://z.ai/manage-apikey/apikey-list) to generate a new API Key. description Safeguard your API Key by keeping it confidential and avoiding hard-coding it in your code. We recommend storing it in environment variables or configuration files. The GLM Coding Plan is strictly limited to use within officially [supported tools and products](https://docs.z.ai/devpack/tool/others#step-1-supported-tools). Choose based on your preference: Using Claude Code as an example, configure the GLM model: **1. Install Claude Code** Prerequisite: You need to install [Node.js 18 or latest version](https://nodejs.org/en/download/) ```bash theme={null} # Open your terminal and install Claude Code npm install -g @anthropic-ai/claude-code # Create your working directory (e.g., `your-project`) and navigate to it using `cd` cd your-project # After installation, run `claude` to enter the Claude Code interactive interface claude ``` **2. Configure Environment Variables** After installing Claude Code, set up environment variables using one of the following methods by enter the following commands in the **Mac OS terminal** or **Windows cmd**: **Note**: When setting environment variables, the terminal will not return any output. This is normal, as long as no error message appears, the configuration has been applied successfully. **Method 1: Automated Coding Tool Helper** Coding Tool Helper is a coding-tool companion that quickly loads **GLM Coding Plan** into your favorite **Coding Tools**. Install and run it, then follow the on-screen guidance to automatically install tools, configure plan, and manage MCP servers. ```bash theme={null} # Run Coding Tool Helper directly in the terminal npx @z_ai/coding-helper ``` For more details, please refer to the [Coding Tool Helper](/devpack/extension/coding-tool-helper) documentation. **Method 2: Using a Script (Recommended for First-Time Users)** Just run the following command in your terminal. Attention only macOS Linux environment is supported, this method does not support Windows ```bash theme={null} curl -O "https://cdn.bigmodel.cn/install/claude_code_zai_env.sh" && bash ./claude_code_zai_env.sh ``` **Method 3: Manual Configuration** If you have previously configured environment variables for Claude Code, you can manually configure them as follows. A new window is required for the changes to take effect. ```bash MacOS & Linux theme={null} # Edit the Claude Code configuration file `~/.claude/settings.json` # Add or modify the env fields ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN # Note to replace `your_zai_api_key` with the API Key you obtained in the previous step { "env": { "ANTHROPIC_AUTH_TOKEN": "your_zai_api_key", "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic", "API_TIMEOUT_MS": "3000000" } } ``` ```cmd Windows Cmd theme={null} # Run the following commands in Cmd # Note to replace `your_zai_api_key` with the API Key you obtained in the previous step setx ANTHROPIC_AUTH_TOKEN your_zai_api_key setx ANTHROPIC_BASE_URL https://api.z.ai/api/anthropic ``` ```powershell Windows PowerShell theme={null} # Run the following commands in PowerShell # Note to replace `your_zai_api_key` with the API Key you obtained in the previous step [System.Environment]::SetEnvironmentVariable('ANTHROPIC_AUTH_TOKEN', 'your_zai_api_key', 'User') [System.Environment]::SetEnvironmentVariable('ANTHROPIC_BASE_URL', 'https://api.z.ai/api/anthropic', 'User') ``` **General Configuration Method** For other supported coding tools, configure as follows: * **API Provider**: Select OpenAI Compatible * **Base URL**: Enter `https://api.z.ai/api/paas/v4` * **API Key**: Enter your Z.AI API Key * **Model**: Select `GLM-4.7` Refer to each tool's detailed documentation for specific configuration steps. Once configured, you can begin coding with GLM model! ```bash theme={null} # Using Natural Language Commands in Claude Code Please create a React component containing a user login form ``` GLM will automatically: * Analyze requirements and formulate an implementation plan * Generate complete React component code * Include form validation and styling * Ensure code runs directly ```bash theme={null} # Describe the Issue Encountered My API request returns a 404 error. Please help me check the code. ``` GLM will automatically: * Analyze your codebase * Pinpoint potential causes of issues * Provide specific fixes * Explain the root causes ```bash theme={null} # Code Optimization This function performs poorly. Please optimize it for me. ``` GLM will automatically: * Analyze performance bottlenecks in your code * Provide optimization suggestions and refactoring plans * Preserve existing functionality * Improve execution efficiency ## Feature Examples Generates real-time completion suggestions based on context, reducing manual input and significantly boosting development efficiency. ```javascript theme={null} // Type function name, GLM-4.7 auto-completes implementation function calculateTotal(items) { // GLM-4.7 automatically generates complete function implementation } ``` Ask questions about your team's codebase anytime to maintain a holistic understanding. ``` Q: How is user authentication implemented in this project? A: GLM-4.7 analyzes your codebase and provides detailed explanations of the authentication process and related files. ``` One-click fixes for lint issues, merge conflicts, and release note generation. ``` # Auto-fix code style issues Fix all ESLint errors # Auto-generate documentation Generate detailed documentation for this API ``` ## Advanced Features All users can utilize the Vision MCP Server, which employs the flagship vision reasoning model GLM-4.6V to comprehend and analyze image content. * Analyze UI design mockups and generate corresponding code * Understand flowcharts and architecture diagrams * Extract text and information from screenshots For detailed usage instructions, refer to the [Vision MCP Server](/devpack/mcp/vision-mcp-server) documentation. All users can utilize the Web Search MCP Server to access the latest technical information. * Search for the latest technical documentation and API changes * Obtain the latest information on open-source projects * Find solutions and best practices For detailed usage instructions, refer to the [Web Search MCP Server](/devpack/mcp/search-mcp-server) documentation. All users can utilize the Web Reader MCP Server to fetch full webpage content and extract structured data. * Fetch complete webpage content including text, and links * Extract structured data such as title, body, and metadata * Remote HTTP-based MCP service, no local installation required For detailed usage instructions, refer to the [Web Reader MCP Server](/devpack/mcp/reader-mcp-server) documentation. # Best Practice Source: https://docs.z.ai/devpack/resources/best-practice Best Practices for Coding Agents: Managing Prompts, Plans, Skills, and Workflows As foundation models continue to improve, AI coding tools are evolving from simple code-completion assistants into **coding agents** that can participate in the full software development lifecycle. Unlike traditional copilot-style tools, coding agents do more than generate code from prompts. They can read and navigate codebases, modify files, run commands, invoke external tools, and complete complex tasks through multi-step interaction. As this shift continues, developers need more than prompt-writing techniques. They need a reliable way to work with coding agents in practice. Across leading tools, a shared usage pattern is starting to emerge: provide clear task context, plan execution steps, capture project-level guidance, connect external tools and systems, and automate repetitive workflows so agents can collaborate effectively in real development environments over time. Drawing on official guidance from these tools, this article outlines a more general **best practices framework for coding agents**. ## **1. Treat Coding Agents as a Collaborator, Not One-Off Assistants** A common mistake when using a coding agent is to treat it like a one-off question-and-answer tool: > Ask a question, get a piece of code, and end the interaction. In practice, that approach does not make full use of what a coding agent can do. A coding agent is better understood as a configurable collaborator that can be refined over time. Through project-level guidance files, tool integrations, and reusable skills, developers can continuously shape the agent’s behavior so it becomes better aligned with the team’s development workflow. In other words, the value of a coding agent does not come from model capability alone. It comes from the combination of model capability and the development workflow around it. ## **2. Structure Task Inputs: Context Matters More Than Prompting Tricks** When working with a coding agent, many developers focus too much on prompt-writing techniques and not enough on what matters more: **task context**. In a complex codebase, an effective task description typically includes four elements: Clearly describe what needs to be built or changed, such as fixing a bug, implementing an endpoint, or refactoring a module. Provide the relevant files, error messages, documentation, or examples. For example, specify which files, functions, or modules are involved. List the engineering requirements the agent should follow, such as coding standards, architectural rules, security requirements, or dependency limitations. Define how completion should be evaluated, such as tests passing, behavior changing as expected, or the bug no longer reproducing. This kind of structured task input reduces unnecessary guesswork and makes the agent’s changes more consistent and easier to review. In most coding agents, you can provide this context by referencing files, supplying code snippets, or stating the relevant details explicitly in the prompt. Once that context is in place, the next step for complex work is to plan before making changes. ## **3. Plan Before Execution for Complex Tasks** Once a task has clear context, the next challenge is execution. For complex requests, coding agents are most effective when they plan before they act. For more complex requests, asking the agent to start writing code immediately often leads to logic errors, unnecessary rework, or repeated revisions. A more effective approach is to have the agent **produce a plan first, then move into implementation**. This planning phase typically includes:
Complex Task Planning
Claude Code, for example, encourages an analysis-and-planning step for complex tasks, such as exploring the codebase, identifying the scope of changes, and confirming the implementation approach before making edits. Some coding agents also provide a dedicated Plan mode that generates a complete execution plan before implementation begins. This shifts the agent from simply generating code on demand to completing work step by step against an explicit plan. ## **4. Capture Repeated Rules in Project-Level Configuration Files** In practice, many prompts end up repeating the same project rules, such as: ``` - Project directory structure - Build commands - Test workflow - Coding standards - PR submission process ``` If these rules are restated in every prompt, the workflow becomes inefficient and the instructions can easily drift over time. For this reason, most coding agents provide a way to store **long-lived project guidance** in project-level configuration files, so the agent can automatically load the relevant context when carrying out tasks. In some tools, these configuration files take the form of agent-facing guidance documents that describe the repository structure, how to run the project, and the development conventions to follow. Other systems capture the same information through configuration files, scripts, or related project settings, allowing the agent to apply consistent project constraints across sessions. Regardless of the implementation, the goal is the same: move information that would otherwise need to be repeated in conversation into **stable project context**. From a practical standpoint, this can be reduced to one simple rule: **put temporary instructions in the prompt, and put long-lived rules in project-level configuration files.** ## **5. The Execution Environment Defines What the Agent Can Do** When working with coding agents, developers often attribute inconsistent results to model capability. In practice, many of these issues are caused by an incomplete or poorly configured **execution environment**. Unlike traditional code-completion tools, coding agents are typically expected to operate in a real development environment and carry out tasks such as: ``` - Reading and modifying source files - Running build or test commands - Calling external tools or APIs - Interacting with version control systems ``` As a result, agent behavior depends not only on model capability, but also on whether its **execution environment is complete, stable, and accessible**. When the environment is not configured properly, the agent can easily run into problems such as: * Being unable to locate the correct project directory * Lacking permission to read or modify critical files * Failing to run build or test commands * Being unable to access external tools or services These issues often appear as model misunderstanding or poor code quality, but the underlying problem is usually that the agent does not have enough execution capability or access to the right context. Most leading coding agents now provide some form of **environment configuration** to define the boundaries of agent behavior within a project. For example: ``` - Setting the default model or reasoning level - Controlling file permissions and sandbox policies - Defining which commands the agent is allowed to run - Configuring connections to external tools or services ``` While the implementation differs across tools, the objective is largely the same: to provide the agent with a **stable, controlled, and repeatable execution environment**. In practice, a reliable execution environment usually depends on a few categories of configuration: ``` - The project working directory and code access permissions - The allowed command surface, such as build, test, and lint - Connections to external tools or data sources - Shared team defaults for agent behavior ``` Once the execution environment is configured correctly, a coding agent is much more likely to behave consistently across sessions and complete multi-step tasks reliably. At a higher level, a coding agent depends on three types of context: * **Task Context**: the prompt and input for the current task * **Project Context**: the repository structure and engineering rules * **Environment Context**: the tools, permissions, and execution environment Of these, Environment Context determines **what the agent can do, and how far it can go**. ## **6. Involve Coding Agents in the Full Development Loop** Once a coding agent has the right execution environment, the next step is to involve it in the full development loop rather than using it for code generation alone. In real software development, a code change is rarely judged on generation alone. It also needs to pass tests, comply with engineering standards, and go through appropriate review. A more effective approach is to have the coding agent participate in the **full development loop**, rather than treating it as a tool for code generation only. A typical agent-driven development loop includes the following steps:
![Description](https://cdn.bigmodel.cn/markdown/1774428645882mermaid-diagram%20%283%29.png?attname=mermaid-diagram+%283%29.png)
1. **Implement code changes** Modify existing code or add new code based on the task requirements. 2. **Write or update tests** Add test coverage for new functionality or for the bug being fixed. 3. **Run the test suite** Execute unit or integration tests to verify that the changes behave as expected. 4. **Run code checks** Run linting, formatting, or type-checking tools to ensure the changes meet engineering standards. 5. **Review the code changes** Inspect the diff to identify potential issues, regression risks, or unintended modifications. In this workflow, the coding agent is no longer just a code generator. It becomes an active participant in **implementation, validation, and review**. Most leading coding agents now support this way of working. Some can automatically run tests or build commands after making changes. Others can execute validation scripts or trigger review steps once code has been modified. Through these mechanisms, agents can participate continuously in code quality workflows, whether in a local development environment or in CI pipelines. From a workflow perspective, this shifts the coding agent from a traditional **code generator** to an **execution node within the development loop**. Once an agent is able to participate in the full loop, its role extends beyond writing code. It can also reduce repetitive manual work in testing, validation, and review, improving overall development efficiency. ## **7. Extend Agent Context with MCP** In real development workflows, the information a coding agent needs does not always live inside the code repository. Much of the context that shapes implementation decisions is often spread across external systems, such as: ``` - Issue tracking and requirements systems - CI/CD status and execution results - Database schemas or production data - API documentation and external service references ``` If this information has to be copied and pasted manually each time, the workflow becomes inefficient and the context passed to the agent is often fragmented and unreliable. For coding agents that are expected to carry out multi-step tasks over time, this approach does not scale well to more complex workflows. For this reason, many coding agent tools support the **Model Context Protocol (MCP)**, which provides a standard way to connect external tools and systems so the agent can access the real-time information it needs beyond the repository itself. Through MCP, a coding agent can typically access resources such as: ``` - Code hosting and collaboration platforms - Databases and query interfaces - API services and technical documentation - Internal tools and automation systems ``` This changes how coding agents obtain information. Instead of relying entirely on what a developer describes in the prompt, the agent can retrieve the relevant context directly from connected tools. From a workflow perspective, this is a meaningful shift. When an agent can only use the information provided in the prompt, it is usually limited to localized tasks. Once it can connect to external systems, it becomes capable of participating in more complete development workflows, such as reading issue context, investigating failed CI runs, checking API definitions, or analyzing problems against database schemas. The value of external tool integration is therefore not just that it gives the agent more tools. It expands the agent’s context boundary, allowing it to evolve from a **repository-level executor** into **a collaborative node that operates within a real engineering environment**. ## **8. Capture Repeated Workflows as Skills** Over time, teams often find that certain tasks come up again and again when working with coding agents. Common examples include: ``` - PR review - Log analysis - Release note generation - Standard debugging workflows ``` If these tasks are described manually in the prompt each time, the result is unnecessary repetition and less consistent outcomes. For this reason, many coding agent systems provide a **Skill** mechanism for packaging common workflows into reusable workflow templates. At a high level, a Skill can be understood as a **structured workflow template**. It abstracts execution logic that would otherwise be scattered across prompts, allowing the agent to apply the same workflow consistently when handling similar tasks. For a deeper discussion of what Skills are and how to write them, see [How to create Skills: Key steps, limitations, and examples](https://claude.com/blog/how-to-create-skills-key-steps-limitations-and-examples). Different coding agent tools manage Skills in different ways. Some define them through dedicated skill files, while others register reusable workflows through configuration or scripting mechanisms. Although the implementations vary, the goal is the same: **turn ad hoc prompting into reusable workflows**. In practice, one simple rule works well: > **If a prompt pattern or task flow is used repeatedly, it should probably be captured as a Skill.** This shifts the use of coding agents from one-off, conversation-driven interaction toward more workflow-oriented task execution. As the skill library grows, agent behavior also becomes more consistent and predictable. ## **9. Automate Stable Workflows** Once a Skill can be executed reliably, the next step is often to automate it. In long-running development workflows, many tasks are inherently repetitive or time-based. For example: ``` - Generate commit summaries on a regular basis - Automatically investigate failed CI runs - Scan for potential bugs or abnormal logs - Produce daily or weekly engineering reports ``` Even when these tasks have already been encapsulated as Skills, they still create unnecessary manual work if developers need to trigger them every time. For this reason, many coding agent systems provide an **Automation layer** that allows workflows to run automatically based on a schedule or a defined trigger condition. Automation can be understood as the next layer above Skills. A Skill defines how a workflow is executed, while Automation determines when **that workflow runs and how it continues to operate over time**. For example, a Skill for generating release notes might be configured to run: ``` - whenever a new release is published - once a week to produce a release summary - automatically after CI completes ``` This allows the coding agent to keep performing tasks in the background without requiring developers to start each run manually. The significance of this model is that it shifts the coding agent from an **interactive tool** to a **continuous development assistant**. Once a workflow has been abstracted into a stable Skill, adding automation can significantly reduce manual effort and allow the agent to provide ongoing support within the development environment. ## **10. Manage Agent Sessions Deliberately** When working with coding agents, a session is more than just a chat history. In practice, it functions as a **working context** that accumulates context, intermediate reasoning, and execution results over time. As a task progresses, the agent gradually builds up information within the same session, including: ``` - the task objective - relevant code context - changes that have already been made - intermediate reasoning and decisions ``` Together, these form the agent’s **active working context** for the task. If sessions are not managed carefully, unrelated tasks can accumulate in the same session and make the context unnecessarily complex. This often reduces the quality of the agent’s reasoning and execution. For that reason, deliberate session management is an important part of working effectively with coding agents. Common practices include: Avoid mixing unrelated tasks in the same session so the working context stays clear. When a session accumulates too much history, use summaries or compression to reduce context overhead. If the task opens up a new line of investigation, continue it in a separate session instead of piling more changes into the original one. Summarize older parts of the conversation to reduce pressure on the context window. In more complex development scenarios, teams may also adopt a **multi-agent collaboration model**. For example, subtasks such as exploring the codebase, running tests, or investigating failures can be delegated to separate agents, while a primary agent coordinates the overall task. This helps preserve clarity in the main session while improving execution efficiency for more complex work. At a higher level, session management is a form of **context management**. A well-structured session strategy helps coding agents maintain a clear reasoning path across multi-step tasks, which in turn improves consistency and execution quality. ## Conclusion The effectiveness of a coding agent does not come from the model alone. It also depends on how developers structure the workflow around it. In practice, a mature coding agent workflow typically includes the following stages: ![Description](https://cdn.bigmodel.cn/markdown/1774433767376image.png?attname=image.png) Through this workflow, a coding agent can gradually evolve from a simple code generation tool into a collaborative system that participates across the full software development lifecycle. # Memory-mechanism Source: https://docs.z.ai/devpack/resources/memory-mechanism Memory enables a coding agent to retain context across tasks and sessions, reducing repeated input and improving execution efficiency. With a well-designed memory system, an agent can continuously understand the project structure, engineering conventions, and user preferences, and automatically reuse that information in future work. In coding agent systems, memory is typically organized into multiple layers, such as **automatic memory, project memory**, and **session memory**. ## **Why do coding agents need memory?** Traditional large language models do not preserve state between calls. As a result, they cannot remember project context across sessions, accumulate problem-solving experience over time, or consistently adapt to user preferences. Agent systems address this limitation through **external memory**. A typical architecture looks like this: ``` User input ↓ Memory retrieval ↓ Context assembly ↓ LLM reasoning ↓ Action / tool call ↓ Memory update ``` In other words, the agent retrieves relevant memory before starting a task and updates memory after the task is completed. This architecture is a common pattern in modern agent systems such as LangGraph, AutoGPT, and Devin. ## **A complete memory architecture for modern coding agents** At a high level, a complete agent memory architecture typically looks like this: ``` Short-term memory ↓ Session context Long-term memory ├ semantic memory ├ episodic memory └ procedural memory ``` ## **Core memory types in coding agents** Session memory is the contextual information associated with the current task. It includes the current conversation history, recent tool outputs, the current execution plan, and the contents of the files currently in scope. This information typically lives in the model’s context window. For example: ``` User: Fix this Python bug Agent: Analyze the error Agent: Modify the code Agent: Run tests ``` These execution steps all fall under session memory. Project memory stores **long-lived information about the entire codebase**, such as the project architecture, coding standards, build workflows, and frequently used commands. This kind of memory is typically written into .md files and loaded at the start of a session. For example: ``` your-project/ ├── .claude/ │ ├── CLAUDE.md # Main project instructions │ └── rules/ │ ├── code-style.md # Code style guide │ ├── testing.md # Testing conventions │ └── security.md # Security requirements ``` This way, the Agent will automatically follow these rules when modifying the code. Semantic memory stores factual knowledge and reference information. Examples include API documentation, programming language rules, and project knowledge bases. In practice, this is often implemented through RAG (retrieval-augmented generation). A typical flow looks like this: ``` query ↓ embedding ↓ vector search ↓ retrieve documents ↓ LLM reasoning ``` This is also one of the most commonly used memorization methods at Coding Agent. Episodic memory records an agent’s past experiences, such as the steps used to fix a previous bug, the root cause of a previous build failure, or a debugging strategy that worked well before. This type of memory helps the agent learn from prior experience. For example: ``` Episode: CI failure caused by missing dependency Solution: upgrade pip package ``` Procedural memory stores strategies or step-by-step workflows for completing tasks. For example: ``` Debug_Workflow.md 1. read error log 2. locate file 3. write patch 4. run tests ``` These memories are typically used in system prompt engineering, workflow templates, and agent policies. ## **The standard memory pattern used by coding agents** In real-world systems, agents typically follow a consistent memory workflow. Before starting a task, the agent retrieves relevant project memory, knowledge base entries, and prior experience, then injects them into the working context. The retrieved memories are assembled into a complete context and passed to the model. After the task is completed, the agent decides whether to write new memories—for example, newly discovered project rules, debugging experience, or user preferences. ## **How to use memory correctly in coding agents** In mainstream agent systems, memory is generally designed to be **layered, controllable, retrievable, and updatable**. In most cases, memory is divided into **short-term memory** and **long-term memory**. Short-term memory is mainly used to preserve state within the current thread or session, while long-term memory is maintained through explicit files, rule configurations, vector retrieval, or other persistent storage mechanisms. Take **Claude Code** as an example. Its official documentation explicitly states that each session begins with a fresh context window. Knowledge is carried across sessions primarily through persistent instruction files such as CLAUDE.md and through **auto memory**. Similarly, in **LangChain / LangGraph**, memory is also divided into **thread-scoped short-term memory** and **long-term memory** that persists across sessions. In practice, the most effective approach is not to rely on the model to “remember everything” automatically, but to establish a clear memory management pattern. For example: what should be written into project-level memory files, what should be retrieved from a knowledge base or vector store, what only needs to remain in the current session, and what is worth promoting into long-term memory after a task is completed. ### \* Separate instruction memory from learning memory One of the most practical principles for general-purpose coding agents is to distinguish between two fundamentally different kinds of memory: * **Instruction memory**: written by humans to tell the agent how it should work. This usually includes coding standards, directory conventions, build commands, test procedures, naming conventions, commit requirements, and team-level safety rules. In Claude Code, this maps to persistent instruction files such as `CLAUDE.md`. * **Learning memory**: not predefined in advance, but accumulated by the agent over time from your corrections, preferences, failed attempts, common commands, and project habits. Claude Code refers to this capability as auto memory, and its documentation states that it is loaded at the start of every conversation together with instruction files. For subagents, Claude Code can also maintain a separate persistent memory directory, and the first 200 lines of `MEMORY.md` are included automatically. If these two types of memory are mixed together, system behavior often drifts over time. A better approach is to assign them clearly separated roles: * Write **rules, policies, and behavioral constraints** into **instruction memory**, so the agent’s behavior stays stable and predictable. * Write **experience, user preferences, temporary discoveries, and retrospective takeaways** into **learning memory**, so decisions can improve over time in future tasks. This separation helps prevent experience-driven notes from gradually polluting the system’s core rules, which in turn keeps agent behavior more stable and controllable. ### \* Layered memory management This layer contains rules defined and distributed at the team or company level, and applies across all developers and all relevant projects. Typical examples include: * security and compliance requirements * baseline code review standards * restricted directories that must not be read from or written to * dependency and license constraints * organization-wide engineering standards At the organizational level, a shared `sysytem.md` can be deployed to a system-level path and should not be easily excluded by individual users. In practice, this can also be distributed through centralized management tools such as hosted configuration, MDM, Group Policy, or Ansible. In a more general agent architecture, this means **organization-level memory should be treated as the highest-priority governance layer and should not be casually bypassed**. This is the team-shared project context, and it is the most important memory layer for a coding agent. It should be version-controlled and shared across all collaborators. Typical examples include: * project architecture documentation * directory structure conventions * build and test commands * where APIs should live * naming conventions * common development workflows Claude Code recommends storing this kind of information in a project-level `project.md`, and its `/init ` command can generate an initial draft automatically. That draft can then be refined with rules the model is unlikely to infer on its own. The key property of this layer is that it is **shared across the project, tracked in version control, and stable over time**. This layer captures a developer’s personal preferences that apply across projects. It is best stored under the user’s home directory and treated as reusable personal context for all workspaces. In Claude Code, user instructions are stored separately from project instructions, and both are loaded at session start. This layer is a good place for: * your preferred coding style * your usual debugging sequence * your preferred output format * your personal workflow shortcuts It should complement project conventions, not override them. This layer is specific to your local copy of a project, but should not be committed to Git. A file such as local.md is a good place to store project-specific preferences that should remain private or machine-specific, such as: * personal test accounts * local development ports * temporary mock service endpoints * machine-specific runtime notes * experimental workflows that are not ready to share The value of this layer is that it **allows individuals to work efficiently without polluting team-shared memory**. Another pattern worth generalizing is role-specific memory for subagents. Different subagents can maintain their own memory scopes rather than sharing a single global memory. This is especially important in multi-agent systems, where one of the most common failure modes is context pollution across roles. A better pattern is to let each subagent retain only the memory relevant to its role: * let the **testing agent** remember test commands, CI behavior, and assertion style * let the **refactoring agent** remember module boundaries, restricted dependencies, and migration strategies * let the **documentation agent** remember glossary terms, documentation templates, and audience-specific style This keeps memory shorter, more precise, and more stable. ### \* Loading `.md` files by path Claude Code’s official documentation offers a very useful pattern for organizing memory in large codebases. For larger repositories, it recommends splitting instructions into multiple Markdown files under `.claude/rules/`, with each file focused on a single topic such as `testing.md`, `api-design.md`, or `security.md`. Claude Code also supports **scoping rules to specific subdirectories or file types**, and these rules are loaded only when Claude is working with matching files. That reduces irrelevant noise and helps conserve context window space. As a general design pattern for coding agents, this can be summarized in three principles: * **Keep the main memory file limited to global shared context**, such as project background, high-level architecture, and cross-project conventions. * **Keep specialized rules modular**, with one rule file per topic. * If a rule can be loaded by path, **do not load it globally; bring it into context only when needed**. Based on this approach, a project’s memory structure could look like this: ``` agent-memory/ ├── project.md # Project overview ├── rules/ │ ├── code-style.md # Code style │ ├── testing.md # Testing conventions │ ├── api-design.md # API design guidelines │ ├── security.md # Security requirements │ └── frontend/ │ └── react.md # Frontend-specific rules └── local/ └── developer.local.md ``` This structure offers three advantages: 1. Easier to maintain. Each rule file focuses on a single topic, so the rule set is less likely to become bloated or disorganized. Claude Code explicitly recommends topic-specific files with descriptive names. 2. Easier to load on demand. When the agent is working on tests, it does not need to load frontend conventions or database-specific rules into the context window. 3. Better for team collaboration. Different teams or subteams can maintain their own rule directories instead of competing to edit a single monolithic instruction file. ### \* Write memory rules as concrete instructions When writing agent memory, use **specific, verifiable rules** whenever possible rather than abstract principles. The clearer the instructions are, the more stable the agent’s behavior will be. In general, it is recommended to: * keep instructions **concise and explicit** * keep rules **consistent** with one another * keep the main memory file **under 200 lines** where possible * use **Markdown headings and lists** to improve readability * phrase requirements as rules that can be **checked and executed** For example, avoid writing: * Keep the code clean * Write good tests * Be mindful of API design * Split modules when appropriate Instead, prefer rules like: * Use **2-space indentation** in all new TypeScript files * **Run `pnpm test`** after modifying business logic * Place **all API handlers under `src/api/handlers/`** * Keep React page components **under 300 lines**; split larger ones into hooks or child components Concrete rules significantly reduce the agent’s room for interpretation, which improves behavioral consistency. ### \* Separate shared rules from personal preferences When designing an agent memory structure, it is important to clearly define **the scope of each rule and who is responsible for it**. A common approach is to organize memory by scope: * **Project**: shared by all team members and maintained through version control * **Organization**: defined centrally by IT or DevOps, such as security standards or development processes * **User**: applies only to an individual, such as personal coding habits * **Local**: applies only to the current machine or working environment and should not be committed to Git * **Role / Agent-specific**: used only by a specific specialized agent The core principle of this hierarchy is: > who owns it, who shares it, and who it applies to. For example: * team-wide conventions → project level * company security policies → organization level * personal coding habits → user level * machine-specific configuration → local level * rules for a specialized agent → role level Defining these boundaries during the memory design phase helps avoid rule sprawl and duplicate definitions. ### \* Reuse memory through imports and rule packages In real projects, many rules are **shared engineering conventions across repositories**. Rewriting them in every repo increases maintenance overhead and makes inconsistency more likely. Using **Claude Code** as an example, its documentation explains that: * `CLAUDE.md` can import other rule files using `@path/to/import` * `.claude/rules/` can share rules through **symbolic links** (symlinks) * imported content can be **expanded recursively**, and symlinks are resolved normally This makes it possible for teams to build **reusable rule packages**, such as: * `company-security-rules` * `frontend-react-rules` * `backend-api-rules` * `python-testing-rules` Each project only needs to reference the rule modules it needs, rather than maintaining a full copy of the entire rule set. This approach brings two direct benefits: 1. **rules can be maintained centrally and updated consistently** 2. **different projects can share the same engineering language**, making agent behavior more consistent across repositories ## **Memory troubleshooting** `.md` memory files are typically provided to the agent as contextual instructions, not as enforced configuration. The agent will read them and try to follow them, but it cannot guarantee strict compliance when the rules are vague, unclear, or conflicting. If the agent is not following the rules, you can check the following: * Run `/memory` (or the equivalent command) to confirm that the .md memory files have been loaded. * Check whether the `.md` files are located in a path or scope that is allowed to load in the current session. * Check whether there are conflicting rules across multiple `.md` files. If different files give different instructions for the same behavior, the agent may choose one arbitrarily. Most coding agents maintain auto memory in the background to capture project context, user preferences, or common actions. You can inspect it in the following ways: * Run `/memory` (or a similar command) to view the current auto memory directory. * Auto memory is typically stored as Markdown files that you can read, edit, or delete directly. Oversized memory files consume more of the context window, reduce the agent’s adherence to instructions, and increase the likelihood of conflicts. It is recommended to split detailed content into multiple Markdown files and use file references or imports (such as `@path/to/file`), or move rules into a dedicated rules directory such as `rules/`. Many coding agents **compress or summarize context** during long conversations in order to reduce context length. In most cases, memory files are **reloaded from disk** after compression, so only content that has been written into memory files will persist. If certain rules disappear after compression, that means those rules **existed only in the conversation** and were never written into a memory file. To fix this: * write long-term instructions into `.md` memory files * do not rely on the conversation alone to preserve rules # Tool Integration Source: https://docs.z.ai/devpack/tool/others ## Supported Tools The GLM Coding Plan is limited to use within the following officially supported tools and product environments; users may not use their subscription benefits for tools or scenarios outside of this scope. ### 1. Coding Agent Tool Click on the tool documentation below that you wish to use, and follow the corresponding integration guide to set it up. The Claude Code IDE plugin supports VSCode and JetBrains. The Claude Code IDE plugin supports VSCode and JetBrains. The Claude Code IDE plugin supports VSCode and JetBrains. An AI-first code editor that supports custom model configurations. An AI programming extension for VS Code that supports code generation and file operations. An AI editor capable of independently completing various development tasks. An agentic coding platform designed for real software development. Enterprise-grade AI coding agent that runs in the terminal to handle end-to-end workflows. A powerful VS Code extension for code generation and project management. A smart VS Code extension for code writing and refactoring. A terminal-based AI programming tool that supports both CLI and TUI interfaces. AI Agent tool, supporting local execution and automated engineering tasks. A desktop AI agent built on a multi-agent architecture, capable of automating browser, terminal, and MCP-powered workflows. ### 2. General-purpose Agent Tool The general-purpose agent tools listed below are also supported and will continue to be served on a best-effort basis. Under high inference load (typically around 2–6 PM Singapore time, though this may shift), some requests may face temporary rate limits. An open-source AI assistant that runs on local devices, supports multi-platform use. An open-source evolving AI agent with persistent memory,getting smarter with use. A highly customizable AI chat frontend for immersive roleplay with multi-model & media support. ## Coding Endpoint GLM Coding Plan provides endpoints for two protocols for different coding tools: `OpenAI Chat Completions` protocol endpoint: `https://api.z.ai/api/coding/paas/v4` `Anthropic Messages` protocol endpoint: `https://api.z.ai/api/anthropic` Please select the correct endpoint address based on the tool you are using. Incorrect endpoint configuration will result in inability to use GLM Coding Plan subscription quota. ## Config Example > Core Steps: > > 1. Select an OpenAI-compatible provider. > 2. **`Add/Replace the OpenAI Base URL with https://api.z.ai/api/coding/paas/v4.`** > 3. **`Enter your Z.AI API Key and select GLM-5.1, GLM-5, GLM-5-Turbo, GLM-4.7 or GLM-4.5-air.`** Using **Cursor** as an example (Note: Custom configuration is only supported in Cursor Pro and higher versions), the following steps demonstrate how to integrate the `GLM-4.7` model via the OpenAI protocol. Similarly, other tools supporting the OpenAI protocol can adopt the same configuration approach. ### 1. Install Cursor Download and install Cursor from the official website. ### 2. Create a New Provider/Model In Cursor, navigate to the "**Models**" section and click the "**Add Custom Model**". ![Description](https://cdn.bigmodel.cn/markdown/176032216013820251013-100735.jpeg?attname=20251013-100735.jpeg) * Select the **OpenAI Protocol**. * Configure the **OpenAI API Key** (obtained from the Z.AI). * In **Override OpenAI Base URL**, replace the default URL with `https://api.z.ai/api/coding/paas/v4`. * Enter the model you wish to use, such as `GLM-5.1`, `GLM-5` or `GLM-4.5-air`. * Note: In Cursor, the model name must be entered in uppercase, such as `GLM-4.7`. ![Description](https://cdn.bigmodel.cn/markdown/176032218295020251013-100740.jpeg?attname=20251013-100740.jpeg) ### 3. Save and Switch Models After configuration, save your settings and select the newly created **GLM-4.7 Provider** on the homepage. ### 4. Get Started With this setup, you can begin using the GLM model for code generation, debugging, task analysis, and more. ![Description](https://cdn.bigmodel.cn/markdown/176032221518820251013-100745.jpeg?attname=20251013-100745.jpeg) ## Step 3: Replacing the API URL 1. **Locate the API configuration section in your tool**: For example, in **Goose**, this is typically where you set the API address in the configuration file; In **VS Code** plugins or **IntelliJ IDEA** plugins, configuration is usually done through the plugin's settings interface. 2. **Replace the OpenAI Base URL**: Replace the default OpenAI API URL with `https://api.z.ai/api/coding/paas/v4`. 3. **Enter API Key and Select Model**: * Enter your **Z.ai API Key**. * Select `GLM-5.1`(standard, complex tasks) or `GLM-4.5-air`(lightweight, faster response) based on your requirements. # Legacy Plan Migration Notice Source: https://docs.z.ai/devpack/transition Thank you for your continued support of GLM Coding Plan. To provide a more consistent subscription structure and better support future product upgrades, GLM Coding Plan will gradually migrate remaining legacy subscribers to the current plan system. The details of this update are as follows. ## 1. Changes and Applicability Legacy plans (without weekly usage limits) will be phased out and will no longer remain as a supported subscription option going forward. **This update applies to users who are on a legacy plan with auto-renew enabled as of April 30, 2026 (Singapore Time). It does not affect users who are already on the current plans.** ## 2. Details of the Changes ### 2.1 Auto-renew for legacy plans will be turned off Starting on **April 30, 2026 (Singapore Time)**, auto-renew for eligible legacy plans will be automatically canceled. This change **will not affect your current paid subscription period**. Your existing plan will remain active until the end of its current billing cycle. ### 2.2 Migration support To make the migration as smooth as possible, we will provide the following migration support to eligible users. #### 2.2.1 2 complimentary months of the equivalent online version plan We will automatically provide 2 complimentary months of the equivalent online version plan tier. This will be issued on April 30, 2026 (Singapore Time) and will be added after your current subscription ends, with no manual action required. | Current Legacy Plan | Transition Support | | :------------------ | :--------------------------------------- | | Legacy Lite | 2 months of the online version Lite plan | | Legacy Pro | 2 months of the online version Pro plan | | Legacy Max | 2 months of the online version Max plan | #### 2.2.2 Extra benefits when migrating to the online version plan We understand that the current plan pricing has already gone through a recent price increase. As part of this policy, eligible users will be able to purchase the latest plan at 50% off based on the latest discounted price. This benefit will remain available from April 30, 2026 until 3 months after the end of the user’s current legacy plan period. During this period: * The discounted price will dynamically reflect **50%** of the latest discounted price available at the time of purchase. ![Description](https://cdn.bigmodel.cn/markdown/1777461936885image.png?attname=image.png) * Users may purchase or renew multiple times within the eligibility period, and renewals will continue to be charged at the discounted rate. After the eligibility period ends, this migration support will no longer apply. The available renewal prices are illustrated below (based on current pricing, for reference only): Monthly Plan | Plan | Current Standard Price | 50% off (example based on current pricing) | | :--- | :--------------------- | :----------------------------------------- | | Lite | \$18 | \$9 | | Pro | \$72 | \$36 | | Max | \$160 | \$80 | Quarterly Plan | Plan | Current Discounted Price | 50% off (example based on current pricing) | | :--- | :----------------------- | :----------------------------------------- | | Lite | \$48.60 | \$24.30 | | Pro | \$194.40 | \$97.20 | | Max | \$432 | \$216 | Annual Plan | Plan | Current Discounted Price | 50% off (example based on current pricing) | | :--- | :----------------------- | :----------------------------------------- | | Lite | \$172.80 | \$96 | | Pro | \$691.20 | \$288 | | Max | \$1,536 | \$768 | ### 3. What this means for you * Your current legacy subscription will remain active until the end of the current billing cycle. * Auto-renew for the legacy plan will be canceled on **April 30, 2026 (Singapore Time)**. * You will automatically receive 2 complimentary months of the equivalent online version plan tier, which will begin after your current subscription ends. * As part of the migration support, you will remain eligible for an additional 50% discount from April 30, 2026 until 3 months after the end of the user’s current legacy plan period. ### 4. FAQ **Q: Will my current legacy plan stop immediately once auto-renew is canceled?** No. Your current subscription will remain available through the end of the existing billing period. **Q: Do I need to manually claim the complimentary 2 months?** No. The migration support will be granted automatically on April 30, 2026 (Singapore Time) and added after your current plan expires. **Q: Will the complimentary online version plan be the same tier as my current legacy plan?** Yes. We will grant 2 months of the corresponding online version plan based on the tier of your current legacy plan. **Q: Can the value of the 2 complimentary months be used as credit when subscribing to a new plan?** This depends on whether the new plan purchased by the user is considered an “upgrade”. 1. If the new plan is an **upgrade** (e.g., from Pro to Max) All existing plan benefits in the user’s account, including both paid plans and complimentary plans, will become invalid immediately and will be used to offset the cost of the new plan according to the applicable rules. If there is any remaining value after the offset, the excess amount will be returned to your account as bonus credits. 2. If the new plan is a **downgrade or the same tier** (e.g., from Pro to Pro, or Pro to Lite) The newly purchased plan will not take effect immediately. Instead, it will be scheduled to start after all current plan benefits, including the complimentary months, have been fully used or expired. # Usage Policy Source: https://docs.z.ai/devpack/usage-policy The GLM Coding Plan is a subscription package designed specifically for AI-powered coding. ## Rate Limits Rate (concurrency) limits are tied to your plan tier. The platform dynamically adjusts these limits based on resource availability, with the general principle being **Max > Pro > Lite**. For each project, you can use methods like Subagent to make concurrent model calls. Our recommended number of concurrent projects is as follows: * **Lite**: Recommended for developing a single project at a time * **Pro**: Recommended for developing 1–2 projects simultaneously * **Max**: Recommended for developing 2 or more projects simultaneously Plan users will enjoy higher concurrency limits during off-peak hours (dynamically increased), enabling support for a greater number of concurrent projects. ## Account Usage Policy The **GLM Coding Plan** subscription is intended for **individual use by the subscriber**. If multiple users share the same account and subscription, it may be considered improper usage. In such cases, we may apply restrictions to the subscription benefits, and in severe cases the account itself may be affected. The GLM Coding Plan is strictly limited to use within officially supported tools and products. The subscriber shall not use the subscription benefits in any unsupported tools or scenarios. If the system detects usage through unauthorized or unsupported tools (such as SDK-based access or other third-party integrations), some subscription benefits may be restricted to ensure fairness and service stability. * To protect the rights of all subscribers, if your account violates the [Usage Rules](/legal-agreement/subscription-terms#4-usage-rules), and triggers risk control rules, it may be subject to risk control measures, including high-intensity throttling, account suspension, or permanent ban. * If your account triggers a risk control rule, you may receive a notification. If you believe the restriction was applied in error, you may submit an appeal or [contact our support team](https://z.ai/contact) for review. Violating the Usage Rules three or more times will result in an account ban. ## Manage Your Subscription You can manage your subscription, view billing details, and cancel the subscription as follows: 1. Log in to the Z.ai [API Platform](https://z.ai/subscribe?utm_source=zai\&utm_medium=index\&utm_term=glm-coding-plan\&utm_campaign=Platform_Ops&_channel_track_key=6lShUDnv). 2. Click your profile icon in the top-right corner → [Payment Method](https://z.ai/manage-apikey/billing). 3. In the left menu, select [Subscription](https://z.ai/manage-apikey/subscription). 4. To view billing history, go to Billing → [Billing History](https://z.ai/manage-apikey/billing). ## Subscription Period & Renewal Your subscription will **automatically renew at the end of each billing cycle**, and the fee will be charged to your linked payment method. The system will charge you in the following order: 1. Priority is given to the bonus balance in your platform account. 2. If the bonus balance is insufficient, the cash balance in your platform account will be used. 3. If the above balances are insufficient, the payment will be deducted from your linked third-party payment method (e.g., WeChat Pay, Alipay). If you need to cancel your subscription, you can do so on the [Subscription](https://z.ai/manage-apikey/subscription) page. Please be sure to cancel **at least 3 days** before the next billing date to avoid automatic renewal. After cancellation, the current subscription period remains valid and will not be renewed upon expiration. ## Refund Policy Please note that once a subscription service is purchased, it is considered confirmed, and refunds are not supported. Even if you have not used up your plan, the fees cannot be refunded. We recommend that you choose the appropriate subscription plan and term based on your usage needs. # How to Switch Models Source: https://docs.z.ai/devpack/using5.1 The GLM Coding Plan now supports the latest GLM-5.1 model for all users (Max, Pro, and Lite), and you can switch between models within your preferred Coding Agent. ## Switching Models in Claude Code ### Step 0 Claude Code default configuration By default, Claude Code maps its internal model environment variables to GLM models as follows: * `ANTHROPIC_DEFAULT_OPUS_MODEL`:`GLM-4.7` * `ANTHROPIC_DEFAULT_SONNET_MODEL`:`GLM-4.7` * `ANTHROPIC_DEFAULT_HAIKU_MODEL`:`GLM-4.5-Air` ### Step 1 Update the default configuration #### For Windows Users Locate the configuration file at `~/.claude/settings.json`, as shown below: #### For macOS Users Method 1: In the terminal, run `vim ~/.claude/settings.json` to open and edit the file. When you're done, press Esc, type `:wq`, and save your changes. Method 2: In Finder, choose Go → Go to Folder, then enter `~/.claude/settings.json` to locate and edit the configuration file. You can also ask directly in Claude Code: To use GLM-5.1, add or replace the following environment variables in `settings.json`: ```json theme={null} { "env": { "ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air", "ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-5.1", "ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-5.1" } } ``` ### Step 2 Confirm that the model has been switched Open a new terminal window and run `claude` to launch Claude Code. Then type `/status` in Claude Code to verify the current model: ![Description](https://cdn.bigmodel.cn/markdown/1774606248371image.png?attname=image.png) ## Switching Models in OpenClaw > For users who are already using OpenClaw and cannot switch to glm-5.1 by selecting a provider model directly you can update the configuration manually after completing the provider setup in the previous step. #### Step 1 Update the default configuration Locate the configuration file at `~/.openclaw/openclaw.json`, as shown below: In `~/.openclaw/openclaw.json`, add the `glm-5.1` model to the `models.providers.zai.models` array.Append the following object after the last model in the array. Be sure to include a comma if required by JSON syntax: ```json theme={null} { "id": "glm-5.1", "name": "GLM-5.1", "reasoning": true, "input": [ "text" ], "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }, "contextWindow": 204800, "maxTokens": 131072 } ``` Then update the default model under `agents.defaults.model.primary`: ``` "primary": "zai/glm-5", ``` Change it to: ``` "primary": "zai/glm-5.1", ``` Next, under agents.defaults.models, add: ``` "zai/glm-5.1": {} ``` #### Step 2 Updated configuration reference **Below are reference snippets showing what the updated configuration should look like:** 1. `models.providers.zai.models` section: ``` "models": [ { "id": "glm-5", "name": "GLM-5", "reasoning": true, "input": ["text"], "cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0}, "contextWindow": 204800, "maxTokens": 131072 }, { "id": "glm-4.7", "name": "GLM-4.7", "reasoning": true, "input": ["text"], "cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0}, "contextWindow": 204800, "maxTokens": 131072 }, { "id": "glm-5.1", "name": "GLM-5.1", "reasoning": true, "input": ["text"], "cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0}, "contextWindow": 204800, "maxTokens": 131072 } ] ``` 2. `agents.defaults.model.primary` section: ``` "model": { "primary": "zai/glm-5.1", "fallbacks": ["zai/glm-4.7"] } ``` 3. agents.defaults.models section: ``` "models": { "zai/glm-5": {"alias": "GLM"}, "zai/glm-4.7": {}, "zai/glm-5.1": {} } ``` After making these changes, restart the gateway by running: `openclaw gateway restart` Once restarted, you should be able to use the glm-5.1 model directly. You can run `openclaw tui` in the terminal to enter the chat interface, where you should see that glm-5.1 is now in use. ## Switching Models in Other Tools This currently only works with coding agents that support custom model configuration. If your agent tool does not allow custom model settings, you will need to wait for official support in a future release. ### Using Cline as an example: ![Description](https://cdn.bigmodel.cn/markdown/177461553487720260327-204436.jpeg?attname=20260327-204436.jpeg) Please use the following settings: * API Provider:Select OpenAI Compatible * Base URL:Enter [https://api.z.ai/api/coding/paas/v4](https://api.z.ai/api/coding/paas/v4) * API Key:Enter your Z.AI API key * Model:Choose Custom Model and enter the model name, such as glm-5.1 or glm-5 * Other settings: * Uncheck Support Images * Set Context Window Size to 200000 * Adjust temperature and any other parameters based on your task requirements # GLM Slide/Poster Agent(beta) Source: https://docs.z.ai/guides/agents/slide A Slide & Poster Agent powered by the native capabilities of the GLM model — integrating information retrieval, content structuring, and visual layout design, enabling you to effortlessly create professional-grade slides and posters. ## Overview GLM Slide/Poster Agent is an intelligent creation agent built for working people and creators. Powered by the deeply optimized GLM LLM, it goes beyond traditional engineering-style assembly tools—supporting one-click generation of slides or posters from natural language instructions. By natively integrating content generation with layout aesthetics and design conventions, it helps you quickly produce polished, professional-grade materials while lowering design barriers and boosting creative efficiency. ## Resources * [API Documentation](https://docs.z.ai/api-reference/agents/agent): Learn how to call the API. ## Examples Help an AI startup team create a big model market analysis PPT
slide
Make a poster for the release of my new visual understanding model
slide
## Core Features * **Smart Information Search:** Automatically retrieve and organize relevant materials, aggregate multi-source content for richness and accuracy, and support real-time web data access to ensure up-to-date information. * **Elegant Visual Design:** Built-in professional visual standards with intelligent layout algorithms, ensuring clean, balanced, and visually impressive slides or posters. * **Rich Content Display:** Seamless integration of text, charts, and multimedia elements—delivering data-rich, visually engaging content that communicates at a glance. * **Efficient Creation Workflow:** Generates polished slides or posters , supporting custom page count to meet your exact content needs. ## Usage | Scenarios | Typical Needs | | :------------------- | :--------------------------------------------------------------------------------------------------------------------------- | | Workplace Reporting | Weekly Reports, Project Roadshows, Business Presentations, Internal Briefings, etc. | | Education & Training | Teaching Courseware, Academic Thesis Presentations, Research Posters, Workshop Materials, Training Program Slides, etc. | | Personal Use | Summary Reporting, Activity Planning Programs, Personal Portfolios, Knowledge Sharing Presentations, Event Invitations, etc. | ## Use GLM Slide/Poster Agent Describe your slide or poster needs in one sentence, and GLM Slide/Poster Agent will generate polished, professional-level content for you! 1. **Describe Your Needs:** Enter the topic or purpose of your slide/poster in natural language 2. **Smart Information Gathering:** The intelligent agent automatically searches and organizes relevant content 3. **Slide/Poster Generation:** Instantly create visually engaging slides or posters based on curated information 4. **Refinement & Adjustment:** Modify and iterate based on your feedback (*in-browser editing coming soon*) 5. **Export and Use:** Export as PDF (*PPTX and other formats coming soon*) ## Price * **Pay-as-you-go based on Token consumption, \$0.7/M Tokens** * Measurement Scope: Total Tokens generated by all task flow nodes of the intelligent agent # Translation Agent Source: https://docs.z.ai/guides/agents/translation ## Overview An AI-powered translation agent designed for multilingual communication and content localization, supporting automatic language detection and translation across 40+ languages. It offers six flexible translation strategies—General, Paraphrasing, Two-Step, Three-Stage, Reflection, and COT—to handle diverse scenarios such as social media, literary works, and academic papers. Users can further enhance translation accuracy through terminology customization and expert guidance. This translation agent also provides a corresponding API service, making it a powerful tool for individuals, enterprises, developers, and professional translators. ## Core Features * **Multilingual Translation Support:** Covers translation and language detection for 40 languages, including niche languages such as Classical Chinese and Cantonese. * **Six Professional Translation Strategies:** Includes General, Paraphrasing, Two-Step, Three-Stage, Reflective, and COT translation modes to meet diverse text translation needs. * **Terminology Customization:** Enterprises or users can upload terminology glossaries for term alignment, enhancing domain-specific accuracy. * **Expert Guidance and Style Adaptation:** Allows specification of stylistic preferences and cultural context to ensure translations better fit the intended usage scenarios. * **Pinyin and IPA Support:** Converts text into Pinyin or International Phonetic Alphabet, suitable for language learning or voice-assistive content creation. * **Intra-Language “Translation” Support:** When the source and target languages are the same, applying strategies like Two-Step, Reflective, or COT will paraphrase, rewrite, or polish the text—helpful for eliminating ambiguity in secondary translations. * **Translation Explainability:** COT and Reflective strategies can output the reasoning behind the translation process. | **Feature** | **General-Purpose Translation Agent** | **Traditional Translation API** | **General LLM** | | :------------------------- | :------------------------------------ | :------------------------------ | :------------------------------------ | | Multilingual Support | ✅ 40 languages | ✅ Typically supports many | ✅ Typically supports many | | Professional Strategies | ✅ 6 specialized strategies | ❌ Limited strategies | ❌ No dedicated translation strategies | | Terminology Support | ✅ Full support | ⚠️ Partial support | ❌ Not supported | | Translation Suggestions | ✅ Detailed suggestions supported | ❌ Not supported | ❌ Requires manual input | | Translation Explainability | ✅ Provides reasoning | ❌ Not available | ⚠️ Requires special prompting | | Auto Language Detection | ✅ High accuracy | ✅ Supported | ✅ Supported | | Special Language Support | ✅ Classical Chinese, Cantonese, etc. | ❌ Typically unsupported | ⚠️ Limited support | ## Usage | **Target Users** | **Application Scenarios** | | :----------------------------------------- | :---------------------------------------------------------------------------------------- | | Individual Developers & Language Learners | Provides translation assistance and phonetic annotation to enhance learning efficiency | | Content Creators / New Media Operators | Supports multilingual content publishing and social media localization | | Enterprise Clients / Translation Providers | Handles batch translation tasks, manages custom terminology, and ensures compliant output | | Educational / Research Institutions | Supports translation of papers, course materials, and academic communication | | **Strategy** | **Description** | **Applicable Scenarios** | **Key Features** | | :---------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------- | | General Translation | A basic translation method that retains the original format, accounts for the target language’s cultural and linguistic context, balances accuracy and fluency, and resolves over 95% of prompt injection issues. | General-purpose content translation; suitable for most daily use cases and handling prompt injection. | - Preserves original structure
- Follows translation rules - Supports terminology glossaries
- Adapts to regional language variants | | Paraphrased Translation | Respects the original meaning while rewriting content in the target language’s natural expression. | When natural adaptation to target culture is preferred; non-literal translation scenarios. | - Focuses on conveying meaning over form
- Adapts more naturally to the target language
- Allows greater flexibility in restructuring | | Two-Step Translation | Involves literal translation first, followed by freer expression; combines direct and adaptive approaches. | Literary works or content needing both accuracy and expressiveness. | - Balances literal and intended meanings
- Enhances comprehension
- Stepwise approach improves both accuracy and fluency | | Three-Stage Translation | Based on the Chinese traditional theory of "faithfulness, expressiveness, elegance"; ensures content accuracy, fluent delivery, and stylistic beauty—currently supports classical/literary styles only. | High-quality literary, poetic, or culturally expressive texts. | - Follows "faithfulness, expressiveness, elegance" principle
- Focuses on cultural and philosophical nuance
- Produces refined, elegant output | | Reflective Translation | Starts with a literal translation, then prompts the model to act as an expert to reflect on the result by specific dimensions, and finally optimizes the output accordingly. | Professional publications, formal documents, or content requiring high accuracy and quality. | - Improves output through iterative feedback
- Corrects errors and enhances style
- Ideal for complex or domain-specific texts | | COT Translation | Applies a Chain of Thought reasoning process to analyze the source text before translating into the target language. | Conceptually complex or professional domain content requiring deeper understanding. | - Performs explicit reasoning before translation
- Enhances comprehension of complex input
- Increases transparency
- Useful in expert domains | ## Use Translation Agent 1. **Select Source Language:** Specify the source language of the text to be translated (can be left as auto-detect by default). 2. **Select Target Language:** Choose the language you want the text to be translated into. 3. **Choose Translation Strategy:** Select a suitable translation strategy based on your needs (e.g., General, Paraphrasing, Two-Step, etc.). 4. **Enter Source Text:** Paste the text you want to translate into the input box. 5. **Get Translation Result:** The agent will generate a high-quality translation based on the selected strategy. ## Examples > Original English Text: > > "Two roads diverged in a wood, and I took the one less traveled by, and that has made all the difference." > > — Robert Frost 1. General Translation: > “林中有两条路分叉,而我选择了那条少有人走的路,而这让一切变得不同。” > > ——罗伯特·弗罗斯特 Uses a basic literal translation approach with terminology support, preserving the original format while accommodating the cultural context and linguistic norms of the target language—**suitable for over 95% of general translation needs.** 2. Paraphrased Translation: > “树林中分出两条路,我选择了人迹罕至的一条,而这决定了我一生的道路。” > > ——罗伯特·弗罗斯特 While staying true to the original meaning, this approach rewrites sentence structure and expressions to present the content in a more natural and native way in the target language. **It is suitable for translation scenarios that do not require word-for-word correspondence and prioritize cultural adaptation and linguistic conventions.** 3. Two-Step Translation: > "林中有两条路分岔,而我选择了那条少有人走的路,这一切因此而截然不同。" > > ——罗伯特·弗罗斯特 This approach first performs a literal word-for-word translation, then refines the result through free translation to improve overall quality and readability. **It is well-suited for scenarios like literary works, where semantic understanding and stylistic expression are especially important.** 4. Three-Principle Translation: > "林间歧路分,吾择幽径以行,此别天地宽。" > > ——罗伯特·弗罗斯特 Based on the traditional Chinese translation principles of "faithfulness, expressiveness, and elegance", this approach ensures that the translation is accurate, fluent, and stylistically refined. **It is ideal for literary content such as poetry and prose**, supporting high-register styles like Classical Chinese, and aims to balance form and meaning. 5. Reflective Translation: > "林中两条路岔开,而我选择了那条少有人走的,而这,让一切截然不同。" > > ——罗伯特·弗罗斯特 Building on an initial literal translation, this approach introduces an expert role to reflect on and provide feedback, followed by refinement and optimization of the output. It enhances accuracy and stylistic consistency, making **it suitable for high-quality formal texts** such as publications, legal documents, or professional materials. 6. COT Translation: ```python theme={null} First, we need to understand the basic meaning of the poem. Robert Frost’s poem describes a person faced with a fork in the woods, symbolizing a moment of choice. The “two roads” represent different paths in life, and the speaker chooses “the one less traveled by,” indicating a preference for uniqueness and individuality. This choice, which “has made all the difference,” suggests that it had a significant impact on the speaker’s life. When translating, we need to pay attention to the following points: 1. “Two roads diverged in a wood”中的“diverged”表示分叉,可以翻译为“分叉”或“岔开”。 2. “I took the one less traveled by”中的“less traveled”表示少有人走,可以翻译为“少有人走的”。 3. “And that has made all the difference”中的“made all the difference”表示造成了巨大的影响,可以翻译为“造成了所有的不同”或“这一切都不同了”。 In addition, to preserve the poem’s rhythm and aesthetic appeal, we should strive to make the translation concise and poetic. ``` > 树林中两条路分叉,而我选择了那条少有人走的,这一切都不同了。 > > —— 罗伯特·弗罗斯特 By following a “first understand, then translate” reasoning process, this approach enhances the model’s ability to comprehend complex concepts and specialized contexts. **It is well-suited for high-expertise domains such as science and medicine**, resulting in more logical and transparent translations that help users grasp the underlying knowledge structure behind the language. ## Price Pay-as-you-go by Token usage, \$3 per 1 million Tokens # Video Effect Template Agent Source: https://docs.z.ai/guides/agents/video-template ## Overview Includes three popular special effects video templates: french\_kiss, bodyshake, and sexy\_me. Users can upload a single image to generate a professional-grade special effects video with a single click. The app is easy to use, produces smooth images, and effectively improves the efficiency and quality of short video creation. ## Examples
Templates Prompt Example Input Example Output
french\_kiss The two figures in the painting gradually draw closer, then passionately kiss, alternating between deep and intense moments.
bodyshake Video content: The character performs a rhythmic dance sequence in an indoor setting. She first sways her hips, then turns to the other side, briefly shaking her hips in a playful manner. Her movements are fluid and confident, consistently emphasizing body rhythm and expressiveness. Requirements: Movement Level: High
sexy\_me The woman's attire undergoes a seamless transformation, her original clothing smoothly transitioning into a fashionable bikini. In the final moment, she confidently places her hands on her hips, exuding elegance and poise.
## Core Features * **Excellent Dynamic Performance**:Natural and smooth movements with a wide range of motion, eliminating the sluggishness commonly found in AI videos, and suitable for a variety of creative special effects scenarios. * **Accurate Semantic Understanding**: Able to efficiently generate a variety of images based on prompts, providing a smoother card-drawing experience. * **Excellent Anime Style**: Stable video generation quality without sudden changes. * **Strong Subject Consistency**:No need for additional first frame images; generate highly consistent videos with a single click, greatly simplifying the creation process. ## Usage | **Target Users** | **Application Scenarios** | | :-------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------ | | Individual Creators | Quickly produce popular special effects short videos (such as costume changes/dance challenges) and efficiently produce creative content at low cost. | | MCN Agencies | Batch generate standardized special effects videos (such as popular costume change templates) to meet the large-scale content needs of matrix accounts. | | Short Video Platforms | Provide an integrated special effects template library to lower user creation barriers, enhance platform content diversity, and boost user engagement. | ## Use Video Templates 1. **French Kiss Template:** * Image Quantity: Required; Only 1 image can be uploaded. * Number of People in Image: Only supports two-person collages or group photos. * Guidelines: Subjects should be shown from the front and upper body only, with no props in hand for best results. * Prompt: The two figures in the image gradually move closer, then passionately kiss with alternating deep and firm intensity. * Effect Limitations: If a full-body image is uploaded, the result may show no kiss, a weaker kiss, or a shorter duration—failing to convey the intended passionate effect. 2. **Bodyshake Template:** * Image Quantity: Required; Only 1 image can be uploaded. * Number of People in Image: Supports single-person photos in either realistic or anime style. * Guidelines: Best results are achieved when the character is alone and the image shows at least above-thigh framing. * Prompt: "Video content: The character performs a rhythmic dance sequence in an indoor setting. She starts by swaying her hips, then turns to the other side, briefly shaking her hips in a playful manner. Her movements are smooth and confident, consistently emphasizing rhythm and expressiveness. Requirement: Movement intensity – high." * Effect Limitations: During the turning motion, body structure distortion or hand clipping may occasionally occur. For hip-shaking actions, motion may sometimes appear disconnected or visually unnatural. 3. **Sexy\_Me Template:** * Image Quantity: Required; Only 1 image can be uploaded. * Number of People in Image: Only supports single-person photos (female or male) in either realistic or anime style. * Guidelines: Best results are achieved with half-body or full-body images. If the subject in the input image is female, the output will feature a bikini transformation video showcasing the body. If the subject is male, the output will show a shirtless transformation video highlighting muscular physique. * Prompt: "Video content: The transformation varies depending on the subject's gender. If the image shows a female: 'The woman's clothing transforms seamlessly—her outfit smoothly changes into a stylish bikini. At the final moment, she confidently places her hands on her waist, exuding grace and poise.' If the image shows a male: 'The man swiftly removes his shirt, revealing a muscular physique matching his skin tone. He then steps forward.' Requirements: 1. If the image is a close-up or medium shot, set the camera motion to "zoom out." 2. Movement intensity: high." * Effect Limitations: If the subject's gender features are unclear, incorrect gender identification may occur, resulting in mixed or incorrect bikini/muscle transformations. In bikini transformations, clothing may not be fully removed. If a two-person image is uploaded, one subject may be missing in the output. ## Price Pay-as-you-go based on number of videos, \$ 0.2 per video # GLM-ASR-2512 Source: https://docs.z.ai/guides/audio/glm-asr-2512 ## Overview GLM-ASR-2512 is Z.AI's next-generation speech recognition model, enabling real-time conversion of speech into high-quality text. Whether for daily conversations, meeting minutes, work documents, or scenarios involving specialized terminology, it delivers precise recognition and conversion, significantly boosting input and recording efficiency. The model maintains industry-leading recognition performance across diverse scenarios and accents, achieving a Character Error Rate (CER) of just 0.0717. This delivers a fast and reliable voice input experience. Audio / File Text * Audio duration ≤ 30 seconds * File size ≤ 25 MB ## Usage Transcribe online meetings instantly, automatically organizing structured summaries to significantly boost efficiency. High-precision transcription of support calls enhances QA efficiency and enables multi-scenario analysis. Provides real-time synchronized subtitles for news broadcasts, educational courses, or video conferences with low latency and high accuracy. Rapidly generate work documents, emails, and proposal drafts via voice input, dramatically accelerating content creation. Supports cross-language speech comprehension for cross-border exchanges, online meetings, and educational settings. Instantly recognizes extensive medical terminology, enabling doctors to dictate patient histories for swift electronic medical record generation. ## Resources * [API Documentation](/api-reference/audio/audio-transcriptions): Learn how to call the API. ## Introducing GLM-ASR-2512 * Precise Recognition: In the latest competitive evaluation, GLM-ASR-2512 achieved a Character Error Rate (CER) of just 0.0717, reaching internationally leading standards and matching the world's top speech recognition models. * Efficient Custom Dictionary: The model enables users to swiftly import specialized vocabulary, project codes (e.g., AutoGLM, Zhipu AI Input Method), and uncommon names/locations through simple configuration. Add once in settings to eliminate repetitive editing hassles. * Complex Scenario Advantages: Whether handling mixed Chinese-English expressions, command-based text, industry-specific terminology, long sentences, or colloquial speech, GLM-ASR-2512 consistently delivers high-quality transcriptions with overall performance significantly outperforming competitors. GLM-ASR-2512 excels in multilingual and dialect processing, accurately transcribing major global languages and regional speech: * Chinese: Supports Mandarin, along with major dialects including Sichuanese, Cantonese, Min Nan, and Wu * English: Supports multiple accents such as American and British * Other supported languages: Dozens of globally used languages including French, German, Japanese, Korean, Spanish, Arabic, and more ## Quick Start The following is a full sample code to help you onboard `GLM-ASR-2512` with ease. **Basic Call** ```bash theme={null} curl --request POST \ --url https://api.z.ai/api/paas/v4/audio/transcriptions \ --header 'Authorization: Bearer API_Key' \ --header 'Content-Type: multipart/form-data' \ --form model=glm-asr-2512 \ --form stream=false \ --form file=@example-file ``` **Streaming Call** ```bash theme={null} curl --request POST \ --url https://api.z.ai/api/paas/v4/audio/transcriptions \ --header 'Authorization: Bearer API_Key' \ --header 'Content-Type: multipart/form-data' \ --form model=glm-asr-2512 \ --form stream=true \ --form file=@example-file ``` # Context Caching Source: https://docs.z.ai/guides/capabilities/cache Context caching functionality significantly reduces token consumption and response latency by caching repeated context content. When you repeatedly use the same system prompts or conversation history in dialogues, the caching mechanism automatically identifies and reuses this content, thereby improving performance and reducing costs. ## Features * **Automatic Cache Recognition**: Implicit caching that intelligently identifies repeated context content without manual configuration * **Significant Cost Reduction**: Cached tokens are billed at lower prices, dramatically saving costs * **Improved Response Speed**: Reduces processing time for repeated content, accelerating model responses * **Transparent Billing**: Detailed display of cached token counts in response field `usage.prompt_tokens_details.cached_tokens` * **Wide Compatibility**: Supports all mainstream models, including GLM-5, GLM-4.7, GLM-4.6, GLM-4.5 series, etc. > Context caching works by computing input message content and identifying content that is identical or highly similar to previous requests. When repeated content is detected, the system reuses previous computation results, avoiding redundant token processing. This mechanism is particularly suitable for the following scenarios: * System prompt reuse: In multi-turn conversations, system prompts usually remain unchanged, and caching can significantly reduce token consumption for this part. * Repetitive tasks: For tasks that process similar content with consistent instructions multiple times, caching can improve efficiency. * Multi-turn conversation history: In complex conversations, historical messages often contain a lot of repeated information, and caching can effectively reduce token usage for this part. ## Code Examples **Basic Caching Example** ```bash theme={null} # First request - establish cache curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm-5", "messages": [ { "role": "system", "content": "You are a professional data analyst, skilled at explaining data trends and providing business insights." }, { "role": "user", "content": "How to analyze user retention rate?" } ] }' ``` **Cache Reuse Example** ```bash theme={null} # Second request - reuse system prompt cache curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm-5", "messages": [ { "role": "system", "content": "You are a professional data analyst, skilled at explaining data trends and providing business insights." }, { "role": "user", "content": "What is funnel analysis?" } ] }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Basic Conversation Example** ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key='Your API key') # First request - establish cache response1 = client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": "You are a professional technical documentation assistant, skilled at explaining complex technical concepts. Please answer user questions with clear and concise language, and provide practical code examples." }, { "role": "user", "content": "What is RESTful API?" } ] ) print("First request result:") print(f"Reply: {response1.choices[0].message.content}") print(f"Total tokens: {response1.usage.total_tokens}") print(f"Cached tokens: {response1.usage.prompt_tokens_details.cached_tokens if hasattr(response1.usage, 'prompt_tokens_details') else 0}") # Second request - reuse system prompt cache response2 = client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": "You are a professional technical documentation assistant, skilled at explaining complex technical concepts. Please answer user questions with clear and concise language, and provide practical code examples." # Same system prompt }, { "role": "user", "content": "What are the differences between GraphQL and RESTful API?" } ] ) print("\nSecond request result:") print(f"Reply: {response2.choices[0].message.content}") print(f"Total tokens: {response2.usage.total_tokens}") print(f"Cached tokens: {response2.usage.prompt_tokens_details.cached_tokens if hasattr(response2.usage, 'prompt_tokens_details') else 0}") ``` **Long Document Analysis Example** ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key='Your API key') # Long document content (simulated) long_document = """ This is a detailed technical specification document that includes system architecture, API design, database structure, and many other aspects. The document is very long and contains a lot of technical details and implementation instructions... [Large amount of document content omitted here] """ # First analysis - establish document cache response1 = client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": f"Please answer user questions based on the following technical document:\n\n{long_document}" }, { "role": "user", "content": "What is the main architecture of this system?" } ] ) print("First analysis:") print(f"Total tokens: {response1.usage.total_tokens}") print(f"Cached tokens: {response1.usage.prompt_tokens_details.cached_tokens if hasattr(response1.usage, 'prompt_tokens_details') else 0}") # Second analysis - reuse document cache response2 = client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": f"Please answer user questions based on the following technical document:\n\n{long_document}" # Same document content }, { "role": "user", "content": "What are the characteristics of the API design?" } ] ) print("\nSecond analysis:") print(f"Total tokens: {response2.usage.total_tokens}") print(f"Cached tokens: {response2.usage.prompt_tokens_details.cached_tokens if hasattr(response2.usage, 'prompt_tokens_details') else 0}") print(f"Cache savings: {response2.usage.prompt_tokens_details.cached_tokens / response2.usage.total_tokens * 100:.1f}%") ``` **Multi-turn Conversation Caching Example** ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key='Your API key') # Build conversation history conversation_history = [ {"role": "system", "content": "You are a Python programming assistant, helping users solve programming problems."}, {"role": "user", "content": "How to create a simple Flask application?"}, {"role": "assistant", "content": "Creating a Flask application is simple, first install Flask..."}, {"role": "user", "content": "How to add routes?"}, {"role": "assistant", "content": "In Flask, add routes using the @app.route decorator..."}, ] # Continue conversation - reuse conversation history cache response = client.chat.completions.create( model="glm-5", messages=conversation_history + [ {"role": "user", "content": "How to handle POST requests?"} ] ) print("Conversation reply:") print(f"Content: {response.choices[0].message.content}") print(f"Total tokens: {response.usage.total_tokens}") print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens if hasattr(response.usage, 'prompt_tokens_details') else 0}") # Calculate cache efficiency if hasattr(response.usage, 'prompt_tokens_details') and response.usage.prompt_tokens_details.cached_tokens: cache_ratio = response.usage.prompt_tokens_details.cached_tokens / response.usage.prompt_tokens * 100 print(f"Cache hit rate: {cache_ratio:.1f}%") ``` **Batch Processing Optimization Example** ````python theme={null} from zai import ZaiClient import time # Initialize client client = ZaiClient(api_key='Your API key') # Common system prompt system_prompt = """ You are a professional code review assistant. Please analyze the provided code from the following aspects: 1. Code quality and readability 2. Performance optimization suggestions 3. Security considerations 4. Best practice recommendations Please provide specific improvement suggestions. """ # List of code snippets to review code_snippets = [ "def calculate_sum(numbers): return sum(numbers)", "class User: def __init__(self, name): self.name = name", "for i in range(len(items)): print(items[i])", "if user_input == 'yes' or user_input == 'y': return True" ] results = [] total_cached_tokens = 0 for i, code in enumerate(code_snippets): start_time = time.time() response = client.chat.completions.create( model="glm-5", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": f"Please review the following code:\n```python\n{code}\n```"} ] ) end_time = time.time() # Count cache effects cached_tokens = 0 if hasattr(response.usage, 'prompt_tokens_details') and response.usage.prompt_tokens_details.cached_tokens: cached_tokens = response.usage.prompt_tokens_details.cached_tokens total_cached_tokens += cached_tokens results.append({ 'code': code, 'review': response.choices[0].message.content, 'total_tokens': response.usage.total_tokens, 'cached_tokens': cached_tokens, 'response_time': end_time - start_time }) print(f"Code snippet {i+1} review completed:") print(f" Response time: {end_time - start_time:.2f}s") print(f" Cached tokens: {cached_tokens}") print(f" Total tokens: {response.usage.total_tokens}") print() print(f"Batch processing completed, total cached tokens: {total_cached_tokens}") ```` Response contains context cache token usage information: ```json theme={null} { "usage": { "prompt_tokens": 1200, "completion_tokens": 300, "total_tokens": 1500, "prompt_tokens_details": { "cached_tokens": 800 } } } ``` ## Best Practices Use stable system prompts ```python theme={null} # Recommended: Use stable system prompts system_prompt = """ You are a professional technical consultant with the following characteristics: - Deep technical background and rich project experience - Able to provide accurate and practical technical advice - Good at explaining complex concepts in clear and concise language Please provide professional technical guidance based on user questions. """ ``` Use long documents as system messages ```python theme={null} # Recommended: Use long documents as system messages def create_document_based_chat(document_content, user_question): return client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": f"Please answer user questions based on the following document content:\n\n{document_content}" }, { "role": "user", "content": user_question } ] ) # Multiple calls with the same document, system prompts will be cached questions = ["What is the main content of the document?", "What are the key points?", "How to implement these suggestions?"] for question in questions: response = create_document_based_chat(document_content, question) # Second and subsequent calls will hit the cache ``` Manage conversation history to improve cache efficiency ```python theme={null} class ConversationManager: def __init__(self, client, system_prompt): self.client = client self.system_prompt = system_prompt self.history = [{"role": "system", "content": system_prompt}] def add_message(self, role, content): self.history.append({"role": role, "content": content}) def get_response(self, user_message): # Add user message self.add_message("user", user_message) # Get reply (conversation history will be cached) response = self.client.chat.completions.create( model="glm-5", messages=self.history ) # Add assistant reply to history assistant_message = response.choices[0].message.content self.add_message("assistant", assistant_message) return response def get_cache_stats(self, response): """Get cache statistics""" if hasattr(response.usage, 'prompt_tokens_details'): cached = response.usage.prompt_tokens_details.cached_tokens or 0 total = response.usage.prompt_tokens return f"Cache hit: {cached}/{total} ({cached/total*100:.1f}%)" return "No cache information" # Usage example manager = ConversationManager(client, "You are a programming assistant...") response1 = manager.get_response("How to learn Python?") response2 = manager.get_response("Recommend some learning resources") # Will reuse previous conversation cache ``` ## Use Cases * Intelligent customer service systems * Personal assistant services * Code review batch processing * Content batch analysis * Report generation templates * Standardized process handling * Homework grading assistance * Learning material analysis ## Important Notes * Caching is automatically triggered based on content similarity, no manual configuration required * Identical content has the highest cache hit rate * Minor formatting differences may affect cache effectiveness * Cache has reasonable time limits, will recalculate after expiration * Cached tokens are billed at lower prices * Long documents and repeated content have the most significant cache effects * Design system prompts reasonably to improve reuse rates * Monitor cache hit rates and optimize usage patterns * Caching can significantly improve response speed * First request to establish cache may be slightly slower * Manage conversation history length reasonably * Avoid overly frequent content changes * Use stable system prompt templates * Process long documents as system messages * Organize conversation history structure reasonably * Regularly analyze cache effectiveness and optimize ## Billing Information Context caching uses a differentiated billing strategy: * New content tokens: Billed at standard prices * Cache hit tokens: Billed at discounted prices (usually 50% of standard price) * Output tokens: Billed at standard prices Billing example: ``` Assuming standard price is 0.01 /1K tokens: Request details: - Total input tokens: 2000 - Cache hit tokens: 1200 - New content tokens: 800 - Output tokens: 500 Billing calculation: - New content cost: 800 × 0.01/1000 = 0.008 - Cache cost: 1200 × 0.005/1000 = 0.006 - Output cost: 500 × 0.01/1000 = 0.005 - Total cost: 0.019 Compared to no cache (2500 × 0.01/1000 = 0.025), saves 24% ``` # Function Calling Source: https://docs.z.ai/guides/capabilities/function-calling Function Calling allows AI models to call external functions and APIs, greatly expanding the capability boundaries of intelligent agents, enabling them to perform specific operations and obtain real-time data. ## Features Function calling provides AI models with the ability to interact with external systems, supporting various complex application scenarios and integration requirements. ### Core Parameter Description * **`tools`**: Defines the list of callable functions, including function names, descriptions, and parameter specifications * **`tool_choice`**: Controls function calling strategy, default is `auto` (only supports `auto`) * **`model`**: Uses models that support function calling, such as `glm-4-plus`, `glm-4.6`, etc. ### Response Parameter Description Key fields in function calling responses: * **`tool_calls`**: Contains information about functions the model decides to call * **`function.name`**: Name of the called function * **`function.arguments`**: Function call parameters (JSON format string) * **`id`**: Unique identifier for the tool call ## Code Examples By defining function tools and handling function calls, AI models can perform various external operations: **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Complete Example** ```python theme={null} import json from zai import ZaiClient # Initialize client client = ZaiClient(api_key='your_api_key') # Define weather query function def get_weather(city: str) -> dict: """Get weather information for specified city""" # This should call a real weather API weather_data = { "city": city, "temperature": "22°C", "condition": "Sunny", "humidity": "65%", "wind_speed": "5 km/h" } return weather_data # Define function tools tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get current weather information for specified city", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "City name, e.g.: Beijing, Shanghai" } }, "required": ["city"] } } } ] # Make conversation request response = client.chat.completions.create( model="glm-5", # Use model that supports function calling messages=[ {"role": "user", "content": "How's the weather in Beijing today?"} ], tools=tools, # Pass function tools tool_choice="auto" # Automatically choose whether to call functions ) # Handle function calls message = response.choices[0].message messages = [{"role": "user", "content": "How's the weather in Beijing today?"}] messages.append(message.model_dump()) if message.tool_calls: for tool_call in message.tool_calls: if tool_call.function.name == "get_weather": # Parse parameters and call function args = json.loads(tool_call.function.arguments) weather_result = get_weather(args.get("city")) # Return function result to model messages.append({ "role": "tool", "content": json.dumps(weather_result, ensure_ascii=False), "tool_call_id": tool_call.id }) # Get final answer final_response = client.chat.completions.create( model="glm-5", messages=messages, tools=tools ) print(final_response.choices[0].message.content) else: print(message.content) ``` ## Scenario Examples When using function calling, please ensure proper security validation and permission control for external APIs and database operations. ```python theme={null} import json import requests from datetime import datetime from zai import ZaiClient class FunctionAgent: def __init__(self, api_key): self.client = ZaiClient(api_key=api_key) self.tools = self._define_tools() def _define_tools(self): return [ { "type": "function", "function": { "name": "get_current_time", "description": "Get current time", "parameters": { "type": "object", "properties": {}, "required": [] } } }, { "type": "function", "function": { "name": "calculate", "description": "Perform mathematical calculations", "parameters": { "type": "object", "properties": { "expression": { "type": "string", "description": "Mathematical expression, e.g.: 2+3*4" } }, "required": ["expression"] } } }, { "type": "function", "function": { "name": "search_web", "description": "Search web information", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "Search keywords" } }, "required": ["query"] } } } ] def get_current_time(self): """Get current time""" return { "current_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"), "timezone": "Asia/Shanghai" } def calculate(self, expression: str): """Safe mathematical calculation""" try: # Simple security check allowed_chars = set('0123456789+-*/().') if not all(c in allowed_chars or c.isspace() for c in expression): return {"error": "Expression contains disallowed characters"} result = eval(expression) return { "expression": expression, "result": result } except Exception as e: return {"error": f"Calculation error: {str(e)}"} def search_web(self, query: str): """Simulate web search""" # This should call a real search API return { "query": query, "results": [ {"title": f"Search result 1 about {query}", "url": "https://example1.com"}, {"title": f"Search result 2 about {query}", "url": "https://example2.com"} ] } def execute_function(self, function_name: str, arguments: dict): """Execute function call""" if function_name == "get_current_time": return self.get_current_time() elif function_name == "calculate": return self.calculate(arguments.get("expression", "")) elif function_name == "search_web": return self.search_web(arguments.get("query", "")) else: return {"error": f"Unknown function: {function_name}"} def chat(self, user_message: str): """Handle user message""" messages = [{"role": "user", "content": user_message}] response = self.client.chat.completions.create( model="glm-5", messages=messages, tools=self.tools, tool_choice="auto" ) message = response.choices[0].message messages.append(message.model_dump()) # Handle function calls if message.tool_calls: for tool_call in message.tool_calls: function_name = tool_call.function.name arguments = json.loads(tool_call.function.arguments) # Execute function result = self.execute_function(function_name, arguments) # Add function result messages.append({ "role": "tool", "content": json.dumps(result, ensure_ascii=False), "tool_call_id": tool_call.id }) # Get final answer final_response = self.client.chat.completions.create( model="glm-5", messages=messages, tools=self.tools ) return final_response.choices[0].message.content else: return message.content # Usage example agent = FunctionAgent("your_api_key") # Test different types of requests print(agent.chat("What time is it now?")) print(agent.chat("Help me calculate 15 * 23 + 7")) print(agent.chat("Search for the latest developments in artificial intelligence")) ``` ```python theme={null} import sqlite3 def query_database(sql: str) -> dict: """Execute database query""" try: conn = sqlite3.connect('example.db') cursor = conn.cursor() cursor.execute(sql) results = cursor.fetchall() conn.close() return { "success": True, "data": results, "row_count": len(results) } except Exception as e: return { "success": False, "error": str(e) } # Function definition db_tool = { "type": "function", "function": { "name": "query_database", "description": "Execute SQL query", "parameters": { "type": "object", "properties": { "sql": { "type": "string", "description": "SQL query statement" } }, "required": ["sql"] } } } ``` ```python theme={null} import os import json def file_operations(operation: str, file_path: str, content: str = None) -> dict: """File operation function""" try: if operation == "read": with open(file_path, 'r', encoding='utf-8') as f: content = f.read() return {"success": True, "content": content} elif operation == "write": with open(file_path, 'w', encoding='utf-8') as f: f.write(content) return {"success": True, "message": "File written successfully"} elif operation == "list": files = os.listdir(file_path) return {"success": True, "files": files} else: return {"success": False, "error": "Unsupported operation"} except Exception as e: return {"success": False, "error": str(e)} # Function definition file_tool = { "type": "function", "function": { "name": "file_operations", "description": "Execute file operations", "parameters": { "type": "object", "properties": { "operation": { "type": "string", "enum": ["read", "write", "list"], "description": "Operation type" }, "file_path": { "type": "string", "description": "File path" }, "content": { "type": "string", "description": "Content to write (only required for write operation)" } }, "required": ["operation", "file_path"] } } } ``` ```python theme={null} import requests def call_external_api(url: str, method: str = "GET", headers: dict = None, data: dict = None) -> dict: """Call external API""" try: if method.upper() == "GET": response = requests.get(url, headers=headers, params=data) elif method.upper() == "POST": response = requests.post(url, headers=headers, json=data) else: return {"success": False, "error": "Unsupported HTTP method"} return { "success": True, "status_code": response.status_code, "data": response.json() if response.headers.get('content-type', '').startswith('application/json') else response.text } except Exception as e: return {"success": False, "error": str(e)} # Function definition api_tool = { "type": "function", "function": { "name": "call_external_api", "description": "Call external API", "parameters": { "type": "object", "properties": { "url": { "type": "string", "description": "API endpoint URL" }, "method": { "type": "string", "enum": ["GET", "POST"], "description": "HTTP method" }, "headers": { "type": "object", "description": "Request headers" }, "data": { "type": "object", "description": "Request data" } }, "required": ["url"] } } } ``` ## Best Practices * Single responsibility: Each function should do one thing * Clear naming: Function and parameter names should be meaningful * Complete description: Provide detailed function and parameter descriptions * Input validation: Strictly validate all input parameters * Permission control: Limit function access permissions * Logging: Record function call logs ### Parameter Design ```python theme={null} # Good parameter design { "type": "object", "properties": { "city": { "type": "string", "description": "City name, supports Chinese and English, e.g.: Beijing, Shanghai, New York", "examples": ["Beijing", "Shanghai", "New York"] }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit", "default": "celsius" } }, "required": ["city"] } ``` ### Error Handling ```python theme={null} def robust_function(param: str) -> dict: """Robust function implementation""" try: # Parameter validation if not param or not isinstance(param, str): return { "success": False, "error": "Invalid parameter", "error_code": "INVALID_PARAM" } # Business logic result = process_data(param) return { "success": True, "data": result, "timestamp": datetime.now().isoformat() } except ValueError as e: return { "success": False, "error": f"Data error: {str(e)}", "error_code": "DATA_ERROR" } except Exception as e: return { "success": False, "error": f"System error: {str(e)}", "error_code": "SYSTEM_ERROR" } ``` ### Input Validation ```python theme={null} def secure_function(user_input: str) -> dict: """Secure function implementation""" # Input length limit if len(user_input) > 1000: return {"error": "Input too long"} # Dangerous character filtering dangerous_chars = ['<', '>', '&', '"', "'"] if any(char in user_input for char in dangerous_chars): return {"error": "Input contains dangerous characters"} # SQL injection protection sql_keywords = ['DROP', 'DELETE', 'UPDATE', 'INSERT'] if any(keyword in user_input.upper() for keyword in sql_keywords): return {"error": "Input contains dangerous keywords"} return {"success": True, "processed_input": user_input} ``` ### Permission Control ```python theme={null} def check_permissions(user_id: str, operation: str) -> bool: """Check user permissions""" user_permissions = get_user_permissions(user_id) return operation in user_permissions def protected_function(user_id: str, operation: str, data: dict) -> dict: """Function requiring permission validation""" if not check_permissions(user_id, operation): return { "success": False, "error": "Insufficient permissions", "error_code": "PERMISSION_DENIED" } # Execute operation return perform_operation(operation, data) ``` It is recommended to provide detailed documentation and examples for each function to help the model better understand the function's purpose and usage. Function calling involves code execution. Please ensure appropriate security measures are implemented, including input validation, permission control, and error handling. # Tool Streaming Output Source: https://docs.z.ai/guides/capabilities/stream-tool Stream Tool Call is a unique feature of Z.ai's latest model, allowing real-time access to reasoning processes, response content, and tool call information during tool invocation, providing better user experience and real-time feedback. ## Features Tool calling in the latest GLM-5 GLM-4.7 GLM-4.6 model now supports streaming output for responses. This allows developers to stream tool usage parameters without buffering or JSON validation when calling `chat.completions`, reducing call latency and providing better user experience. ### Core Parameter Description * **`stream=True`**: Enable streaming output, must be set to `True` * **`tool_stream=True`**: Enable tool call streaming output * **`model`**: Use a model that supports tool calling, limited to `glm-5` ### Response Parameter Description The `delta` object in streaming responses contains the following fields: * **`reasoning_content`**: Text content of the model's reasoning process * **`content`**: Text content of the model's response * **`tool_calls`**: Tool call information, including function names and parameters ## Code Examples By setting the `tool_stream=True` parameter, you can enable streaming tool call functionality: **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Complete Example** ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key='Your API Key') # Create streaming tool call request response = client.chat.completions.create( model="glm-5", # Use model that supports tool calling messages=[ {"role": "user", "content": "How's the weather in Beijing?"}, ], tools=[ { "type": "function", "function": { "name": "get_weather", "description": "Get current weather conditions for a specified location", "parameters": { "type": "object", "properties": { "location": {"type": "string", "description": "City, e.g.: Beijing, Shanghai"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["location"] } } } ], stream=True, # Enable streaming output tool_stream=True # Enable tool call streaming output ) # Initialize variables to collect streaming data reasoning_content = "" # Reasoning process content content = "" # Response content final_tool_calls = {} # Tool call information reasoning_started = False # Reasoning process start flag content_started = False # Content output start flag # Process streaming response for chunk in response: if not chunk.choices: continue delta = chunk.choices[0].delta # Handle streaming reasoning process output if hasattr(delta, 'reasoning_content') and delta.reasoning_content: if not reasoning_started and delta.reasoning_content.strip(): print("\n🧠 Thinking Process:") reasoning_started = True reasoning_content += delta.reasoning_content print(delta.reasoning_content, end="", flush=True) # Handle streaming response content output if hasattr(delta, 'content') and delta.content: if not content_started and delta.content.strip(): print("\n\n💬 Response Content:") content_started = True content += delta.content print(delta.content, end="", flush=True) # Handle streaming tool call information if delta.tool_calls: for tool_call in delta.tool_calls: index = tool_call.index if index not in final_tool_calls: # New tool call final_tool_calls[index] = tool_call final_tool_calls[index].function.arguments = tool_call.function.arguments else: # Append tool call parameters (streaming construction) final_tool_calls[index].function.arguments += tool_call.function.arguments # Output final tool call information if final_tool_calls: print("\n📋 Function Calls Triggered:") for index, tool_call in final_tool_calls.items(): print(f" {index}: Function Name: {tool_call.function.name}, Parameters: {tool_call.function.arguments}") ``` ## Application Scenarios * Real-time query progress display * Improved waiting experience * Real-time code analysis process * Display tool call chains # Streaming Messages Source: https://docs.z.ai/guides/capabilities/streaming Streaming Messages allow real-time content retrieval while the model generates responses, without waiting for the complete response to be generated. This approach can significantly improve user experience, especially when generating long text content, as users can immediately see output beginning to appear. ## Features Streaming messages use an incremental generation mechanism, transmitting content in chunks in real-time during the generation process, rather than waiting for the complete response to be generated before returning it all at once. This mechanism allows developers to: * **Real-time Response**: No need to wait for complete response, content displays progressively * **Improved Experience**: Reduce user waiting time, provide instant feedback * **Reduced Latency**: Content is transmitted as it's generated, reducing perceived latency * **Flexible Processing**: Real-time processing and display during reception ### Core Parameter Description * **`stream=True`**: Enable streaming output, must be set to `True` * **`model`**: Models that support streaming output, such as `glm-5`, `glm-4.7`, `glm-4.6`, `glm-4.5`, etc. ### Response Format Description Streaming responses use Server-Sent Events (SSE) format, with each event containing: * `choices[0].delta.content`: Incremental text content * `choices[0].delta.reasoning_content`: Incremental reasoning content * `choices[0].finish_reason`: Completion reason (only appears in the last chunk) * `usage`: Token usage statistics (only appears in the last chunk) ## Code Examples ```bash theme={null} curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm-5", "messages": [ { "role": "user", "content": "Write a poem about spring" } ], "stream": true }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Complete Example** ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key='Your API Key') # Create streaming message request response = client.chat.completions.create( model="glm-5", messages=[ {"role": "user", "content": "Write a poem about spring"} ], stream=True # Enable streaming output ) # Process streaming response full_content = "" for chunk in response: if not chunk.choices: continue delta = chunk.choices[0].delta # Handle incremental content if hasattr(delta, 'content') and delta.content: full_content += delta.content print(delta.content, end="", flush=True) # Check if completed if chunk.choices[0].finish_reason: print(f"\n\nCompletion reason: {chunk.choices[0].finish_reason}") if hasattr(chunk, 'usage') and chunk.usage: print(f"Token usage: Input {chunk.usage.prompt_tokens}, Output {chunk.usage.completion_tokens}") print(f"\n\nComplete content:\n{full_content}") ``` ### Response Example The streaming response format is as follows: ``` data: {"id":"1","created":1677652288,"model":"glm-5","choices":[{"index":0,"delta":{"content":"Spring"},"finish_reason":null}]} data: {"id":"1","created":1677652288,"model":"glm-5","choices":[{"index":0,"delta":{"content":" comes"},"finish_reason":null}]} data: {"id":"1","created":1677652288,"model":"glm-5","choices":[{"index":0,"delta":{"content":" with"},"finish_reason":null}]} ... data: {"id":"1","created":1677652288,"model":"glm-5","choices":[{"index":0,"finish_reason":"stop","delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":8,"completion_tokens":262,"total_tokens":270,"prompt_tokens_details":{"cached_tokens":0}}} data: [DONE] ``` ## Application Scenarios * Real-time conversation experience * Character-by-character reply display * Reduced waiting time * Article writing assistant * Code generation tools * Creative content creation * Online Q\&A systems * Learning assistance tools * Knowledge Q\&A platforms * Intelligent customer service bots * Real-time problem solving * User support systems # Structured Output Source: https://docs.z.ai/guides/capabilities/struct-output Structured output (JSON mode) ensures that AI returns JSON data conforming to predefined formats, providing reliable guarantees for programmatic processing of AI outputs. ## Features The structured output feature provides AI models with strict data format control capabilities, supporting various complex data structures and validation requirements. ### Core Parameters * **`response_format`**: Specifies the response format, set to `{"type": "json_object"}` to enable JSON mode * **`model`**: Use models that support structured output, such as `glm-5`, `glm-4.7`, `glm-4.5`, `glm-4.6`, etc. * **`messages`**: Define the expected JSON structure and field requirements in system messages ## Code Examples **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Complete Example** The following is a complete structured output example demonstrating how to perform sentiment analysis and return structured JSON results: ```python theme={null} from zai import ZaiClient import json # Initialize client client = ZaiClient(api_key="your-api-key") # Basic JSON mode response = client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": """ You are a sentiment analysis expert. Please return analysis results in the following JSON format: { "sentiment": "positive/negative/neutral", "confidence": 0.95, "emotions": ["joy", "excitement"], "keywords": ["weather", "mood"], "analysis": "Detailed analysis explanation" } """ }, { "role": "user", "content": "Please analyze the sentiment of this sentence: 'The weather is really nice today, I'm feeling very happy!'" } ], response_format={ "type": "json_object" } ) # Parse results result = json.loads(response.choices[0].message.content) print(f"Sentiment: {result['sentiment']}") print(f"Confidence: {result['confidence']}") print(f"Emotions: {result['emotions']}") ``` ## Basic Usage **Simple JSON Output** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Basic JSON mode response = client.chat.completions.create( model="glm-5", messages=[ { "role": "user", "content": "Please analyze the sentiment of this sentence: 'The weather is really nice today, I'm feeling very happy!'" } ], response_format={ "type": "json_object" } ) import json result = json.loads(response.choices[0].message.content) print(result) ``` ### Specify JSON Structure ```python theme={null} # Specify specific JSON structure response = client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": """ You are a sentiment analysis expert. Please return analysis results in the following JSON format: { "sentiment": "positive/negative/neutral", "confidence": 0.95, "emotions": ["joy", "excitement"], "keywords": ["weather", "mood"], "analysis": "Detailed analysis explanation" } """ }, { "role": "user", "content": "Please analyze the sentiment of this sentence: 'The weather is really nice today, I'm feeling very happy!'" } ], response_format={ "type": "json_object" } ) result = json.loads(response.choices[0].message.content) print(f"Sentiment: {result['sentiment']}") print(f"Confidence: {result['confidence']}") print(f"Emotions: {result['emotions']}") ``` ### Using JSON Schema Validation ```python theme={null} import jsonschema from jsonschema import validate # Define JSON Schema schema = { "type": "object", "properties": { "sentiment": { "type": "string", "enum": ["positive", "negative", "neutral"] }, "confidence": { "type": "number", "minimum": 0, "maximum": 1 }, "emotions": { "type": "array", "items": {"type": "string"} }, "keywords": { "type": "array", "items": {"type": "string"} }, "analysis": { "type": "string" } }, "required": ["sentiment", "confidence", "analysis"] } def analyze_sentiment_with_validation(text): """Sentiment analysis with validation""" response = client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": f""" Please return sentiment analysis results according to the following JSON Schema format: {json.dumps(schema, indent=2, ensure_ascii=False)} """ }, { "role": "user", "content": f"Please analyze the sentiment of this sentence: '{text}'" } ], response_format={"type": "json_object"} ) try: result = json.loads(response.choices[0].message.content) # Validate JSON structure validate(instance=result, schema=schema) return result except jsonschema.exceptions.ValidationError as e: print(f"JSON validation failed: {e}") return None except json.JSONDecodeError as e: print(f"JSON parsing failed: {e}") return None # Usage example result = analyze_sentiment_with_validation("The weather is really nice today, I'm feeling very happy!") if result: print("Analysis result:", result) ``` ## Scenario Examples When using JSON mode for data extraction, please ensure the quality and format of input data to achieve the best extraction results. ```python theme={null} class DataExtractor: def __init__(self, api_key): self.client = ZaiClient(api_key=api_key) def extract_contact_info(self, text): """Extract contact information""" schema = { "type": "object", "properties": { "contacts": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "phone": {"type": "string"}, "email": {"type": "string"}, "company": {"type": "string"}, "position": {"type": "string"}, "address": {"type": "string"} }, "required": ["name"] } }, "total_count": {"type": "integer"}, "extraction_confidence": {"type": "number"} }, "required": ["contacts", "total_count"] } response = self.client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": f""" You are an information extraction expert. Please extract all contact information from the text, return in the following JSON format: {json.dumps(schema, indent=2, ensure_ascii=False)} Note: - If a field has no information, do not include that field - phone field should be in standardized phone number format - email field should be a valid email address - extraction_confidence represents overall extraction confidence (0-1) """ }, { "role": "user", "content": f"Please extract contact information from the following text:\n\n{text}" } ], response_format={"type": "json_object"} ) try: result = json.loads(response.choices[0].message.content)["properties"] validate(instance=result, schema=schema) return result except Exception as e: print(f"Extraction failed: {e}") return None def extract_product_info(self, product_description): """Extract product information""" schema = { "type": "object", "properties": { "product_name": {"type": "string"}, "brand": {"type": "string"}, "category": {"type": "string"}, "price": { "type": "object", "properties": { "amount": {"type": "number"}, "currency": {"type": "string"}, "original_price": {"type": "number"}, "discount": {"type": "number"} } }, "specifications": { "type": "object", "additionalProperties": True }, "features": { "type": "array", "items": {"type": "string"} }, "availability": { "type": "object", "properties": { "in_stock": {"type": "boolean"}, "quantity": {"type": "integer"}, "shipping_time": {"type": "string"} } }, "ratings": { "type": "object", "properties": { "average_rating": {"type": "number"}, "total_reviews": {"type": "integer"} } } }, "required": ["product_name"] } response = self.client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": f""" Please extract structured information from product description, return in the following format: {json.dumps(schema, indent=2, ensure_ascii=False)} Note: - Price information should accurately extract values and currency units - specifications should include all technical specifications - features should list main functional features - Do not guess if information is unclear """ }, { "role": "user", "content": f"Product description:\n{product_description}" } ], response_format={"type": "json_object"} ) try: result = json.loads(response.choices[0].message.content) validate(instance=result, schema=schema) return result except Exception as e: print(f"Product information extraction failed: {e}") return None def extract_event_info(self, event_text): """Extract event information""" schema = { "type": "object", "properties": { "events": { "type": "array", "items": { "type": "object", "properties": { "title": {"type": "string"}, "description": {"type": "string"}, "start_time": {"type": "string"}, "end_time": {"type": "string"}, "location": {"type": "string"}, "organizer": {"type": "string"}, "participants": { "type": "array", "items": {"type": "string"} }, "category": {"type": "string"}, "priority": { "type": "string", "enum": ["high", "medium", "low"] }, "status": { "type": "string", "enum": ["scheduled", "ongoing", "completed", "cancelled"] } }, "required": ["title", "start_time"] } } }, "required": ["events"] } response = self.client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": f""" Please extract all event information from the text, return in the following format: {json.dumps(schema, indent=2, ensure_ascii=False)} Time format requirements: - Use ISO 8601 format: YYYY-MM-DDTHH:MM:SS - If only date available, use: YYYY-MM-DD - If time is unclear, try to infer reasonable time """ }, { "role": "user", "content": f"Please extract event information from the following text:\n\n{event_text}" } ], response_format={"type": "json_object"} ) try: result = json.loads(response.choices[0].message.content) validate(instance=result, schema=schema) return result except Exception as e: print(f"Event information extraction failed: {e}") return None # Usage example extractor = DataExtractor("your_api_key") # Extract contact information contact_text = """ Zhang San, mobile: 13800138000, email: zhangsan@example.com, works as Technical Director at Beijing Technology Co., Ltd. Company address: No. 123, Technology Park, Chaoyang District, Beijing. Li Si, phone: 010-12345678, work email: lisi@company.com, is a Product Manager at Shanghai Innovation Company. """ contacts = extractor.extract_contact_info(contact_text) if contacts: print(f"Extracted {contacts['total_count']} contacts") for contact in contacts['contacts']: print(f"Name: {contact['name']}") if 'phone' in contact: print(f"Phone: {contact['phone']}") ``` ```python theme={null} class APIResponseFormatter: def __init__(self, api_key): self.client = ZaiClient(api_key=api_key) def format_search_results(self, query, raw_results): """Format search results""" schema = { "type": "object", "properties": { "query": {"type": "string"}, "total_results": {"type": "integer"}, "results": { "type": "array", "items": { "type": "object", "properties": { "title": {"type": "string"}, "url": {"type": "string"}, "snippet": {"type": "string"}, "relevance_score": {"type": "number"}, "source_type": {"type": "string"}, "publish_date": {"type": "string"}, "tags": { "type": "array", "items": {"type": "string"} } }, "required": ["title", "url", "snippet"] } }, "suggestions": { "type": "array", "items": {"type": "string"} }, "filters": { "type": "object", "properties": { "date_range": {"type": "string"}, "source_types": { "type": "array", "items": {"type": "string"} }, "languages": { "type": "array", "items": {"type": "string"} } } } }, "required": ["query", "total_results", "results"] } response = self.client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": f""" Please format search results into standard JSON format: {json.dumps(schema, indent=2, ensure_ascii=False)} Requirements: - Calculate relevance score for each result (0-1) - Identify content type (article, video, image, document, etc.) - Extract publish date (if available) - Generate relevant tags - Provide search suggestions """ }, { "role": "user", "content": f"Query: {query}\n\nRaw results:\n{raw_results}" } ], response_format={"type": "json_object"} ) try: result = json.loads(response.choices[0].message.content) validate(instance=result, schema=schema) return result except Exception as e: print(f"Formatting failed: {e}") return None def format_analytics_data(self, raw_data, metrics): """Format analytics data""" schema = { "type": "object", "properties": { "summary": { "type": "object", "properties": { "total_records": {"type": "integer"}, "date_range": { "type": "object", "properties": { "start_date": {"type": "string"}, "end_date": {"type": "string"} } }, "key_insights": { "type": "array", "items": {"type": "string"} } } }, "metrics": { "type": "object", "additionalProperties": { "type": "object", "properties": { "current_value": {"type": "number"}, "previous_value": {"type": "number"}, "change_percentage": {"type": "number"}, "trend": { "type": "string", "enum": ["up", "down", "stable"] }, "unit": {"type": "string"} } } }, "time_series": { "type": "array", "items": { "type": "object", "properties": { "timestamp": {"type": "string"}, "values": { "type": "object", "additionalProperties": {"type": "number"} } } } }, "segments": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "value": {"type": "number"}, "percentage": {"type": "number"}, "color": {"type": "string"} } } } }, "required": ["summary", "metrics"] } response = self.client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": f""" Please format analytics data into standard format: {json.dumps(schema, indent=2, ensure_ascii=False)} Focus indicators:{', '.join(metrics)} Requirements: - Calculate change percentage and trend - Provide key insights - Time series data sorted by time - Segments data contain percentage """ }, { "role": "user", "content": f"Raw data: \n{raw_data}" } ], response_format={"type": "json_object"} ) try: result = json.loads(response.choices[0].message.content) validate(instance=result, schema=schema) return result except Exception as e: print(f"Analytics data formatting failed: {e}") return None # Usage example formatter = APIResponseFormatter("your_api_key") # Format search results raw_search = """ 1. Python Programming Tutorial - https://example.com/python-tutorial Detailed introduction to Python basic syntax and programming concepts... 2. Python Data Analysis Practice - https://example.com/python-data Using pandas and numpy for data processing... """ formatted_results = formatter.format_search_results("Python Tutorial", raw_search) if formatted_results: print(f"Found {formatted_results['total_results']} results") for result in formatted_results['results']: print(f"Title: {result['title']}") print(f"Relevance: {result['relevance_score']}") ``` ```python theme={null} class ConfigurationManager: def __init__(self, api_key): self.client = ZaiClient(api_key=api_key) def parse_config_file(self, config_text, config_type="general"): """Parse configuration file""" schemas = { "database": { "type": "object", "properties": { "connections": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "host": {"type": "string"}, "port": {"type": "integer"}, "database": {"type": "string"}, "username": {"type": "string"}, "ssl": {"type": "boolean"}, "pool_size": {"type": "integer"} }, "required": ["name", "host", "database"] } }, "settings": { "type": "object", "properties": { "timeout": {"type": "integer"}, "retry_attempts": {"type": "integer"}, "log_level": { "type": "string", "enum": ["DEBUG", "INFO", "WARNING", "ERROR"] } } } }, "required": ["connections"] }, "api": { "type": "object", "properties": { "endpoints": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "url": {"type": "string"}, "method": { "type": "string", "enum": ["GET", "POST", "PUT", "DELETE"] }, "headers": {"type": "object"}, "timeout": {"type": "integer"}, "rate_limit": {"type": "integer"} }, "required": ["name", "url", "method"] } }, "authentication": { "type": "object", "properties": { "type": { "type": "string", "enum": ["bearer", "basic", "api_key"] }, "credentials": {"type": "object"} } } }, "required": ["endpoints"] } } schema = schemas.get(config_type, { "type": "object", "additionalProperties": True }) response = self.client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": f""" Please parse the configuration file and convert to JSON format: {json.dumps(schema, indent=2, ensure_ascii=False)} Configuration type: {config_type} Requirements: - Identify configuration items and values - Convert data types (string, number, boolean) - Handle arrays and nested objects - Validate required fields - Provide default values (if applicable) """ }, { "role": "user", "content": f"Configuration file content:\n{config_text}" } ], response_format={"type": "json_object"} ) try: result = json.loads(response.choices[0].message.content) validate(instance=result, schema=schema) return result except Exception as e: print(f"Configuration parsing failed: {e}") return None def validate_configuration(self, config_data, validation_rules): """Validate configuration""" response = self.client.chat.completions.create( model="glm-5", messages=[ { "role": "system", "content": f""" Please validate configuration data and return validation results: Return format: {{ "is_valid": true/false, "errors": [ {{ "field": "field_name", "error": "error_description", "severity": "error/warning/info" }} ], "warnings": [ {{ "field": "field_name", "message": "warning_message" }} ], "suggestions": [ "improvement_suggestion_1", "improvement_suggestion_2" ] }} Validation rules: {validation_rules} """ }, { "role": "user", "content": f"Configuration data:\n{json.dumps(config_data, indent=2, ensure_ascii=False)}" } ], response_format={"type": "json_object"} ) try: result = json.loads(response.choices[0].message.content) return result except Exception as e: print(f"Configuration validation failed: {e}") return None # Usage example config_manager = ConfigurationManager("your_api_key") # Parse database configuration db_config_text = """ [database] host = localhost port = 5432 database = myapp username = admin ssl = true pool_size = 10 [settings] timeout = 30 retry_attempts = 3 log_level = INFO """ config = config_manager.parse_config_file(db_config_text, "database") if config: print("Parsed configuration:", json.dumps(config, indent=2, ensure_ascii=False)) # Validate configuration validation_rules = [ "Port number must be in range 1-65535", "Database name cannot be empty", "Connection pool size should be greater than 0", "Timeout should be reasonable (1-300 seconds)" ] validation_result = config_manager.validate_configuration(config, validation_rules) if validation_result: print(f"Configuration validity: {validation_result['is_valid']}") if validation_result['errors']: print("Errors:", validation_result['errors']) if validation_result['warnings']: print("Warnings:", validation_result['warnings']) ``` ## Best Practices * Clarity: Field names and types should be clear and explicit * Completeness: Include all necessary validation rules * Flexibility: Consider future expansion needs * Multi-layer validation: Schema validation + business logic validation * Fallback plan: Prepare simplified backup Schema * Logging: Record detailed error information JSON mode requires AI to strictly output according to specified format, but in some complex scenarios it may affect the naturalness of responses. It's recommended to find a balance between functionality and user experience. When designing JSON Schema, it's recommended to start with simple structures and gradually increase complexity. Also, providing detailed descriptions and examples for key fields helps AI better understand and generate JSON data that meets requirements. # Deep Thinking Source: https://docs.z.ai/guides/capabilities/thinking Deep Thinking is an advanced reasoning feature that enables Chain of Thought mechanisms, allowing the model to perform deep analysis and reasoning before answering questions. This approach significantly improves the model's accuracy and interpretability in complex tasks, particularly suitable for scenarios requiring multi-step reasoning, logical analysis, and problem-solving. ## Features The Deep Thinking feature currently supports the latest models in the GLM-5.1 GLM-5 GLM-5-Turbo GLM-5V-Turbo GLM-4.5 GLM-4.6 GLM-4.7 series. By enabling deep thinking, the model can: * **Multi-step Reasoning**: Break down complex problems into multiple steps for gradual analysis and resolution * **Logical Analysis**: Provide clear reasoning processes and logical chains * **Improved Accuracy**: Reduce errors and improve answer quality through deep thinking * **Enhanced Interpretability**: Display the thinking process to help users understand the model's reasoning logic * **Intelligent Judgment**: The model automatically determines whether deep thinking is needed to optimize response efficiency ### Core Parameters * **`thinking.type`**: Controls the deep thinking mode * `enabled` (default): Enable dynamic thinking, model automatically determines if deep thinking is needed * `disabled`: Disable deep thinking, provide direct answers * **`model`**: Models that support deep thinking, such as `glm-5`, `glm-5.1`, `glm-4.7`, `glm-4.6`, `glm-4.5`, `glm-4.5v`, etc. ## Code Examples **Basic Call (Enable Deep Thinking)** ```bash theme={null} curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm-5.1", "messages": [ { "role": "user", "content": "Explain in detail the basic principles of quantum computing and analyze its potential impact in the field of cryptography" } ], "thinking": { "type": "enabled" }, "max_tokens": 4096, "temperature": 1.0 }' ``` **Streaming Call (Deep Thinking + Streaming Output)** ```bash theme={null} curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm-5.1", "messages": [ { "role": "user", "content": "Design a recommendation system architecture for an e-commerce website, considering user behavior, product features, and real-time requirements" } ], "thinking": { "type": "enabled" }, "stream": true, "max_tokens": 4096, "temperature": 1.0 }' ``` **Disable Deep Thinking** ```bash theme={null} curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm-5.1", "messages": [ { "role": "user", "content": "How is the weather today?" } ], "thinking": { "type": "disabled" } }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Basic Call (Enable Deep Thinking)** ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key='your_api_key') # Create deep thinking request response = client.chat.completions.create( model="glm-5.1", messages=[ {"role": "user", "content": "Explain in detail the basic principles of quantum computing and analyze its potential impact in the field of cryptography"} ], thinking={ "type": "enabled" # Enable deep thinking mode }, max_tokens=4096, temperature=1.0 ) print("Model response:") print(response.choices[0].message.content) print("\n---") print(response.choices[0].message.reasoning_content) ``` **Streaming Call (Deep Thinking + Streaming Output)** ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key='your_api_key') # Create streaming deep thinking request response = client.chat.completions.create( model="glm-5.1", messages=[ {"role": "user", "content": "Design a recommendation system architecture for an e-commerce website, considering user behavior, product features, and real-time requirements"} ], thinking={ "type": "enabled" # Enable deep thinking mode }, stream=True, # Enable streaming output max_tokens=4096, temperature=1.0 ) # Process streaming response reasoning_content = "" thinking_phase = True for chunk in response: if not chunk.choices: continue delta = chunk.choices[0].delta # Process thinking process (if any) if hasattr(delta, 'reasoning_content') and delta.reasoning_content: reasoning_content += delta.reasoning_content if thinking_phase: print("🧠 Thinking...", end="", flush=True) thinking_phase = False print(delta.reasoning_content, end="", flush=True) # Process answer content if hasattr(delta, 'content') and delta.content: if thinking_phase: print("\n\n💡 Answer:") thinking_phase = False print(delta.content, end="", flush=True) ``` **Disable Deep Thinking** ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key='your_api_key') # Disable deep thinking for quick response response = client.chat.completions.create( model="glm-5.1", messages=[ {"role": "user", "content": "How is the weather today?"} ], thinking={ "type": "disabled" # Disable deep thinking mode } ) print(response.choices[0].message.content) ``` ### Response Example Response format with deep thinking enabled: ```json theme={null} { "created": 1677652288, "model": "glm-5.1", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Artificial intelligence has tremendous application prospects in medical diagnosis...", "reasoning_content": "Let me analyze this question from multiple angles. First, I need to consider the technical advantages of AI in medical diagnosis..." }, "finish_reason": "stop" } ], "usage": { "completion_tokens": 239, "prompt_tokens": 8, "prompt_tokens_details": { "cached_tokens": 0 }, "total_tokens": 247 } } ``` ## Best Practices **Recommended scenarios to enable:** * Complex problem analysis and solving * Multi-step reasoning tasks * Technical solution design * Strategy planning and decision * Academic research and analysis * Creative writing and content creation **Can be disabled scenarios:** * Simple fact query * Basic translation tasks * Simple classification judgment * Quick question and answer requirements ## Application scenarios * Research method design * Data analysis and explanation * Theory deduction and proof * System architecture design * Technological scheme evaluation * Problem diagnosis and solution * Market trends analysis * Business model design * Investment decision support * Complex concept explanation * Learning path planning * Knowledge system building ## Notes 1. **Response time**:Enable deep thinking will increase response time, particularly for complex tasks 2. **Token consumption**:Thinking process will consume extra tokens, please manage your tokens 3. **Model support**:Ensure you're using models that support deep thinking 4. **Task matching**:Choose whether to enable deep thinking according to the task complexity 5. **Streaming output**:Combine streaming output to see the thinking process, improving user experience # Thinking Mode Source: https://docs.z.ai/guides/capabilities/thinking-mode GLM offers multiple thinking modes for different scenarios. The sections below explain how to enable each mode, key considerations, and example usage. ## **Default Thinking Behaviour** Thinking is activated by default in GLM-5.1 GLM-5 GLM-4.7 series, different from the default hybrid thinking in GLM-4.6. > If you want to disable thinking, use: ```bash theme={null} "thinking": { "type": "disabled" } ``` ## **Interleaved thinking** We support **interleaved thinking** by default (supported since GLM-4.5), allowing GLM to think between tool calls and after receiving tool results. This enables more complex, step-by-step reasoning: interpreting each tool output before deciding what to do next, chaining multiple tool calls with reasoning steps, and making finer-grained decisions based on intermediate results. When using interleaved thinking with tools, **thinking blocks should be explicitly preserved and returned together with the tool results.** The detailed interleaved thinking process is as follows. ![Description](https://cdn.bigmodel.cn/markdown/1766025484368img_v3_02t3_4677ac48-b748-44d8-a56f-8cbd599b51ag.jpg?attname=img_v3_02t3_4677ac48-b748-44d8-a56f-8cbd599b51ag.jpg) ## **Preserved thinking** **We introduce a new capability** in coding scenarios: the model can retain **reasoning content from previous assistant turns** in the context. This helps preserve reasoning continuity and conversation integrity, improves model performance, and increases cache hit rates—saving tokens in real tasks. This capability is **enabled by default** on the **Coding Plan endpoint** and **disabled by default** on the **standard API endpoint**. If you want to enable **Preserved Thinking** in your product (primarily recommended for coding/agent scenarios), you can turn it on for the API endpoint by setting **"clear\_thinking": false**, and **you must return the complete**, unmodified reasoning\_content back to the API. All consecutive reasoning\_content blocks must **exactly match the original sequence** generated by the model during the initial request. Do not reorder or edit these blocks; otherwise, performance may degrade and cache hit rates may be affected. The detailed Preserved thinking process is as follows. ![Description](https://cdn.bigmodel.cn/markdown/176641919972020251222-235942.jpeg?attname=20251222-235942.jpeg) ## Turn-level Thinking “Turn-level Thinking” is a capability that **lets you control reasoning computation on a per-turn basis**: within the same session, each request can independently choose to enable or disable thinking. This is a new capability introduced in GLM-4.7, with the following advantages: * **More flexible cost/latency control:** For lightweight turns like “asking a fact” or “tweaking wording,” you can disable thinking to get faster responses; for heavier tasks like “complex planning,” “multi-constraint reasoning,” or “code debugging,” you can enable thinking to improve accuracy and stability. * **Smoother multi-turn experience:** The thinking switch can be toggled at any point within a session. The model stays coherent across turns and keeps a consistent output style, making it feel “smarter when things are hard, faster when things are simple.” * **Better for agent/tool-use scenarios:** On turns that require quick tool execution, you can reduce reasoning overhead; on turns that require making decisions based on tool results, you can turn on deeper thinking—dynamically balancing efficiency and quality. ## Example Usage This applies to both **Interleaved Thinking** and **Preserved Thinking**—no manual differentiation is required. **Remember to return the historical** `reasoning_content`**to keep the reasoning coherent.** ```python theme={null} """"Interleaved Thinking + Tool Calling Example""" import json from openai import OpenAI client = OpenAI( api_key="YOUR_API_KEY", base_url="https://api.z.ai/api/paas/v4/", ) tools = [{"type": "function", "function": { "name": "get_weather", "description": "Get weather information", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}, }}] messages = [ {"role": "system", "content": "You are an assistant"}, {"role": "user", "content": "What's the weather like in Beijing?"}, ] # Round 1: the model reasons and then calls a tool response = client.chat.completions.create(model="glm-4.7", messages=messages, tools=tools, stream=True, extra_body={ "thinking":{ "type":"enabled", "clear_thinking": False # False for Preserved Thinking }}) reasoning, content, tool_calls = "", "", [] for chunk in response: delta = chunk.choices[0].delta if hasattr(delta, "reasoning_content") and delta.reasoning_content: reasoning += delta.reasoning_content if hasattr(delta, "content") and delta.content: content += delta.content if hasattr(delta, "tool_calls") and delta.tool_calls: for tc in delta.tool_calls: if tc.index >= len(tool_calls): tool_calls.append({"id": tc.id, "function": {"name": "", "arguments": ""}}) if tc.function.name: tool_calls[tc.index]["function"]["name"] = tc.function.name if tc.function.arguments: tool_calls[tc.index]["function"]["arguments"] += tc.function.arguments print(f"Reasoning: {reasoning}\nTool calls: {tool_calls}") # Key: return reasoning_content to keep the reasoning coherent messages.append({"role": "assistant", "content": content, "reasoning_content": reasoning, "tool_calls": [{"id": tc["id"], "type": "function", "function": tc["function"]} for tc in tool_calls]}) messages.append({"role": "tool", "tool_call_id": tool_calls[0]["id"], "content": json.dumps({"weather": "Sunny", "temp": "25°C"})}) # Round 2: the model continues reasoning based on the tool result and responds response = client.chat.completions.create(model="glm-4.7", messages=messages, tools=tools, stream=True, extra_body={ "thinking":{ "type":"enabled", "clear_thinking": False # False for Preserved Thinking }}) reasoning, content = "", "" for chunk in response: delta = chunk.choices[0].delta if hasattr(delta, "reasoning_content") and delta.reasoning_content: reasoning += delta.reasoning_content if hasattr(delta, "content") and delta.content: content += delta.content print(f"Reasoning: {reasoning}\nReply: {content}") ``` # HTTP API Calls Source: https://docs.z.ai/guides/develop/http/introduction Z.AI provides standard HTTP API interfaces that support multiple programming languages and development environments, allowing you to easily integrate Z.AI's powerful capabilities. ### Core Advantages Supports all programming languages and platforms that support HTTP protocol Based on RESTful design, follows HTTP standards, easy to understand and use Can be integrated into any existing applications and systems Supports synchronous and asynchronous calls to meet different scenario requirements ## Get API Key 1. Access [Z.AI Open Platform](https://z.ai/model-api), Register or Login. 2. Create an API Key in the [API Keys](https://z.ai/manage-apikey/apikey-list) management page. 3. Copy your API Key for use. ## API Basic Information ### General API Endpoint ``` https://api.z.ai/api/paas/v4/ ``` Note: When using the [GLM Coding Plan](/devpack/overview), you need to configure the dedicated \ Coding endpoint - [https://api.z.ai/api/coding/paas/v4](https://api.z.ai/api/coding/paas/v4) \ instead of the general endpoint - [https://api.z.ai/api/paas/v4](https://api.z.ai/api/paas/v4) \ Note: The Coding API endpoint is only for Coding scenarios and is not applicable to general API scenarios. Please use them accordingly. ### Request Header Requirements ```http theme={null} Content-Type: application/json Authorization: Bearer YOUR_API_KEY ``` ### Supported Authentication Methods The simplest authentication method, directly using your API Key: ```bash theme={null} curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --header 'Accept-Language: en-US,en' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm-5.1", "messages": [ { "role": "user", "content": "Hello" } ] }' ``` Use JWT Token for authentication, suitable for scenarios requiring higher security: Install PyJWT ```shell theme={null} pip install PyJWT ``` ```python theme={null} import time import jwt def generate_token(apikey: str, exp_seconds: int): try: id, secret = apikey.split(".") except Exception as e: raise Exception("invalid apikey", e) payload = { "api_key": id, "exp": int(round(time.time() * 1000)) + exp_seconds * 1000, "timestamp": int(round(time.time() * 1000)), } return jwt.encode( payload, secret, algorithm="HS256", headers={"alg": "HS256", "sign_type": "SIGN"}, ) # Use the generated token token = generate_token("your-api-key", 3600) # 1 hour validity ``` ## Basic Call Examples ### Simple Conversation ```bash theme={null} curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --header 'Accept-Language: en-US,en' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm-5.1", "messages": [ { "role": "user", "content": "Please introduce the development history of artificial intelligence" } ], "temperature": 1.0, "max_tokens": 1024 }' ``` ### Streaming Response ```bash theme={null} curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --header 'Accept-Language: en-US,en' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm-5.1", "messages": [ { "role": "user", "content": "Write a poem about spring" } ], "stream": true }' ``` ### Multi-turn Conversation ```bash theme={null} curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --header 'Accept-Language: en-US,en' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm-5.1", "messages": [ { "role": "system", "content": "You are a professional programming assistant" }, { "role": "user", "content": "What is recursion?" }, { "role": "assistant", "content": "Recursion is a programming technique where a function calls itself to solve problems..." }, { "role": "user", "content": "Can you give me an example of Python recursion?" } ] }' ``` ## Common Programming Language Examples ```python theme={null} import requests import json def call_zai_api(messages, model="glm-5.1"): url = "https://api.z.ai/api/paas/v4/chat/completions" headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json", "Accept-Language": "en-US,en" } data = { "model": model, "messages": messages, "temperature": 1.0 } response = requests.post(url, headers=headers, json=data) if response.status_code == 200: return response.json() else: raise Exception(f"API call failed: {response.status_code}, {response.text}") # Usage example messages = [ {"role": "user", "content": "Hello, please introduce yourself"} ] result = call_zai_api(messages) print(result['choices'][0]['message']['content']) ``` ```javascript theme={null} async function callZAPI(messages, model = 'glm-4.7') { const url = 'https://api.z.ai/api/paas/v4/chat/completions'; const response = await fetch(url, { method: 'POST', headers: { 'Authorization': 'Bearer YOUR_API_KEY', 'Content-Type': 'application/json', 'Accept-Language': 'en-US,en' }, body: JSON.stringify({ model: model, messages: messages, temperature: 1.0 }) }); if (!response.ok) { throw new Error(`API call failed: ${response.status}`); } return await response.json(); } // Usage example const messages = [ { role: 'user', content: 'Hello, please introduce yourself' } ]; callZAPI(messages) .then(result => { console.log(result.choices[0].message.content); }) .catch(error => { console.error('Error:', error); }); ``` ```java theme={null} import com.fasterxml.jackson.databind.ObjectMapper; import okhttp3.MediaType; import okhttp3.OkHttpClient; import okhttp3.Request; import okhttp3.RequestBody; import okhttp3.Response; import java.util.Collections; import java.util.HashMap; import java.util.Map; public class AgentExample { public static void main(String[] args) throws Exception { OkHttpClient client = new OkHttpClient(); ObjectMapper mapper = new ObjectMapper(); Map messages = new HashMap<>(8); messages.put("role", "user"); messages.put("content", "Hello, please introduce yourself"); Map requestBody = new HashMap<>(); requestBody.put("model", "glm-5.1"); requestBody.put("messages", Collections.singletonList(messages)); requestBody.put("temperature", 1.0); String jsonBody = mapper.writeValueAsString(requestBody); MediaType JSON = MediaType.get("application/json; charset=utf-8"); RequestBody body = RequestBody.create(JSON, jsonBody); Request request = new Request.Builder() .url("https://api.z.ai/api/paas/v4/chat/completions") .addHeader("Authorization", "Bearer your_api_key") .addHeader("Content-Type", "application/json") .addHeader("Accept-Language", "en-US,en") .post(body) .build(); try (Response response = client.newCall(request).execute()) { System.out.println(response.body().string()); } } } ``` ## Best Practices * Properly secure API Keys, do not hard-code them in your code * Use environment variables or configuration files to store sensitive information * Regularly rotate API Keys * Implement connection pooling and session reuse * Set reasonable timeout values * Use asynchronous requests for high-concurrency scenarios * Implement exponential backoff retry mechanisms * Log detailed error information * Set reasonable timeout and retry limits * Monitor API call frequency and success rates * Track response times and error rates * Set up alerting mechanisms ## Get More View complete API interface documentation and parameter descriptions Get technical support and assistance It is recommended to use HTTPS protocol in production environments and implement appropriate security measures to protect your API keys and data transmission. # Official Java SDK Source: https://docs.z.ai/guides/develop/java/introduction Z.AI Java SDK is the official Java development toolkit provided by Z.AI, offering Java developers convenient and efficient AI model integration solutions. ### Core Advantages Designed for enterprise applications, supports high concurrency and high availability Clean API design, comprehensive documentation, quick integration into existing projects Complete type definitions, compile-time error checking, reducing runtime errors Optimized network request handling, supports connection pooling and asynchronous calls ### Supported Features * **💬 Chat Conversations**: Support for single-turn and multi-turn conversations, streaming and non-streaming responses * **🔧 Function Calling**: Enable AI models to call your custom functions * **👁️ Vision Understanding**: Image analysis, visual understanding * **🎨 Image Generation**: Generate high-quality images from text descriptions * **🎬 Video Generation**: Creative content generation from text to video * **🔊 Speech Processing**: Speech-to-text, text-to-speech * **📊 Text Embedding**: Text vectorization, supporting semantic search * **🤖 Intelligent Assistants**: Build professional AI assistant applications ## Technical Specifications ### Environment Requirements * **Java Version**: Java 1.8 or higher * **Build Tools**: Maven 3.6+ or Gradle 6.0+ * **Network Requirements**: HTTPS connection support * **API Key**: Valid Z.AI API key required ### Dependency Management The SDK adopts a modular design, allowing you to selectively import functional modules as needed: * **Core Module**: Basic API calling functionality * **Async Module**: Asynchronous and concurrent processing support * **Utility Module**: Utility tools and auxiliary functions ## Quick Start ### Environment Requirements Java 1.8 or higher Maven 3.6+ or Gradle 6.0+ Supports Java 8, 11, 17, 21 versions, cross-platform compatible with Windows, macOS, Linux ### Add Dependencies ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` ```gradle theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` ### Get API Key 1. Access [Z.AI Open Platform](https://z.ai/model-api), Register or Login. 2. Create an API Key in the [API Keys](https://z.ai/manage-apikey/apikey-list) management page. 3. Copy your API Key for use. It is recommended to set the API Key as an environment variable: `export ZAI_API_KEY=your-api-key` Z.AI domestic platform uses ZaiClient ``` API URL: https://api.z.ai/api/paas/v4/ ``` #### Create Client ```java theme={null} import ai.z.openapi.ZaiClient; public class QuickStart { public static void main(String[] args) { // Read API Key from environment variable ZaiClient client = ZaiClient.builder().ofZAI() .apiKey(System.getenv("ZAI_API_KEY")) .build(); // Or use directly (if environment variable is set) ZaiClient client2 = ZaiClient.builder().ofZAI().build(); } } ``` ```java theme={null} import ai.z.openapi.ZaiClient; public class QuickStart { public static void main(String[] args) { // Set API Key directly ZaiClient client = ZaiClient.builder().ofZAI() .apiKey("YOUR_API_KEY") .build(); } } ``` #### Basic Conversation ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.*; import ai.z.openapi.core.Constants; import java.util.Arrays; public class BasicChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI() .apiKey("YOUR_API_KEY") .build(); // Create chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5.1") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Hello, please introduce yourself") .build() )) .build(); // Send request ChatCompletionResponse response = client.chat().createChatCompletion(request); // Get reply if (response.isSuccess()) { Object reply = response.getData().getChoices().get(0).getMessage().getContent(); System.out.println("AI Reply: " + reply); } else { System.err.println("Error: " + response.getMsg()); } } } ``` #### Streaming Conversation ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.*; import ai.z.openapi.core.Constants; import java.util.Arrays; public class StreamingChat { public static void main(String[] args) { ZaiClient client = ZaiClient.builder().ofZAI() .apiKey("YOUR_API_KEY") .build(); // Create streaming chat request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5.1") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Write a poem about spring") .build() )) .stream(true) .build(); // Handle streaming response ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess() && response.getFlowable() != null) { response.getFlowable().subscribe( data -> { // Handle streaming data chunks if (data.getChoices() != null && !data.getChoices().isEmpty()) { Delta content = data.getChoices().get(0).getDelta(); if (content != null) { System.out.print(content); } } }, error -> System.err.println("\nStreaming error: " + error.getMessage()), () -> System.out.println("\nStreaming completed") ); } } } ``` ### Complete Example ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.*; import ai.z.openapi.core.Constants; import java.util.*; public class ChatBot { private final ZaiClient client; private final List conversation; public ChatBot(String apiKey) { this.client = ZaiClient.builder().ofZAI() .apiKey(apiKey) .build(); this.conversation = new ArrayList<>(); // Add system message this.conversation.add(ChatMessage.builder() .role(ChatMessageRole.SYSTEM.value()) .content("You are a friendly AI assistant") .build()); } public Object chat(String userInput) { try { // Add user message conversation.add(ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content(userInput) .build()); // Create request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5.1") .messages(conversation) .temperature(0.6f) .maxTokens(1000) .build(); // Send request ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { // Get AI response Object aiResponse = response.getData().getChoices().get(0).getMessage().getContent(); // Add AI response to conversation history conversation.add(ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content(aiResponse) .build()); return aiResponse; } else { return "Error occurred: " + response.getMsg(); } } catch (Exception e) { return "Error occurred: " + e.getMessage(); } } public static void main(String[] args) { ChatBot bot = new ChatBot(System.getenv("ZAI_API_KEY")); Scanner scanner = new Scanner(System.in); System.out.println("Welcome to Z.ai chatbot! Type 'quit' to exit."); while (true) { System.out.print("You: "); String input = scanner.nextLine(); if ("quit".equalsIgnoreCase(input)) { break; } Object response = bot.chat(input); System.out.println("AI: " + response); } System.out.println("Goodbye!"); scanner.close(); } } ``` ## Advanced Features ### Function Calling Function calling allows AI models to call functions you define to get real-time information or perform specific operations. #### Defining and Using Functions ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.*; import ai.z.openapi.core.Constants; import java.util.*; public class FunctionCallingExample { // Simulate weather API public static Map getWeather(String location, String date) { Map weather = new HashMap<>(); weather.put("location", location); weather.put("date", date != null ? date : "today"); weather.put("weather", "sunny"); weather.put("temperature", "25°C"); weather.put("humidity", "60%"); return weather; } // Simulate stock API public static Map getStockPrice(String symbol) { Map stock = new HashMap<>(); stock.put("symbol", symbol); stock.put("price", 150.25); stock.put("change", "+2.5%"); return stock; } public static void main(String[] args) { ZaiClient client = ZaiClient.builder().ofZAI() .apiKey(System.getenv("ZAI_API_KEY")) .build(); // Define function tools Map properties = new HashMap<>(); ChatFunctionParameterProperty locationProperty = ChatFunctionParameterProperty .builder().type("string").description("City name, for example: Beijing").build(); properties.put("location", locationProperty); ChatFunctionParameterProperty unitProperty = ChatFunctionParameterProperty .builder().type("string").enums(Arrays.asList("celsius", "fahrenheit")).build(); properties.put("unit", unitProperty); ChatTool weatherTool = ChatTool.builder() .type(ChatToolType.FUNCTION.value()) .function(ChatFunction.builder() .name("get_weather") .description("Get weather information for a specified location") .parameters(ChatFunctionParameters.builder() .type("object") .properties(properties) .required(Collections.singletonList("location")) .build()) .build()) .build(); // Create request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5.1") .messages(Collections.singletonList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("How's the weather in Beijing today?") .build() )) .tools(Collections.singletonList(weatherTool)) .toolChoice("auto") .build(); // Send request ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { // Handle function calling ChatMessage assistantMessage = response.getData().getChoices().get(0).getMessage(); if (assistantMessage.getToolCalls() != null && !assistantMessage.getToolCalls().isEmpty()) { for (ToolCalls toolCall : assistantMessage.getToolCalls()) { String functionName = toolCall.getFunction().getName(); if ("get_weather".equals(functionName)) { Map result = getWeather("Beijing", null); System.out.println("Weather info: " + result); } } } else { System.out.println(assistantMessage.getContent()); } } else { System.err.println("Error: " + response.getMsg()); } } } ``` ## Getting Help View source code, submit issues, contribute View complete API documentation Browse more practical application examples Learn best practices for SDK usage This SDK is developed based on the latest API specifications from Z.AI, ensuring synchronization with platform features. It is recommended to regularly update to the latest version for the best experience. # LangChain Integration Source: https://docs.z.ai/guides/develop/langchain/introduction LangChain is a framework for developing applications powered by language models. Z.AI's integration with LangChain enables you to: * Use LangChain's chain calling functionality * Build intelligent agents and tool calling * Implement complex conversation memory management ## Core Advantages Access to LangChain's rich ecosystem and toolchain Quickly build complex AI applications using pre-built components Flexibly combine different components to meet various needs Enjoy active open source community and rich resources ## Environment Requirements Python 3.8 or higher langchain\_community version 0.0.32 or higher Please ensure langchain\_community version is 0.0.32 or higher for optimal compatibility and feature support. ## Install Dependencies ### Basic Installation ```bash theme={null} # Install LangChain and related dependencies pip install langchain langchainhub httpx_sse # Install OpenAI compatible package pip install langchain-openai ``` ### Complete Installation ```bash theme={null} # Install all related packages at once pip install langchain langchain-openai langchainhub httpx_sse # Verify installation python -c "import langchain; print(langchain.__version__)" ``` ## Quick Start ### Get API Key 1. Access [Z.AI Open Platform](https://z.ai/model-api), Register or Login. 2. Create an API Key in the [API Keys](https://z.ai/manage-apikey/apikey-list) management page. 3. Copy your API Key for use. It is recommended to set the API Key as an environment variable: `export ZAI_API_KEY=your-api-key` ### Basic Configuration ```python theme={null} import os from langchain_openai import ChatOpenAI # Create Z.AI LLM instance llm = ChatOpenAI( temperature=0.6, model="glm-5.1", openai_api_key="your-Z.AI-api-key", openai_api_base="https://api.z.ai/api/paas/v4/" ) # Or use environment variables llm = ChatOpenAI( temperature=0.6, model="glm-5.1", openai_api_key=os.getenv("ZAI_API_KEY"), openai_api_base="https://api.z.ai/api/paas/v4/" ) ``` ## Basic Usage Examples ### Simple Conversation ```python theme={null} from langchain_openai import ChatOpenAI from langchain.schema import HumanMessage, SystemMessage # Create LLM instance llm = ChatOpenAI( temperature=0.7, model="glm-5.1", openai_api_key="your-Z.AI-api-key", openai_api_base="https://api.z.ai/api/paas/v4/" ) # Create messages messages = [ SystemMessage(content="You are a helpful AI assistant"), HumanMessage(content="Please introduce the development history of artificial intelligence") ] # Call the model response = llm(messages) print(response.content) ``` ### Using Prompt Templates ```python theme={null} from langchain.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI # Create LLM llm = ChatOpenAI( model="glm-5.1", openai_api_key="your-Z.AI-api-key", openai_api_base="https://api.z.ai/api/paas/v4/" ) # Create prompt template prompt = ChatPromptTemplate.from_messages([ ("system", "You are a professional {domain} expert"), ("human", "Please explain the concept and applications of {topic}") ]) # Create chain chain = prompt | llm # Invoke chain response = chain.invoke({ "domain": "machine learning", "topic": "deep learning" }) print(response.content) ``` ### Conversation Memory Management ```python theme={null} from langchain_openai import ChatOpenAI from langchain.prompts import ( ChatPromptTemplate, MessagesPlaceholder, SystemMessagePromptTemplate, HumanMessagePromptTemplate, ) from langchain.chains import LLMChain from langchain.memory import ConversationBufferMemory # Create LLM llm = ChatOpenAI( temperature=1.0, model="glm-5.1", openai_api_key="your-Z.AI-api-key", openai_api_base="https://api.z.ai/api/paas/v4/" ) # Create prompt template prompt = ChatPromptTemplate( messages=[ SystemMessagePromptTemplate.from_template( "You are a nice chatbot having a conversation with a human." ), MessagesPlaceholder(variable_name="chat_history"), HumanMessagePromptTemplate.from_template("{question}") ] ) # Create memory memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True) # Create conversation chain conversation = LLMChain( llm=llm, prompt=prompt, verbose=True, memory=memory ) # Have conversations response1 = conversation.invoke({"question": "tell me a joke"}) print("AI:", response1['text']) response2 = conversation.invoke({"question": "tell me another one"}) print("AI:", response2['text']) ``` ## Advanced Features ### Intelligent Agent ```python theme={null} import os from langchain import hub from langchain.agents import AgentExecutor, create_react_agent from langchain_community.tools.tavily_search import TavilySearchResults from langchain_openai import ChatOpenAI # Set search tool API key os.environ["TAVILY_API_KEY"] = "your-tavily-api-key" # Create LLM llm = ChatOpenAI( model="glm-5.1", openai_api_key="your-Z.AI-api-key", openai_api_base="https://api.z.ai/api/paas/v4/" ) # Create tools tools = [TavilySearchResults(max_results=2)] # Get prompt template prompt = hub.pull("hwchase17/react") # Create agent agent = create_react_agent(llm, tools, prompt) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True) # Execute task result = agent_executor.invoke({"input": "what is LangChain?"}) print(result['output']) ``` ### Streaming Output ```python theme={null} from langchain_openai import ChatOpenAI from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler from langchain.schema import HumanMessage # Create LLM with streaming output llm = ChatOpenAI( model="glm-5.1", openai_api_key="your-Z.AI-api-key", openai_api_base="https://api.z.ai/api/paas/v4/", streaming=True, callbacks=[StreamingStdOutCallbackHandler()] ) # Send message (output will be displayed in real-time streaming) response = llm([HumanMessage(content="Write a poem about spring")]) ``` ## Best Practices * Enable LangChain caching mechanism * Use batch processing to reduce API calls * Set reasonable max\_tokens limits * Use async processing for better concurrency * Implement retry mechanisms and exponential backoff * Set reasonable timeout values * Log detailed error information * Provide fallback solutions * Use ConversationBufferWindowMemory to limit history length * Regularly clean unnecessary conversation history * Monitor memory usage * Implement conversation summarization * Use environment variables to store API keys * Implement input validation and filtering * Monitor API usage and costs * Rotate API keys regularly ## Getting Help View complete Z.AI API documentation View LangChain official documentation and tutorials LangChain is a rapidly evolving framework. It is recommended to update to the latest version regularly for optimal functionality and performance. Meanwhile, Z.AI will continue to optimize integration with LangChain to ensure the best compatibility and user experience. # OpenAI Python SDK Source: https://docs.z.ai/guides/develop/openai/python Z.AI provides interfaces compatible with OpenAI API, which means you can use existing OpenAI SDK code and seamlessly switch to Z.AI's model services by simply modifying the API key and base URL. This compatibility allows you to: * Quickly migrate existing OpenAI applications * Use familiar development patterns and tools * Enjoy the powerful capabilities of Z.AI models * Maintain code consistency and maintainability In some scenarios, there are still differences between Z.AI and OpenAI interfaces, but this does not affect overall compatibility. ### Core Advantages If you are already familiar with OpenAI SDK, you can start using it immediately Existing OpenAI applications can be quickly migrated to Z.AI platform Compatible with various tools and frameworks in the OpenAI ecosystem Follow OpenAI SDK updates to maintain latest feature support ## Environment Requirements Python 3.7.1 or higher OpenAI SDK version 1.0.0 or higher Please ensure using OpenAI SDK 1.0.0 or higher, older versions may have compatibility issues. ## Install OpenAI Python SDK ### Install using pip ```bash theme={null} # Install or upgrade to latest version pip install --upgrade 'openai>=1.0' # Verify installation python -c "import openai; print(openai.__version__)" ``` ### Install using poetry ```bash theme={null} poetry add "openai>=1.0" ``` ## Quick Start ### Get API Key 1. Access [Z.AI Open Platform](https://z.ai/model-api), Register or Login. 2. Create an API Key in the [API Keys](https://z.ai/manage-apikey/apikey-list) management page. 3. Copy your API Key for use. It is recommended to set the API Key as an environment variable: `export ZAI_API_KEY=your-api-key` ### Create Client ```python theme={null} from openai import OpenAI # Create Z.AI client client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/" ) ``` ```python theme={null} from openai import OpenAI import os # Use environment variables client = OpenAI( api_key=os.getenv("ZAI_API_KEY"), base_url="https://api.z.ai/api/paas/v4/" ) ``` ```python theme={null} from openai import OpenAI from dataclasses import dataclass @dataclass class Z.AIConfig: api_key: str base_url: str = "https://api.z.ai/api/paas/v4/" timeout: int = 30 max_retries: int = 3 config = Z.AIConfig(api_key="your-api-key") client = OpenAI( api_key=config.api_key, base_url=config.base_url, timeout=config.timeout, max_retries=config.max_retries ) ``` ## Quick Start Examples ### Basic Chat ```python theme={null} from openai import OpenAI client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/" ) completion = client.chat.completions.create( model="glm-5.1", messages=[ {"role": "system", "content": "You are a smart and creative novelist"}, {"role": "user", "content": "Please write a short fairy tale story as a fairy tale master"} ] ) print(completion.choices[0].message.content) ``` ### Streaming Response ```python theme={null} from openai import OpenAI client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/" ) stream = client.chat.completions.create( model="glm-5.1", messages=[ {"role": "user", "content": "Write a poem about artificial intelligence"} ], stream=True, temperature=1.0 ) for chunk in stream: if chunk.choices[0].delta.content is not None: print(chunk.choices[0].delta.content, end="", flush=True) print() # New line ``` ### Multi-turn Conversation ```python theme={null} from openai import OpenAI class ChatBot: def __init__(self, api_key: str): self.client = OpenAI( api_key=api_key, base_url="https://api.z.ai/api/paas/v4/" ) self.conversation = [ {"role": "system", "content": "You are a helpful AI assistant"} ] def chat(self, user_input: str) -> str: # Add user message self.conversation.append({"role": "user", "content": user_input}) # Call API response = self.client.chat.completions.create( model="glm-5.1", messages=self.conversation, temperature=1.0 ) # Get AI response ai_response = response.choices[0].message.content # Add to conversation history self.conversation.append({"role": "assistant", "content": ai_response}) return ai_response def clear_history(self): """Clear conversation history, keep system prompt""" self.conversation = self.conversation[:1] # Usage example bot = ChatBot("your-api-key") print(bot.chat("Hello, please introduce yourself")) print(bot.chat("Can you help me write code?")) print(bot.chat("Write a Python quicksort algorithm")) ``` ## Advanced Features ### Thinking Mode In thinking mode, GLM-4.6, GLM-4.5 and GLM-4.5-Air can solve complex reasoning problems, including mathematics, science, and logic problems. The param `thinking.type` can be either `enabled` or `disabled`. ```python theme={null} import os from openai import OpenAI client = OpenAI(api_key='your-api-key', base_url='https://api.z.ai/api/paas/v4/') response = client.chat.completions.create( model='glm-4.7', messages=[ {"role": "system", "content": "you are a helpful assistant"}, {"role": "user", "content": "what is the revolution of llm?"} ], stream=True, extra_body={ "thinking": { "type": "enabled", }, } ) for chunk in response: if chunk.choices[0].delta.reasoning_content: print(chunk.choices[0].delta.reasoning_content, end='') if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end='') ``` ### Function Calling ```python theme={null} import json from openai import OpenAI client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/" ) def get_weather(location: str) -> str: """Get weather information for specified location""" # This should call a real weather API return f"Weather in {location}: Sunny, 25°C" # Define function description tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get weather information for specified location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "Location name, e.g.: Beijing, Shanghai" } }, "required": ["location"] } } } ] # Call conversation with functions response = client.chat.completions.create( model="glm-5.1", messages=[ {"role": "user", "content": "How's the weather in Beijing today?"} ], tools=tools, tool_choice="auto" ) # Handle function calls message = response.choices[0].message if message.tool_calls: for tool_call in message.tool_calls: if tool_call.function.name == "get_weather": args = json.loads(tool_call.function.arguments) result = get_weather(args["location"]) print(f"Function call result: {result}") ``` ## Parameter Configuration ### Common Parameters | Parameter | Type | Default | Description | | ----------- | ------------ | -------- | -------------------------------- | | model | string | Required | Model name to use | | messages | array | Required | List of conversation messages | | temperature | float | 0.6 | Controls output randomness (0-1) | | top\_p | float | 0.95 | Nucleus sampling parameter (0-1) | | max\_tokens | integer | - | Maximum output tokens | | stream | boolean | false | Whether to use streaming output | | stop | string/array | - | Stop generation tokens | Note: The temperature parameter range is (0,1), do\_sample = False (temperature = 0) is not applicable in OpenAI calls. ## Best Practices * Use connection pooling and session reuse * Set reasonable timeout values * Implement async calls for high concurrency * Cache frequently used responses * Set reasonable max\_tokens limits * Use appropriate models (don't overuse powerful models) * Implement request deduplication * Monitor API usage * Use environment variables to store API keys * Implement input validation and filtering * Log and monitor API calls * Rotate API keys regularly * Implement retry mechanisms and error handling * Set reasonable timeout values * Monitor API status and response times * Prepare fallback solutions ## Migration Guide ### Migrating from OpenAI If you're already using OpenAI API, migrating to Z.AI is very simple: ```python theme={null} # Original OpenAI code from openai import OpenAI client = OpenAI( api_key="sk-...", # OpenAI API Key # base_url uses default value ) # Migrate to Z.AI, only need to modify two places client = OpenAI( api_key="your-Z.AI-api-key", # Replace with Z.AI API Key base_url="https://api.z.ai/api/paas/v4/" # Add Z.AI base_url ) # Other code remains unchanged response = client.chat.completions.create( model="glm-5.1", # Use Z.AI model messages=[{"role": "user", "content": "Hello!"}] ) ``` ## Getting Help View complete API interface documentation Refer to OpenAI official documentation for more usage Z.AI is committed to maintaining compatibility with OpenAI API. If you encounter any issues during migration, please contact our technical support team. # Official Python SDK Source: https://docs.z.ai/guides/develop/python/introduction Z.AI Python SDK is the official Python development toolkit provided by Z.AI, offering Python developers convenient and efficient AI model integration solutions. ### Core Advantages Pythonic API design, comprehensive documentation and examples for quick start Supports Z.AI's full range of models, including language, vision, image generation, etc. Async support, connection pool management, optimized network request handling Complete type hints, IDE-friendly, reducing development errors ### Supported Features * **💬 Chat Conversations**: Support for single-turn and multi-turn conversations, streaming and non-streaming responses * **🔧 Function Calling**: Enable AI models to call your custom functions * **👁️ Vision Understanding**: Image analysis, visual understanding * **🎨 Image Generation**: Generate high-quality images from text descriptions * **🎬 Video Generation**: Creative content generation from text to video * **🔊 Speech Processing**: Speech-to-text, text-to-speech * **📊 Text Embedding**: Text vectorization, supporting semantic search * **🤖 Intelligent Assistants**: Build professional AI assistant applications * **🛡️ Content Moderation**: Text and image content safety detection ## Technical Specifications ### Environment Requirements * **Python Version**: Python 3.8 or higher * **Package Manager**: pip or poetry * **Network Requirements**: HTTPS connection support * **API Key**: Valid Z.AI API key required ### Dependency Management The SDK adopts a modular design, allowing you to selectively install functional modules as needed: * **Core Module**: Basic API calling functionality * **Async Module**: Asynchronous and concurrent processing support * **Utility Module**: Utility tools and auxiliary functions ## Quick Start ### Environment Requirements Python 3.8 or higher poetry (recommended), uv (recommended), pip Supports Python 3.8, 3.9, 3.10, 3.11, 3.12 versions, cross-platform compatible with Windows, macOS, Linux ### Install SDK #### Install using pip ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` #### Verify Installation ```python theme={null} import zai print(zai.__version__) ``` ### Get API Key 1. Access [Z.AI Open Platform](https://z.ai/model-api), Register or Login. 2. Create an API Key in the [API Keys](https://z.ai/manage-apikey/apikey-list) management page. 3. Copy your API Key for use. It is recommended to set the API Key as an environment variable: `export ZAI_API_KEY=your-api-key` Domestic Z.AI platform uses ZaiClient ``` Domestic API URL: https://api.z.ai/api/paas/v4/ ``` #### Create Client ```python theme={null} from zai import ZaiClient import os # Read API Key from environment variable client = ZaiClient(api_key=os.getenv("ZAI_API_KEY")) # Or use directly (if environment variable is set) client = ZaiClient() ``` ```python theme={null} from zai import ZaiClient, ZaiClient # Set API Key directly client = ZaiClient(api_key="abc123.def456") ``` #### Basic Conversation ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key="your-api-key") # Create chat completion response = client.chat.completions.create( model="glm-5.1", messages=[ {"role": "user", "content": "Hello, please introduce yourself, Z.ai!"} ] ) print(response.choices[0].message.content) ``` #### Streaming Conversation ```python theme={null} # Create streaming chat request from zai import ZaiClient # Initialize client client = ZaiClient(api_key="your-api-key") # Create chat completion response = client.chat.completions.create( model='glm-4.7', messages=[ {'role': 'system', 'content': 'You are an AI writer.'}, {'role': 'user', 'content': 'Tell a story about AI.'}, ], stream=True, ) for chunk in response: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end='') ``` #### Multi-turn Conversation ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") response = client.chat.completions.create( model="glm-5.1", # Please fill in the model name you want to call messages=[ {"role": "user", "content": "As a marketing expert, please create an attractive slogan for my product"}, {"role": "assistant", "content": "Of course, to create an attractive slogan, please tell me some information about your product"}, {"role": "user", "content": "Z.AI Open Platform"}, {"role": "assistant", "content": "Ignite the future, Z.AI draws infinite possibilities, making innovation within reach!"}, {"role": "user", "content": "Create a more precise and attractive slogan"} ], ) print(response.choices[0].message.content) ``` ### Complete Example ```python theme={null} from zai import ZaiClient import os def main(): # Initialize client client = ZaiClient(api_key=os.getenv("ZAI_API_KEY")) print("Welcome to Z.ai Chatbot! Type 'quit' to exit.") # Conversation history conversation = [ {"role": "system", "content": "You are a friendly AI assistant"} ] while True: # Get user input user_input = input("You: ") if user_input.lower() == 'quit': break try: # Add user message conversation.append({"role": "user", "content": user_input}) # Create chat request response = client.chat.completions.create( model="glm-5.1", messages=conversation, temperature=1.0, max_tokens=1000 ) # Get AI response ai_response = response.choices[0].message.content print(f"AI: {ai_response}") # Add AI response to conversation history conversation.append({"role": "assistant", "content": ai_response}) except Exception as e: print(f"Error occurred: {e}") print("Goodbye!") if __name__ == "__main__": main() ``` ### Error Handling ```python theme={null} from zai import ZaiClient import zai def robust_chat(message): client = ZaiClient(api_key="your-api-key") try: response = client.chat.completions.create( model="glm-5.1", messages=[{"role": "user", "content": message}] ) return response.choices[0].message.content except zai.core.APIStatusError as err: return f"API status error: {err}" except zai.core.APITimeoutError as err: return f"Request timeout: {err}" except Exception as err: return f"Other error: {err}" # Usage example result = robust_chat("Hello") print(result) ``` ### Advanced Configuration ```python theme={null} import httpx from zai import ZaiClient # Custom HTTP client httpx_client = httpx.Client( limits=httpx.Limits( max_keepalive_connections=20, max_connections=100 ), timeout=30.0 ) # Create client with custom configuration client = ZaiClient( api_key="your-api-key", base_url="https://api.z.ai/api/paas/v4/", timeout=httpx.Timeout(timeout=300.0, connect=8.0), max_retries=3, http_client=httpx_client ) ``` ## Advanced Features ### Function Calling Function calling allows AI models to call functions you define to get real-time information or perform specific operations. #### Defining and Using Functions ```python theme={null} from zai import ZaiClient import json # Define functions def get_weather(location, date=None): """Get weather information""" # Simulate weather API call return { "location": location, "date": date or "today", "weather": "sunny", "temperature": "25°C", "humidity": "60%" } def get_stock_price(symbol): """Get stock price""" # Simulate stock API call return { "symbol": symbol, "price": 150.25, "change": "+2.5%" } # Function descriptions tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get weather information for a specified location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "Location name" }, "date": { "type": "string", "description": "Date in YYYY-MM-DD format" } }, "required": ["location"] } } }, { "type": "function", "function": { "name": "get_stock_price", "description": "Get current stock price", "parameters": { "type": "object", "properties": { "symbol": { "type": "string", "description": "Stock symbol" } }, "required": ["symbol"] } } } ] # Use function calling client = ZaiClient(api_key="your-api-key") response = client.chat.completions.create( model='glm-4.7', messages=[ {'role': 'user', 'content': 'How\'s the weather in Beijing today?'} ], tools=tools, tool_choice="auto" ) # Handle function calling if response.choices[0].message.tool_calls: for tool_call in response.choices[0].message.tool_calls: function_name = tool_call.function.name function_args = json.loads(tool_call.function.arguments) if function_name == "get_weather": result = get_weather(**function_args) print(f"Weather info: {result}") elif function_name == "get_stock_price": result = get_stock_price(**function_args) print(f"Stock info: {result}") else: print(response.choices[0].message.content) ``` ### Web Search Tool ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key="your-api-key") # Use web search tool response = client.chat.completions.create( model='glm-4.7', messages=[ {'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'What is artificial intelligence?'}, ], tools=[ { 'type': 'web_search', 'web_search': { 'search_query': 'What is artificial intelligence?', 'search_result': True, }, } ], temperature=0.5, max_tokens=2000, ) print(response) ``` ### Video Generation ```python theme={null} from zai import ZaiClient import time client = ZaiClient(api_key="your-api-key") # Submit generation task response = client.videos.generations( model="cogvideox-3", # Video generation model to use image_url=image_url, # Provided image URL or Base64 encoding prompt="Make the scene come alive", quality="speed", # Output mode: "quality" for quality priority, "speed" for speed priority with_audio=True, size="1920x1080", # Video resolution, supports up to 4K (e.g., "3840x2160") fps=30, # Frame rate, can be 30 or 60 ) print(response) # Get generation result time.sleep(60) # Wait for a while to ensure video generation is complete result = client.videos.retrieve_videos_result(id=response.id) print(result) ``` ### Streaming Processing ```python theme={null} class StreamProcessor: def __init__(self, client): self.client = client self.full_response = "" def stream_chat(self, messages, model="glm-5.1", callback=None): """Streaming chat processing""" stream = self.client.chat.completions.create( model=model, messages=messages, stream=True ) self.full_response = "" for chunk in stream: if chunk.choices[0].delta.content is not None: content = chunk.choices[0].delta.content self.full_response += content if callback: callback(content, self.full_response) else: print(content, end="", flush=True) print() # New line return self.full_response # Usage example processor = StreamProcessor(client) # Custom callback function def on_token_received(token, full_text): # You can implement real-time processing logic here print(token, end="", flush=True) response = processor.stream_chat( messages=[{"role": "user", "content": "Write a Python function to calculate Fibonacci sequence"}], callback=on_token_received ) ``` ## Getting Help View source code, submit issues, contribute View complete API documentation Browse more practical application examples Learn best practices for SDK usage This SDK is developed based on the latest API specifications from Z.AI, ensuring synchronization with platform features. It is recommended to regularly update to the latest version for the best experience. # CogView-4 Source: https://docs.z.ai/guides/image/cogview-4 ## Overview CogView-4 is Z.AI’s first open-source text-to-image model. It has comprehensive improvements in semantic understanding, image generation quality, and the ability to generate both English and Chinese text. It supports bilingual input of any length in Chinese and English and can generate images of any resolution within a specified range. \$0.01 / image Text Image ## Usage Generates visually appealing, detailed, and realistic food images based on dish names, ingredient characteristics, and style requirements, incorporating creative text elements. Suitable for menu design, food delivery platform displays, and offline posters. Quickly generates high-resolution product display images based on product features and selling points, adding bilingual promotional text as needed. Fits the image requirements for different product pages and campaign visuals on e-commerce platforms. Produces high-resolution, detailed character illustrations and concept art based on game worldviews and character settings, meeting the needs of multi-resolution production. Analyzes teaching text content and automatically generates matching illustrations and scene images, adapted to the layout and resolution requirements of various educational materials, enhancing the visualization of knowledge. Generates promotional images in different sizes based on cultural and tourism themes, skillfully combining text with region-specific visual elements to increase the appeal of cultural and tourism marketing. ## Resources * [API Documentation](/api-reference/image/generate-image): Learn how to call the API. ## Introducting CogView-4 DPG-Bench (Dense Prompt Graph Benchmark) is a benchmark for evaluating text-to-image generation models, focusing on the model’s performance in complex semantic alignment and instruction following. At the time of release, CogView-4 ranked first overall in the DPG-Bench benchmark test, achieving SOTA performance among open-source text-to-image models. ![Description](https://cdn.bigmodel.cn/markdown/1749449849627DPG-Bench.png?attname=DPG-Bench.png) Technically, CogView-4 replaced the English-only T5 encoder with the bilingual GLM-4 encoder and trained the model using bilingual image-text data, enabling the model to handle bilingual prompts. CogView-4 supports Chinese and English prompts and is especially good at understanding and following Chinese prompts, greatly lowering the prompt threshold for users. It is the first open-source text-to-image model capable of generating Chinese characters in the images, making it particularly suitable for creative needs in advertising, short videos, and other fields. CogView-4 implements a mixed training paradigm of text descriptions (captions) of any length and images of any resolution. The model supports input prompts of any length and can generate images at any resolution within the supported range. This not only provides users with more creative freedom but also improves training efficiency. ## Examples Close-up, commercial food photography, intense indoor lighting, extreme detail. A Christmas dinner table, a corner of the table where a long-haired orange tabby cat leans its head close to a plate, greedily sniffing the festive feast with an expression of pure delight. The table features roast chicken, plants, salad, champagne, and gold-rimmed porcelain tea sets. Afternoon sunlight bathes the cat's profile in golden light, casting a soft glow over both the food and its fur. A Christmas tree adorns the background. The image emphasizes the texture of the food and the cat's coat, featuring strong lighting and a warm, festive Christmas atmosphere. Description Two opaque, non-reflective white milk tea cups are adorned with intricate golden patterns of varying sizes. The designs feature Christmas motifs, including reindeer and pine trees, set against a warm red background and twinkling holiday lights. Displayed within a miniature snow scene, they are illuminated by natural light. Description Cyberpunk samurai with a glowing katana and a robotic arm, standing in a neon-lit alley in Tokyo, rain reflecting on the wet pavement, Blade Runner aesthetic, cinematic, highly detailed, volumetric lighting -- ar 2:3. Description The dazzling nightscape of Victoria Harbour in Hong Kong employs double exposure techniques to seamlessly blend the bustling city skyline with spectacular fireworks. Multiple fireworks burst across the night sky, forming a massive heart shape perfectly superimposed at the center of the frame. The fireworks display a kaleidoscope of colors—gold, red, blue, and purple intertwine, illuminating the entire night sky. City lights twinkle in the background, with skyscraper silhouettes clearly visible. Neon lights along the streets accentuate the city's vibrant energy. The entire scene exudes a dreamlike and romantic atmosphere, immersing the viewer in the dazzling nightscape of Hong Kong. Description ## Quick Start ``` curl --request POST \ --url https://api.z.ai/api/paas/v4/images/generations \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data '{ "model": "cogView-4-250304", "prompt": "A cute little kitten sitting on a sunny windowsill, with the background of blue sky and white clouds.", "size": "1024x1024" }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Call Example** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") response = client.images.generations( model="cogView-4-250304", prompt="A cute little kitten sitting on a sunny windowsill, with the background of blue sky and white clouds.", ) print(response.data[0].url) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Call Example** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.core.Constants; import ai.z.openapi.service.image.CreateImageRequest; import ai.z.openapi.service.image.ImageResponse; public class CogView4Example { public static void main(String[] args) { ZaiClient client = ZaiClient.builder().ofZAI().apiKey("YOUR_API_KEY").build(); // Create image generation request CreateImageRequest request = CreateImageRequest.builder() .model(Constants.ModelCogView4250304) .prompt("A cute little kitten sitting on a sunny windowsill, with the background of blue sky and white clouds.") .size("1024x1024") .build(); ImageResponse response = client.images().createImage(request); System.out.println(response.getData()); } } ``` Please note that the output of the CogView-4 model is an image URL. You will need to download the image using this URL. # GLM-Image Source: https://docs.z.ai/guides/image/glm-image ## Overview GLM-Image is Z.AI's new flagship image generation model, which adopts an original hybrid architecture of "autoregressive + diffusion decoder", taking into account both global instruction understanding and local detail portrayal, overcoming the challenges in generating knowledge-intensive scenarios such as posters, PPTs, and science popularization diagrams. It represents an important exploration of the new generation of "cognitive generative" technology paradigm represented by Nano Banana Pro. \$0.015 / image Text Image Supports 1:1, 3:4, 4:3, 16:9, etc. **Recommended common resolutions:** 1280×1280, 1568×1056, 1056×1568, 1472×1088, 1088×1472, 1728×960, 960×1728. **Custom parameters:** Both width and height must be within the range of 512px–2048px, and each must be a multiple of 32. Please note that the output of the GLM-Image model is an image URL. You need to download the image via the provided URL. ## Usage It can generate festival posters and commercial promotional images with complete composition, clear visual hierarchy, and prominent overall design sense, support the precise embedding and stable presentation of text content, and is suitable for various commercial scenarios such as brand communication and market promotion. More adept at creating popular science illustrations and schematic diagrams of principles that include complex logical relationships, process descriptions, and text annotations, capable of clearly and accurately conveying the knowledge structure and core information while ensuring the aesthetic appeal of the visuals. When generating multi-panel images such as e-commerce display images and story comics, GLM-Image can effectively maintain the consistency of the overall content style and the main subject's image, while significantly improving the accuracy of text generation in multiple locations to ensure content coherence and unified expression. Suitable for creating social media graphic content with relatively complex cover design and layout structure, it supports flexible typesetting and diverse expression, making the creative process more efficient and the presentation more rich and diverse. ## Resources * [API Documentation](/api-reference/image/generate-image): Learn how to call the API. ## Introducting GLM-Image GLM-image is an important exploration of ours in the technological paradigm of "cognitive generative" technology, and it is the first open-source industrial-grade discrete autoregressive image generation model. GLM-Image introduces a hybrid architecture of "autoregressive + diffusion decoder", integrating a 9B autoregressive model with a 7B DiT diffusion decoder. The former leverages the advantages of its language model base, focusing on enhancing semantic understanding of instructions and global composition of images; the latter, in conjunction with the text encoder of Glyph Encoder, focuses on restoring high-frequency details of images and text strokes, thereby improving the model's "forgetting characters while writing" phenomenon. ![Description](https://cdn.bigmodel.cn/markdown/1768305604344image.png?attname=image.png) *decoder formulation* Based on the above architectural innovation, GLM-Image has reached the open-source SOTA level in the authoritative leaderboard for text rendering. ![Description](https://cdn.bigmodel.cn/markdown/1768308056990image.png?attname=image.png) The CVTG-2K (Complex Visual Text Generation) leaderboard primarily evaluates the accuracy of models in simultaneously generating multiple text instances within an image. In terms of multi-region text generation accuracy, GLM-Image ranks first among open-source models, with a Word Accuracy score of 0.9116. On the NED (Normalized Edit Distance) metric, GLM-Image also leads with a score of 0.9557, indicating that the text it generates is highly consistent with the target text, with fewer typos and omissions. The LongText-Bench (Long Text Rendering) leaderboard evaluates the accuracy of models in rendering long texts and multi-line texts, covering 8 text-intensive scenarios such as signboards, posters, PPTs, dialog boxes, etc., and separately conducts bilingual tests in Chinese and English. GLM-Image ranked first among open-source models with scores of 0.9524 in English and 0.9788 in Chinese. ## Examples } > A Hasselblad film–style portrait set in soft indoor lighting. A long-haired woman stands within gentle shadows, while branches outside the window sway in the breeze, casting dappled light across her face and shoulders. Sheer fabric drapes softly in the background, creating a hazy, romantic atmosphere. Rim lighting outlines her relaxed, natural posture, and her slightly tousled hair lifts gently in the air, each strand catching subtle highlights from the sunlight. A close-up composition captures the moment she gazes deeply into the camera. Her skin appears clear and finely textured under high exposure and strong light–shadow contrast. The background is softly blurred, with bloom and diffusion blending into a dreamy glow. Film-like grain and delicate reflections add richness and realism, freezing a poetic instant of afternoon light and breeze. } > ![Description](https://cdn.bigmodel.cn/markdown/1768310904165image.png?attname=image.png) } >
Winter OOTD outfit cover in a retro collage style. The main subject is a female outfit (light blue loose sweater + yellow plaid inner shirt + burgundy skirt + pink-and-white patterned scarf + pink-toned handbag), surrounded by 2–3 smaller images of winter looks from the same series (such as a blue down jacket with black wide-leg pants, or a brown coat with navy trousers). The background blends a light gray grid wall with partial outdoor street scenery. Add large light-blue decorative text reading “OOTD,” handwritten-style annotations (such as “autumn/win” and “work/date”), and small embellishments like stars, hand-drawn arrows, a coffee cup icon, and a play button. The overall color palette is soft and warm, with layered elements arranged dynamically to create a lively, winter outfit inspiration vibe.
} > ![Description](https://cdn.bigmodel.cn/markdown/1768309855615image.png?attname=image.png)
} >
A dark, artistic Burberry brand campaign poster. The overall composition uses a low-saturation dark gray background, with a color palette centered on black and white (two horses) and Burberry’s iconic red-and-black plaid pattern (with white and light brown lines). All text and logos are white. The main subjects are two highly realistic horses, one pure white on the left and one pure black on the right, both with their eyes covered by Burberry’s classic red-and-black plaid silk scarves, rendered with naturally draping fabric textures. A white Burberry equestrian logo is placed in the top-right corner, while the bottom features the brand name “BURBERRY” in large white sans-serif type. Lighting is soft and restrained, highlighting the fine details of the horses’ coats and the plaid scarf textures. The overall style conveys a high-end, artistic fashion aesthetic with a mysterious atmosphere that aligns with the brand’s iconic identity.
} > ![Description](https://cdn.bigmodel.cn/markdown/1768309771376image.png?attname=image.png)
## Quick Start ```bash theme={null} curl --request POST \ --url https://api.z.ai/api/paas/v4/images/generations \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm-image", "prompt": "A cute little kitten sitting on a sunny windowsill, with the background of blue sky and white clouds.", "size": "1280x1280" }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Call Example** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") response = client.images.generations( model="glm-image", prompt="A cute little kitten sitting on a sunny windowsill, with the background of blue sky and white clouds.", ) print(response.data[0].url) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Call Example** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.core.Constants; import ai.z.openapi.service.image.CreateImageRequest; import ai.z.openapi.service.image.ImageResponse; public class GlmImageExample { public static void main(String[] args) { ZaiClient client = ZaiClient.builder().ofZAI().apiKey("YOUR_API_KEY").build(); // Create image generation request CreateImageRequest request = CreateImageRequest.builder() .model("glm-image") .prompt("A cute little kitten sitting on a sunny windowsill, with the background of blue sky and white clouds.") .size("1280x1280") .build(); ImageResponse response = client.images().createImage(request); System.out.println(response.getData()); } } ``` Please note that the output of the CogView-4 model is an image URL. You will need to download the image using this URL. # GLM-4-32B-0414-128K Source: https://docs.z.ai/guides/llm/glm-4-32b-0414-128k ## Overview GLM-4-32B-0414-128K is a highly cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. \$0.1 per million tokens Text Text 128K 16K ## Usage Supports real-time online search to retrieve the latest information, accurately parses complex queries on e-commerce product inquiries, financial service terms, education course Q\&A, and generates precise, professional answers based on enterprise knowledge bases. Accurately identifies and extracts key information and business fields from complex texts such as customer service tickets, automating analyses like sales pitch inspection and risk identification, strictly adhering to SOP processes and greatly reducing data processing time. Real-time cleansing of financial data, automated extraction of key insights, and detection of potential trends and correlations. Supports scenarios such as bid document analysis, financial report interpretation, and market trend monitoring. Based on intent decomposition and logical reasoning, accurately generates initial code frameworks or key functions in mainstream languages such as Python, Java, and JavaScript. Supports multi-turn contextual iterative development, intelligent comments, and rewriting functions to add clear annotations to code. Deep analysis of job descriptions and resumes using real-time job information, industry salary trends, and talent demand, providing precise talent matching recommendations for enterprises and analyzing employment trends and career development paths for job seekers. ## Resources * [API Documentation](/api-reference/llm/caht-completion): Learn how to call the API. ## Introducing GLM-4-32B-0414-128K GLM-4-32B-0414-128K was pre-trained on 15T of high-quality data, including abundant synthetic reasoning data to lay a solid foundation for subsequent reinforcement learning. In the post-training phase, besides aligning with human preferences in dialogue scenarios, we also applied techniques like rejection sampling and reinforcement learning to enhance instruction following, engineering code generation, and function calling, strengthening the model’s fundamental capabilities for intelligent tasks. The model performs comparably to much larger domestic and international mainstream models, with some benchmark indicators approaching or even exceeding models like GPT-4o and DeepSeek-V3-0324 (671B). ## Capability the search engine supports Jina AI, with a price of \$0.01 per use. ## Quick Start **Basic Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "model": "glm-4-32b-0414-128k", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." } ] }' ``` **Streaming Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "model": "glm-4-32b-0414-128k", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." } ], "stream": true }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Basic Call** ```python theme={null} from zai import ZaiClient # Initialize the client client = ZaiClient(api_key="your-api-key") # Create a chat completion request response = client.chat.completions.create( model="glm-4-32b-0414-128k", messages=[ {"role": "user", "content": "As a marketing expert, please create an attractive slogan for my product."} ] ) # Get the response print(response.choices[0].message.content) ``` **Streaming Call** ```python theme={null} from zai import ZaiClient # Initialize the client client = ZaiClient(api_key="your-api-key") # Create a streaming chat completion request stream = client.chat.completions.create( model="glm-4-32b-0414-128k", messages=[ {"role": "user", "content": "As a marketing expert, please create an attractive slogan for my product."} ], stream=True ) # Process streaming response for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Basic Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import java.util.Arrays; public class BasicChat { public static void main(String[] args) { // Initialize the client ZaiClient client = ZaiClient.builder().ofZAI() .apiKey("your-api-key") .build(); // Create a chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4-32b-0414-128k") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("As a marketing expert, please create an attractive slogan for my product.") .build() )) .build(); // Send request ChatCompletionResponse response = client.chat().createChatCompletion(request); // Get the response if (response.isSuccess()) { Object reply = response.getData().getChoices().get(0).getMessage().getContent(); System.out.println("AI Reply: " + reply); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Streaming Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.Delta; import java.util.Arrays; public class StreamingChat { public static void main(String[] args) { // Initialize the client ZaiClient client = ZaiClient.builder().ofZAI() .apiKey("your-api-key") .build(); // Create a streaming chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4-32b-0414-128k") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("As a marketing expert, please create an attractive slogan for my product.") .build() )) .stream(true) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { response.getFlowable().subscribe( // Process streaming message data data -> { if (data.getChoices() != null && !data.getChoices().isEmpty()) { Delta delta = data.getChoices().get(0).getDelta(); System.out.print(delta + "\n"); }}, // Process streaming response error error -> System.err.println("\nStream error: " + error.getMessage()), // Process streaming response completion event () -> System.out.println("\nStreaming response completed") ); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Install SDK** ```bash theme={null} # Install or upgrade to latest version pip install --upgrade 'openai>=1.0' ``` **Verify Installation** ```python theme={null} python -c "import openai; print(openai.__version__)" ``` **Usage Example** ```python theme={null} from openai import OpenAI client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/" ) completion = client.chat.completions.create( model="glm-4-32b-0414-128k", messages=[ {"role": "system", "content": "You are a smart and creative novelist"}, {"role": "user", "content": "Please write a short fairy tale story as a fairy tale master"} ] ) print(completion.choices[0].message.content) ``` # GLM-4.5 Source: https://docs.z.ai/guides/llm/glm-4.5 ## Overview GLM-4.5 and GLM-4.5-Air are Z.AI's models, purpose-built as foundational models for agent-oriented applications. Both leverage a Mixture-of-Experts (MoE) architecture. GLM-4.5 has a total parameter count of 355B with 32B active parameters per forward pass, while GLM-4.5-Air adopts a more streamlined design with 106B total parameters and 12B active parameters. Both models share a similar training pipeline: an initial pretraining phase on 15 trillion tokens of general-domain data, followed by targeted fine-tuning on datasets covering code, reasoning, and agent-specific tasks. The context length has been extended to 128k tokens, and reinforcement learning was applied to further enhance reasoning, coding, and agent performance. GLM-4.5 and GLM-4.5-Air are optimized for tool invocation, web browsing, software engineering, and front-end development. They can be integrated into code-centric agents such as Claude Code and Roo Code, and also support arbitrary agent applications through tool invocation APIs. Both models support hybrid reasoning modes, offering two execution modes: Thinking Mode for complex reasoning and tool usage, and Non-Thinking Mode for instant responses. These modes can be toggled via the `thinking.type`parameter (with `enabled` and `disabled` settings), and dynamic thinking is enabled by default. Text Text 128K 96K ## GLM-4.5 Serials
GLM

GLM-4.5

Our most powerful reasoning model, with 355 billion parameters

AIR

GLM-4.5-Air

Cost-Effective Lightweight Strong Performance

X

GLM-4.5-X

High Performance Strong Reasoning Ultra-Fast Response

AirX

GLM-4.5-AirX

Lightweight Strong Performance Ultra-Fast Response

FLASH

GLM-4.5-Flash

Free Strong Performance Excellent for Reasoning Coding & Agents

## Capability Enable deep thinking mode for more advanced reasoning and analysis Support real-time streaming responses to enhance user interaction experience Powerful tool invocation capabilities, enabling integration with various external toolsets Intelligent caching mechanism to optimize performance in long conversations Support for structured output formats like JSON, facilitating system integration ## Introducing GLM-4.5 ### Overview The first-principle measure of AGI lies in integrating more general intelligence capabilities without compromising existing functions. GLM-4.5 represents our first complete realization of this concept. It combines advanced reasoning, coding, and agent capabilities within a single model, achieving a significant technological breakthrough by natively fusing reasoning, coding, and agent abilities to meet the complex demands of agent-based applications. To comprehensively evaluate the model’s general intelligence, we selected 12 of the most representative benchmark suites, including MMLU Pro, AIME24, MATH 500, SciCode, GPQA, HLE, LiveCodeBench, SWE-Bench, Terminal-bench, TAU-Bench, BFCL v3, and BrowseComp. Based on the aggregated average scores, GLM-4.5 ranks second globally among all models, first among domestic models, and first among open-source models. Description Description ### **Higher Parameter Efficiency** GLM-4.5 has half the number of parameters of DeepSeek-R1 and one-third that of Kimi-K2, yet it outperforms them on multiple standard benchmark tests. This is attributed to the higher parameter efficiency of GLM architecture. Notably, GLM-4.5-Air, with 106 billion total parameters and 12 billion active parameters, achieves a significant breakthrough—surpassing models such as Gemini 2.5 Flash, Qwen3-235B, and Claude 4 Opus on reasoning benchmarks like Artificial Analysis, ranking among the top three domestic models in performance. On charts such as SWE-Bench Verified, the GLM-4.5 series lies on the Pareto frontier for performance-to-parameter ratio, demonstrating that at the same scale, the GLM-4.5 series delivers optimal performance. Description ### **Low Cost, High Speed** Beyond performance optimization, the GLM-4.5 series also achieves breakthroughs in cost and efficiency, resulting in pricing far lower than mainstream models: API call costs are as low as \$0.2 per million input tokens and \$1.1 per million output tokens. At the same time, the high-speed version demonstrates a generation speed exceeding 100 tokens per second in real-world tests, supporting low-latency and high-concurrency deployment scenarios—balancing cost-effectiveness with user interaction experience. Description ### **Real-World Evaluation** Real-world performance matters more than leaderboard rankings. To evaluate GLM-4.5’s effectiveness in practical Agent Coding scenarios, we integrated it into Claude Code and benchmarked it against Claude 4 Sonnet, Kimi-K2, and Qwen3-Coder. The evaluation consisted of 52 programming and development tasks spanning six major domains, executed in isolated container environments with multi-turn interaction tests. As shown in the results (below), GLM-4.5 demonstrates a strong competitive advantage over other open-source models, particularly in tool invocation reliability and task completion rate. While there remains room for improvement compared to Claude 4 Sonnet, GLM-4.5 delivers a largely comparable experience in most scenarios. To ensure transparency, we have released all [52 test problems along with full agent trajectories](https://huggingface.co/datasets/zai-org/CC-Bench-trajectories) for industry validation and reproducibility. Description ## Usage **Core Capability:** Coding Skills → Intelligent code generation | Real-time code completion | Automated bug fixing * Supports major languages including Python, JavaScript, and Java. * Generates well-structured, scalable, high-quality code based on natural language instructions. * Focuses on real-world development needs, avoiding templated or generic outputs. **Use Case:** Complete refactoring-level tasks within 1 hour; generate full product prototypes in 5 minutes. **Core Capabilities:** Agent Abilities → Autonomous task planning | Multi-tool orchestration | Dynamic environment interaction * Automatically decomposes complex tasks into clear, executable step-by-step plans. * Flexibly invokes development tools to complete coding, debugging, and validation in a one-stop workflow. * Dynamically adjusts strategies based on real-time feedback, quickly adapting to task changes and continuously optimizing execution paths. **Use Case:** In multi-module collaborative development projects, delivery cycles were shortened by 40%, and manpower investment was reduced by approximately 30%. **Core Capability:** PPT Creation → Clear logic | Complete content | Effective visual presentation * **Content Expansion by Theme:** Generates multi-slide PPT content from a single title or central concept. * **Logical Structure Organization:** Automatically segments content into introduction, body, and conclusion modules with well-organized semantic flow. * **Slide Layout Suggestions:** Works with template systems to recommend optimal presentation styles for the generated content. **Use Case:** Suitable for office automation platforms, AI presentation tools, and other productivity-focused products. **Core Capability:** Model reasoning power → Precise instruction parsing | Multi-turn logical reasoning | Domain knowledge integration * **Deep Natural Language Understanding** – Accurately interprets natural language instructions, extracts key intents, and converts them into executable tasks. * **Complex Multi-Turn Reasoning** – Supports multi-step logical reasoning chains, efficiently handling composite problems involving cross-step dependencies and multiple variables. * **Domain Knowledge Fusion** – Integrates domain-specific expertise with contextual information to enhance reasoning accuracy and output stability. **Use Case:** In complex business workflows, accuracy improves by *60%*, and reasoning efficiency improves by *70%*. **Core Capabilities:** Translation Proficiency → Strong contextual consistency | Accurate style preservation | Excellent handling of long passages * **Long, Complex Sentence Translation:** Maintains semantic coherence and structural accuracy, ideal for policy and academic materials. * **Style Retention and Adaptation:** Preserves the original tone or adapts to the target language’s commonly used expression style during translation. * **Support for Low-Resource Languages and Informal Contexts:** Preliminary coverage of 26 languages, with capabilities to translate social and informal texts. **Use Cases:** Suitable for publishing house translations, content localization for overseas markets, cross-border customer service, and social media platforms. **Core Capability:** Creative Writing → Natural expression | Rich emotion | Complete structure * Generates coherent literary texts with clear narrative flow based on given themes, characters, or worldviews. * Produces emotionally engaging copy tailored to audience profiles and product characteristics. * Supports short videos and new media scripts aligned with platforms like Douyin and Xiaohongshu, integrating emotion control and narrative pacing. **Use Case:** Ideal for deployment in content creation platforms, marketing toolchains, or AI writing assistants to enhance content production efficiency and personalization. **Core Capability:** Humanized Expression → Natural tone | Accurate emotional conveyance | Consistent character behavior * **Role-Playing Dialogue System:** Maintains consistent tone and behavior of the designated character across multi-turn conversations. * **Emotionally Rich Copywriting:** Delivers warm, relatable expressions suitable for building “humanized” brands or companion-style user products. * **Virtual Persona Content Creation:** Supports generation of content aligned with virtual streamers or character IPs, including social posts and fan interactions. **Use Case:** Ideal for virtual humans, social AI, and brand personification operations. ## Resources * [API Documentation](/api-reference/llm/chat-completion): Learn how to call the API. ## Quick Start ### Thinking Mode GLM-4.5 offers a “Deep Thinking Mode” that users can enable or disable by setting the `thinking.type` parameter. This parameter supports two values: `enabled` (enabled) and `disabled` (disabled). By default, dynamic thinking is enabled. * **Simple Tasks (No Thinking Required):** For straightforward requests that do not require complex reasoning (e.g., fact retrieval or classification), thinking is unnecessary. Examples include: * When was Z.AI founded? * Translate the sentence “I love you” into Chinese. * **Moderate Tasks (Default/Some Thinking Required):** Many common requests require stepwise processing or deeper understanding. The GLM-4.5 series can flexibly apply thinking capabilities to handle tasks such as: * Why does Jupiter have more moons than Saturn, despite Saturn being larger? * Compare the advantages and disadvantages of flying versus taking the high-speed train from Beijing to Shanghai. **Difficult Tasks (Maximum Thinking Capacity):** For truly complex challenges—such as solving advanced math problems, network-related questions, or coding issues—these tasks require the model to fully engage its reasoning and planning abilities, often involving many internal steps before arriving at an answer. Examples include: * Explain in detail how different experts in a Mixture-of-Experts (MoE) model collaborate. * Based on the recent week’s fluctuations of the Shanghai Composite Index and current political information, should I invest in a stock index ETF? Why? ### Samples Code **Basic Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-4.5", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "max_tokens": 4096, "temperature": 0.6 }' ``` **Streaming Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-4.5", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "stream": true, "max_tokens": 4096, "temperature": 0.6 }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Basic Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-4.5", messages=[ {"role": "user", "content": "As a marketing expert, please create an attractive slogan for my product."}, {"role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product."}, {"role": "user", "content": "Z.AI Open Platform"} ], thinking={ "type": "enabled", }, max_tokens=4096, temperature=0.6 ) # Get complete response print(response.choices[0].message) ``` **Streaming Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-4.5", messages=[ {"role": "user", "content": "As a marketing expert, please create an attractive slogan for my product."}, {"role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product."}, {"role": "user", "content": "Z.AI Open Platform"} ], thinking={ "type": "enabled", # Optional: "disabled" or "enabled", default is "enabled" }, stream=True, max_tokens=4096, temperature=0.6 ) # Stream response for chunk in response: if chunk.choices[0].delta.reasoning_content: print(chunk.choices[0].delta.reasoning_content, end='', flush=True) if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end='', flush=True) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Basic Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import java.util.Arrays; public class BasicChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI() .apiKey("your-api-key") .build(); // Create chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4.5") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content("Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build() )) .thinking(ChatThinking.builder().type("enabled").build()) .maxTokens(4096) .temperature(0.6f) .build(); // Send request ChatCompletionResponse response = client.chat().createChatCompletion(request); // Get response if (response.isSuccess()) { Object reply = response.getData().getChoices().get(0).getMessage(); System.out.println("AI Response: " + reply); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Streaming Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import ai.z.openapi.service.model.Delta; import java.util.Arrays; public class StreamingChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI() .apiKey("your-api-key") .build(); // Create streaming chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4.5") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content("Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build() )) .thinking(ChatThinking.builder().type("enabled").build()) .stream(true) // Enable streaming output .maxTokens(4096) .temperature(0.6f) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { response.getFlowable().subscribe( // Process streaming message data data -> { if (data.getChoices() != null && !data.getChoices().isEmpty()) { Delta delta = data.getChoices().get(0).getDelta(); System.out.print(delta + "\n"); }}, // Process streaming response error error -> System.err.println("\nStream error: " + error.getMessage()), // Process streaming response completion event () -> System.out.println("\nStreaming response completed") ); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Install SDK** ```bash theme={null} # Install or upgrade to latest version pip install --upgrade 'openai>=1.0' ``` **Verify Installation** ```python theme={null} python -c "import openai; print(openai.__version__)" ``` **Usage Example** ```python theme={null} from openai import OpenAI client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/" ) completion = client.chat.completions.create( model="glm-4.5", messages=[ {"role": "system", "content": "You are a smart and creative novelist"}, {"role": "user", "content": "Please write a short fairy tale story as a fairy tale master"} ] ) print(completion.choices[0].message.content) ``` # GLM-4.6 Source: https://docs.z.ai/guides/llm/glm-4.6 ## Overview GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications. Details are as follows: * **Longer context window**: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. * **Superior coding performance**: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. * **Advanced reasoning**: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. * **More capable agents**: GLM-4.6 exhibits stronger performance in tool use and search-based agents, and integrates more effectively within agent frameworks. * **Refined writing**: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios. Text Text 200K 128K ## Introducing GLM-4.6 ### 1. Comprehensive Evaluation In evaluations across 8 authoritative benchmarks for general model capabilities—including AIME 25, GPQA, LCB v6, HLE, and SWE-Bench Verified—GLM-4.6 achieves performance on par with Claude Sonnet 4/Claude Sonnet 4.6 on several leaderboards, solidifying its position as the top model developed in China. Description ### 2. Real-World Coding Evaluation To better test the model's capabilities in practical coding tasks, we conducted 74 real-world coding tests within the Claude Code environment. The results show that GLM-4.6 surpasses Claude Sonnet 4 and other domestic models in these real-world tests. Description In terms of average token consumption, GLM-4.6 is over 30% more efficient than GLM-4.5, achieving the lowest consumption rate among comparable models. Description To ensure transparency and credibility, Z.ai has publicly released all test questions and agent trajectories for verification and reproduction. (Link: [https://huggingface.co/datasets/zai-org/CC-Bench-trajectories](https://huggingface.co/datasets/zai-org/CC-Bench-trajectories)). ## Usage Supports mainstream languages including Python, JavaScript, and Java, delivering superior aesthetics and logical layout in frontend code. Natively handles diverse agent tasks with enhanced autonomous planning and tool invocation capabilities. Excels in task decomposition, cross-tool collaboration, and dynamic adjustments, enabling flexible adaptation to complex development or office workflows. Significantly enhances presentation quality in PowerPoint creation and office automation scenarios. Generates aesthetically advanced layouts with clear logical structures while preserving content integrity and expression accuracy, making it ideal for office automation systems and AI presentation tools. Translation quality for minor languages (French, Russian, Japanese, Korean) and informal contexts has been further optimized, making it particularly suitable for social media, e-commerce content, and short drama translations. It maintains semantic coherence and stylistic consistency in lengthy passages while achieving superior style adaptation and localized expression, meeting the diverse needs of global enterprises and cross-border services. Supports diverse content production including novels, scripts, and copywriting, achieving more natural expression through contextual expansion and emotional regulation. Maintains consistent tone and behavior across multi-turn conversations, ideal for virtual humans, social AI, and brand personification operations, making interactions warmer and more authentic. Enhances user intent understanding, tool retrieval, and result integration. Not only does it return more precise search results, but it also deeply synthesizes outcomes to support Deep Research scenarios, delivering more insightful answers to users. ## Resources * [API Documentation](/api-reference/llm/chat-completion): Learn how to call the API. ## Quick Start The following is a full sample code to help you onboard GLM-4.6 with ease. **Basic Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-4.6", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "max_tokens": 4096, "temperature": 1.0 }' ``` **Streaming Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-4.6", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "stream": true, "max_tokens": 4096, "temperature": 1.0 }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Basic Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-4.6", messages=[ {"role": "user", "content": "As a marketing expert, please create an attractive slogan for my product."}, {"role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product."}, {"role": "user", "content": "Z.AI Open Platform"} ], thinking={ "type": "enabled", }, max_tokens=4096, temperature=1.0 ) # Get complete response print(response.choices[0].message) ``` **Streaming Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-4.6", messages=[ {"role": "user", "content": "As a marketing expert, please create an attractive slogan for my product."}, {"role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product."}, {"role": "user", "content": "Z.AI Open Platform"} ], thinking={ "type": "enabled", # Optional: "disabled" or "enabled", default is "enabled" }, stream=True, max_tokens=4096, temperature=0.6 ) # Stream response for chunk in response: if chunk.choices[0].delta.reasoning_content: print(chunk.choices[0].delta.reasoning_content, end='', flush=True) if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end='', flush=True) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Basic Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import java.util.Arrays; public class BasicChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI() .apiKey("your-api-key") .build(); // Create chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4.6") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content("Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build() )) .thinking(ChatThinking.builder().type("enabled").build()) .maxTokens(4096) .temperature(1.0f) .build(); // Send request ChatCompletionResponse response = client.chat().createChatCompletion(request); // Get response if (response.isSuccess()) { Object reply = response.getData().getChoices().get(0).getMessage(); System.out.println("AI Response: " + reply); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Streaming Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import ai.z.openapi.service.model.Delta; import java.util.Arrays; public class StreamingChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI() .apiKey("your-api-key") .build(); // Create streaming chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4.6") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content("Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build() )) .thinking(ChatThinking.builder().type("enabled").build()) .stream(true) // Enable streaming output .maxTokens(4096) .temperature(1.0f) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { response.getFlowable().subscribe( // Process streaming message data data -> { if (data.getChoices() != null && !data.getChoices().isEmpty()) { Delta delta = data.getChoices().get(0).getDelta(); System.out.print(delta + "\n"); } }, // Process streaming response error error -> System.err.println("\nStream error: " + error.getMessage()), // Process streaming response completion event () -> System.out.println("\nStreaming response completed") ); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Install SDK** ```bash theme={null} # Install or upgrade to latest version pip install --upgrade 'openai>=1.0' ``` **Verify Installation** ```python theme={null} python -c "import openai; print(openai.__version__)" ``` **Usage Example** ```python theme={null} from openai import OpenAI client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/" ) completion = client.chat.completions.create( model="glm-4.6", messages=[ {"role": "system", "content": "You are a smart and creative novelist"}, {"role": "user", "content": "Please write a short fairy tale story as a fairy tale master"} ] ) print(completion.choices[0].message.content) ``` # GLM-4.7 Source: https://docs.z.ai/guides/llm/glm-4.7 Tired of limits? Get premium performance at a fraction of the cost, fully compatible with top coding tools like Claude Code and Cline. Starting from just \$10/month. [Try it now →](https://z.ai/subscribe?utm_campaign=Platform_Ops&_channel_track_key=DaprgHIc) ## Overview GLM-4.7 Series are Z.AI's models, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while delivering more natural conversational experiences and superior front-end aesthetics. Text Text 200K 128K Lightweight, High-Speed,and Affordable Text Text 200K 128K Lightweight, Completely Free Text Text 200K 128K ## Capability Offering multiple thinking modes for different scenarios Support real-time streaming responses to enhance user interaction experience Powerful tool invocation capabilities, enabling integration with various external toolsets Intelligent caching mechanism to optimize performance in long conversations Support for structured output formats like JSON, facilitating system integration ## Usage GLM-4.7 focuses on “task completion” rather than single-point code generation. It autonomously accomplishes requirement comprehension, solution decomposition, and multi-technology stack integration starting from target descriptions. In complex scenarios involving frontend-backend coordination, real-time interaction, and peripheral device calls, it directly generates structurally complete, executable code frameworks. This significantly reduces manual assembly and iterative debugging costs, making it ideal for complex demos, prototype validation, and automated development workflows. In scenarios requiring cameras, real-time input, and interactive controls, GLM-4.7 demonstrates superior system-level comprehension. It integrates visual recognition, logic control, and application code into unified solutions, enabling rapid construction of interactive applications like gesture control and real-time feedback. This accelerates the journey from concept to operational application. Significantly enhanced understanding of visual code and UI specifications. GLM-4.7 provides more aesthetically pleasing and consistent default solutions for layout structures, color harmony, and component styling, reducing time spent on repetitive “fine-tuning” of styles. It is well-suited for low-code platforms, AI frontend generation tools, and rapid prototyping scenarios. Maintains context and constraints more reliably during multi-turn conversations. Responds more directly to simple queries while continuously clarifying objectives and advancing resolution paths for complex issues. GLM-4.7 functions as a collaborative “problem-solving partner,” ideal for high-frequency collaboration scenarios like development support, solution discussions, and decision-making assistance. Delivers more nuanced, vividly descriptive prose that builds atmosphere through sensory details like scent, sound, and light. In role-playing and narrative creation, it maintains consistent adherence to world-building and character archetypes, advancing plots with natural tension. Ideal for interactive storytelling, IP content creation, and character-based applications. In office creation, GLM-4.7 demonstrates significantly enhanced layout consistency and aesthetic stability. It reliably adapts to mainstream aspect ratios like 16:9, minimizes template-like elements in typography hierarchy, white space, and color schemes, and produces near-ready-to-use results. This makes it ideal for AI presentation tools, enterprise office systems, and automated content generation scenarios. Enhanced capabilities in user intent understanding, information retrieval, and result integration. For complex queries and research tasks, GLM-4.7 not only returns information but also performs structured organization and cross-source consolidation. Through multi-round interactions, it progressively narrows in on core conclusions, making it suitable for in-depth research and decision-support scenarios. ## Introducing GLM-4.7 GLM-4.7 achieves significant breakthroughs across three dimensions: programming, reasoning, and agent capabilities: * **Enhanced Programming Capabilities**: Substantially improves model performance in multi-language coding and terminal agent applications; GLM-4.7 now implements a “think before acting” mechanism within programming frameworks like Claude Code, Kilo Code, TRAE, Cline, and Roo Code, delivering more stable performance on complex tasks. * **Enhanced Frontend Aesthetics**: GLM-4.7 shows marked progress in frontend generation quality, producing visually superior webpages, PPTs, and posters. * **Enhanced Tool Invocation Capabilities**: GLM-4.7 demonstrates improved tool invocation skills, scoring 67 points on the BrowseComp web task evaluation and achieving an open-source SOTA of 84.7 points on the τ²-Bench interactive tool invocation benchmark, surpassing Claude Sonnet 4.5 * **Enhanced reasoning capabilities**: Significantly improved mathematical and reasoning skills, achieving 42.8% on the HLE (“Human Last Exam”) benchmark—a 41% increase over GLM-4.6 and surpassing GPT-5.1 * **Enhanced General Capabilities**: GLM-4.7 delivers more concise, intelligent, and empathetic conversations, with more eloquent and immersive writing and role-playing Description *`Code Arena: A professional coding evaluation system with millions of global users participating in blind tests. GLM-4.7 ranks first among open-source models and domestic models, outperforming GPT-5.2`* In mainstream benchmark performance, GLM-4.7's coding capabilities align with Claude Sonnet 4.5: Achieved top open-source ranking on SWE-bench-Verified; Reached an open-source SOTA score of 84.9 on LiveCodeBench V6, surpassing Claude Sonnet 4.5; Achieved 73.8% on SWE-bench Verified (a 5.8% improvement over GLM-4.6), 66.7% on SWE-bench Multilingual (a 12.9% improvement), and 41% on Terminal Bench 2.0 (a 16.5% improvement). ![Description](https://cdn.bigmodel.cn/markdown/1766459089466image.png?attname=image.png) In mainstream benchmarks like SWE-bench Verified and τ²-Bench, GLM-4.7-Flash achieves open-source SOTA scores among models of comparable size. Additionally, compared to similarly sized models, GLM-4.7-Flash demonstrates superior frontend and backend development capabilities. In internal programming tests, GLM-4.7-Flash excels at both frontend and backend tasks. Beyond programming scenarios, we also recommend experiencing GLM-4.7-Flash in general-purpose applications such as Chinese writing, translation, long-form text processing, and emotional/role-playing interactions. ![Description](https://cdn.bigmodel.cn/markdown/176886970126120260120-084119.jpeg?attname=20260120-084119.jpeg) In the Claude Code environment, we tested 100 real programming tasks covering core capabilities like frontend, backend, and instruction following. Results show GLM-4.7 demonstrates significant improvements over GLM-4.6 in both stability and deliverability. Description With enhanced programming capabilities, developers can more naturally organize their development workflow around “task delivery,” forming an end-to-end closed loop from requirement understanding to implementation. GLM-4.7 further enhances the interleaved reasoning capabilities introduced in GLM-4.5 by introducing retained reasoning and round-based reasoning, making complex task execution more stable and controllable. * Interleaved Reasoning: Performs reasoning before each response/tool invocation, improving compliance with complex instructions and code generation quality. * Retention-Based Reasoning: Automatically preserves reasoning blocks across multi-turn dialogues, improving cache hit rates and reducing computational costs—ideal for long-term complex tasks. * Round-Level Reasoning: Enables round-based control of reasoning overhead within a single session—disable reasoning for simple tasks to reduce latency, or enable it for complex tasks to boost accuracy and stability. [\_Related Documentation: https://docs.z.ai/guides/capabilities/thinking-mode](https://docs.z.ai/guides/capabilities/thinking-mode) GLM-4.7 demonstrates superior task decomposition and technology stack integration in complex tasks, delivering **complete, executable code** in a single step while clearly identifying critical dependencies and execution steps, significantly reducing manual debugging costs. GLM-4.7 enhances its comprehension of visual code. In frontend design, it better interprets UI design specifications, offering more aesthetically pleasing default solutions for layout structures, color harmony, and component styling. This reduces the time developers spend on style “fine-tuning.” ## Resources * [API Documentation](/api-reference/llm/chat-completion): Learn how to call the API. ## Quick Start The following is a full sample code to help you onboard GLM-4.7 with ease. **Basic Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-4.7", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "max_tokens": 4096, "temperature": 1.0 }' ``` **Streaming Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-4.7", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "stream": true, "max_tokens": 4096, "temperature": 1.0 }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Basic Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-4.7", messages=[ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product.", }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product.", }, {"role": "user", "content": "Z.AI Open Platform"}, ], thinking={ "type": "enabled", }, max_tokens=4096, temperature=1.0, ) # Get complete response print(response.choices[0].message) ``` **Streaming Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-4.7", messages=[ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product.", }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product.", }, {"role": "user", "content": "Z.AI Open Platform"}, ], thinking={ "type": "enabled", # Optional: "disabled" or "enabled", default is "enabled" }, stream=True, max_tokens=4096, temperature=0.6, ) # Stream response for chunk in response: if chunk.choices[0].delta.reasoning_content: print(chunk.choices[0].delta.reasoning_content, end="", flush=True) if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Basic Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import java.util.Arrays; public class BasicChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI().apiKey("your-api-key").build(); // Create chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4.7") .messages( Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content( "As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content( "Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build())) .thinking(ChatThinking.builder().type("enabled").build()) .maxTokens(4096) .temperature(1.0f) .build(); // Send request ChatCompletionResponse response = client.chat().createChatCompletion(request); // Get response if (response.isSuccess()) { Object reply = response.getData().getChoices().get(0).getMessage(); System.out.println("AI Response: " + reply); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Streaming Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import ai.z.openapi.service.model.Delta; import java.util.Arrays; public class StreamingChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI().apiKey("your-api-key").build(); // Create streaming chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4.7") .messages( Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content( "As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content( "Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build())) .thinking(ChatThinking.builder().type("enabled").build()) .stream(true) // Enable streaming output .maxTokens(4096) .temperature(1.0f) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { response.getFlowable() .subscribe( // Process streaming message data data -> { if (data.getChoices() != null && !data.getChoices().isEmpty()) { Delta delta = data.getChoices().get(0).getDelta(); System.out.print(delta + "\n"); } }, // Process streaming response error error -> System.err.println("\nStream error: " + error.getMessage()), // Process streaming response completion event () -> System.out.println("\nStreaming response completed")); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Install SDK** ```bash theme={null} # Install or upgrade to latest version pip install --upgrade 'openai>=1.0' ``` **Verify Installation** ```python theme={null} python -c "import openai; print(openai.__version__)" ``` **Usage Example** ```python theme={null} from openai import OpenAI client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/", ) completion = client.chat.completions.create( model="glm-4.7", messages=[ {"role": "system", "content": "You are a smart and creative novelist"}, { "role": "user", "content": "Please write a short fairy tale story as a fairy tale master", }, ], ) print(completion.choices[0].message.content) ``` # GLM-5 Source: https://docs.z.ai/guides/llm/glm-5 Tired of limits? GLM-5 access is currently available for GLM Coding Plan **Pro and Max** — monthly access to world-class models, compatible with top coding tools like Claude Code and Open Code. [Try it now →](https://z.ai/subscribe?utm_campaign=Platform_Ops&_channel_track_key=DaprgHIc) ## Overview **GLM-5** is Z.AI's new-generation foundation model, designed for **Agentic Engineering**, capable of providing reliable productivity in complex system engineering and long-range Agent tasks. In terms of Coding and Agent capabilities, GLM-5 has achieved state-of-the-art (SOTA) performance in open source, with its usability in real programming scenarios approaching that of Claude Opus 4.5. Foundation Model Text Text 200K 128K ## Capability Offering multiple thinking modes for different scenarios Support real-time streaming responses to enhance user interaction experience Powerful tool invocation capabilities, enabling integration with various external toolsets Intelligent caching mechanism to optimize performance in long conversations Support for structured output formats like JSON, facilitating system integration ## Usage It can automatically generate runnable code based on natural language, covering development processes such as front-end, back-end, and data processing, significantly shortening the iteration cycle from requirements to products. Capable of autonomous decision-making and tool invocation, it can complete the full-process intelligent agent tasks from understanding, planning to execution and self-check under ambiguous and complex objectives, achieving "input from a single sentence to complete deliverables". With strong long-range planning and memory capabilities, it can stably complete complex work tasks that span multiple stages, involve multiple steps, and have strong logical connections, ensuring instruction compliance and goal consistency. It can accurately understand and consistently maintain character settings, remain consistent in narrative, emotion, and logic, and achieve a natural, evolvable, and highly immersive role-playing experience. Significantly enhanced in long text consistency and complex character development, it can stably output high-quality script content that can directly enter the production process. Capable of accurately converting formal texts into professional translations that conform to the expression habits of the target language, achieving full alignment of semantics, terminology, and expression. It can accurately extract key fields and logical relationships from complex texts such as contracts, announcements, and financial reports, stably convert the original content into analyzable Structured Data, and contribute to enterprise data governance and automation. It can accurately identify key information in complex texts such as customer service tickets and automatically complete quality inspection and risk identification, significantly improving Operational Efficiency. ## Introducing GLM-5 The brand-new GLM-5 foundation lays a solid groundwork for the capability evolution from "writing code" to "building entire projects": * **Expanded Parameter Scale**: Increased from 355B (32B activated) to 744B (40B activated), with pre-training data upgraded from 23T to 28.5T. Larger-scale pre-training computing power has significantly improved the model’s general intelligence. * **Asynchronous Reinforcement Learning**: A new "Slime" framework has been developed to support larger model scales and more complex reinforcement learning tasks, enhancing the efficiency of post-training workflows. An asynchronous agent reinforcement learning algorithm is proposed, enabling the model to continuously learn from long-range interactions and fully unlock the potential of pre-trained models. * **Sparse Attention Mechanism**: DeepSeek Sparse Attention is integrated for the first time, maintaining lossless long-text performance while drastically reducing model deployment costs and improving Token Efficiency. GLM-5 achieves performance alignment with Claude Opus 4.5 in software engineering tasks, **reaching the highest scores among open-weight models across widely recognized industry benchmarks**. On SWE-bench Verified and Terminal Bench 2.0, GLM-5 records leading open-model scores of 77.8 and 56.2, respectively — surpassing Gemini 3.0 Pro in overall performance. ![Description](https://cdn.bigmodel.cn/markdown/177083028071620260212-011355.jpeg?attname=20260212-011355.jpeg) In internal evaluations aligned with the Claude Code task distribution, GLM-5 demonstrates substantial gains over GLM-4.7 across frontend development, backend systems engineering, and long-horizon execution tasks. The model can autonomously perform agentic long-range planning, backend refactoring, and deep debugging with minimal human intervention—delivering a development experience that approaches Opus 4.5 in both reliability and execution depth. ![Description](https://cdn.bigmodel.cn/markdown/177082439894420260211-233935.jpeg?attname=20260211-233935.jpeg) GLM-5 achieves state-of-the-art performance among open-weight models in agentic capability, ranking first across multiple authoritative benchmarks. On BrowseComp (web-scale retrieval and information synthesis), MCP-Atlas (tool invocation and multi-step task execution), and τ²-Bench (complex multi-tool planning and orchestration), GLM-5 delivers top open-model results across the board. ![Description](https://cdn.bigmodel.cn/markdown/177083065584320260212-012319.jpeg?attname=20260212-012319.jpeg) These capabilities define the core of Agentic Engineering. A capable agent must go beyond generating code or completing isolated tasks — it must sustain goal alignment over long horizons, manage intermediate resources, coordinate tool usage, and resolve multi-step dependencies without losing coherence. ## Resources * [API Documentation](/api-reference/llm/chat-completion): Learn how to call the API. ## Quick Start The following is a full sample code to help you onboard GLM-5 with ease. **Basic Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-5", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "max_tokens": 4096, "temperature": 1.0 }' ``` **Streaming Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-5", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "stream": true, "max_tokens": 4096, "temperature": 1.0 }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Basic Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-5", messages=[ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product.", }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product.", }, {"role": "user", "content": "Z.AI Open Platform"}, ], thinking={ "type": "enabled", }, max_tokens=4096, temperature=1.0, ) # Get complete response print(response.choices[0].message) ``` **Streaming Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-5", messages=[ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product.", }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product.", }, {"role": "user", "content": "Z.AI Open Platform"}, ], thinking={ "type": "enabled", # Optional: "disabled" or "enabled", default is "enabled" }, stream=True, max_tokens=4096, temperature=0.6, ) # Stream response for chunk in response: if chunk.choices[0].delta.reasoning_content: print(chunk.choices[0].delta.reasoning_content, end="", flush=True) if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Basic Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import java.util.Arrays; public class BasicChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI().apiKey("your-api-key").build(); // Create chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5") .messages( Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content( "As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content( "Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build())) .thinking(ChatThinking.builder().type("enabled").build()) .maxTokens(4096) .temperature(1.0f) .build(); // Send request ChatCompletionResponse response = client.chat().createChatCompletion(request); // Get response if (response.isSuccess()) { Object reply = response.getData().getChoices().get(0).getMessage(); System.out.println("AI Response: " + reply); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Streaming Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import ai.z.openapi.service.model.Delta; import java.util.Arrays; public class StreamingChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI().apiKey("your-api-key").build(); // Create streaming chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5") .messages( Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content( "As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content( "Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build())) .thinking(ChatThinking.builder().type("enabled").build()) .stream(true) // Enable streaming output .maxTokens(4096) .temperature(1.0f) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { response.getFlowable() .subscribe( // Process streaming message data data -> { if (data.getChoices() != null && !data.getChoices().isEmpty()) { Delta delta = data.getChoices().get(0).getDelta(); System.out.print(delta + "\n"); } }, // Process streaming response error error -> System.err.println("\nStream error: " + error.getMessage()), // Process streaming response completion event () -> System.out.println("\nStreaming response completed")); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Install SDK** ```bash theme={null} # Install or upgrade to latest version pip install --upgrade 'openai>=1.0' ``` **Verify Installation** ```python theme={null} python -c "import openai; print(openai.__version__)" ``` **Usage Example** ```python theme={null} from openai import OpenAI client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/", ) completion = client.chat.completions.create( model="glm-5", messages=[ {"role": "system", "content": "You are a smart and creative novelist"}, { "role": "user", "content": "Please write a short fairy tale story as a fairy tale master", }, ], ) print(completion.choices[0].message.content) ``` # GLM-5-Turbo Source: https://docs.z.ai/guides/llm/glm-5-turbo ## Overview **GLM-5-Turbo is a foundation model deeply optimized for the OpenClaw scenario.** It has been specifically optimized for the core requirements of OpenClaw tasks since the training phase, enhancing key capabilities such as tool invocation, command following, timed and persistent tasks, and long-chain execution. ClawBench Enhanced Model Text Text 200K 128K ## Capability Offering multiple thinking modes for different scenarios Support real-time streaming responses to enhance user interaction experience Powerful tool invocation capabilities, enabling integration with various external toolsets Intelligent caching mechanism to optimize performance in long conversations Support for structured output formats like JSON, facilitating system integration Flexibly integrate external MCP tools and data sources to expand use cases ## Introducing GLM-5-Turbo From training data construction to the design of optimization objectives, we have systematically constructed a variety of OpenClaw tasks scenarios based on real-world agent workflows, ensuring that the model is truly capable of executing complex, dynamic, and long-chain tasks. We have significantly enhanced the following core capabilities: * **Tool Calling—Precise Invocation, No Failures**: GLM-5-Turbo has strengthened its ability to invoke external tools and various skills, ensuring greater stability and reliability in multi-step tasks, thereby enabling OpenClaw tasks to transition from dialogue to execution. * **Instruction Following—Enhanced Decomposition of Complex Instructions**: The model demonstrates stronger comprehension and decomposition capabilities for complex, multi-layered, and long-chain instructions. It can accurately identify objectives, plan steps, and support collaborative task division among multiple agents. * **Scheduled and Persistent Tasks — Better Understanding of Time Dimensions, Uninterrupted Long Tasks**: Significantly optimized for scenarios involving scheduled triggers, continuous execution, and long-running tasks. It better understands time-related requirements and maintains execution continuity during complex, long-running tasks. * **High-Throughput Long Chains — Faster and More Stable Execution**: For Lobster tasks involving high data throughput and long logical chains, GLM-5-Turbo further enhances execution efficiency and response stability, making it better suited for integration into real-world business workflows. With the growing adoption of **OpenClaw**, evaluating model performance in Openclaw workflows has become a key focus across the industry. Based on extensive analysis of real OpenClaw use cases, we introduce **ZClawBench**, an end-to-end benchmark designed specifically for agent tasks in the OpenClaw ecosystem. Current OpenClaw workloads span a wide range of task types, including environment setup, software development, information retrieval, data analysis, and content creation. The user base has also expanded beyond early developer adopters to include productivity users, financial professionals, operations engineers, content creators, and research analysts. Meanwhile, the usage of **Skills** has increased rapidly—from **26% to 45% in a short period of time**—highlighting a clear shift toward a more modular and skill-driven agent ecosystem. Benchmark results show that **GLM-5-Turbo** delivers substantial improvements over **GLM-5** in OpenClaw scenarios, outperforming several leading models across multiple key task categories. ![Description](https://cdn.bigmodel.cn/markdown/1773654632089img_v3_02vr_d484b822-00d0-4ac0-8b7d-c02a4bb3213g.png?attname=img_v3_02vr_d484b822-00d0-4ac0-8b7d-c02a4bb3213g.png) The **ZClawBench** dataset and full evaluation trajectories are now publicly available. We welcome the community to validate, reproduce, and further improve the benchmark. ## Resources * [API Documentation](/api-reference/llm/chat-completion): Learn how to call the API. * [OpenClaw Guide](/devpack/tool/openclaw#switching-to-glm-5-turbo-model): Learn how to integrate with OpenClaw. ## Quick Start The following is a full sample code to help you onboard GLM-5-Turbo with ease. **Basic Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-5-turbo", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "max_tokens": 4096, "temperature": 1.0 }' ``` **Streaming Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-5-turbo", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "stream": true, "max_tokens": 4096, "temperature": 1.0 }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Basic Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-5-turbo", messages=[ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product.", }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product.", }, {"role": "user", "content": "Z.AI Open Platform"}, ], thinking={ "type": "enabled", }, max_tokens=4096, temperature=1.0, ) # Get complete response print(response.choices[0].message) ``` **Streaming Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-5-turbo", messages=[ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product.", }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product.", }, {"role": "user", "content": "Z.AI Open Platform"}, ], thinking={ "type": "enabled", # Optional: "disabled" or "enabled", default is "enabled" }, stream=True, max_tokens=4096, temperature=0.6, ) # Stream response for chunk in response: if chunk.choices[0].delta.reasoning_content: print(chunk.choices[0].delta.reasoning_content, end="", flush=True) if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Basic Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import java.util.Arrays; public class BasicChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI().apiKey("your-api-key").build(); // Create chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5-turbo") .messages( Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content( "As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content( "Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build())) .thinking(ChatThinking.builder().type("enabled").build()) .maxTokens(4096) .temperature(1.0f) .build(); // Send request ChatCompletionResponse response = client.chat().createChatCompletion(request); // Get response if (response.isSuccess()) { Object reply = response.getData().getChoices().get(0).getMessage(); System.out.println("AI Response: " + reply); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Streaming Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import ai.z.openapi.service.model.Delta; import java.util.Arrays; public class StreamingChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI().apiKey("your-api-key").build(); // Create streaming chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5-turbo") .messages( Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content( "As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content( "Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build())) .thinking(ChatThinking.builder().type("enabled").build()) .stream(true) // Enable streaming output .maxTokens(4096) .temperature(1.0f) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { response.getFlowable() .subscribe( // Process streaming message data data -> { if (data.getChoices() != null && !data.getChoices().isEmpty()) { Delta delta = data.getChoices().get(0).getDelta(); System.out.print(delta + "\n"); } }, // Process streaming response error error -> System.err.println("\nStream error: " + error.getMessage()), // Process streaming response completion event () -> System.out.println("\nStreaming response completed")); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Install SDK** ```bash theme={null} # Install or upgrade to latest version pip install --upgrade 'openai>=1.0' ``` **Verify Installation** ```python theme={null} python -c "import openai; print(openai.__version__)" ``` **Usage Example** ```python theme={null} from openai import OpenAI client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/", ) completion = client.chat.completions.create( model="glm-5-turbo", messages=[ {"role": "system", "content": "You are a smart and creative novelist"}, { "role": "user", "content": "Please write a short fairy tale story as a fairy tale master", }, ], ) print(completion.choices[0].message.content) ``` # GLM-5.1 Source: https://docs.z.ai/guides/llm/glm-5.1 ## Overview **GLM-5.1** is Z.AI’s latest flagship model, designed for **long-horizon tasks**. It can work continuously and autonomously on a single task for up to 8 hours, completing the full loop from planning and execution to iterative optimization and delivering production-grade results.

In both general capability and coding performance, GLM-5.1 is overall aligned with Claude Opus 4.6. It demonstrates stronger sustained execution in **long-horizon autonomous tasks, complex engineering optimization, and real-world development workflows**, making it an ideal foundation for building autonomous agents and long-horizon coding agents. Flagship Foundation Model Text Text 200K 128K ## Capability Offering multiple thinking modes for different scenarios Support real-time streaming responses to enhance user interaction experience Powerful tool invocation capabilities, enabling integration with various external toolsets Intelligent caching mechanism to optimize performance in long conversations Support for structured output formats like JSON, facilitating system integration Flexibly integrate external MCP tools and data sources to expand application scenarios ## Usage Further optimized for agentic coding workflows such as Claude Code and OpenClaw, GLM-5.1 offers stronger long-horizon planning, stepwise execution, process adjustment, and result delivery. It performs significantly better on long-running development tasks and complex coding problems, making it well suited for real-world engineering work with multiple stages and strong interdependencies. More robust in open-ended Q\&A, complex instruction following, and multi-turn interactions, with richer responses, more complete content, stronger instruction adherence, and better long-context understanding. It is well suited for high-quality everyday assistance and complex information workflows. Further improved in literary expression, plot development, character portrayal, and style control, making it suitable for fiction excerpts, story concepts, and copywriting tasks that require strong expressiveness and consistency. Well suited for website generation, interactive pages, and front-end prototyping. Outputs show less templated structure, more diverse visual expression, and higher overall task completion quality, enabling a faster path from requirements to usable deliverables. Broadly improved across PowerPoint, Word, PDF, and Excel tasks, with stronger capabilities in complex content organization, layout design, and structured output. Default visual quality and overall polish are significantly improved, making it suitable for high-intensity production scenarios such as long-form documents, reports, teaching materials, and research papers. ## Introducing GLM-5.1 GLM-5.1 ranks among the world’s top-tier models in both overall capability and coding performance, with overall performance aligned with Claude Opus 4.6 and leading results across multiple key benchmarks. ![Description](https://cdn.bigmodel.cn/markdown/1775571965455img_v3_0210h_e53bcf0a-11aa-481c-aa2b-896e1b902eeg.png?attname=img_v3_0210h_e53bcf0a-11aa-481c-aa2b-896e1b902eeg.png) On SWE-Bench Pro, GLM-5.1 achieves a score of **58.4**, outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro, setting a new state-of-the-art result. At the same time, across 12 representative benchmarks covering reasoning, coding, agents, tool use, and browsing, GLM-5.1 demonstrates a broad and well-balanced capability profile. ![Description](https://cdn.bigmodel.cn/markdown/1775572152820img_v3_0210h_7e69658d-d027-40da-b7e5-da12c080e41g.png?attname=img_v3_0210h_7e69658d-d027-40da-b7e5-da12c080e41g.png) This shows that GLM-5.1 is not a single-metric improvement. Instead, it advances simultaneously across **general intelligence, real-world coding, and complex task execution**, making it a stronger foundation for general-purpose agent systems and engineering production scenarios. GLM-5.1 shows especially strong gains on long-horizon tasks, with major improvements in **sustained execution, closed-loop optimization, and engineering delivery** under complex objectives. Compared with models primarily designed for minute-level interactions, GLM-5.1 can work autonomously on a single task for up to 8 hours, completing the full process from planning and execution to testing, fixing, and delivery. Under the same evaluation standard, GLM-5.1 is one of the few models capable of 8-hour sustained execution, and the first Chinese model to reach this level. The way we evaluate model capability is shifting from “how smart it is in a single turn” to “how long it can work reliably on a long-horizon task, and what it can actually deliver.” This capability is not simply about having a longer context window. It requires the model to **maintain goal alignment over extended execution, reducing strategy drift, error accumulation, and ineffective trial and error**, and enabling truly autonomous execution for complex engineering tasks. One of GLM-5.1’s key breakthroughs is its ability to form an autonomous “**experiment–analyze–optimize**” loop in long-horizon tasks, rather than stopping at one-shot code generation. The model can proactively run benchmarks, identify bottlenecks, adjust strategies, and continuously improve results through iterative refinement. In representative cases, GLM-5.1 can build a complete Linux desktop system from scratch within 8 hours. It can autonomously carry out 655 iterations, completing the entire optimization pipeline and boosting vector database query throughput to 6.9× that of the initial production version. On the KernelBench Level 3 optimization benchmark, it performs thousands of tool-invocation-driven optimizations on real machine learning workloads, achieving a 3.6× geometric mean speedup—significantly surpassing the 1.49× achieved by torch.compile in max-autotune mode. These results show that GLM-5.1 is already capable of autonomous exploration, continuous improvement, and stable delivery in complex engineering environments, enabling it to take on higher-value tasks such as system building, performance optimization, and long-horizon coding agents. ## Resources * [API Documentation](/api-reference/llm/chat-completion): Learn how to call the API. ## Quick Start The following is a full sample code to help you onboard GLM-5.1 with ease. **Basic Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-5.1", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "max_tokens": 4096, "temperature": 1.0 }' ``` **Streaming Call** ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer your-api-key" \ -d '{ "model": "glm-5.1", "messages": [ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product." }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product." }, { "role": "user", "content": "Z.AI Open Platform" } ], "thinking": { "type": "enabled" }, "stream": true, "max_tokens": 4096, "temperature": 1.0 }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Basic Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-5.1", messages=[ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product.", }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product.", }, { "role": "user", "content": "Z.AI Open Platform" }, ], thinking={ "type": "enabled", }, max_tokens=4096, temperature=1.0, ) # Get complete response print(response.choices[0].message) ``` **Streaming Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Your API Key response = client.chat.completions.create( model="glm-5.1", messages=[ { "role": "user", "content": "As a marketing expert, please create an attractive slogan for my product.", }, { "role": "assistant", "content": "Sure, to craft a compelling slogan, please tell me more about your product.", }, { "role": "user", "content": "Z.AI Open Platform" }, ], thinking={ "type": "enabled", # Optional: "disabled" or "enabled", default is "enabled" }, stream=True, max_tokens=4096, temperature=0.6, ) # Stream response for chunk in response: if chunk.choices[0].delta.reasoning_content: print(chunk.choices[0].delta.reasoning_content, end="", flush=True) if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Basic Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import java.util.Arrays; public class BasicChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI().apiKey("your-api-key").build(); // Create chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5.1") .messages( Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content( "As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content( "Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build())) .thinking(ChatThinking.builder().type("enabled").build()) .maxTokens(4096) .temperature(1.0f) .build(); // Send request ChatCompletionResponse response = client.chat().createChatCompletion(request); // Get response if (response.isSuccess()) { Object reply = response.getData().getChoices().get(0).getMessage(); System.out.println("AI Response: " + reply); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Streaming Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.ChatCompletionCreateParams; import ai.z.openapi.service.model.ChatCompletionResponse; import ai.z.openapi.service.model.ChatMessage; import ai.z.openapi.service.model.ChatMessageRole; import ai.z.openapi.service.model.ChatThinking; import ai.z.openapi.service.model.Delta; import java.util.Arrays; public class StreamingChat { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI().apiKey("your-api-key").build(); // Create streaming chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5.1") .messages( Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content( "As a marketing expert, please create an attractive slogan for my product.") .build(), ChatMessage.builder() .role(ChatMessageRole.ASSISTANT.value()) .content( "Sure, to craft a compelling slogan, please tell me more about your product.") .build(), ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Z.AI Open Platform") .build())) .thinking(ChatThinking.builder().type("enabled").build()) .stream(true) // Enable streaming output .maxTokens(4096) .temperature(1.0f) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { response.getFlowable() .subscribe( // Process streaming message data data -> { if (data.getChoices() != null && !data.getChoices().isEmpty()) { Delta delta = data.getChoices().get(0).getDelta(); System.out.print(delta + "\n"); } }, // Process streaming response error error -> System.err.println("\nStream error: " + error.getMessage()), // Process streaming response completion event () -> System.out.println("\nStreaming response completed")); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Install SDK** ```bash theme={null} # Install or upgrade to latest version pip install --upgrade 'openai>=1.0' ``` **Verify Installation** ```python theme={null} python -c "import openai; print(openai.__version__)" ``` **Usage Example** ```python theme={null} from openai import OpenAI client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/", ) completion = client.chat.completions.create( model="glm-5.1", messages=[ {"role": "system", "content": "You are a smart and creative novelist"}, { "role": "user", "content": "Please write a short fairy tale story as a fairy tale master", }, ], ) print(completion.choices[0].message.content) ``` # Core Parameters Source: https://docs.z.ai/guides/overview/concept-param When interacting with models, you can control the model's output by adjusting different parameters to meet the needs of various scenarios. Understanding these core parameters will help you better utilize the model's capabilities. ## Quick Reference | Parameter | Type | Default Value | Description | | :-------------------------- | :------ | :-------------------- | :-------------------------------------------------------------------------------------------- | | [do\_sample](#do_sample) | Boolean | `true` | Whether to sample the output to increase diversity. | | [temperature](#temperature) | Float | (Model dependent) | Controls the randomness of output, higher values are more random. | | [top\_p](#top_p) | Float | (Model dependent) | Controls diversity through nucleus sampling, recommended to use either this or `temperature`. | | [max\_tokens](#max_tokens) | Integer | (Model dependent) | Limits the maximum number of tokens generated in a single call. | | [stream](#stream) | Boolean | `false` | Whether to return responses in streaming mode. | | [thinking](#thinking) | Object | `{"type": "enabled"}` | Whether to enable chain-of-thought deep thinking, only supported by `GLM-4.5` and above. | *** ## Parameter Details ### do\_sample `do_sample` is a boolean value (`true` or `false`) that determines whether to sample the model's output. * `true` (default): Performs random sampling based on the probability distribution of each token, increasing text diversity and creativity. Suitable for content creation, dialogue, and other scenarios. * `false`: Uses a greedy strategy, always selecting the token with the highest probability. Provides high deterministic output, suitable for scenarios requiring precise, factual answers. Best Practices: * Set to `false` when you need reproducible, deterministic output. * Set to `true` when you want the model to generate more diverse and interesting content, and use it in combination with `temperature` or `top_p`. ### temperature The `temperature` parameter controls the randomness of the model's output. * Lower values (e.g., 0.2): Make the probability distribution more "sharp", resulting in more deterministic and conservative output. * Higher values (e.g., 0.8): Make the probability distribution more "flat", resulting in more random and diverse output. Best Practices: * For scenarios requiring rigor and factual accuracy (such as knowledge Q\&A), it's recommended to use lower `temperature`. * For scenarios requiring creativity (such as content creation), you can try higher `temperature`. * It's recommended to use only one of `temperature` and `top_p`. ### top\_p `top_p` (nucleus sampling) controls diversity by sampling from the smallest set of tokens whose cumulative probability exceeds the threshold. * Lower values (e.g., 0.2): Limit the sampling range, resulting in more deterministic output. * Higher values (e.g., 0.9): Expand the sampling range, resulting in more diverse output. Best Practices: * If you want to achieve some diversity while ensuring content quality, `top_p` is a good choice (recommended values 0.8-0.95). * It's generally not recommended to modify both `temperature` and `top_p` simultaneously. ### max\_tokens `max_tokens` is used to limit the maximum number of tokens the model can generate in a single call. GLM-4.6 supports a maximum output length of 128K, GLM-4.5 supports a maximum output length of 96K, and it's recommended to set it to no less than 1024. Tokens are the basic units of text, typically 1 token equals approximately 0.75 English words or 1.5 Chinese characters. Setting an appropriate max\_tokens can control response length and cost, avoiding overly long outputs. If the model completes its answer before reaching the max\_tokens limit, it will naturally end; if it reaches the limit, the output may be truncated. * Purpose: Prevents generating overly long text and controls API call costs. * Note: `max_tokens` limits the length of generated content, not including input. Best Practices: * Set `max_tokens` reasonably according to your application scenario. If you need short answers, you can set it to a smaller value (e.g., 50). Default `max_tokens` and maximum supported `max_tokens` for each model: | Model Code | Default max\_tokens | Maximum max\_tokens | | :------------------ | :-----------------: | :-----------------: | | glm-5.1 | 65536 | 131072 | | glm-5v-turbo | 65536 | 131072 | | glm-5-turbo | 65536 | 131072 | | glm-5 | 65536 | 131072 | | glm-4.7 | 65536 | 131072 | | glm-4.6 | 65536 | 131072 | | glm-4.6v | 16384 | 32768 | | glm-4.6v-flash | 16384 | 32768 | | glm-4.6v-flashx | 16384 | 32768 | | glm-4.5 | 65536 | 98304 | | glm-4.5-air | 65536 | 98304 | | glm-4.5-x | 65536 | 98304 | | glm-4.5-airx | 65536 | 98304 | | glm-4.5-flash | 65536 | 98304 | | glm-4.5v | 16384 | 16384 | | glm-4-32b-0414-128k | 16384 | 16384 | ### stream `stream` is a boolean value used to control the API's response method. * `false` (default): Returns the complete response at once, simple to implement but with long waiting times. * `true`: Returns content in streaming (SSE) mode, significantly improving the experience of real-time interactive applications. Best Practices: * For chatbots, real-time code generation, and other applications, it's strongly recommended to set this to `true`. ### thinking The `thinking` parameter controls whether the model enables "Chain of Thought" for deeper thinking and reasoning. * Type: Object * Supported Models: `GLM-4.5` and above Properties: * `type` (string): * `enabled` (default): Enable chain of thought. `GLM-4.6` and `GLM-4.5` will automatically determine if needed, while `GLM-4.5V` will force thinking. * `disabled`: Disable chain of thought. Best Practices: * It's recommended to enable this when you need the model to perform complex reasoning and planning. * For simple tasks, you can disable it to get faster responses. *** ## Related Concepts Tokens are the basic units for model text processing. Usage calculation includes both input and output parts. * **Input Token Count:** The number of tokens contained in the text you send to the model. * **Output Token Count:** The number of tokens contained in the text generated by the model. * **Total Token Count:** The sum of input and output, usually used as the billing basis. You can call the `tokenizer` API to estimate the token count of text. Maximum Output Tokens refers to the maximum number of tokens a model can generate in a single request. It's different from the `max_tokens` parameter - `max_tokens` is the upper limit you set in your request, while Maximum Output Tokens is the architectural limitation of the model itself. For example, a model's context window might be 8k tokens, but its maximum output capability might be limited to 4k tokens. The Context Window refers to the total number of tokens a model can process in a single interaction, including all tokens from both **input text** and **generated text**. * **Importance:** The context window determines how much historical information the model can "remember". If the total length of input and expected output exceeds the model's context window, the model will be unable to process it. * **Note:** Different models have different context window sizes. When conducting long conversations or processing long documents, special attention should be paid to context window limitations. Concurrency refers to the number of API requests you can initiate simultaneously. This is set by the platform to ensure service stability and fair resource allocation. * **Limits:** Different users or subscription plans may have different concurrency quotas. * **Overages:** If you exceed the concurrency limit, new requests may fail or need to wait in queue. If your application requires high concurrency processing, please check your account limits or contact platform support. # Migrate to GLM-5.1 Source: https://docs.z.ai/guides/overview/migrate-to-glm-new This guide explains how to migrate your calls from GLM-5.1 GLM-5 GLM-4.7 or other earlier models to Z.AI GLM-5.1, our most powerful coding model to date, covering sampling parameter differences, streaming tool calls, and other key points. ## GLM-5.1 Features * Support for larger context and output: Maximum context 200K, maximum output 128K. * New support for streaming output during tool calling process (`tool_stream=true`), real-time retrieval of tool call parameters. * Same as GLM-5 series, supports deep thinking (`thinking={ type: "enabled" }`), when enabled will think compulsorily. * Superior code performance and advanced reasoning capabilities. ## Migration Checklist * [ ] Update model identifier to `glm-5.1` * [ ] Sampling parameters: `temperature` default value `1.0`, `top_p` default value `0.95`, recommend choosing only one for tuning * [ ] Deep thinking: Enabled or disable `thinking={ type: "enabled" }` as needed for complex reasoning/coding * [ ] Streaming response: Enable `stream=true` and properly handle `delta.reasoning_content` and `delta.content` * [ ] Streaming tool calls: Enable `stream=true` and `tool_stream=true` and stream-concatenate `delta.tool_calls[*].function.arguments` * [ ] Maximum output and context: Set `max_tokens` appropriately (GLM-5.1 maximum output 128K, context 200K) * [ ] Prompt optimization: Work with deep thinking, use clearer instructions and constraints * [ ] Development environment verification: Conduct use case testing and regression, focus on randomness, latency, parameter completeness in tool streams ## Start Migration ### 1. Update Model Identifier * Update `model` to `glm-5.1`. ```python theme={null} resp = client.chat.completions.create( model="glm-5.1", messages=[{"role": "user", "content": "Briefly describe the advantages of GLM-5"}] ) ``` ### 2. Update Sampling Parameters * `temperature`: Controls randomness; higher values are more divergent, lower values are more stable. * `top_p`: Controls nucleus sampling; higher values expand candidate set, lower values converge candidate set. * `temperature` defaults to `1.0`, `top_p` defaults to `0.95`, not recommended to adjust both simultaneously. ```python theme={null} # Plan A: Use temperature (recommended) resp = client.chat.completions.create( model="glm-5.1", messages=[{"role": "user", "content": "Write a more creative brand introduction"}], temperature=1.0 ) # Plan B: Use top_p resp = client.chat.completions.create( model="glm-5.1", messages=[{"role": "user", "content": "Generate more stable technical documentation"}], top_p=0.8 ) ``` ### 3. Deep Thinking (Optional) * GLM-5.1 continues to support deep thinking capability, enabled by default. * Recommended to enable for complex reasoning and coding tasks: ```python theme={null} resp = client.chat.completions.create( model="glm-5.1", messages=[{"role": "user", "content": "Design a three-tier microservice architecture for me"}], thinking={"type": "enabled"} ) ``` ### 4. Streaming Output and Tool Calls (Optional) * GLM-5 supports real-time streaming construction and output during tool calling process, disabled by default (`False`), requires enabling both: * `stream=True`: Enable streaming output for responses * `tool_stream=True`: Enable streaming output for tool call parameters ```python theme={null} response = client.chat.completions.create( model="glm-5.1", messages=[{"role": "user", "content": "How's the weather in Beijing"}], tools=[ { "type": "function", "function": { "name": "get_weather", "description": "Get current weather conditions for a specified location", "parameters": { "type": "object", "properties": { "location": {"type": "string", "description": "City, eg: Beijing, Shanghai"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["location"] } } } ], stream=True, tool_stream=True, ) # Initialize streaming collection variables reasoning_content = "" content = "" final_tool_calls = {} reasoning_started = False content_started = False # Process streaming response for chunk in response: if not chunk.choices: continue delta = chunk.choices[0].delta # Streaming reasoning process output if hasattr(delta, 'reasoning_content') and delta.reasoning_content: if not reasoning_started and delta.reasoning_content.strip(): print("\n🧠 Thinking Process:") reasoning_started = True reasoning_content += delta.reasoning_content print(delta.reasoning_content, end="", flush=True) # Streaming answer content output if hasattr(delta, 'content') and delta.content: if not content_started and delta.content.strip(): print("\n\n💬 Answer Content:") content_started = True content += delta.content print(delta.content, end="", flush=True) # Streaming tool call information (parameter concatenation) if delta.tool_calls: for tool_call in delta.tool_calls: idx = tool_call.index if idx not in final_tool_calls: final_tool_calls[idx] = tool_call final_tool_calls[idx].function.arguments = tool_call.function.arguments else: final_tool_calls[idx].function.arguments += tool_call.function.arguments # Output final tool call information if final_tool_calls: print("\n📋 Function Calls Triggered:") for idx, tool_call in final_tool_calls.items(): print(f" {idx}: Function Name: {tool_call.function.name}, Parameters: {tool_call.function.arguments}") ``` See: [Tool Streaming Output Documentation](/guides/tools/stream-tool) ### 5. Testing and Regression > First verify in development environment that post-migration calls are stable, focus on: * Whether responses meet expectations, whether there's excessive randomness or excessive conservatism in output * Whether tool streaming construction and output work normally * Latency and cost in long context and deep thinking scenarios ## More Resources Common model parameter concepts and sampling recommendations View tool streaming output usage details View complete API documentation Get technical support and help # Overview Source: https://docs.z.ai/guides/overview/overview Z.AI offers a variety of models and agents to meet the needs of different scenarios. Choosing the right model can help you complete tasks more efficiently. ## Featured Models Leading Open-Source Coding Capabilities
Significant Gains on Long-Horizon Tasks
Multimodal coding model, specializing in visual programming. Supports text-to-image generation, achieving open-source state-of-the-art in complex scenarios
## Models, Agents and Tools To help you find the best fit for your use case, we've created a table outlining the core features and strengths of each model in the Z.AI family. If you need to get pricing information, please go directly to [Pricing](/guides/overview/pricing). ### Text Models Our model matrix includes text models with built-in reasoning capabilities, as well as vision-language models (VLMs) that extend the same reasoning power to multimodal understanding. | Model | Strength | Language | Context | Resource | | :------------------ | :------------------------------------------------------------------------------------------------------------------------------ | :---------------- | :------ | :------------------------------------------------------------------------------------------------------ | | GLM-5.1 | Coding proficiency aligned with Opus 4.6
Ability to work independently and consistently for up to 8 hours on a single task | English & Chinese | 200K | [Guide](/guides/llm/glm-5.1)

[API Reference](/api-reference/llm/chat-completion) | | GLM-5 | Programming ability
Agentic Long-Term Planning and Execution
Backend refactoring and in-depth debugging | English & Chinese | 200K | [Guide](/guides/llm/glm-5)

[API Reference](/api-reference/llm/chat-completion) | | GLM-5-Turbo | Optimization of Core Requirements for OpenClaw Tasks
Improved continuity in the execution of complex tasks | English & Chinese | 200K | [Guide](/guides/llm/glm-5-turbo)

[API Reference](/api-reference/llm/chat-completion) | | GLM-4.7 | SOTA Performance
Enhanced General Capabilities
Optimized Agentic Coding | English & Chinese | 200K | [Guide](/guides/llm/glm-4.7)

[API Reference](/api-reference/llm/chat-completion) | | GLM-4.7-FlashX | Enhanced General Capabilities
Optimized Agentic Coding
Lightweight & High-Speed | English & Chinese | 200K | [Guide](/guides/llm/glm-4.7)

[API Reference](/api-reference/llm/chat-completion) | | GLM-4.6 | High Performance
Strong Coding
More Versatile | English & Chinese | 200K | [Guide](/guides/llm/glm-4.6)

[API Reference](/api-reference/llm/chat-completion) | | GLM-4.5 | Better Performance
Strong Reasoning
More Versatile | English & Chinese | 128K | [Guide](/guides/llm/glm-4.5)

[API Reference](/api-reference/llm/chat-completion) | | GLM-4.5-X | Good Performance
Strong Reasoning
Ultra-Fast Response | English & Chinese | 128K | [Guide](/guides/llm/glm-4.5)

[API Reference](/api-reference/llm/chat-completion) | | GLM-4.5-Air | Cost-Effective
Lightweight
High Performance | English & Chinese | 128K | [Guide](/guides/llm/glm-4.5)

[API Reference](/api-reference/llm/chat-completion) | | GLM-4.5-AirX | Lightweight
High Performance
Ultra-Fast Response | English & Chinese | 128K | [Guide](/guides/llm/glm-4.5)

[API Reference](/api-reference/llm/chat-completion) | | GLM-4-32B-0414-128K | High intelligence at
unmatched cost-efficiency | English & Chinese | 128K | [Guide](/guides/llm/glm-4-32b-0414-128k)

[API Reference](/api-reference/llm/chat-completion) | | GLM-4.7-Flash | Free, Lightweight
High Performance | English & Chinese | 200K | [Guide](/guides/llm/glm-4.7)

[API Reference](/api-reference/llm/chat-completion) | | GLM-4.5-Flash | Free, Lightweight
Strong Reasoning
| English & Chinese | 200K | [Guide](/guides/llm/glm-4.5)

[API Reference](/api-reference/llm/chat-completion) | ### Vision Models Visual models process images or videos for recognition and analysis. | Model | Strength | Language | Context | Resource | | :-------------- | :------------------------------------------------------------------------------------------------------------ | :---------------- | :------ | :----------------------------------------------------------------------------------------------- | | GLM-5V-Turbo | Multimodal Coding Capabilities
Context Size Increased to 200K
Deep Integration with Agent Workflows | English & Chinese | 200K | [Guide](/guides/vlm/glm-5v-turbo)

[API Reference](/api-reference/llm/chat-completion) | | GLM-4.6V | Native Function Call Support
Thinking Mode Switch Support | English & Chinese | 128K | [Guide](/guides/vlm/glm-4.6v)

[API Reference](/api-reference/llm/chat-completion) | | GLM-OCR | Document Parsing
Information Extraction | Multiple | / | [Guide](/guides/vlm/glm-ocr)

[API Reference](/api-reference/tools/layout-parsing) | | GLM-4.6V-FlashX | Native Function Call Support
Thinking Mode Switch Support
Lightweight & High-Speed | English & Chinese | 128K | [Guide](/guides/vlm/glm-4.6v)

[API Reference](/api-reference/llm/chat-completion) | | GLM-4.5V | Multimodal
Flexible Reasoning | English & Chinese | 64K | [Guide](/guides/vlm/glm-4.5v)

[API Reference](/api-reference/llm/chat-completion) | | GLM-4.6V-Flash | Free, Native Function Call Support | English & Chinese | 128K | [Guide](/guides/vlm/glm-4.6v)

[API Reference](/api-reference/llm/chat-completion) | ### Built-in Tools A suite of built-in tools designed to streamline workflows and boost productivity. | Tool | Capability | | :--------- | :---------------------------------------------------------------------------------------------------------------------------- | | Web Search | - Provide real-time, concise, direct answers
- Accurately parse complex HTML and converts it into clean Markdown or JSON | ### Image Generation Models Image Generation Models learn from massive image data to automatically generate high-quality images from text. | Model | Strength | Language | Resolution | Resource | | :-------- | :------------------------------------------------------------------------------------------------------------ | :---------------- | :------------------- | :----------------------------------------------------------------------------------------------- | | GLM-Image | - Stronger in complex instruction and knowledge-intensive scenarios
- Open-source SOTA in text rendering | English & Chinese | multiple resolutions | [Guide](/guides/image/glm-image)

[API Reference](/api-reference/image/generate-image) | | CogView-4 | - High-quality image generation
- Diverse styles
- Rich in detail | English & Chinese | multiple resolutions | [Guide](/guides/image/cogview-4)

[API Reference](/api-reference/image/generate-image) | ### Video Generation Models Video Generation Models turn text, images, or clips into dynamic video content, accelerating creativity for film, virtual avatars, animation, and marketing. | Model | Strength | Language | Resolution | Resource | | :---------- | :------------------------------------------------------------------------------------ | :---------------- | :------------------- | :---------------------------------------------------------------------------------------------------- | | CogVideoX-3 | Significant improvements in image quality, stability, and physical realism simulation | English & Chinese | multiple resolutions | [Guide](/guides/video/cogvideox-3)

[API Reference](/api-reference/video/cogvideox-3\&vidu) | | ViduQ1 | Theatrical quality with seamless temporal flow | English & Chinese | 1080P | [Guide](/guides/video/vidu-q1)

[API Reference](/api-reference/video/cogvideox-3\&vidu) | | Vidu2 | Fast delivery with smart style preservation | English & Chinese | 720P | [Guide](/guides/video/vidu2)

[API Reference](/api-reference/video/cogvideox-3\&vidu) | ### Audio Models Audio models are a class of multimodal models that process audio and video signals, enabling the understanding, generation, or editing of audiovisual content. | Model | Strength | Multimodal Support | Resource | | :----------- | :------------------------------------------------------------------------------------------------------------------------ | :----------------- | :-------------------------------------------------------------------------------------------------------- | | GLM-ASR-2512 | - CER as low as 0.0717
- Support user-defined vocabularies
- Support multiple mainstream languages and dialects | Audio | [Guide](/guides/audio/glm-asr-2512)

[API Reference](/api-reference/audio/audio-transcriptions) | ### Agents A set of ready-made agents empower users to create and communicate effortlessly. | Tool | Capability | Resource | | :-------------------------------------- | :------------------------------------------------------------------------- | :---------------------------- | | GLM Slide/Poster Agent(beta) | Combine content generation with professional design | [Guide](/guides/agents/slide) | | General-Purpose Translation | Support 40+ languages, flexible strategies, and terminology customization | [Guide](/guides/agents/slide) | | Popular Special Effects Video Templates | Special effects video templates like French\_Kiss, BodyShake, and Sexy\_Me | [Guide](/guides/agents/slide) | # Pricing Source: https://docs.z.ai/guides/overview/pricing This page provides pricing information for Z.AI’s models and tools. All prices are in USD. ## Models ### Text Models Prices per 1M tokens. | Model | Input | Cached Input | Cached Input Storage | Output | | :------------------ | :----- | :----------- | :------------------- | :----- | | GLM-5.1 | \$1.4 | \$0.26 | Limited-time Free | \$4.4 | | GLM-5 | \$1 | \$0.2 | Limited-time Free | \$3.2 | | GLM-5-Turbo | \$1.2 | \$0.24 | Limited-time Free | \$4.0 | | GLM-4.7 | \$0.6 | \$0.11 | Limited-time Free | \$2.2 | | GLM-4.7-FlashX | \$0.07 | \$0.01 | Limited-time Free | \$0.4 | | GLM-4.6 | \$0.6 | \$0.11 | Limited-time Free | \$2.2 | | GLM-4.5 | \$0.6 | \$0.11 | Limited-time Free | \$2.2 | | GLM-4.5-X | \$2.2 | \$0.45 | Limited-time Free | \$8.9 | | GLM-4.5-Air | \$0.2 | \$0.03 | Limited-time Free | \$1.1 | | GLM-4.5-AirX | \$1.1 | \$0.22 | Limited-time Free | \$4.5 | | GLM-4-32B-0414-128K | \$0.1 | - | - | \$0.1 | | GLM-4.7-Flash | Free | Free | Free | Free | | GLM-4.5-Flash | Free | Free | Free | Free | ### Vision Models Prices per 1M tokens. | Model | Input | Cached Input | Cached Input Storage | Output | | :-------------- | :----- | :----------- | :------------------- | :----- | | GLM-5V-Turbo | \$1.2 | \$0.24 | Limited-time Free | \$4 | | GLM-4.6V | \$0.3 | \$0.05 | Limited-time Free | \$0.9 | | GLM-OCR | \$0.03 | \\ | \\ | \$0.03 | | GLM-4.6V-FlashX | \$0.04 | \$0.004 | Limited-time Free | \$0.4 | | GLM-4.5V | \$0.6 | \$0.11 | Limited-time Free | \$1.8 | | GLM-4.6V-Flash | Free | Free | Free | Free | ### Built-in Tools | Tool | Cost | | :--------- | :----------- | | Web Search | \$0.01 / use | ### Image Generation Models Prices per image. | Model | Price | | :-------- | :------ | | GLM-Image | \$0.015 | | CogView-4 | \$0.01 | ### Video Generation Models Prices per video. | Model | Price | | :--------------- | :---- | | CogVideoX-3 | \$0.2 | | ViduQ1-Text | \$0.4 | | ViduQ1-Image | \$0.4 | | ViduQ1-Start-End | \$0.4 | | Vidu2-Image | \$0.2 | | Vidu2-Start-End | \$0.2 | | Vidu2-Reference | \$0.4 | ### Audio Models | Model | Price | | :----------- | :---------------------------------------------------------- | | GLM-ASR-2512 | \$0.03 / MTok (equivalent to approximately \$0.0024/minute) | ### Agents | Agent | Price | | :-------------------------------------- | :------------ | | GLM Slide/Poster Agent(beta) | \$0.7 / MTok | | General-Purpose Translation | \$3 / MTok | | Popular Special Effects Video Templates | \$0.2 / video | # Quick Start Source: https://docs.z.ai/guides/overview/quick-start Tired of limits? GLM Coding Plan — monthly access to world-class models, compatible with top coding tools like Claude Code and Cline. All from just \$18/month. [Try it now →](https://z.ai/subscribe?utm_campaign=Platform_Ops&_channel_track_key=DaprgHIc) ## Getting Started * Access [Z.AI Open Platform](https://z.ai/model-api), Register or Login. * Access [Billing Page](https://z.ai/manage-apikey/billing) to top up if needed. * Create an API Key in the [API Keys](https://z.ai/manage-apikey/apikey-list) management page. * Copy your API Key for use. Access [Z.AI Open Platform](https://z.ai/model-api), Register or Login. Create an API Key in the [API Keys](https://z.ai/manage-apikey/apikey-list) management page. > The platform offers multiple models, and you can select the appropriate model based on your needs. For detailed model introductions, please refer to the [Models & Agents](pricing). Zai's new-generation flagship foundation model, targeting Agentic Engineering, enables a paradigm shift from code to engineering. Multimodal Coding model, specializing in visual programming. Supports text-to-image generation, achieving open-source state-of-the-art (SOTA) in complex scenarios New frame generation capabilities that significantly improve image stability and clarity Our platform provides various development approaches; you can select the best fit for your project needs and tech stack. Standard RESTful API, compatible with all programming languages. Official Python SDK, featuring full type hints and async support. Official Java SDK, designed for high concurrency and availability. OpenAI SDK Compatibility, quickly migrating from OpenAI. Complete API documentation with parameter descriptions. After preparing your `API Key` and selecting a model, you can start making API calls. Here are examples using `curl`, `Python SDK`, and `Java SDK`: Note: When using the [GLM Coding Plan](/devpack/overview), you need to configure the dedicated \ Coding endpoint - [https://api.z.ai/api/coding/paas/v4](https://api.z.ai/api/coding/paas/v4) \ instead of the general endpoint - [https://api.z.ai/api/paas/v4](https://api.z.ai/api/paas/v4) \ Note: The GLM Coding Plan API endpoint is intended for [supported tools](https://docs.z.ai/devpack/tool/others#step-1-supported-tools) only. For other use cases, we recommend using the general API endpoint. ```bash theme={null} curl -X POST "https://api.z.ai/api/paas/v4/chat/completions" \ -H "Content-Type: application/json" \ -H "Accept-Language: en-US,en" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{ "model": "glm-5.1", "messages": [ { "role": "system", "content": "You are a helpful AI assistant." }, { "role": "user", "content": "Hello, please introduce yourself." } ] }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Usage Example** ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key="YOUR_API_KEY") # Create chat completion request response = client.chat.completions.create( model="glm-5.1", messages=[ { "role": "system", "content": "You are a helpful AI assistant." }, { "role": "user", "content": "Hello, please introduce yourself." } ] ) # Get response print(response.choices[0].message.content) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Usage Example** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.*; import java.util.Arrays; public class QuickStart { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder().ofZAI() .apiKey("YOUR_API_KEY") .build(); // Create chat completion request ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5.1") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content("Hello, who are you?") .build() )) .stream(false) .build(); // Send request ChatCompletionResponse response = client.chat().createChatCompletion(request); // Get response System.out.println(response.getData().getChoices().get(0).getMessage().getContent()); } } ``` **Install SDK** ```bash theme={null} # Install or upgrade to latest version pip install --upgrade 'openai>=1.0' ``` **Verify Installation** ```python theme={null} python -c "import openai; print(openai.__version__)" ``` **Usage Example** ```python theme={null} from openai import OpenAI client = OpenAI( api_key="your-Z.AI-api-key", base_url="https://api.z.ai/api/paas/v4/" ) completion = client.chat.completions.create( model="glm-5.1", messages=[ {"role": "system", "content": "You are a smart and creative novelist"}, {"role": "user", "content": "Please write a short fairy tale story as a fairy tale master"} ] ) print(completion.choices[0].message.content) ``` **Install SDK** ```bash theme={null} # Install or upgrade to latest version npm install openai # Or using yarn yarn add openai ``` **Usage Example** ```javascript theme={null} import OpenAI from "openai"; const client = new OpenAI({ apiKey: "your-Z.AI-api-key", baseURL: "https://api.z.ai/api/paas/v4/" }); async function main() { const completion = await client.chat.completions.create({ model: "glm-5.1", messages: [ { role: "system", content: "You are a helpful AI assistant." }, { role: "user", content: "Hello, please introduce yourself." } ] }); console.log(completion.choices[0].message.content); } main(); ``` **Install SDK** **Maven** ```xml theme={null} com.openai openai-java 2.20.1 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'com.openai:openai-java:2.20.1' ``` **Usage Example** ```java theme={null} import com.openai.client.OpenAIClient; import com.openai.client.okhttp.OpenAIOkHttpClient; import com.openai.models.chat.completions.ChatCompletion; import com.openai.models.chat.completions.ChatCompletionCreateParams; public class QuickStart { public static void main(String[] args) { // Initialize client OpenAIClient client = OpenAIOkHttpClient.builder() .apiKey("your-Z.AI-api-key") .baseUrl("https://api.z.ai/api/paas/v4/") .build(); // Create chat completion request ChatCompletionCreateParams params = ChatCompletionCreateParams.builder() .addSystemMessage("You are a helpful AI assistant.") .addUserMessage("Hello, please introduce yourself.") .model("glm-5.1") .build(); // Send request and get response ChatCompletion chatCompletion = client.chat().completions().create(params); Object response = chatCompletion.choices().get(0).message().content(); System.out.println(response); } } ``` ### Get More Access API Reference. Access Python SDK Github Access Java SDK Github # Stream Tool Call Source: https://docs.z.ai/guides/tools/stream-tool Stream Tool Call is a unique feature of Z.ai's latest GLM-4.6 model, allowing real-time access to reasoning processes, response content, and tool call information during tool invocation, providing better user experience and real-time feedback. ## Features Tool calling in the latest GLM model now supports streaming output for responses. This allows developers to stream tool usage parameters without buffering or JSON validation when calling `chat.completions`, thereby reducing call latency and providing a better user experience. ### Core Parameter Description * **`stream=True`**: Enable streaming output, must be set to `True` * **`tool_stream=True`**: Enable tool call streaming output * **`model`**: Use a model that supports tool calling, limited to `glm-4.6` `glm-4.7` `glm-5` ### Response Parameter Description The `delta` object in streaming responses contains the following fields: * **`reasoning_content`**: Text content of the model's reasoning process * **`content`**: Text content of the model's response * **`tool_calls`**: Tool call information, including function names and parameters ## Code Example By setting the `tool_stream=True` parameter, you can enable streaming tool call functionality: **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` **Complete Example** ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key='Your API key') # Create streaming tool call request response = client.chat.completions.create( model="glm-4.6", # Use model that supports tool calling messages=[ {"role": "user", "content": "How's the weather in Beijing?"}, ], tools=[ { "type": "function", "function": { "name": "get_weather", "description": "Get current weather conditions for a specified location", "parameters": { "type": "object", "properties": { "location": {"type": "string", "description": "City, e.g.: Beijing, Shanghai"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["location"] } } } ], stream=True, # Enable streaming output tool_stream=True # Enable tool call streaming output ) # Initialize variables to collect streaming data reasoning_content = "" # Reasoning process content content = "" # Response content final_tool_calls = {} # Tool call information reasoning_started = False # Reasoning process start flag content_started = False # Content output start flag # Process streaming response for chunk in response: if not chunk.choices: continue delta = chunk.choices[0].delta # Handle streaming reasoning process output if hasattr(delta, 'reasoning_content') and delta.reasoning_content: if not reasoning_started and delta.reasoning_content.strip(): print("\n🧠 Thinking Process:") reasoning_started = True reasoning_content += delta.reasoning_content print(delta.reasoning_content, end="", flush=True) # Handle streaming response content output if hasattr(delta, 'content') and delta.content: if not content_started and delta.content.strip(): print("\n\n💬 Response Content:") content_started = True content += delta.content print(delta.content, end="", flush=True) # Handle streaming tool call information if delta.tool_calls: for tool_call in delta.tool_calls: index = tool_call.index if index not in final_tool_calls: # New tool call final_tool_calls[index] = tool_call final_tool_calls[index].function.arguments = tool_call.function.arguments else: # Append tool call parameters (streaming construction) final_tool_calls[index].function.arguments += tool_call.function.arguments # Output final tool call information if final_tool_calls: print("\n📋 Function Calls Triggered:") for index, tool_call in final_tool_calls.items(): print(f" {index}: Function Name: {tool_call.function.name}, Parameters: {tool_call.function.arguments}") ``` ## Use Cases * Real-time display of query progress * Improve waiting experience * Real-time code analysis process * Display tool call chain # Web Search Source: https://docs.z.ai/guides/tools/web-search Z.AI provides developers with a full suite of AI search tools, covering three core services: **basic search (Web Search API)**, **retrieval-augmented generation (Web Search in Chat)**, and **intelligent search agents (Search Agent)**. Through a unified API, offering end-to-end capabilities—from raw web data retrieval and fusion of search results with LLM output, to multi-turn dialogue context management. This empowers developers to build trustworthy, real-time, and traceable AI applications at **lower cost**. * View [Product Price](/guides/overview/pricing) * View Your [API Key](https://chat.z.ai) ## Product Overview | Service Module | Developer Value | Technical Features | | :----------------- | :--------------------------------------------------------------------------------------------- | :------------------------------------------------------------- | | Web Search API | Directly obtain **structured search results** (title/summary/link, etc.) | Multi-Search Engine Support | | Web Search in Chat | Incorporate search results into large model-generated **answers with cited** **web sources**. | Seamless Integration of Real-Time Retrieval and LLM Generation | ## Web Search API Web Search API is a search engine specifically designed for large language models. Building on the traditional capabilities of webpage crawling and ranking, it enhances intent recognition and returns results that are more suitable for LLM processing (webpage titles, URLs, summaries, site names, and favicons). 1. Intent-Enhanced Retrieval: Intelligently identifies the user’s query intent and automatically determines whether web search is needed. 2. Structured Output: Returns data formats optimized for LLMs, including titles, URLs, summaries, site names, and icons. 3. Customizable Search Scope: Allows developers to specify the number of results, domain constraints, and time ranges. It also supports adjustable summary lengths for fine-grained search behavior control. 4. Time-Aware Output Control: The response can include the webpage’s publication time, facilitating timeliness analysis and result ranking. **API Call** * API Docs: [Web Search API](/api-reference/tools/web-search) * Example: Search Financial News ```python Search Financial News theme={null} from zai import ZaiClient client = ZaiClient(api_key="") # Fill in your own APIKey response = client.web_search.web_search( search_engine="search-prime", search_query="search economic events", count=15, # The number of results to return, ranging from 1-50, default 10 search_domain_filter="www.sohu.com", # Only access content from specified domain names. search_recency_filter="noLimit", # Search for content within specified date ranges ) print(response) ``` ```json theme={null} WebSearchResp( { "created": 1748261757, "id": "20250526201557dda85ca6801b467b", "request_id": "20250526201557dda85ca6801b467b", "search_result": [ { "content": "1. China's outward direct investment from January to April reached $57.54 billion, a year-on-year increase of 7.5%. The trade-in policy continues to show results, with retail sales of home appliances maintaining double-digit growth for eight consecutive months.\n2. China has implemented multiple measures to accelerate the construction of a sci-tech financial system. The CSRC will support tech companies breaking through key core technologies by granting them 'green channel' access. The PBOC stated that nearly 100 institutions have issued over 250 billion yuan in sci-tech innovation bonds.\n3. In May, China's passenger car retail sales are estimated at around 1.85 million units, up 8.5% year-on-year and 5.4% month-on-month. New energy vehicle sales are expected to reach 980,000, maintaining a penetration rate of approximately 52.9%.\n4. The Beijing Humanoid Robot Innovation Center released the world's first 'Humanoid Robot Intelligence Grading' standard, establishing a 'four-dimensional, five-level' evaluation framework covering 'perception and cognition, decision-making and learning, execution performance, and collaborative interaction.'\n5. Federal Reserve Governor Waller stated that if the Trump administration's tariffs stabilize around 10%, the Fed may cut interest rates in the second half of 2025.\n6. The U.S. manufacturing PMI rose to a three-month high of 52.3 in May. The services PMI preliminary reading was 52.3, hitting a two-month high. New orders grew at their fastest pace in over a year, while price indicators reached a nearly three-year peak.\n7. The eurozone manufacturing PMI preliminary reading improved slightly to 49.2 in May, but the services PMI unexpectedly dropped sharply to 48.9, marking the worst performance in 16 months. This dragged the eurozone composite PMI down to 49.5 from 50.4 in April. Return to Sohu for more.", "icon": "https://sfile.chatglm.cn/searchImage/sohu_icon_new.jpg", "link": "https://www.sohu.com/a/897879632_121123890", "media": "Sohu", "publish_date": "2025-05-23", "refer": "ref_1", "title": "Financial Morning Briefing for May 23, 2025" }, { "content": "1. The CSRC and seven other departments issued 'Several Measures to Support Financing for Small and Micro Enterprises,' proposing 23 specific initiatives across eight areas: increasing financing supply, reducing comprehensive financing costs, improving financing efficiency, enhancing support precision, implementing regulatory policies, strengthening risk management, improving policy safeguards, and ensuring effective implementation.\nComment: Increased support for small and micro enterprise financing, with 23 measures targeting the real economy to alleviate financing difficulties.\n2. Ministry of Commerce: From January to April 2025, China's outward direct investment across all sectors totaled $57.54 billion, up 7.5% year-on-year.\nComment: Steady growth in China's outward investment, with 'Belt and Road' cooperation likely a key driver.\nIndustry Insights\n3. Ministry of Commerce: From September 2024 to April 2025, retail sales of home appliances maintained double-digit growth for eight consecutive months. In April, retail sales of household appliances and audio-visual equipment in above-quota units surged 38.8% year-on-year, ranking first among 16 major consumer goods categories.\nComment: Strong demand for home appliances, driven by policy incentives and consumption upgrades.\n4. National Energy Administration: As of end-April, China's total installed power generation capacity reached 3.49 billion kW, up 15.9% year-on-year. From January to April, major power enterprises completed 193.3 billion yuan in power source engineering investments, up 1.6% year-on-year, while grid engineering investments totaled 140.8 billion yuan, up 14.6%.\nComment: Accelerated energy infrastructure development, with grid investment growth outpacing power source investment.\nGlobal Perspective\n5. The U.S. House Rules Committee approved Trump's comprehensive tax cut bill, paving the way for a full House vote.\nComment: Progress on Trump's tax cuts, which could stimulate U.S. consumption but exacerbate fiscal deficit pressures if enacted.\nFinancial Markets\n6. Most domestic commodity futures closed lower, with weak energy performance, widespread pressure on chemical futures, and declines in the black commodities sector. Agricultural products, particularly apple futures, also saw notable drops.\nComment: Broad-based commodity market declines reflect weak demand expectations.\n7. On Thursday, the onshore yuan closed at 7.2040 against the dollar at 16:30, up 25 basis points from the previous session. The central parity rate was adjusted up 34 basis points to 7.1903. Treasury futures mostly flatlined, with the 30-year contract down 0.04% and the 10-year contract up 0.01%.\nComment: Stabilizing yuan exchange rate, with parity rate adjustments signaling policy support.\n8. On May 22, the three major indices closed lower. Defense stocks led gains, while innovative drug concepts were active. Bank stocks bucked the trend, while pet economy concepts lagged, beauty care stocks plunged, and solid-state battery concepts retreated.\nComment: Structural divergence in A-shares as risk aversion rises, with caution advised over June U.S. debt maturity risks and potential global market corrections. Return to Sohu for more.", "icon": "https://sfile.chatglm.cn/searchImage/sohu_icon_new.jpg", "link": "https://www.sohu.com/a/897874861_121123901", "media": "Sohu", "publish_date": "2025-05-23", "refer": "ref_2", "title": "Financial Morning Briefing for May 23" }, { "content": "According to Zhitong Finance APP, Macao's Composite Consumer Price Index (CPI) rose 0.23% year-on-year and 0.16% month-on-month in April 2025. The data, released by Macao's Statistics and Census Service on May 23, reflects subtle shifts in Macao's consumer market.\n1. Changes in Macao's CPI\nData shows that the 12-month average composite CPI as of April 2025 rose 0.42% compared to the same period a year earlier (May 2023–April 2024). Among major categories, the price index for Recreation, Sports, and Culture rose 2.47% year-on-year, while Miscellaneous Goods and Services increased 1.82%, signaling potential economic recovery.\n2. Impact of Food and Rent\nFood and non-alcoholic beverage prices rose 0.48% year-on-year, driven by higher dining-out and takeaway costs. Housing and fuel prices also increased 0.21% due to rising rents. As tourism recovers, dining demand has pushed up service fees.\nHowever, prices for Communication (-3.01%), Clothing (-2.35%), and Transport (-1.67%) declined, complicating the overall CPI trend.\n3. Month-on-Month Changes\nCompared to March 2025, April's CPI rose 0.16%. Recreation, Sports, and Culture saw the largest increase (2.6%) due to higher hotel rates. Summer clothing launches and rebounding airfares also lifted Clothing (+0.78%) and Transport (+0.42%) prices.\nDespite this, Food and non-alcoholic beverages dipped slightly (-0.04%) as lower prices for fresh fish, seafood, vegetables, and fruit offset higher dining costs.\n4. CPI Classification\nMacao's CPI is compiled into three series to reflect price impacts on households with different spending patterns. Category A (50% of households) covers monthly expenditures of 11,000–35,999 MOP, while Category B (30%) covers 36,000–71,999 MOP.\nHousing and fuel (34.47%), food (29.47%), and transport (8.33%) carry the highest weights, ensuring precise reflection of consumption trends across income groups.\n5. Outlook\nMacao's consumer market will continue to face multiple influences. While some categories show price declines, the overall mild CPI rise suggests recovery potential. With tourism rebounding and demand growing, CPI may maintain a gradual upward trend.\nIn summary, April 2025's CPI changes reflect market dynamics and new cost-of-living challenges. Authorities must monitor price impacts to craft effective policies ensuring stable living standards. Return to Sohu for more.", "icon": "https://sfile.chatglm.cn/searchImage/sohu_icon_new.jpg", "link": "https://www.sohu.com/a/898091955_121956424", "media": "Sohu", "publish_date": "2025-05-23", "refer": "ref_3", "title": "Macao's April 2025 CPI Rises Slightly: Key Drivers Explained" }, { "content": "Zhitong Finance APP learned that on May 23, the Asset Management Association of China (AMAC) released its monthly report on private fund manager registrations and product filings. In April 2025, 1,606 new private funds were filed nationwide, with a total size of 643.72 billion yuan. This included 1,189 private securities investment funds (366.95 billion yuan), 143 private equity funds (161.01 billion yuan), and 274 venture capital funds (115.76 billion yuan).\n1. Private Fund Manager Registration Overview\n(1) Monthly Registrations\nIn April 2025, 16 institutions passed AMAC's AMBERS system, including 4 private securities investment fund managers and 12 private equity/venture capital fund managers. During the same period, 76 private fund managers were deregistered.\n(2) Existing Managers\nAs of end-April 2025, there were 19,891 private fund managers overseeing 141,579 funds with 20.22 trillion yuan in assets. This included 7,827 private securities investment fund managers, 11,867 private equity/venture capital managers, 6 asset allocation managers, and 191 others.\n(3) Geographic Distribution\nBy registered location (across 36 regions), 72.20% of managers were concentrated in Shanghai (3,714), Beijing (3,244), Shenzhen (3,012), Guangdong (ex-Shenzhen, 1,594), Zhejiang (ex-Ningbo, 1,590), and Jiangsu (1,207), accounting for 18.67%, 16.31%, 15.14%, 8.01%, 7.99%, and 6.07% respectively.\nBy AUM, the top regions were Shanghai (5.08 trillion yuan, 25.12%), Beijing (4.68 trillion yuan, 23.13%), Shenzhen (1.97 trillion yuan, 9.74%), Guangdong (ex-Shenzhen, 1.30 trillion yuan, 6.43%), Jiangsu (1.18 trillion yuan, 5.82%), and Zhejiang (ex-Ningbo, 967.16 billion yuan, 4.78%), totaling 75.02%.\n2. Private Fund Filings Overview\n(1) Monthly Filings\nApril 2025 saw 1,606 new private funds (643.72 billion yuan), including 1,189 private securities funds (366.95 billion yuan), 143 private equity funds (161.01 billion yuan), and 274 venture capital funds (115.76 billion yuan).\n(2) Existing Funds\nAs of end-April 2025, there were 141,579 private funds (20.22 trillion yuan), including 84,673 private securities funds (5.51 trillion yuan), 30,205 private equity funds (10.96 trillion yuan), and 25,830 venture capital funds (3.41 trillion yuan).", "icon": "https://sfile.chatglm.cn/searchImage/sohu_icon_new.jpg", "link": "https://www.sohu.com/a/898137965_323087", "media": "Sohu", "publish_date": "2025-05-23", "refer": "ref_4", "title": "AMAC: 1,606 New Private Funds Filed in April, Totaling 643.72 Billion Yuan" } ] } ) ``` ### MCP Server Access the [official MCP documentation](https://modelcontextprotocol.io/introduction) to learn more about the protocol. **Installation Guide** * Use clients that support the MCP protocol, such as Cursor and Cherry Studio. * Obtain an [API Key](https://z.ai/manage-apikey/apikey-list) from the Z.AI Platform. **Using in Cursor** Cursor 0.45.6 includes MCP functionality. Cursor acts as an MCP service client and can connect to the MCP service with simple configuration. Navigation Path: Cursor Settings → \[Features] → \[MCP Servers] 370b139e 8201 40a0 9c44 3faf4b1b9655 Pn **Configure MCP Server** ```python theme={null} { "mcpServers": { "z.ai-web-search-sse": { "url": "https://api.z.ai/api/mcp/web_search/sse?Authorization=YOUR API Key" } } } ``` **Cursor MCP Usage** Cursor MCP must be used in Composer's Agent mode. Bb97f60a B887 4e13 92d7 8156227676c7 Pn ## Web Search in Chat Web Search in Chat allows the Completions API to call search engines, combining real-time web retrieval results with GLM's generative capabilities to provide up-to-date and verifiable answers. * API Docs: [Web Search in Chat](/api-reference/tools/web-search) * Example: Financial Analysis Summary ```python Financial Analysis Summary theme={null} from zai import ZaiClient # Initialize the ZaiClient client client = ZaiClient(api_key="YourAPIKey") # Define tool parameters tools = [{ "type": "web_search", "web_search": { "enable": "True", "search_engine": "search-prime", "search_result": "True", "search_prompt": "You are a financial analyst. Please use concise language to summarize the key information in {{search_result}} from the web search, ranked by importance and citing the source date. Today's date is April 11, 2025.", "count": "5", "search_domain_filter": "www.sohu.com", "search_recency_filter": "noLimit", "content_size": "high" } }] # Define user message messages = [{ "role": "user", "content": "Key financial events, policy changes, and market data in April 2025" }] # Call the API to get a response response = client.chat.completions.create( model="glm-4-air", # Model identifier messages=messages, # User messages tools=tools # Tool parameters ) # Print the response result print(response) ``` ```json theme={null} { "choices": [ { "finish_reason": "stop", "index": 0, "message": { "content": "Based on the documents you provided, here are the important financial events for April 2025, ranked by importance:\n\n1. **G20 Finance and Central Bank Ministers Meeting** - Date to be determined. The G20 meeting will discuss key issues such as global economic recovery, financial stability, and sustainable development. This will have a profound impact on the coordination of global economic policies and financial market sentiment. [Source: ref_1]\n\n2. **Preliminary Manufacturing PMI Releases for Multiple Countries and Regions** - Including France, Germany, the Eurozone, and the UK. These data will reveal the activity status of their respective manufacturing sectors, providing critical insights for investors. [Source: ref_1]\n\n3. **Eurozone Seasonally Adjusted Trade Balance for February** - Released at 17:00. The performance of the trade balance will significantly impact the euro and related markets, reflecting the region's trade conditions. [Source: ref_1]\n\n4. **Speeches by Fed Officials Moussalem and Waller** - At 21:30. Speeches by Fed officials will play a crucial role in driving market volatility and sparking discussions about the future direction of U.S. monetary policy. [Source: ref_1]\n\n5. **U.S. April S&P Global Manufacturing and Services PMI Preliminaries** - Released at 21:45. The PMI data for manufacturing and services will indicate the overall trend of economic activity, influencing investment decisions. [Source: ref_1]\n\n6. **U.S. March New Home Sales Annualized Total** - Released at 22:00. As an important indicator of economic health, new home sales will signal changes in market demand. [Source: ref_1]\n\n7. **U.S. EIA Crude Oil Inventory for the Week Ending April 18** - Released at 22:30. Crude oil inventory data will provide a key reference point for the market, affecting oil price fluctuations. [Source: ref_1]\n\n8. **Fed Releases the Beige Book on Economic Conditions** - At 02:00 the next day. The Fed's Beige Book report will summarize economic development trends, impacting investor confidence and strategies. [Source: ref_1]\n\n9. **Speech by Fed Governor Lisa Cook** - At 07:10. Governor Cook's speech may reflect the Fed's current thinking on the economic situation, influencing market sentiment and asset price fluctuations. [Source: ref_2]\n\n10. **Release of China's Core Economic Data** - Including the monthly report on residential sales prices in 70 large and medium-sized cities, Q1 GDP annual rate, total retail sales of consumer goods, and industrial added value above a designated size. These will have a significant impact on global economic growth and the healthy development of China's economy. [Source: ref_2]\n\n11. **Release of European Economic Data** - Including the UK's March CPI monthly rate, the Eurozone's seasonally adjusted current account for February, and the final annual CPI rate for March. These will provide investors with a comprehensive view of the European economy. [Source: ref_2]\n\n12. **WTO Global Trade Outlook Report** - Released at 20:00. The report may reveal the future direction of global trade and changes among countries. [Source: ref_2]\n\n13. **Release of Key U.S. Economic Indicators** - Including the March retail sales monthly rate, industrial production monthly rate, and NAHB housing market index. These will influence investors' assessment of the strength of the U.S. economic recovery. [Source: ref_2]\n\n14. **Bank of Canada Interest Rate Decision** - Released at 21:45. The direction of Canada's monetary policy is also worth noting. [Source: ref_2]\n\n15. **China's State Council Information Office Releases Q1 Import and Export Data** - Released at 10:00. The growth or decline in import and export data will directly affect GDP growth expectations and investor confidence. [Source: ref_3]\n\n16. **Canada's February Wholesale Sales Monthly Rate Data** - Released at 20:30. Wholesale sales are an important indicator of retail demand expectations, directly reflecting the health of the economy. [Source: ref_3]\n\n17. **U.S. One-Year Inflation Expectations Data for March** - Released at 23:00. This data is a key reference for assessing U.S. consumers' and businesses' views on future economic conditions. [Source: ref_3]\n\n18. **OPEC Monthly Crude Oil Market Report** - Release time to be determined. The report typically influences global crude oil market price trends. [Source: ref_3]\n\n19. **Speeches by Fed Officials** - At 00:00 and 06:00 the next day. Speeches by Fed officials will outline the future direction of Fed policy, affecting global investor expectations. [Source: ref_3]\n\nThe above events and data will have a significant impact on global market trends. Investors should closely monitor them and adjust their strategies accordingly.", "role": "assistant" } } ], "created": 1748311718, "id": "20250527100811da2f8f7243f94b02", "model": "glm-4-air", "request_id": "20250527100811da2f8f7243f94b02", "usage": { "completion_tokens": 868, "prompt_tokens": 4199, "total_tokens": 5067 }, "web_search": [ { "content": "In the current volatile global economic climate, April 23, 2025, is set to be a major focus day for the financial world. Several important financial events and economic data will be released, keeping market participants on their toes as they anticipate the upcoming dynamics. Below is an in-depth analysis of the key events and data scheduled for the day to help you gain insights into future market trends.\n1. G20 Finance and Central Bank Ministers Meeting (Time to be determined)\nToday, the G20 has become a crucial platform for global economic governance. Finance and central bank ministers will discuss issues such as economic recovery, financial stability, and sustainable development. This meeting will not only impact the coordination of global economic policies but also have a profound effect on financial market sentiment.\n2. France April Preliminary Manufacturing PMI (15:15)\nAs the economic engine of the Eurozone, the current state of France's manufacturing sector will directly influence the region's recovery. The preliminary data will indicate the fragility or stability of France's economic recovery, providing key insights for investors.\n3. Germany April Preliminary Manufacturing PMI (15:30)\nGermany's manufacturing performance is always closely watched, serving as a barometer of the country's economic health and a significant indicator of the Eurozone's overall economic performance. The preliminary data will offer valuable references for investors to determine future market strategies.\n4. Eurozone April Preliminary Manufacturing PMI (16:00)\nFollowing this, the Eurozone's preliminary manufacturing PMI will also be released. This data is expected to reveal the activity status of the region's manufacturing sector, with markets keenly observing whether the upward trend can be sustained. This will also affect investor confidence in the euro.\n5. UK April Preliminary Manufacturing PMI (16:30)\nAfter facing multiple challenges, the recovery of the UK's manufacturing sector is critical. The preliminary PMI will further clarify whether the UK economy is on a positive trajectory, providing economic context for the upcoming general election.\n6. UK April Preliminary Services PMI (16:30)\nAs a key pillar of the UK's national economy, the services PMI directly impacts overall GDP trends. Changes in this data will offer practical suggestions for the flexibility of economic policies, especially during a period when the UK government faces numerous reform challenges.\n7. Eurozone Seasonally Adjusted Trade Balance for February (17:00)\nThe performance of the trade balance will significantly impact the euro and related markets. The positive or negative figures will directly reflect the region's trade conditions, influencing central bank monetary policies and market liquidity.\n8. Speeches by Fed Officials Moussalem and Waller (21:30)\nSpeeches by Fed officials often play a crucial role in driving market volatility. The remarks by Moussalem and Waller will spark discussions about the future direction of U.S. monetary policy, affecting interest rate decisions and economic guidance.\n9. U.S. April S&P Global Manufacturing PMI Preliminary (21:45)\nKey data on the recovery of U.S. industries is coming. Will the market see strong economic signals that boost global market sentiment? The preliminary manufacturing PMI will be the focal point of both skepticism and anticipation.\n10. U.S. April S&P Global Services PMI Preliminary (21:45)\nThe services PMI reflects the overall trend of economic activity, indicating consumer confidence and spending power. Changes in this value will also provide more guidance for investment decisions.\n11. U.S. March New Home Sales Annualized Total (22:00)\nAs an important indicator of economic health, the changes in new home sales this month may signal shifts in market demand, especially under current fluctuating interest rates.\n12. U.S. EIA Crude Oil Inventory for the Week Ending April 18 (22:30)\nAmid the recovery of global energy demand and geopolitical uncertainties, crude oil inventory data will provide a key reference point for the market, influencing oil price fluctuations.\n13. Fed Releases the Beige Book on Economic Conditions (02:00 the next day)\nThe Fed's Beige Book report will summarize economic development trends, often sparking widespread market discussions and trading volatility upon release. Whether this report brings new economic insights will directly affect investor confidence and strategies.\nSummary\nIn summary, the financial events and economic data on April 23, 2025, will profoundly impact global market trends, potentially offering rare opportunities and warnings for investors. Whether you are a policy analyst or a market participant, it is essential to closely monitor these key events and data, adjusting strategies promptly to seize market opportunities. Return to Sohu for more.", "icon": "https://sfile.chatglm.cn/searchImage/www_sohu_com_icon_new.jpg", "link": "https://www.sohu.com/a/887836650_121956424", "media": "Sohu", "publish_date": "", "refer": "ref_1", "title": "April 23 Financial Highlights: Analyzing Major Global Market Events and New Data Trends (Published: 2025-04-23 07:07:00)" }, { "content": "In the current ever-changing global economy, the impact of financial events on the market cannot be ignored. April 16, 2025, will witness a series of highly anticipated financial activities and important data releases, forming a focal point for investors and stirring the nerves of international markets.\n1. Market Volatility and Investor Expectations\nRecently, global markets have faced numerous uncertainties, particularly regarding inflation, interest rate policies, and the winding path of economic recovery, all of which have immersed investors in a sea of analysis and predictions.\nOn April 16, the U.S., Europe, China, and other regions will release a series of heavyweight economic data, which investors eagerly anticipate. These data may provide crucial clues for future market trends. Just as every fluctuation in the stock market is a process of information transmission, only by deeply understanding this information can one make more effective investment decisions.\n2. Prelude to Key Data\n1. Important Fed Speeches\nAt 07:10, Fed Governor Lisa Cook's speech will undoubtedly attract market attention. As a member of the Fed, her views and remarks may reflect the Fed's current thinking on the economic situation. If she mentions the outlook on interest rates or views on economic recovery, this will directly impact market sentiment and the price fluctuations of various asset classes, becoming a key factor influencing the opening of the New York market.\n2. Release of China's Core Economic Data\nFollowing this, at 09:30, China will release the monthly report on residential sales prices in 70 large and medium-sized cities, outlining the current state of China's real estate market. At 10:00, a series of heavyweight economic data will be released, including Q1 GDP annual rate, total retail sales of consumer goods, and industrial added value above a designated size. The release of these data will have a significant impact on global economic growth and the healthy development of China's economy, especially against the backdrop of complex and changing international economics. Analyzing these data will provide a profound understanding of the direction of China's market.\n3. Importance of European Data\nAt 14:00, the release of the UK's March CPI monthly rate will reveal the current inflationary pressures in the UK economy. At 16:00, the Eurozone's seasonally adjusted current account for February and the final annual CPI rate for March will be released, offering investors a more comprehensive view of the European economy. In recent years, the policies of the European Central Bank have also garnered widespread market attention, and changes in inflation trends may influence future monetary policy directions.\n3. Global Trade Outlook from a Global Perspective\nAt 20:00, the World Trade Organization (WTO) will release its Global Trade Outlook Report. Faced with geopolitical risks and challenges to global supply chains, the WTO report may reveal the future direction of global trade and changes among countries. If the trends mentioned in the report diverge from investor expectations, it is bound to trigger strong market fluctuations.\n4. Release of Key U.S. Economic Indicators\nNews from the U.S. market is equally unmissable. At 20:30, the U.S. March retail sales monthly rate will be released, followed by the industrial production monthly rate at 21:15 and the NAHB housing market index before 22:00. These will directly influence investors' assessment of the strength of the U.S. economic recovery. These data, reflecting consumption and production activities, are vital indicators of economic health and can guide investors toward future trends.\nAt 22:30, the U.S. EIA crude oil inventory data for the week ending April 11 will reveal the dynamics of the global energy market. As the global economy remains volatile, fluctuations in energy prices will have a profound impact on prices, inflation, and more.\n5. Canadian Household Confidence and Interest Rate Decision\nFinally, at 21:45, the Bank of Canada will announce its interest rate decision. Faced with global economic uncertainties, the direction of Canada's monetary policy is also worth noting.\nSummary and Reflections\nBehind every data release lies profound economic significance. Although the market is full of uncertainties, for investors, in-depth analysis of each event can help build more comprehensive and scientific investment strategies.\nOn the upcoming April 16, paying attention to the changes in these financial events and data will be a shared responsibility for all market participants. We hope everyone can find their footing in this turbulent market, rationally respond to various information, and make wise choices. Return to Sohu for more.", "icon": "https://sfile.chatglm.cn/searchImage/www_sohu_com_icon_new.jpg", "link": "https://www.sohu.com/a/884753377_122066678", "media": "Sohu", "publish_date": "", "refer": "ref_2", "title": "April 16 Financial Events and Economic Data: Focus Points for Global Markets (Published: 2025-04-16 06:32:00)" }, { "content": "As the global economic landscape continues to evolve, adjustments in macroeconomic policies and data releases by various countries are having a profound impact on the market. On April 14, 2025, multiple important financial events and data will emerge, which not only relate to the health of national economies but may also trigger chain reactions in global markets. This article provides a detailed analysis of the day's major financial events, offering deep insights for readers.\n1. Introduction to Market Phenomena: The Fluid Changes in the Global Economic Landscape\nIn recent years, the global economic situation has been highly volatile, with financial markets reacting sensitively. The economic data for Q1 2025 is about to be released, and its impact on foreign exchange, stock, and commodity markets cannot be underestimated. Especially amid the mutual influences of major economies like China, the U.S., and Europe, market investors are increasingly focusing on real-time data and changes in economic policies.\n2. State Council Information Office Releases Q1 Import and Export Data\nTime: 10:00\nChina's State Council Information Office will hold a press conference to release Q1 import and export data. This information is not just about numbers but also reveals the close ties between China and the global economic chain. The growth or decline in import and export data will directly affect GDP growth expectations, thereby influencing investor confidence.\nBy analyzing recent trade relations, it is evident that China plays a pivotal role in the global supply chain. Especially in the face of international economic fluctuations, China's import and export situation will serve as a market barometer, closely linked to trends in international commodity prices and the foreign exchange market. Achieving strong import and export performance will not only boost market confidence but also inject more vitality into domestic economic development.\n3. Canadian Wholesale Sales Data and Economic Analysis\nTime: 20:30\nOn the same day, Canada will release its February wholesale sales monthly rate data. Wholesale sales are an important indicator of retail demand expectations, directly reflecting the health of the economy.\nAnalysts point out that if the data shows strong sales growth, it may indicate a recovery in Canadian consumer confidence, while the opposite would suggest potential economic risks. Particularly during a rate hike cycle, changes in consumer spending will be especially important, influencing the Bank of Canada's monetary policy decisions and, to some extent, global market liquidity.\n4. Importance of U.S. Inflation Expectations\nTime: 23:00\nNext, the New York Fed will release the one-year inflation expectations data for March. This data is a key reference for assessing U.S. consumers' and businesses' views on future economic conditions.\nCurrently, the Fed is at a crossroads in addressing high inflation and reducing interest rate pressures. Rising inflation expectations may signal the need for further rate hikes, affecting capital market pricing. Conversely, if inflation shows signs of easing, it would indicate that policies are taking effect, potentially boosting market risk appetite.\n5. OPEC Crude Oil Market Report: A Barometer for Energy Prices\nTime: To be determined\nAround the same time, OPEC will release its monthly crude oil market report. This report typically influences global crude oil price trends.\n·\nExpand full text\nIn 2025, global energy agencies are generally facing the dual challenges of balancing market supply and demand with emission reduction policies. As global warming issues intensify, the balance between new energy development and traditional energy consumption in various countries will become increasingly important. Especially for countries reliant on oil and gas resources, this report will be a key reference for observing market adjustments.\n6. Views from Fed Officials: A Critical Moment for Market Interpretation\nTime: 00:00 and 06:00 the next day\nFed officials will deliver speeches on navigating economic uncertainties and sharing the Fed's role. These speeches will not only outline the future direction of Fed policy but also cast a shadow over economic uncertainties.\nThe market's focus on the Fed will influence global investor expectations, especially under the current intertwined scenarios of high inflation and high interest rates. Investors need to closely monitor the Fed's stance, which will not only affect the dollar's trajectory but also directly relate to the stability of commodity markets and the global financial system.\nConclusion: Financial Signals Investors Cannot Ignore\nIn summary, the major financial events and economic data on April 14, 2025, herald new changes in the global economy, warranting close attention from investors. On this day, the release of economic data and policy interpretations will intertwine, forming a barometer for market trends.\nWhile the global economy remains uncertain, investors must stay vigilant, monitor real-time information changes, and respond promptly. At the same time, diversifying investments and controlling risks will be wise choices to cope with market fluctuations. How to find stable investment opportunities in a changing economic environment is a crucial question every investor must consider.\nFinally, we hope that under the impetus of these financial events and data, a clearer and more stable development environment can emerge. Let us stay tuned and seize every possible opportunity. Return to Sohu for more.", "icon": "https://sfile.chatglm.cn/searchImage/www_sohu_com_icon_new.jpg", "link": "https://www.sohu.com/a/883713425_122066678", "media": "Sohu", "publish_date": "", "refer": "ref_3", "title": "Global Financial Hotspots: Key Events and Data Analysis Not to Be Missed on April 14 (Published: 2025-04-14 07:25:00)" }, { "content": "As 2024 approaches, the winds of change in financial markets will continue to shape the global economic landscape. In this upcoming new year, we need to closely monitor several key events that will influence financial developments in 2025. This concerns not only investors but also every individual who cares about their financial future. This article presents the financial calendar for 2025, covering key events, policy changes, and investment opportunities in major global economies, helping you navigate your financial direction.\n1. Introduction: Where is the Future of Financial Markets?\nFinancial markets are a barometer of economic operations, with national fiscal policies, monetary policies, trade wars, and economic growth directly influencing market trends. In 2025, the economic situations in the U.S., China, and Europe will be the focus of many investors. Against the backdrop of globalization, the relationships between these regions have become particularly complex. Let’s start with the key financial events of 2025.\n2. Key Events in the 2025 Financial Calendar\nChina's Two Sessions (March)\nThe annual National People's Congress and the Chinese People's Political Consultative Conference are important indicators of China's economic policies. The 2025 Two Sessions will discuss new national economic strategies and policies. This is a critical time for investors to watch, as new policy directions will directly affect investment opportunities in the Chinese market.\nFed Interest Rate Decision Meetings (Quarterly)\nThe Fed's interest rate decisions impact not only the U.S. but also cause ripples in the global economy. In 2025, the Fed's monetary policy may undergo significant changes, and investors need to stay alert to how interest rate changes affect capital flows and stock markets.\nEuropean Central Bank Policy Meetings (Quarterly)\nAs another major economy, Europe's policy direction will directly influence global markets. The ECB may adopt tightening or easing policies during the year, and investors should monitor its relationship with inflation and economic growth.\nGlobal Climate Summit (November)\nAs climate change issues become increasingly severe, the international community has recognized the importance of environmental policies for economic growth. The 2025 Climate Summit will discuss global emission reduction targets and their impact on national economies, offering opportunities for sustainable investments.\nG20 Summit (October)\nLeaders from various countries will gather to discuss global economic and trade cooperation issues. The G20 Summit is not only a platform for policy dialogue but also a key signal for investments among economies. New cooperation frameworks and economic policies may be introduced during the summit.\nChina's Two Sessions (March)\nThe annual National People's Congress and the Chinese People's Political Consultative Conference are important indicators of China's economic policies. The 2025 Two Sessions will discuss new national economic strategies and policies. This is a critical time for investors to watch, as new policy directions will directly affect investment opportunities in the Chinese market.\nFed Interest Rate Decision Meetings (Quarterly)\nThe Fed's interest rate decisions impact not only the U.S. but also cause ripples in the global economy. In 2025, the Fed's monetary policy may undergo significant changes, and investors need to stay alert to how interest rate changes affect capital flows and stock markets.\n·\nExpand full text\nEuropean Central Bank Policy Meetings (Quarterly)\nAs another major economy, Europe's policy direction will directly influence global markets. The ECB may adopt tightening or easing policies during the year, and investors should monitor its relationship with inflation and economic growth.\nGlobal Climate Summit (November)\nAs climate change issues become increasingly severe, the international community has recognized the importance of environmental policies for economic growth. The 2025 Climate Summit will discuss global emission reduction targets and their impact on national economies, offering opportunities for sustainable investments.\nG20 Summit (October)\nLeaders from various countries will gather to discuss global economic and trade cooperation issues. The G20 Summit is not only a platform for policy dialogue but also a key signal for investments among economies. New cooperation frameworks and economic policies may be introduced during the summit.\nIn 2025, we need to conduct in-depth analysis of policy trends in various countries to assess their impact on the market.\nChina\nIn 2025, China is expected to further promote \"digital economy\" and \"green economy\" policies, thereby boosting the development of technology and environmental industries. Investors can seek to increase investments in these areas.\nUnited States\nThe Fed may continue to raise interest rates while seeking a moderate inflation target, potentially causing short-term stock market volatility. In the long run, the technology and healthcare sectors will continue to receive strong support, making them suitable for investor focus.\nEuropean Union\nFacing ongoing economic growth pressures, the ECB may adopt moderately accommodative policies, with manufacturing and services sectors receiving more support, making them suitable for related investments.\nChina\nIn 2025, China is expected to further promote \"digital economy\" and \"green economy\" policies, thereby boosting the development of technology and environmental industries. Investors can seek to increase investments in these areas.\nUnited States\nThe Fed may continue to raise interest rates while seeking a moderate inflation target, potentially causing short-term stock market volatility. In the long run, the technology and healthcare sectors will continue to receive strong support, making them suitable for investor focus.\nEuropean Union\nFacing ongoing economic growth pressures, the ECB may adopt moderately accommodative policies, with manufacturing and services sectors receiving more support, making them suitable for related investments.\nIn 2025, investors will still face opportunities and challenges. Identifying potential investment opportunities is crucial, and we can start from the following aspects:\nTechnology Industry\nAreas such as artificial intelligence, blockchain, and new energy will remain hot. It is recommended to focus on related companies and investment funds.\nGreen Investments\nIn response to climate change, countries are injecting more capital into renewable energy. Investors should consider companies related to green energy to capture investment opportunities.\nConsumer Market\nWith the expansion of the middle class, the consumer market will continue to grow, especially in China, where household consumption will become a key driver of economic growth.\nTechnology Industry\nAreas such as artificial intelligence, blockchain, and new energy will remain hot. It is recommended to focus on related companies and investment funds.\nGreen Investments\nIn response to climate change, countries are injecting more capital into renewable energy. Investors should consider companies related to green energy to capture investment opportunities.\nConsumer Market\nWith the expansion of the middle class, the consumer market will continue to grow, especially in China, where household consumption will become a key driver of economic growth.\nHowever, investors must also be wary of potential risks, including policy risks and market volatility. Thorough research before investing is essential.\n5. Summary and Expectations\nLooking ahead to 2025, the global economy still faces many uncertainties, but opportunities and challenges coexist. As investors, we must continuously learn and adapt to new situations, track global economic dynamics, and adjust our investment portfolios accordingly. We hope this 2025 financial calendar provides valuable insights, and we look forward to a promising future.\nIn the coming year, we remind everyone to stay alert, keep up with financial news, especially policy changes, international relations, and market trends. We hope every reader can find suitable investment opportunities in 2025 and reap abundant rewards!\nReturn to Sohu for more.", "icon": "https://sfile.chatglm.cn/searchImage/www_sohu_com_icon_new.jpg", "link": "https://www.sohu.com/a/844117122_122066678", "media": "Sohu", "publish_date": "", "refer": "ref_4", "title": "2025 Financial Calendar: Major Financial Events and Investment Opportunities Not to Be Missed (Published: 2025-01-01 10:37:00)" }, { "content": "April 16, 2025, will be an important financial day, with investors needing to closely monitor a series of key economic indicators and events. First, at 07:10, Fed Governor Lisa Cook will deliver a speech, expected to provide guidance on future monetary policy directions.\nIn the morning, China will release multiple heavyweight economic data. At 09:30, the monthly report on residential sales prices in 70 large and medium-sized cities will be released, revealing trends in the real estate market. At 10:00, China's Q1 GDP annual rate, March total retail sales of consumer goods, and industrial added value above a designated size will be released, once again drawing market attention to the strength of economic recovery.\nIn the afternoon, the UK will release its March Consumer Price Index (CPI) monthly rate at 14:00, an indicator that will influence market views on inflation and central bank decisions. Following this, at 16:00, the Eurozone's seasonally adjusted current account and the final annual CPI rate for March will also be released, further affecting assessments of the Eurozone's economic situation.\nIn the evening, the World Trade Organization will release its global trade outlook report at 20:00. Subsequently, at 20:30, the U.S. March retail sales monthly rate and at 21:15 the industrial production monthly rate will be released, drawing high market attention to the health of the U.S. economy.\nAt 21:45, the Bank of Canada will announce its latest interest rate decision, with investors closely watching its impact on future monetary policy. Finally, at 22:00, the U.S. April NAHB housing market index will debut, along with the release of the U.S. February business inventory monthly rate data, providing investors with a more comprehensive market understanding.\nAt 22:30, the U.S. EIA crude oil inventory data for the week ending April 11 will be released, potentially significantly impacting oil prices and related market sentiment. Overall, April 16 will be a busy day for financial activities, holding great importance for market analysis and decision-making. Return to Sohu for more.", "icon": "https://sfile.chatglm.cn/searchImage/www_sohu_com_icon_new.jpg", "link": "https://www.sohu.com/a/884753440_122006510", "media": "Sohu", "publish_date": "", "refer": "ref_5", "title": "April 16 Financial Focus: Overview of Global Economic Data and Monetary Policy Decisions (Published: 2025-04-16 06:32:00)" } ] } ``` # CogVideoX-3 Source: https://docs.z.ai/guides/video/cogvideox-3 ## Overview CogVideoX-3 features new frame generation capabilities that significantly improve image stability and clarity. It also delivers superior performance in handling subjects with significant movement, better adheres to instructions, and provides more realistic simulations. Additionally, it enhances the rendering of high-definition real-world and 3D-style scenes. \$0.2 / video Image / Text / Start and End Frame Video ## Usage Input product images or copy to quickly generate dynamic ads in multiple styles, supporting scene transitions and realistic lighting rendering. Convert single-frame images or text scripts into smooth, naturally animated short videos, covering both realistic and 3D styles. Upload scenic spot photos and promotional text to generate immersive tourism short videos with realistic natural landscapes. Input storyboards or character design images to automatically generate dynamic preview clips, simulating seamless camera movements and realistic physical interactions. ## Resources * [API Documentation](/api-reference/video/cogvideox-3\&vidu): Learn how to call the API. ## Introducing CogVideoX-3 Videos generated by CogVideoX-3 feature clear subjects, stable frames, reduced distortion issues, and support for subjects to move extensively, resulting in more natural and fluid dynamic performance.
Prompt Video
The petals were blown by the wind, spinning continuously and transforming into a person.
Nezha happily took a sip of wine, then showed off the brand of wine. wine
Deeply understands the intent of text commands and accurately reproduces creative requirements. Whether it is having a character perform specific actions or simulating natural physical phenomena, it can be presented in accordance with real-world logic.
Prompt Video
A pair of hands holding a fruit knife, slicing a whole red tomato into slices.
In an open-plan office, an employee is looking down at his phone. Suddenly, the manager appears and taps him on the shoulder. Startled, he quickly puts his phone away.
For realistic styles, it can create high-definition textures akin to real-life photography; when switching to 3D styles, it can precisely shape three-dimensional forms and scene atmospheres, effortlessly handling multiple styles.
Prompt Video
A high-angle shot captures Dou E and the sky. Dou E was an innocent woman from ancient China who was wrongfully accused. At this moment, she is looking up and shouting. Under the scorching June sun, white snow falls from the sky, scattering upon contact with bloodstains. Her clothes flutter slightly, accompanied by a 3D particle wind.
A stylish anthropomorphic snow leopard wearing a white leopard-print fashion coat, super fluffy, plush, thick, and luxurious, walking the runway in ultra-high definition with a cinematic feel, reminiscent of a blockbuster movie, like a Victoria's Secret fashion show. The runway is lined with spectators taking photos on both sides.
Supports users providing start and end frames, automatically generating seamless transition content, allowing static frames to naturally connect into dynamic narratives, and linking complete creative concepts.
Prompt Start Frame End Frame Video
The Dragon King transforms into Ao Bing, with ink wash-style shading. The main character slowly transforms, highlighting the details of the transformation. The camera rotates smoothly, creating a natural and fluid transition.
The character holds a gun in both hands and shoots wildly at the computer screen. The computer catches fire, explodes, and shatters into pieces, sending debris flying everywhere. The office lights flicker.
## Quick Start **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Generate video response = client.videos.generations( model="cogvideox-3", prompt="A cat is playing with a ball.", quality="quality", # Output mode, "quality" for quality priority, "speed" for speed priority with_audio=True, # Whether to include audio size="1920x1080", # Video resolution, supports up to 4K (e.g., "3840x2160") fps=30, # Frame rate, can be 30 or 60 ) print(response) # Get video result result = client.videos.retrieve_videos_result(id=response.id) print(result) ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` ```python theme={null} from zai import ZaiClient # Initialize the client, please fill in your own APIKey. client = ZaiClient(api_key="your-api-key") # Define the URL address of the image. image_url = "https://img.iplaysoft.com/wp-content/uploads/2019/free-images/free_stock_photo.jpg" # 替换为您的图片URL地址 # Call the video generation interface. response = client.videos.generations( model="cogvideox-3", # The video generation model used. image_url=image_url, # Please provide the image URL or Base64 encoding. prompt="Make the picture move.", quality="quality", # Output mode: "quality" prioritizes quality, and "speed" prioritizes speed. with_audio=True, size="1920x1080", # Video resolution, supports up to 4K (e.g., "3840x2160"). fps=30, # Frame rate, optional values are 30 or 60. ) # Print the returned result. print(response) # Get video result result = client.videos.retrieve_videos_result(id=response.id) print(result) ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` ```python theme={null} from zai import ZaiClient # Initialize client, please fill in your own APIKey client = ZaiClient(api_key="your-api-key") # Define URLs for first frame and last frame sample_first_frame = "https://gd-hbimg.huaban.com/ccee58d77afe8f5e17a572246b1994f7e027657fe9e6-qD66In_fw1200webp" sample_last_frame = "https://gd-hbimg.huaban.com/cc2601d568a72d18d90b2cc7f1065b16b2d693f7fa3f7-hDAwNq_fw1200webp" # Call video generation API (assuming image_urls is supported) response = client.videos.generations( model="cogvideox-3", # Video generation model to use image_url=[sample_first_frame, sample_last_frame], # List of URLs for first and last frames prompt="Animate the scene", quality="quality", # Output mode, "quality" for quality priority, "speed" for speed priority with_audio=True, size="1920x1080", # Video resolution, supports up to 4K (e.g., "3840x2160") fps=30, # Frame rate, can be 30 or 60 ) # Print response print(response) # Get video result result = client.videos.retrieve_videos_result(id=response.id) print(result) ``` # Vidu Q1 Source: https://docs.z.ai/guides/video/vidu-q1 ## Overview Vidu Q1 is the next-generation video generation model from Vidu, designed for high-quality video creation. It consistently outputs 5-second, 24-frame, 1080P video clips. Through advanced optimization of visual clarity, Vidu Q1 delivers significantly enhanced image quality with notable improvements in issues such as hand distortion and frame jitter. The model achieves photorealistic rendering that closely resembles real-world scenes, while maintaining stylistic accuracy in 2D animation. Transitions between the first and last frames are exceptionally smooth, making Vidu Q1 well-suited for demanding creative applications in film, advertising, and animated short productions. \$0.4 / video Image-to-Video Generation 5S 1080P \$0.4 / video Start and End Frame 5S 1080P \$0.4 / video Text-to-Video Generation 5S 1080P ## Capability Description Generate a video by providing a starting frame or both starting and ending frames along with corresponding text descriptions. Support input of two images: the first uploaded image is treated as the starting frame, and the second as the ending frame. The model uses these images as input parameters to generate the video. Generate a video from a text prompt; currently supports both a general style and an anime style optimized for animation. The URL link for the video generated by the model is valid for one day. Please save it as soon as possible if needed. ## Usage * By inputting script excerpts, concept art, and other materials, users can generate promotional videos, visual effects shots, and auxiliary film assets * Delivers theatrical-level clarity and visual quality with complete frame details * Provides professional-grade video transitions with natural scene flow * Input character designs and storyboard scripts to quickly generate 2D animated sequences and stylized anime shorts * Supports styles such as Chinese animation and Japanese anime * Enables storyline extension and creative regeneration of classic IPs * Automatically generate short videos or micro-dramas from novel chapters or scripted scenes * Covers diverse genres such as romance, mystery, and historical drama * Optimized for multi-platform distribution needs * Quickly generate highly engaging brand ads, e-commerce product videos, and interactive ads (e.g., virtual try-on) based on product images and feature descriptions * Supports adaptation to various platform dimensions and creative formats ## Resources [API Documentation](/api-reference/video/cogvideox-3\&vidu): Learn how to call the API. ## Introducing ViduQ1 The model delivers a comprehensive upgrade in visual detail restoration. Movements are smooth and natural—hand gestures during product demonstrations in e-commerce livestreams are accurately rendered and compliant. Visual jitter is minimized through dynamic frame interpolation technology, ensuring fluid and stable footage even in motion-heavy scenes such as running shots or vehicle perspectives. The realistic style aims for lifelike visuals—urban landscapes and character portraits in city promos are rendered with striking realism. The animated style focuses on authenticity, accurately capturing everything from the hand-drawn lines of Japanese anime to the saturated colors of Western cartoons. By inputting anime character designs, the model generates dynamic story segments that closely match the original IP’s visual style, boosting the efficiency of derivative content creation. The start and end frame transition technology reaches a new level, using dynamic frame prediction and style fusion algorithms to overcome the limitations of "mechanical stitching" in video transitions. ## Quick Start ### 1. Text-to-Video Generation ```json theme={null} curl --location --request POST 'https://api.z.ai/api/paas/v4/videos/generations' \ --header 'Authorization: Bearer {your apikey}' \ --header 'Content-Type: application/json' \ --data-raw '{ "model": "viduq1-text", "style": "anime", "prompt": "Peter Rabbit drives a small car along the road, his face filled with joy and happiness.", "duration": 5, "aspect_ratio": "16:9", "size": "1920x1080", "movement_amplitude": "auto" }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") response = client.videos.generations( model="viduq1-text", prompt="Peter Rabbit driving a car, wandering on the road, with a happy and joyful expression on his face.", style="general", duration=5, aspect_ratio="16:9", size="1920x1080", movement_amplitude="auto" ) print(response) ``` ### 2. Image-to-Video Generation ```json theme={null} curl --location --request POST 'https://api.z.ai/api/paas/v4/videos/generations' \ --header 'Authorization: Bearer {your apikey}' \ --header 'Content-Type: application/json' \ --data-raw '{ "model":"viduq1-image", "image_url":"https://example.com/path/to/your/image.jpg", "prompt":"Peter Rabbit drives a small car along the road, his face filled with joy and happiness.", "duration":5, "size":"1920x1080", "movement_amplitude":"auto" }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") response = client.videos.generations( model="viduq1-image", image_url="https://example.com/path/to/your/image.jpg", prompt="Peter Rabbit driving a car, wandering on the road, with a happy and joyful expression on his face.", duration=5, size="1920x1080", movement_amplitude="auto" ) print(response) ``` ### 3. Start and End Frame ```json theme={null} curl --location --request POST 'https://api.z.ai/api/paas/v4/videos/generations' \ --header 'Authorization: Bearer {your apikey}' \ --header 'Content-Type: application/json' \ --data-raw '{ "model":"viduq1-start-end", "image_url":["https://example.com/path/to/your/image.jpg","https://example.com/path/to/your/image1.jpg"], "prompt":"Peter Rabbit drives a small car along the road, his face filled with joy and happiness.", "duration":5, "size":"1920x1080", "movement_amplitude":"auto" }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="your-api-key") # Define URLs for first frame and last frame sample_first_frame = "https://gd-hbimg.huaban.com/ccee58d77afe8f5e17a572246b1994f7e027657fe9e6-qD66In_fw1200webp" sample_last_frame = "https://gd-hbimg.huaban.com/cc2601d568a72d18d90b2cc7f1065b16b2d693f7fa3f7-hDAwNq_fw1200webp" # Video generation request (first and last frame mode) response = client.videos.generations( model="viduq1-start-end", image_url=[sample_first_frame, sample_last_frame], # The first and last frame images prompt="Peter Rabbit driving a car, wandering on the road, with a happy and joyful expression on his face.", duration=5, #Video duration (seconds) size="1920x1080", # Video resolution movement_amplitude="auto", # Movement amplitude ) # Print the response result print(response) ``` # Vidu 2 Source: https://docs.z.ai/guides/video/vidu2 ## Overview Vidu 2 is a next-generation video generation model that strikes a balance between speed and quality. It focuses on image-to-video generation and keyframe-based video creation, supporting 720P resolution for videos up to 4 seconds long. With significantly faster generation speed and reduced cost, it addresses color distortion issues in image-to-video outputs, delivering stable and controllable visuals ideal for e-commerce scenarios. Enhanced semantic understanding between keyframes and improved consistency with multiple reference images make Vidu 2 a highly efficient tool for mass production in pan-entertainment, internet content, anime short series, and advertising. \$0.2 / video Image-to-Video Generation 4S 720P \$0.2 / video Start and End Frame 5S 720P \$0.4 / video Reference-based Video Generation 4S 720P ## Capability Description Generate a video by providing a starting frame or both starting and ending frames along with corresponding text descriptions. Support input of two images: the first uploaded image is treated as the starting frame, and the second as the ending frame. The model uses these images as input parameters to generate the video. Generate a video from a text prompt; currently supports both a general style and an anime style optimized for animation. The URL link for the video generated by the model is valid for one day. Please save it as soon as possible if needed. ## Usage * Input a single frame or IP elements to quickly generate short videos with coherent storylines and interactive special effects * Supports diverse visual styles from anime-inspired to realistic * Tailored for mass production of UGC creative content on short video platforms * Input static character images or keyframes to generate smooth animated sequences and micro-dramas * Accurately reproduce detailed character movements (e.g., facial expressions) * Supports mass production in various styles such as Chinese and Japanese anime * Designed to meet animation studios’ needs for IP-based content expansion * Input real product images to intelligently generate dynamic advertising videos * Clearly showcase product features such as 3C details and beauty product textures * Automatically adapt to various platform formats, such as vertical videos for Tiktok and horizontal layouts for social feeds ## Resources [API Documentation](/api-reference/video/cogvideox-3\&vidu): Learn how to call the API. ## Introducing Vidu2 With optimized model computing architecture, video rendering efficiency is significantly enhanced. This allows daily content teams to respond quickly to trending topics, and enables e-commerce sellers to mass-produce product display videos on demand—greatly reducing content delivery time and helping creators seize traffic windows. The cost of generating 720P resolution videos has dropped to 40% of the Q1 version. Small and medium-sized brands can now create batch videos for multiple SKUs, while advertising teams can test creative concepts like "product close-ups + scenario storytelling" at a lower cost—meeting full-platform marketing needs without breaking the content budget. * The model addresses the "texture color shift" issue—accurately restoring details like the silky glow of satin or the matte finish of leather in clothing videos. In e-commerce scenarios, product colors are displayed more realistically. * Dynamic frame compensation is optimized, ensuring smooth, shake-free motion for rotating 3C products or hand demonstrations in beauty tutorials. * Multiple visual styles are supported, enabling eye-catching content like “product close-up + stylized camera movement,” ideal for e-commerce main images and short-form promotional videos. The model strikes a balance between creativity and stability, delivering significantly improved performance and semantic understanding—making it the optimal solution for keyframe-based video generation. By accurately analyzing scene logic and action continuity, transitions between frames are smooth and natural, enhancing narrative coherence throughout the content. When inputting multi-element materials, the visual style of the generated video (such as tone and lighting) can be highly unified. For example, in a cultural tourism promotional video, the transition between scenes such as the sunrise over an ancient city, street market scenes, and folk performances maintains consistency with the “Chinese style filter.” In anime IP derivative content, the actions and expressions of characters in different plot scenes can also strictly adhere to the original settings, facilitating the coherent creation of multi-scene, multi-element content. 020f485a Fb03 4698 8a6c F9f89b5b7361 Jpe ## Quick Start ### 1. Image-to-Video Generation ```json theme={null} curl --location --request POST 'https://api.z.ai/api/paas/v4/videos/generations' \ --header 'Authorization: Bearer {your apikey}' \ --header 'Content-Type: application/json' \ --data-raw '{ "model":"vidu2-image", "image_url":"https://example.com/path/to/your/image.jpg", "prompt":"Peter Rabbit drives a small car along the road, his face filled with joy and happiness.", "duration":4, "size":"720x480", "movement_amplitude":"auto" }' ``` **Install SDK** ```bash theme={null} # Install latest version pip install zai-sdk # Or specify version pip install zai-sdk==0.2.2 ``` **Verify Installation** ```python theme={null} import zai print(zai.__version__) ``` ```python theme={null} from zai import ZaiClient # Initialize the client, please replace your-api-key with your own APIKey. client = ZaiClient(api_key="your-api-key") # Video generation example from images. response = client.videos.generations( model="vidu2-image", image_url="https://example.com/path/to/your/image.jpg", prompt="Peter Rabbit is driving a small car, cruising on the road, with a face full of happiness and joy.", duration=4, size="1280x720", movement_amplitude="auto" ) # Print the response result. print(response) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.videos.VideoCreateParams; import ai.z.openapi.service.videos.VideosResponse; public class Vidu2Example { public static void main(String[] args) throws InterruptedException { String apiKey = "your_api_key"; // Please fill in your own APIKey. ZaiClient client = ZaiClient.builder().ofZAI().apiKey(apiKey).build(); // Construct video generation request parameters. VideoCreateParams request = VideoCreateParams.builder() .model("vidu2-image") .imageUrl("https://example.com/path/to/your/image.jpg") .prompt("Peter Rabbit is driving a small car, cruising on the road, with a face full of happiness and joy.") .duration(4) .size("1280x720") .build(); // Initiate video generation request. VideosResponse response = client.videos().videoGenerations(request); System.out.println(response.getData()); // Wait for 10 minutes, then asynchronously retrieve the final generated video using the task ID. Thread.sleep(600000L); VideosResponse videosResponse = client.videos().videoGenerationsResult(response.getData().getId()); System.out.println(videosResponse.getData().getVideoResult()); } } ``` ### 2. Start and End Frame ```json theme={null} curl --location --request POST 'https://api.z.ai/api/paas/v4/videos/generations' \ --header 'Authorization: Bearer {your apikey}' \ --header 'Content-Type: application/json' \ --data-raw '{ "model":"vidu2-start-end", "image_url":["https://example.com/path/to/your/image1.jpg","https://example.com/path/to/your/image2.jpg"], "prompt":"Peter Rabbit drives a small car along the road, his face filled with joy and happiness.", "duration":4, "size":"720x480", "movement_amplitude":"auto" }' ``` ```python theme={null} from zai import ZaiClient # Initialize the client, please replace "your-api-key" with your own APIKey. client = ZaiClient(api_key="your-api-key") # Define URLs for first frame and last frame sample_first_frame = "https://gd-hbimg.huaban.com/ccee58d77afe8f5e17a572246b1994f7e027657fe9e6-qD66In_fw1200webp" sample_last_frame = "https://gd-hbimg.huaban.com/cc2601d568a72d18d90b2cc7f1065b16b2d693f7fa3f7-hDAwNq_fw1200webp" # Video generation request (first and last frame mode) response = client.videos.generations( model="vidu2-start-end", image_url=[sample_first_frame, sample_last_frame], # The first and last frame images prompt="Peter Rabbit driving a car, wandering on the road, with a happy and joyful expression on his face.", duration=4, #Video duration (seconds) size="1280x720", # Video resolution movement_amplitude="auto", # Movement amplitude ) # Print the response result print(response) ``` ### 3. Reference-based Video Generation ```json theme={null} curl --location --request POST 'https://api.z.ai/api/paas/v4/videos/generations' \ --header 'Authorization: Bearer {your apikey}' \ --header 'Content-Type: application/json' \ --data-raw '{ "model":"vidu2-reference", "image_url":["https://example.com/path/to/your/image1.jpg","https://example.com/path/to/your/image2.jpg","https://example.com/path/to/your/image3.jpg"], "prompt":"Peter Rabbit drives a small car along the road, his face filled with joy and happiness.", "duration":4, "aspect_ratio":"16:9", "size":"720x480", "movement_amplitude":"auto", "with_audio":true }' ``` ```python theme={null} from zai import ZaiClient # Initialize client with your API key (replace 'your-api-key') client = ZaiClient(api_key="your-api-key") ref_image_url = [ "https://gd-hbimg.huaban.com/ccee58d77afe8f5e17a572246b1994f7e027657fe9e6-qD66In_fw1200webp", "https://gd-hbimg.huaban.com/cc2601d568a72d18d90b2cc7f1065b16b2d693f7fa3f7-hDAwNq_fw1200webp", "https://gd-hbimg.huaban.com/cc2601d568a72d18d90b2cc7f1065b16b2d693f7fa3f7-hDAwNq_fw1200webp" ] # Generate video using reference images response = client.videos.generations( model="vidu2-reference", # Using reference image model image_url=ref_image_url, # List of reference image URLs prompt="Peter Rabbit driving a car, wandering on the road, with a happy and joyful expression on his face.", duration=4, # Video duration in seconds aspect_ratio="16:9", # Standard widescreen aspect ratio size="1280x720", # HD resolution movement_amplitude="auto", # Automatic motion control with_audio=True, # Enable audio generation ) # Print API response print(response) ``` # AutoGLM-Phone-Multilingual Source: https://docs.z.ai/guides/vlm/autoglm-phone-multilingual ## Overview AutoGLM-Phone-Multilingual is a mobile intelligent assistant framework built on vision-language models. It understands phone screen content in a multimodal manner and helps users complete tasks through automated operations. The system controls devices via ADB (Android Debug Bridge), perceives screens, and generates and executes operation workflows through intelligent planning. Users simply describe their needs in natural language, such as "Open eBay and search for wireless earphones." and AutoGLM-Phone-Multilingual will complete the entire workflow. New model launched, free for a limited time! Task Instructions Task Action English & Chinese Android Phone ## Usage Place orders for specific products from designated merchants on food delivery platforms, or request to reorder the meal you most recently purchased. Place orders on shopping websites or check product reviews. Route planning, nearby searches, flight and ticket booking, hotel reservations, and more. Search for news, play songs and videos, and interact through likes, comments, and favorites. Search for rentals based on location, budget, layout, and other criteria. ## Resources * [API Documentation](/api-reference/llm/chat-completion#vision-model): Learn how to call the API. ## Introducing AutoGLM-Phone-Multilingual * **Technical Breadth:**  Powered by the AutoGLM multimodal model combined with ADB-based device control, integrating a complete capability stack including visual understanding, task planning, and tool execution. * **Commercial Validation:**  Its practicality and stability have been verified across multiple partnerships and testing scenarios. * **Application Value:**  Delivers true end-to-end intelligence, enabling a “say it, get it” mobile control experience. AutoGLM-Phone-Multilingual supports 50+ mainstream applications: | Category | Apps | | ------------------------ | ------------------------------------------------------------------------------------------------- | | Social & Messaging | X, Tiktok, WhatsApp, Telegram, FacebookMessenger, GoogleChat, Quora, Reddit, Instagram | | Productivity & Office | Gmail, GoogleCalendar, GoogleDrive, GoogleDocs, GoogleTasks, Joplin | | Life, Shopping & Finance | Amazon shopping, Temu, Bluecoins, Duolingo, GoogleFit, ebay | | Utilities & Media | GoogleClock, Chrome, GooglePlayStore, GooglePlayBooks, FilesbyGoogle | | Travel & Navigation | GoogleMaps, [Booking.com](http://booking.com/), [Trip.com](http://trip.com/), Expedia, OpenTracks | To see the full list of supported apps, run the scripts in [github](https://github.com/zai-org/Open-AutoGLM/blob/main/README.md#%E6%94%AF%E6%8C%81%E7%9A%84%E5%BA%94%E7%94%A8) (feel free to give us a star\~) AutoGLM-Phone-Multilingual can perform the following actions: | Action | Description | | ---------- | --------------------------------------- | | Launch | Launch an app | | Tap | Tap at specified coordinates | | Type | Input text | | Swipe | Swipe the screen | | Back | Go back to previous page | | Home | Return to home screen | | Long Press | Long press | | Double Tap | Double tap | | Wait | Wait for page to load | | Take\_over | Request manual takeover (login/captcha) | ## Examples ## Invocation Guide ### **Environment Setup** #### 1. Python Environment It is recommended to use **Python 3.10**. #### 2. ADB (Android Debug Bridge) * Download the official ADB package and extract it to a custom directory. [https://developer.android.com/tools/releases/platform-tools?hl=zh-cn](https://developer.android.com/tools/releases/platform-tools?hl=zh-cn) * Configure environment variables: * **MacOS**:`export PATH=${PATH}:~/Downloads/platform-tools` * **Windows:**  Refer to third-party tutorials to configure environment variables. * Verify whether ADB is installed successfully: ``` # adb --version Android Debug Bridge version 1.0.41 Version 36.0.0-13206524 Installed as /opt/homebrew/bin/adb Running on Darwin 22.4.0 (arm64) ``` #### 3. **Android Device Configuration** * Android 7.0+ device or emulator * Enable Developer Mode: *Settings → About phone → Tap "Build number" 10 times consecutively* * Enable USB Debugging: *Settings → Developer options → USB debugging* #### 4. **Install ADB Keyboard** Download **ADBKeyboard.apk** and install it on the device. After installation, go to *Settings → Input method* and enable **ADB Keyboard**. [https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk](https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk) ### **Deployment Preparation** #### 1. Clone the Repository ``` git clone https://github.com/zai-org/Open-AutoGLM.git ``` #### 2. Install Dependencies ``` pip install -r requirements.txt pip install -e . ``` #### 3. Configure ADB Connection ``` # Check connected devices adb devices # Output should show your device, e.g. # List of devices attached # emulator-5554 device ``` #### 4. Configure Model API ``` python main.py --base-url https://api-inference.modelscope.cn/v1 --model "ZAI/AutoGLM-Phone-9B" --apikey "your-zai-api-key" ""Open Chrome browser" ``` # GLM-4.5V Source: https://docs.z.ai/guides/vlm/glm-4.5v ## Overview GLM-4.5V is Z.AI's new generation of visual reasoning models based on the MOE architecture. With a total of 106B parameters and 12B activation parameters, it achieves SOTA performance among open-source VLMs of the same level in various benchmark tests, covering common tasks such as image, video, document understanding, and GUI tasks. * Input: \$0.6 per million tokens * Output: \$1.8 per million tokens Video / Image / Text / File Text 16K ## Usage Analyze webpage screenshots or screen recording videos, understand layout and interaction logic, and generate complete and usable webpage code with one click. Precisely identify and locate target objects, suitable for practical scenarios such as security checks, quality inspections, content reviews, and remote sensing monitoring. Recognize and process screen images, support execution of commands like clicking and sliding, providing reliable support for intelligent agents to complete operational tasks. Deeply analyze complex documents spanning dozens of pages, support summarization, translation, chart extraction, and can propose insights based on content. Strong reasoning ability and rich world knowledge, capable of deducing background information of images without using search. Able to parse long video content and accurately infer the time, characters, events, and logical relationships within the video. Can solve complex text-image combined problems, suitable for K12 educational scenarios for problem-solving and explanation. ## Resources * [API Documentation](/api-reference/llm/chat-completion): Learn how to call the API. ## Introducing GLM-4.5V GLM-4.5V, based on Z.AI's flagship GLM-4.5-Air, continues the iterative upgrade of the GLM-4.1V-Thinking technology route, achieving comprehensive performance at the same level as open-source SOTA models in 42 public visual multimodal benchmarks, covering common tasks such as image, video, document understanding, and GUI tasks. ![Description](https://cdn.bigmodel.cn/markdown/1754969019359glm-4.5v-16.jpeg?attname=glm-4.5v-16.jpeg) GLM-4.5V introduces a new "Thinking Mode" switch, allowing users to freely switch between quick response and deep reasoning, flexibly balancing processing speed and output quality according to task requirements. ## Examples ![Description](https://cdn.bigmodel.cn/markdown/1754969059126glm-4.5v-17.png?attname=glm-4.5v-17.png) Please generate a high - quality UI interface using CSS and HTML based on the webpage I provided. Screenshot of the rendered web page: ![Description](https://cdn.bigmodel.cn/markdown/1754969077749glm-4.5v-18.png?attname=glm-4.5v-18.png) ![Description](https://cdn.bigmodel.cn/markdown/1754968632215glm-4.5v-6.png?attname=glm-4.5v-6.png) Modify the data in the first row on slide 4 to "89", "21", "900" and "None" Modification result: ![Description](https://cdn.bigmodel.cn/markdown/1754968746754glm-4.5v-7.png?attname=glm-4.5v-7.png) ![Description](https://cdn.bigmodel.cn/markdown/1754968758489glm-4.5v-8.png?attname=glm-4.5v-8.png) Convert the table in the image to Markdown format Rendered result: ![Description](https://cdn.bigmodel.cn/markdown/1754968768530glm-4.5v-9.png?attname=glm-4.5v-9.png) ![Description](https://cdn.bigmodel.cn/markdown/1754968795362glm-4.5v-12.png?attname=glm-4.5v-12.png) Tell me the position of the couple in the picture. The short-haired guy is wearing a pink top and blue shorts, and the girl is in a cyan dress. Answer in \[x1,y1,x2,y2] format. ``` The position of the couple in the picture, where the short-haired guy is wearing a pink top and blue shorts, and the girl is in a cyan dress, is [835,626,931,883]. ``` Rendered result: ![Description](https://cdn.bigmodel.cn/markdown/1754968823292glm-4.5v-13.png?attname=glm-4.5v-13.png) ## Quick Start **Basic Call** ```bash theme={null} curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --header 'Accept-Language: en-US,en' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm-4.5v", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG" } }, { "type": "text", "text": "Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format" } ] } ], "thinking": { "type":"enabled" } }' ``` **Streaming Call** ```bash theme={null} curl --location 'https://api.z.ai/api/paas/v4/chat/completions' \ --header 'Authorization: Bearer YOUR_API_KEY' \ --header 'Accept-Language: en-US,en' \ --header 'Content-Type: application/json' \ --data '{ "model": "glm-4.5v", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG" } }, { "type": "text", "text": "Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format" } ] } ], "thinking": { "type":"enabled" }, "stream": true }' ``` **Install SDK** ```bash theme={null} # Install the latest version pip install zai-sdk # Or specify a version pip install zai-sdk==0.2.2 ``` **Verify installation** ```python theme={null} import zai print(zai.__version__) ``` **Basic Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="") # Enter your own APIKey response = client.chat.completions.create( model="glm-4.5v", # Enter the name of the model you want to call messages=[ { "content": [ { "type": "image_url", "image_url": { "url": "https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG" } }, { "type": "text", "text": "Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format" } ], "role": "user" } ], thinking={ "type":"enabled" } ) print(response.choices[0].message) ``` **Streaming Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="") # Enter your own APIKey response = client.chat.completions.create( model="glm-4.5v", # Enter the name of the model you want to call messages=[ { "content": [ { "type": "image_url", "image_url": { "url": "https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG" } }, { "type": "text", "text": "Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format" } ], "role": "user" } ], thinking={ "type":"enabled" }, stream=True ) for chunk in response: if chunk.choices[0].delta.reasoning_content: print(chunk.choices[0].delta.reasoning_content, end='', flush=True) if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end='', flush=True) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Basic Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.*; import ai.z.openapi.core.Constants; import java.util.Arrays; public class GLM45VExample { public static void main(String[] args) { String apiKey = ""; // Enter your own APIKey ZaiClient client = ZaiClient.builder().ofZAI() .apiKey(apiKey) .build(); ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4.5v") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content(Arrays.asList( MessageContent.builder() .type("text") .text("Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format") .build(), MessageContent.builder() .type("image_url") .imageUrl(ImageUrl.builder() .url("https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG") .build()) .build())) .build())) .thinking(ChatThinking.builder() .type("enabled") .build()) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { Object reply = response.getData().getChoices().get(0).getMessage(); System.out.println(reply); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Streaming Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.*; import ai.z.openapi.core.Constants; import java.util.Arrays; public class GLM45VStreamExample { public static void main(String[] args) { String apiKey = ""; // Enter your own APIKey ZaiClient client = ZaiClient.builder().ofZAI() .apiKey(apiKey) .build(); ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4.5v") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content(Arrays.asList( MessageContent.builder() .type("text") .text("Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format") .build(), MessageContent.builder() .type("image_url") .imageUrl(ImageUrl.builder() .url("https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG") .build()) .build())) .build())) .thinking(ChatThinking.builder() .type("enabled") .build()) .stream(true) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { response.getFlowable().subscribe( // Process streaming message data data -> { if (data.getChoices() != null && !data.getChoices().isEmpty()) { Delta delta = data.getChoices().get(0).getDelta(); System.out.print(delta + "\n"); }}, // Process streaming response error error -> System.err.println("\nStream error: " + error.getMessage()), // Process streaming response completion event () -> System.out.println("\nStreaming response completed") ); } else { System.err.println("Error: " + response.getMsg()); } } } ``` # GLM-4.6V Source: https://docs.z.ai/guides/vlm/glm-4.6v ## Overview GLM-4.6V series are Z.ai's iterations in a multimodal large language model. GLM-4.6V scales its context window to 128k tokens in training, and achieves SoTA performance in visual understanding among models of similar parameter scales. Crucially, GLM-4.6V integrate native Function Calling capabilities for the first time. This effectively bridges the gap between "visual perception" and "executable action," providing a unified technical foundation for multimodal agents in real-world business scenarios. High Performance Video / Image / Text / File Text 128K Lightweight, High-Speed,and Affordable Video / Image / Text / File Text 128K Lightweight, Completely Free Video / Image / Text / File Text 128K ## Resources * [API Documentation](/api-reference/llm/chat-completion): Learn how to call the API. ## Introducing GLM-4.6V Traditional tool usage in LLMs often relies on pure text, requiring multiple intermediate conversions when dealing with images, videos, or complex documents—a process that introduces information loss and engineering complexity. GLM-4.6V is equipped with native multimodal tool calling capability: * **Multimodal Input:**  Images, screenshots, and document pages can be passed directly as tool parameters without being converted to text descriptions first, minimizing signal loss. * **Multimodal Output:**  The model can visually comprehend results returned by tools—such as searching results, statistical charts, rendered web screenshots, or retrieved product images—and incorporate them into subsequent reasoning chains. This native support allows GLM-4.6V to close the loop from perception to understanding to execution, enabling complex tasks like mixed text-image content creation and visual web searching. GLM-4.6V can accept multimodal inputs—mixed text/image papers, reports, or slides—and automatically generate high-quality, structured image-text interleaved content. * **Complex Document Understanding:**  Accurately understands structured information from documents containing text, charts, figures, and formulas. * **Visual Tool Retrieval:**  During generation, the model can autonomously call search tools to find candidate images or crop key visuals from the source multimodal context. * **Visual Audit & Layout:**  The model performs a "visual audit" on candidate images to assess relevance and quality, filtering out noise to produce structured articles ready for social media or knowledge bases. GLM-4.6V delivers an end-to-end multimodal search-and-analysis workflow, enabling the model to move seamlessly from visual perception to online retrieval and finally to structured and illustrated report generation. Across the entire process, it maintains multimodal context awareness and performs reasoning grounded in both textual and visual information. * **Intent Recognition and Search Planning:**  For queries containing images, text, or both, GLM-4.6V identifies the user’s search intent and determines what information is needed. It then autonomously selects and triggers the appropriate search tools (e.g., text-to-image search, image-to-text search, or image-to-image matching) to retrieve relevant data. * **Multimodal Results Alignment:**  The model reviews the mixed visual and textual information returned by the search tools, identifies the parts most relevant to the query, and uses them to support the subsequent reasoning and generation process. * **Reasoning & Rich Media Report Generation:**  Leveraging relevant visual and textual cues retrieved from search, the model performs the necessary reasoning steps and generates a structured report, seamlessly integrating both visual and textual elements to match the task requirements. We have optimized GLM-4.6V for frontend development, significantly shortening the "design to code" cycle. * **Pixel-Level Replication:**  By uploading a screenshot or design file, the model identifies layouts, components, and color schemes to generate high-fidelity HTML/CSS/JS code. * **Visual Debugging:**  Users can circle an area on a generated page screenshot and give natural language instructions (e.g., "Move this button left and make it dark blue"). The model automatically locates and corrects the corresponding code snippet. GLM-4.6V aligns its visual encoder with a 128K context length, giving the model a massive memory capacity. In practice, this equates to processing \~150 pages of complex documents, 200 slide pages, or a one-hour video in a single inference pass. * **Financial Analysis:**  In testing, GLM-4.6V successfully processed financial reports from four different public companies simultaneously, extracting core metrics across documents and synthesizing a comparative analysis table without losing key details. We evaluated GLM-4.6V on over 20 mainstream multimodal benchmarks, including **MMBench**, **MathVista**, and **OCRBench**. The model achieves SOTA performance among open-source models of comparable scale in key capabilities such as multimodal interaction, logical reasoning, and long-context understanding. ![Description](https://cdn.bigmodel.cn/markdown/1765165989046glm-4.6v-1.jpeg?attname=glm-4.6v-1.jpeg) ## Examples **Highlights Analysis** * Supports native multimodality, enabling direct processing of documents containing visual elements (e.g., images, tables, curves, etc.). This eliminates the need for cumbersome and error-prone preprocessing steps such as OCR and document parsing. * In addition to text output, the model is capable of independent decision-making to locate the pages and regions where relevant content resides. It can also invoke tools via MCP for screenshot capture and embedding, generating well-illustrated reports. * On the basis of in-depth paper reading and information analysis & consolidation, the model is endowed with reasoning capabilities, allowing it to express its own "insights" on specific topics. ![Description](https://cdn.bigmodel.cn/markdown/1765174318683image.png?attname=image.png) Based on the key visualizations from the two papers, this research report delivers a comprehensive, illustration-integrated analysis of the bench described in the literature. `Think` ``` Alright, here's the deal. The user wants me to act as a research report generator, analyzing two papers on AI-generated video benchmarks. I need to synthesize the information, focusing on the key visualizations and presenting it in a clear, integrated way. This is a common task, and I'm ready to dive in. First, I need to understand the core of each paper. Paper 1, "AIGCBench," seems to be about evaluating Image-to-Video (I2V) generation. It's got a framework, datasets, and metrics. I'll need to dig into its structure, especially Figure 1, which shows the overall setup. Then, I'll look at the evaluation metrics, probably in Table 2. The visual comparisons of the generated videos in Figure 3 will be crucial for understanding the results. And, of course, the radar chart in Figure 4 will summarize the performance. Next, Paper 2, "FETV," focuses on Text-to-Video (T2V) generation. It emphasizes fine-grained evaluation, categorizing prompts by content, attributes, and complexity. I'll need to understand the categorization framework, probably in Figure 1. The data distribution across these categories, shown in Figure 3 and 4, will be important. The radar charts in Figure 5 and 6 will show the model performance across different categories. And the leaderboards in Figure 7 will provide a quick overview of the rankings. Now, the real work begins: synthesizing these two papers. I'll need to structure the report logically. I'll start with an introduction to set the stage. Then, I'll dive into the benchmarks themselves, highlighting their different focuses. I'll need to explain how AIGCBench is structured (Figure 1) and how FETV categorizes its prompts (Figure 1). Then, I'll compare the evaluation metrics, probably using Table 1 and Table 2. I'll need to discuss the visual results, using Figure 3 and Figures 5-7. Finally, I'll wrap it up with a conclusion, emphasizing the strengths and weaknesses of each benchmark. I need to make sure I'm using the images effectively, referencing them correctly, and providing a clear, concise analysis. The user wants a "comprehensive, illustration-integrated analysis," so I need to deliver on that promise. Let's get started! ``` `Answer` ``` Of course. Here is a research report that provides a comprehensive, illustration-integrated analysis of the two benchmarks described in the literature. Research Report: A Comparative Analysis of AIGCBench and FETV for Evaluating AI-Generated Video This report analyzes two key research papers that introduce novel benchmarks for evaluating AI-generated video content. The first, AIGCBench, focuses on a comprehensive evaluation of Image-to-Video (I2V) generation, while the second, FETV, proposes a benchmark for the fine-grained evaluation of open-domain Text-to-Video (T2V) generation. By examining their methodologies, datasets, evaluation metrics, and results, we can understand the current state and challenges of automated video generation assessment. 1. AIGCBench: A Comprehensive Benchmark for Image-to-Video Generation AIGCBench is designed to address the limitations of existing benchmarks by providing a unified and scalable framework for evaluating I2V models. Its primary goal is to offer a fair comparison of different algorithms under equivalent conditions. Framework and Datasets: The AIGCBench framework is divided into three core modules: datasets, video generation models, and evaluation metrics. As illustrated in Figure 1, the benchmark utilizes two types of datasets: real-world video-text pairs and a novel set of image-text pairs generated through a proprietary pipeline. ![1, [[102, 75, 887, 296]], "Figure 1: Illustration of the AIGCBench framework."] This pipeline, detailed in Figure 2, involves combining subjects, behaviors, backgrounds, and styles to create diverse prompts, which are then processed by GPT-4 for enhancement before being fed to a text-to-image model (Stable Diffusion) to generate the final image. ![1, [[102, 71, 908, 466]], "Figure 2: Image-text dataset generation pipeline and results."] Evaluation Metrics and Results: AIGCBench evaluates I2V models across four critical dimensions: Control-Video Alignment: How well the generated video matches the input image. Motion Effects: The quality and plausibility of the motion. Temporal Consistency: The coherence of the video across time. Video Quality: The overall aesthetic and technical quality. The benchmark evaluates five state-of-the-art models: VideoCrafter, I2VGen-XL, SVD, Pika, and Gen-2. The quantitative results, summarized in Table 2, show that Gen-2 performs best in video quality, while SVD and Pika show strong performance in control-video alignment and temporal consistency. ![1, [[90, 75, 907, 235]], "Table 2: Quantitative analysis for different Image-to-Video algorithms."] Qualitative comparisons in Figure 3 visually demonstrate the performance differences. For example, in the "knight running" prompt, Gen-2 produces a highly detailed and aesthetically pleasing video, whereas VideoCrafter and I2VGen-XL struggle to preserve the spatial structure of the original image. ![1, [[92, 71, 910, 722]], "Figure 3: Qualitative comparison of five I2V algorithms on three different prompts."] A user study with 42 participants was conducted to validate the metrics. The results, visualized in a radar chart in Figure 4, show that Pika excels in motion effects and temporal consistency, while Gen-2 is favored for video quality. ![1, [[385, 75, 614, 266]], "Figure 4: Radar chart showing user preferences for different I2V algorithms."] 2. FETV: A Benchmark for Fine-Grained Text-to-Video Evaluation FETV addresses the need for more nuanced evaluation of T2V models by introducing a multi-aspect categorization of prompts. Unlike benchmarks that provide only a single overall score, FETV breaks down evaluation into specific categories to reveal a model's strengths and weaknesses. Multi-Aspect Categorization: FETV's core innovation is its categorization of text prompts based on three orthogonal aspects, as shown in Figure 1: Major Content: The primary subject (e.g., people, animals, scenery). Attribute Control: Specific attributes the prompt aims to control (e.g., color, speed, motion direction). Prompt Complexity: The complexity of the text prompt (simple, medium, complex). This framework allows for a "fine-grained evaluation" that goes beyond a single score. ![2, [[181, 71, 815, 314]], "Figure 1: Illustration of FETV's multi-aspect categorization."] Evaluation and Results: FETV evaluates four T2V models: CogVideo, Text2Video-zero, ModelScopeT2V, and ZeroScope. The evaluation is performed manually across four perspectives: static quality, temporal quality, overall alignment, and fine-grained alignment. The results, presented in radar charts in Figure 5 and Figure 6, reveal significant variations in model performance across different categories. For instance, Figure 5 shows that ModelScopeT2V performs well on "static quality" for "people" and "scenery" but poorly on "artifacts" and "vehicles." ![2, [[267, 71, 731, 472]], "Figure 5: Manual evaluation of static and temporal video quality across different major contents."] Furthermore, Figure 6 demonstrates that models struggle with controlling specific attributes. For example, in the "fine-grained alignment" chart, all models show weakness in controlling "event order." ![2, [[181, 525, 807, 722]], "Figure 6: Manual evaluation of video-text alignment across different attributes and complexities."] The benchmark also includes leaderboards (Figure 7) that rank the models based on their overall performance in static quality, temporal quality, and alignment. ![2, [[175, 71, 817, 196]], "Figure 7: Leaderboard on the FETV benchmark based on manual evaluation."] 3. Comparative Analysis and Conclusion Both AIGCBench and FETV represent significant steps forward in the standardized evaluation of AI-generated video, but they target different tasks and evaluation philosophies. Scope: AIGCBench focuses specifically on the I2V task, evaluating how well a model can animate a static image. FETV, in contrast, is designed for the more open-ended T2V task, where the model must generate a video from a text description. Evaluation Granularity: FETV's primary strength is its "fine-grained" approach, using a detailed categorization system to pinpoint model weaknesses. AIGCBench provides a more holistic, four-dimensional score, offering a broader performance overview. Methodology: AIGCBench combines both reference-dependent (video-dependent) and reference-free (video-free) metrics, along with human validation. FETV relies heavily on comprehensive manual evaluation to establish a ground truth, which it then uses to diagnose the reliability of automatic metrics. In conclusion, these two benchmarks provide a more robust and multi-faceted way to assess AI video generation models. AIGCBench offers a comprehensive framework for fair I2V model comparison, while FETV pushes for a deeper, more nuanced understanding of T2V model capabilities. Together, they highlight the importance of diverse and detailed evaluation in driving progress in the field of AIGC. Part 2. Image reference records ![1, [[102, 75, 887, 296]], "Figure 1: Illustration of the AIGCBench framework."] ![1, [[102, 71, 908, 466]], "Figure 2: Image-text dataset generation pipeline and results."] ![1, [[92, 71, 910, 722]], "Figure 3: Qualitative comparison of five I2V algorithms on three different prompts."] ![1, [[385, 75, 614, 266]], "Figure 4: Radar chart showing user preferences for different I2V algorithms."] ![2, [[181, 71, 815, 314]], "Figure 1: Illustration of FETV's multi-aspect categorization."] ![2, [[267, 71, 731, 472]], "Figure 5: Manual evaluation of static and temporal video quality across different major contents."] ![2, [[181, 525, 807, 722]], "Figure 6: Manual evaluation of video-text alignment across different attributes and complexities."] ![2, [[175, 71, 817, 196]], "Figure 7: Leaderboard on the FETV benchmark based on manual evaluation."] Part 3. Table reference records ![1, [[90, 75, 907, 235]], "Table 2: Quantitative analysis for different Image-to-Video algorithms."] Next, I need to call the image_reference tool to get the URL of the relevant image. ``` `Rendered Result`: ![Description](https://cdn.bigmodel.cn/markdown/1765174533702image.png?attname=image.png) **Highlights Analysis** * Enhanced code execution & visual consistency * True support for continuous multi-round iterative web page generation capabilities * Support for long web page generation and multi-round editing capabilities `Think` ``` This is a complex task that requires me to recreate a dynamic webpage based on a series of video frames. The frames show the TripAdvisor website in Chinese. I need to build a React/Tailwind single-page application that mimics the interactions shown in the video.\n\nLet me analyze the key components and interactions from the video:\n\n1. Initial State: The homepage with a search bar, navigation menu, and a large hero image with a "Search" button.\n\n2. Navigation: The top navigation has options like "Discover" (发现), "Travel" (旅行), "Reviews" (点评), "More" (更多), a currency selector (USD), and a login button (登录).\n\n3. Search Bar: A prominent search bar with placeholder text for attractions, hotels, etc.\n\n4. Category Tabs: Below the search bar, there are tabs for different categories like "Search All" (搜索全部), "Hotels" (酒店), "Attractions" (景点玩乐), "Food" (美食).\n\n5. Scrolling Content: As the user scrolls down, different sections appear:\n - Hero banner with rotating images\n - "Find attractions that match your interests" section\n - Category cards for "Outdoor Activities" (户外活动), "Food" (美食), "Culture" (文化), "Water Activities" (水上活动)\n - "Explore Abu Dhabi" section\n - "Unmissable activities in exciting locations" section with activity cards\n - "Go far beyond: Discover more magical moments in Saudi" promotional section\n - "Inspiration to help you plan" section\n - "Big cities, happy holidays" section\n - "Unmissable iconic locations" section\n - "Travelers' Choice Awards" section\n - "Browse theme series" section\n\n6. Footer: The footer contains links to various pages and social media icons.\n\nNow I'll create the HTML/React code to replicate this interface. I'll focus on creating reusable components for repeating elements like activity cards and category cards.\n\nLet me start building the code: ``` `Rendered Result`: **Highlights Analysis** * Stronger multi-image relational reasoning: Understands multiple screenshots from different sources and layouts, integrating information across images without requiring user-provided context per image. * Precise multi-field extraction and validation: Automatically identifies key fields (order status, paid amount, merchant name, etc.) and filters entries based on business logic (e.g., excluding refunded orders). * End-to-end cross-image logical reasoning: Performs the full reasoning chain—identify → validate → aggregate—to compute final results without manual step-by-step guidance. * High robustness in real-world scenarios: Maintains consistent and reliable field recognition and logical judgment across diverse UI styles and layouts. ![Description](https://cdn.bigmodel.cn/markdown/1765174798178image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1765174817262image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1765174823912image.png?attname=image.png) I want to buy AirPods 4. Could you circle the cheapest one for me? `Think` ``` Okay, let's see. The user wants me to circle the cheapest AirPods 4. I've got three images to work with, and I need to find the best deal. First, I'll look at the first image. It shows the standard AirPods 4 for $129 and the Active Noise Cancellation version for $179. The standard one is cheaper. Next, I'll check the second image. Here, I see two AirPods 4 options: one at $141.98 and another at $99.00. The $99.00 one is clearly the cheapest so far. Finally, the third image shows an AirPods 4 for $169.00. Comparing all the prices, the $99.00 AirPods 4 from the second image is the cheapest. I need to circle that one. The bounding box for that product is at [599, 99, 799, 599]. I'll use that to draw the circle. ``` `Rendered result`: ![Description](https://cdn.bigmodel.cn/markdown/1765174909264image.png?attname=image.png) **Highlights Analysis** Full Object Detection and Recognition: Based on the description of target categories or characteristics, output all qualified target detection boxes and their corresponding labels. ![Description](https://cdn.bigmodel.cn/markdown/1765174983998image.png?attname=image.png) Identify the breeds of all cats in the image. Return the results in valid JSON format. The result should be a list, where each element in the list corresponds to a dictionary of target detection results. The dictionary keys are label and bbox\_2d, with values being the detected cat breed and the result bounding box coordinates respectively.For example:`[{"label": "Golden Shorthair-1", "bbox_2d": [1,2,3,4]},{"label": "Golden Shorthair-2", "bbox_2d": [4,5,6,7]}]` `Think` ``` I now need to address the user’s question, and first, I have to recall the breeds of the cats in the image. Let’s start with the first cat on the left: it has stripes and a raised tail, so it should be a tabby cat (e.g., an American Shorthair or British Shorthair, but this type of striped pattern is typically tabby). The second cat is also striped with a slightly curved tail—could it also be a tabby? Wait, no. The third cat is light brown and white; could it be a Siamese cat? Because Siamese cats are usually pointed or bicolor with a light coat. The fourth cat is a small gray-and-white kitten—could it be a Ragdoll? Do Ragdolls have long fur? No, the kitten in the image appears to have short fur, so maybe an American Shorthair? Wait, let me think again: the fifth cat is striped with a very high, raised tail—could it be a Maine Coon? Because Maine Coons have long, fluffy tails, yes, the fifth cat has a large, bushy tail, so it should be a Maine Coon. The sixth cat is a small gray shorthair—probably a British Shorthair? The seventh cat is a long-haired gray cat with a fluffy tail—could it be a Norwegian Forest Cat? Because Norwegian Forest Cats have long fur and large tails. The eighth cat is a small striped kitten—likely an American Shorthair tabby. Now let’s count the number of cats in the image: counting left to right, there are 8 cats total. Let me confirm each one: First cat on the left: tabby pattern, short fur, upright tail—likely a "tabby cat". Second cat: tabby pattern, curved tail, short fur—likely a "tabby cat" (same breed, different individual). Third cat: light brown and white with pointed ears—likely a "Siamese cat" (Siamese cats are characterized by pointed coloration, pointed ears, and large eyes). Fourth cat: gray-and-white, small build, short fur—likely an "American Shorthair" (American Shorthair: short fur, round face). Fifth cat: striped with an especially long, fluffy tail—likely a "Maine Coon" (Maine Coons have long, dense fur on their tails). Sixth cat: small gray shorthair—likely a "British Shorthair" (British Shorthair: round face, short fur). Seventh cat: long-haired gray with a fluffy tail—likely a "Norwegian Forest Cat" (Norwegian Forest Cats have long fur, large tails, and a sturdy build). Eighth cat (last on the right): small striped kitten, short fur—likely an "American Shorthair tabby". Next, I need to verify if the breed names are accurate and if there are more precise labels. For example, the third cat: Siamese cats are marked by pointed coloration (e.g., an M-shaped marking on the head), pointed ears, and blue or green eyes. The third cat in the image has pointed ears and a light brown-and-white coat, which matches Siamese characteristics. The fifth cat’s tail is indeed long and dense, so it is a Maine Coon. The seventh cat fits the profile of a Norwegian Forest Cat with its long fur and large tail. The others are common tabby cats, and American/British Shorthairs are common shorthair breeds. Then, I need to define the position of each cat using bbox_2d, which represents coordinates in the format [x1, y1, x2, y2] (where y1 = top, y2 = bottom, x1 = left, x2 = right). For example: The first cat (far left): approximate bbox is [100, 150, 190, 820] The second cat: [180, 330, 310, 850] The third cat: [300, 350, 430, 900] The fourth cat: [420, 520, 510, 920] The fifth cat: [500, 260, 605, 850] The sixth cat: [600, 450, 700, 870] The seventh cat: [680, 120, 800, 830] The eighth cat: [800, 480, 885, 830] These coordinates are approximate (based on position in the image) and do not require precise pixel values—only general ranges. Finally, organize this into a JSON list where each element is a dictionary. The label should be the breed plus a serial number (e.g., "Tabby Cat-1", "Tabby Cat-2", and so on). ``` `Answer` ```json theme={null} [ {"label": "Tabby Cat-1", "bbox_2d": [95, 152, 192, 825]}, {"label": "Tabby Cat-2", "bbox_2d": [185, 332, 310, 852]}, {"label": "Siamese Cat-1", "bbox_2d": [295, 352, 428, 902]}, {"label": "American Shorthair-1", "bbox_2d": [415, 520, 508, 922]}, {"label": "Maine Coon-1", "bbox_2d": [498, 262, 603, 852]}, {"label": "British Shorthair-1", "bbox_2d": [603, 452, 697, 872]}, {"label": "Norwegian Forest Cat-1", "bbox_2d": [685, 120, 797, 832]}, {"label": "Tabby Cat-3", "bbox_2d": [802, 482, 882, 832]} ] ``` ## Quick Start **Basic Call** ```bash theme={null} curl -X POST \ https://api.z.ai/api/paas/v4/chat/completions \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "model": "glm-4.6v", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG" } }, { "type": "text", "text": "Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format" } ] } ], "thinking": { "type":"enabled" } }' ``` **Streaming Call** ```bash theme={null} curl -X POST \ https://api.z.ai/api/paas/v4/chat/completions \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "model": "glm-4.6v", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG" } }, { "type": "text", "text": "Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format" } ] } ], "thinking": { "type":"enabled" }, "stream": true }' ``` **Install SDK** ```bash theme={null} # Install the latest version pip install zai-sdk # Or specify a version pip install zai-sdk==0.2.2 ``` **Verify installation** ``` import zai print(zai.__version__) ``` **Basic Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="") # Enter your own APIKey response = client.chat.completions.create( model="glm-4.6v", # Enter the name of the model you want to call messages=[ { "content": [ { "type": "image_url", "image_url": { "url": "https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG" } }, { "type": "text", "text": "Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format" } ], "role": "user" } ], thinking={ "type":"enabled" } ) print(response.choices[0].message) ``` **Streaming Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="") # Enter your own APIKey response = client.chat.completions.create( model="glm-4.6v", # Enter the name of the model you want to call messages=[ { "content": [ { "type": "image_url", "image_url": { "url": "https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG" } }, { "type": "text", "text": "Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format" } ], "role": "user" } ], thinking={ "type":"enabled" }, stream=True ) for chunk in response: if chunk.choices[0].delta.reasoning_content: print(chunk.choices[0].delta.reasoning_content, end='', flush=True) if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end='', flush=True) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Basic Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.*; import ai.z.openapi.core.Constants; import java.util.Arrays; public class GLM45VExample { public static void main(String[] args) { String apiKey = ""; // Enter your own APIKey ZaiClient client = ZaiClient.builder().ofZAI() .apiKey(apiKey) .build(); ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4.6v") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content(Arrays.asList( MessageContent.builder() .type("text") .text("Describe this image.") .build(), MessageContent.builder() .type("image_url") .imageUrl(ImageUrl.builder() .url("https://aigc-files.bigmodel.cn/api/cogview/20250723213827da171a419b9b4906_0.png") .build()) .build())) .build())) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { Object reply = response.getData().getChoices().get(0).getMessage(); System.out.println(reply); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Streaming Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.*; import ai.z.openapi.core.Constants; import java.util.Arrays; public class GLM45VStreamExample { public static void main(String[] args) { String apiKey = ""; // Enter your own APIKey ZaiClient client = ZaiClient.builder().ofZAI() .apiKey(apiKey) .build(); ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-4.6v") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content(Arrays.asList( MessageContent.builder() .type("text") .text("Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format") .build(), MessageContent.builder() .type("image_url") .imageUrl(ImageUrl.builder() .url("https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG") .build()) .build())) .build())) .stream(true) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { response.getFlowable().subscribe( // Process streaming message data data -> { if (data.getChoices() != null && !data.getChoices().isEmpty()) { Delta delta = data.getChoices().get(0).getDelta(); System.out.print(delta + "\n"); }}, // Process streaming response error error -> System.err.println("\nStream error: " + error.getMessage()), // Process streaming response completion event () -> System.out.println("\nStreaming response completed") ); } else { System.err.println("Error: " + response.getMsg()); } } } ``` # GLM-5V-Turbo Source: https://docs.z.ai/guides/vlm/glm-5v-turbo ## Overview GLM-5V-Turbo is Z.AI's first **multimodal coding foundation model**, built for **vision-based coding tasks**. It can natively process multimodal inputs such as images, video, and text, while also excelling at long-horizon planning, complex coding, and action execution. **Deeply optimized for agent workflows**, it works seamlessly with agents such as Claude Code and OpenClaw to complete the full loop of "understand the environment → plan actions → execute tasks". Multimodal Coding Model Video / Image / Text / File Text 200K 128K ## Capability Offering multiple thinking modes for different scenarios Powerful vision understanding capabilities, with support for images, video, and files Support real-time streaming responses to enhance user interaction experience Powerful tool invocation capabilities, enabling integration with various external toolsets Intelligent caching mechanism to optimize performance in long conversations ## Usage Send a design mockup or reference image, and the model can directly understand the layout, color palette, component hierarchy, and interaction logic, then generate a complete runnable frontend project. For wireframes, it reconstructs structure and functionality; for high-fidelity designs, it aims for pixel-level visual consistency. Works with frameworks such as Claude Code to autonomously browse target websites, map page transitions, collect visual assets and interaction details, and directly generate code based on the exploration results—upgrading from "recreating from a screenshot" to "recreating through autonomous exploration." Supports inputting screenshots of buggy pages, automatically identifying rendering issues such as layout misalignment, component overlap, and color mismatches, helping locate frontend problems and generate fix code to improve debugging efficiency. After integrating GLM-5V-Turbo, OpenClaw can understand webpage layouts, GUI elements, and chart information, helping the agent handle complex real-world tasks that combine perception, planning, and execution. ## Resources * [API Documentation](/api-reference/llm/chat-completion): Learn how to call the API. ## Introducing GLM-5V-Turbo Across benchmarks for multimodal coding and agentic tasks, as well as pure-text coding, GLM-5V-Turbo delivers strong performance with a smaller model size. In multimodal coding and agentic tasks, GLM-5V-Turbo achieved leading results across benchmarks for design-to-code generation, visual code generation, multimodal retrieval and question answering, and visual exploration. It also delivered strong performance on benchmarks such as AndroidWorld and WebVoyager, which evaluate an agent’s ability to operate in real GUI environments. ![Description](https://cdn.bigmodel.cn/markdown/1775044860800img_v3_0210b_c42d0355-22af-4be7-8582-16fc6e7d0bfg.png?attname=img_v3_0210b_c42d0355-22af-4be7-8582-16fc6e7d0bfg.png) In pure-text coding, GLM-5V-Turbo maintained solid performance across the three core benchmarks in CC-Bench-V2—Backend, Frontend, and Repo Exploration—showing that the addition of visual capabilities did not come at the expense of its text-only strengths. At the same time, the model also delivered strong results on PinchBench, ClawEval, and ZClawBench, which evaluate Lobster-style Agent task execution quality, further validating its overall capability in complex task execution scenarios. ![Description](https://cdn.bigmodel.cn/markdown/1774864152000img_v3_02109_6af1ad38-f794-43ec-9ac7-73a43628d7cg.png?attname=img_v3_02109_6af1ad38-f794-43ec-9ac7-73a43628d7cg.png) GLM-5V-Turbo combines strong vision and coding capabilities while achieving leading performance with a smaller parameter size, powered by systematic upgrades across **model architecture, training methods, data construction, and tooling**: * **Native Multimodal Fusion**: From pretraining through post-training, the model continuously strengthens visual-text alignment. Combined with the new CogViT vision encoder and an inference-friendly MTP architecture, it improves multimodal understanding and reasoning efficiency. * **30+ Task Joint Reinforcement Learning**: During RL, the model is jointly optimized across 30+ task types, spanning STEM, grounding, video, GUI agents, and coding agents, resulting in more robust gains in perception, reasoning, and agentic execution. * **Agentic Data and Task Construction**: To address the scarcity of agent data and the difficulty of verification, we built a multi-level, controllable, and verifiable data system, and injected agentic meta-capabilities during pretraining to strengthen action prediction and execution. * **Expanded Multimodal Toolchain**: Adds multimodal tools such as box drawing, screenshots, and webpage reading (including image understanding), extending agent capabilities from pure text to visual interaction and supporting a more complete perception–planning–execution loop. ## Official Skills Beyond vision-based coding and Claw-style tasks, GLM-5V-Turbo also shows major gains in a broader range of agentic scenarios, including multimodal search, deep research, GUI agents, and perceptual grounding. To support these use cases, we provide a set of official Skills. The ability to automatically analyze image content and generate natural-language descriptions; it can not only identify objects in an image, but also understand relationships between objects, scene atmosphere, and actions, turning them into accurate and fluent textual descriptions The ability to precisely locate the corresponding object or region in an image based on a natural-language description; it establishes alignment between text and visual pixels, typically marking the target location with a bounding box, enabling more grounded interactions and assisting with fine-grained image analysis The ability to understand and extract key information from user-provided documents (such as PDFs and Word files) and then generate text in a specified format; this ensures the output remains tightly grounded in the document content, making it useful for document interpretation, report generation, news writing, or proposal drafting The ability to read candidate resumes and intelligently compare them against job requirements; it can quickly extract key information such as education, work experience, and skill tags, assess candidate-job fit, and provide rankings or recommendations, significantly improving recruiting efficiency The ability to automatically generate high-quality, structured prompts based on reference images/videos and the user's intended goal; by understanding the content and characteristics of the image/video, refining wording, and adding relevant detail, it produces instructions that are easier for AI models to follow, leading to more accurate and higher-quality image/video generation results Additionally, based on the previously released specialized models GLM-OCR and GLM-Image, we have created five Skills to support a wider range of scenarios and tasks. The above Skills are now available on ClawHub, [Install Now](https://clawhub.ai/jaredforreal/glm-master-skill)! ## Examples ![Description](https://cdn.bigmodel.cn/markdown/1774856146926image.png?attname=image.png) > Please recreate the mobile pages based on the design mockups in the images. The left side shows the welcome page, and the center shows the homepage image. You will also need to create mockups for the remaining two pages. ![Description](https://cdn.bigmodel.cn/markdown/1774856692508image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1774856720373image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1774856745613image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1774857139202d744c7be15c6e13a5f651c26f93866f8.png?attname=d744c7be15c6e13a5f651c26f93866f8.png) ![Description](https://cdn.bigmodel.cn/markdown/1774856815088image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1774857357422795d3365-74a8-4cf9-944a-7fe904895f95_01.png?attname=795d3365-74a8-4cf9-944a-7fe904895f95_01.png) > Create an academic website based on the content of this article. ![Description](https://cdn.bigmodel.cn/markdown/1774857049017image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1774859348730A%20Very%20Big%20Video%20Reasoning%20Suite_01.png?attname=A+Very+Big+Video+Reasoning+Suite_01.png) > Summarize the main content and key arguments of the paper. ``` Paper Summary: A Very Big Video Reasoning Suite (VBVR) 1. Overview As video generation models (such as Sora and Veo) continue to improve rapidly in visual quality, their limitations in reasoning are becoming increasingly apparent. Existing video reasoning benchmarks are often relatively small in scale and lack systematic theoretical grounding. To address this gap, this paper introduces the VBVR (Very Big Video Reasoning) suite, which includes a large-scale dataset, VBVR-Dataset, and a verifiable evaluation framework, VBVR-Bench. VBVR is designed to advance video reasoning models beyond pure “visual generation” toward capabilities grounded in “physical-world commonsense and logical reasoning” through a systematic task design based on cognitive architecture. Figure 1: Overview of VBVR. The left panel shows a representative task grid covering five cognitive faculties; the center panel compares the scale of VBVR with other datasets (2.015M samples); the radar chart in the lower-right corner shows model performance across different cognitive faculties. 2. VBVR-Dataset: A Large-Scale and Systematic Dataset 2.1 Cognitive Architecture and Task Design The core innovation of VBVR-Dataset lies in its task taxonomy, which is built on theories from human cognitive science (such as Kant and Anderson). The dataset decomposes video reasoning into five core cognitive faculties: Perception: extracting structured representations from sensory input. Transformation: operating on and composing mental representations. Spatiality: intuitive understanding of position and navigation. Abstraction: distilling general patterns from concrete experience. Knowledge: applying prior knowledge and logical rules. The dataset contains 200 tasks and approximately 2.015 million samples in total (1 million training samples and 1 million test samples). 2.2 Scale Comparison Compared with existing video reasoning benchmarks (such as Video-Zero-Shot and Ruler-Bench), VBVR-Dataset improves by an order of magnitude in both number of tasks and dataset size, providing sufficient “fuel” for model training. Table 1: Comparison between VBVR-Dataset and existing video reasoning benchmarks. VBVR far exceeds previous datasets in both task count (200) and sample size (2M+). 2.3 Task Examples VBVR features highly diverse task designs, ranging from simple geometric shape recognition to complex physical simulation and logical planning. The figure below shows sample tasks, including polygon recognition, pipe connection, grid navigation, maze solving, and sliding puzzle tasks. Figure 2: Sample tasks generated by the VBVR parametric task suite. Each sequence illustrates the reasoning process required to arrive at a valid solution, covering perception, spatial reasoning, and logical operations. 2.4 Data Generation Pipeline To ensure both data quality and scale, VBVR adopts a distributed parametric generation pipeline. Tasks are first rigorously designed and reviewed, then implemented through standardized generator templates, and finally generated and stored at scale in parallel using cloud services (AWS Lambda). Figure 3: System pipeline for VBVR data generation. Starting from cognitive architecture, tasks proceed through task design and generator implementation, and are ultimately written to S3 storage through distributed Lambda workers. 3. VBVR-Bench: A Verifiable Evaluation Framework 3.1 Evaluation Results VBVR-Bench provides a rule-based, reproducible, and interpretable evaluation framework. The research team conducted a comprehensive evaluation of current state-of-the-art video generation models, including open-source models such as CogVideoX and closed-source models such as Sora 2 and Veo 3.1. The results show that even the best-performing model (such as Sora 2) achieves an overall score of only around 0.546, far below human performance (0.974). This indicates that current models still face major challenges on video tasks requiring strict logical reasoning and physical consistency. Table 3: Benchmark results on VBVR-Bench. Shows model performance on in-domain and out-of-domain tasks. Bold indicates the best result, and underlining indicates the second best. 3.2 Human Alignment Analysis To validate the effectiveness of VBVR-Bench, the study conducted a large-scale human preference alignment analysis. The results show that VBVR’s automatic evaluation scores are highly correlated with human preference scores (correlation coefficient ρ>0.93), demonstrating that the benchmark accurately reflects models’ actual reasoning capabilities. Figure 4: Alignment analysis between VBVR-Bench evaluation results and human preferences. The left panel shows in-domain tasks, and the right panel shows out-of-domain tasks; both exhibit strong linear correlation. 4. Scaling Laws and Emergent Capabilities 4.1 The Effect of Data Scale The study further explores how data scale affects model reasoning ability. By fine-tuning the Wan2.2 model on VBVR-Dataset, the team found that as training data increased from 0 to 500K, model performance showed a steady upward trend across all metrics. This demonstrates the critical role of high-quality, large-scale reasoning data in improving model performance. Table 4: Model performance at different data scales (VBVR-Wan2.2). As the amount of training data increases, the model’s scores on both in-domain and out-of-domain tasks improve significantly. 4.2 Qualitative Analysis and Emergent Behaviors In qualitative analysis, the model fine-tuned on VBVR data (VBVR-Wan2.2) demonstrated more controllable execution than Sora 2. For example, on tasks such as “delete a specific symbol” or “precisely rotate an object,” VBVR-Wan2.2 was able to follow instructions strictly, whereas Sora 2 was more prone to object deformation or execution failure. In addition, the study observed several emergent behaviors, such as the model spontaneously choosing a completion strategy (Self-chosen completion policy) or providing “rationalized” interpretations of a scene (Rationalizing), suggesting that large-scale reasoning training may unlock deeper cognitive capabilities. Figure 6: Qualitative comparison between VBVR-Wan2.2 and Sora 2. Panel A shows VBVR-Wan2.2’s advantages on controlled execution tasks; Panel B shows an emergent self-chosen completion policy; Panel C shows the model’s limitations when handling complex constraints. 5. Conclusion By providing the largest and most systematic video reasoning dataset and evaluation benchmark to date, the VBVR suite reveals the shortcomings of current video generation models in logical reasoning and validates the feasibility of improving model capabilities through more high-quality reasoning data. This lays the foundation for future development of general-purpose video agents with genuine physical-world commonsense and logical reasoning abilities. ``` ## Quick Start **Basic Call** ```bash theme={null} curl -X POST \ https://api.z.ai/api/paas/v4/chat/completions \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "model": "glm-5v-turbo", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG" } }, { "type": "text", "text": "Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format" } ] } ], "thinking": { "type":"enabled" } }' ``` **Streaming Call** ```bash theme={null} curl -X POST \ https://api.z.ai/api/paas/v4/chat/completions \ -H "Authorization: Bearer your-api-key" \ -H "Content-Type: application/json" \ -d '{ "model": "glm-5v-turbo", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG" } }, { "type": "text", "text": "Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format" } ] } ], "thinking": { "type":"enabled" }, "stream": true }' ``` **Install SDK** ```bash theme={null} # Install the latest version pip install zai-sdk # Or specify a version pip install zai-sdk==0.2.2 ``` **Verify installation** ``` import zai print(zai.__version__) ``` **Basic Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="") # Enter your own APIKey response = client.chat.completions.create( model="glm-5v-turbo", # Enter the name of the model you want to call messages=[ { "content": [ { "type": "image_url", "image_url": { "url": "https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG" } }, { "type": "text", "text": "Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format" } ], "role": "user" } ], thinking={ "type":"enabled" } ) print(response.choices[0].message) ``` **Streaming Call** ```python theme={null} from zai import ZaiClient client = ZaiClient(api_key="") # Enter your own APIKey response = client.chat.completions.create( model="glm-5v-turbo", # Enter the name of the model you want to call messages=[ { "content": [ { "type": "image_url", "image_url": { "url": "https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG" } }, { "type": "text", "text": "Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format" } ], "role": "user" } ], thinking={ "type":"enabled" }, stream=True ) for chunk in response: if chunk.choices[0].delta.reasoning_content: print(chunk.choices[0].delta.reasoning_content, end='', flush=True) if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end='', flush=True) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Basic Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.*; import ai.z.openapi.core.Constants; import java.util.Arrays; public class GLM45VExample { public static void main(String[] args) { String apiKey = ""; // Enter your own APIKey ZaiClient client = ZaiClient.builder().ofZAI() .apiKey(apiKey) .build(); ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5v-turbo") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content(Arrays.asList( MessageContent.builder() .type("text") .text("Describe this image.") .build(), MessageContent.builder() .type("image_url") .imageUrl(ImageUrl.builder() .url("https://aigc-files.bigmodel.cn/api/cogview/20250723213827da171a419b9b4906_0.png") .build()) .build())) .build())) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { Object reply = response.getData().getChoices().get(0).getMessage(); System.out.println(reply); } else { System.err.println("Error: " + response.getMsg()); } } } ``` **Streaming Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.model.*; import ai.z.openapi.core.Constants; import java.util.Arrays; public class GLM45VStreamExample { public static void main(String[] args) { String apiKey = ""; // Enter your own APIKey ZaiClient client = ZaiClient.builder().ofZAI() .apiKey(apiKey) .build(); ChatCompletionCreateParams request = ChatCompletionCreateParams.builder() .model("glm-5v-turbo") .messages(Arrays.asList( ChatMessage.builder() .role(ChatMessageRole.USER.value()) .content(Arrays.asList( MessageContent.builder() .type("text") .text("Where is the second bottle of beer from the right on the table? Provide coordinates in [[xmin,ymin,xmax,ymax]] format") .build(), MessageContent.builder() .type("image_url") .imageUrl(ImageUrl.builder() .url("https://cloudcovert-1305175928.cos.ap-guangzhou.myqcloud.com/%E5%9B%BE%E7%89%87grounding.PNG") .build()) .build())) .build())) .stream(true) .build(); ChatCompletionResponse response = client.chat().createChatCompletion(request); if (response.isSuccess()) { response.getFlowable().subscribe( // Process streaming message data data -> { if (data.getChoices() != null && !data.getChoices().isEmpty()) { Delta delta = data.getChoices().get(0).getDelta(); System.out.print(delta + "\n"); }}, // Process streaming response error error -> System.err.println("\nStream error: " + error.getMessage()), // Process streaming response completion event () -> System.out.println("\nStreaming response completed") ); } else { System.err.println("Error: " + response.getMsg()); } } } ``` # GLM-OCR Source: https://docs.z.ai/guides/vlm/glm-ocr ## Overview GLM-OCR is a lightweight professional OCR model with parameters as small as 0.9B, yet it achieves state-of-the-art performance across multiple capabilities. It sets a new benchmark for document parsing with its “small size and high accuracy.” Key features include: * **Performance SOTA**: Scored 94.62 points to top OmniDocBench V1.5 and achieved current best performance across **multiple mainstream document understanding benchmarks** including tables and formulas at launch. * **Optimized for Real-World Scenarios**: Delivers stable, leading accuracy in complex environments like code documentation, intricate tables, and stamp recognition. Maintains exceptional recognition precision even with complex layouts, diverse fonts, or mixed text-image content. * **Efficient and Cost-Effective**: With just 0.9B parameters, supports VLLM and SGLang deployment, significantly reducing inference latency and computational overhead. - PDF, images (JPG, PNG) - Single image ≤ 10MB, PDF ≤ 50MB - Maximum support: 100 pages Text / Image Links / MD Documents Support Chinese, English, French, Spanish, Russian, German, Japanese, Korean, etc. For detailed pricing information on GLM-OCR, please visit the [Pricing Page](/guides/overview/pricing). ## Usage Recognize text content from photos, screenshots, documents, and scans, supporting printed text, handwriting, and mathematical formulas. Applicable to diverse scenarios including education, research, and office work. Identify table structures and content, converting them into HTML-formatted sequences. Suitable for scenarios involving table data entry, conversion, and editing. Extract key information from various cards, certificates, receipts, and forms, outputting structured JSON data. Supports applications in banking, insurance, government services, legal, logistics, and other industries. Support high-volume document recognition and parsing with high accuracy and standardized output formats, providing a robust foundation for RAG. ## Resources * [API Documentation](/api-reference/tools/layout-parsing): Learn how to call the API. ## Introducing GLM-OCR Thanks to its proprietary CogViT visual encoder and deep scene optimization, GLM-OCR achieves “compact size, high accuracy.” With only 0.9B parameters, GLM-OCR achieved SOTA on the authoritative document parsing benchmark OmniDocBench V1.5 with a score of 94.6. It outperforms multiple specialized OCR models across four key domains—text, formula, table recognition, and information extraction—with performance approaching that of Gemini-3-Pro. ![Description](https://cdn.bigmodel.cn/markdown/1770048309140img_v3_02uh_d2a8a208-0969-4c06-9a14-fa1d6aa705dg.png?attname=img_v3_02uh_d2a8a208-0969-4c06-9a14-fa1d6aa705dg.png) Beyond public benchmarks, we conducted internal evaluations across six core real-world scenarios. Results show GLM-OCR delivers significant advantages across dimensions including code documentation, real-world tables, handwriting, multilingual text, seal recognition, and invoice extraction. ![Description](https://cdn.bigmodel.cn/markdown/1770048316118img_v3_02uh_c048a7e7-327c-4591-a620-b04113f6acfg.png?attname=img_v3_02uh_c048a7e7-327c-4591-a620-b04113f6acfg.png) For speed, we compared different OCR methods under identical hardware and testing conditions (single replica, single concurrency), evaluating their performance in parsing and exporting Markdown files from both image and PDF inputs. Results show GLM-OCR achieves a throughput of 1.86 pages/second for PDF documents and 0.67 images/second for images, significantly outperforming comparable models. ![Description](https://cdn.bigmodel.cn/markdown/1770038419131img_v3_02uh_8dd8ba6c-3ba0-4a13-9894-53700c931ffg.png?attname=img_v3_02uh_8dd8ba6c-3ba0-4a13-9894-53700c931ffg.png) Pricing is uniform for both API input and output, costing just \$0.03 per million tokens. ## Examples ![Description](https://cdn.bigmodel.cn/markdown/1770035979049image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1770036049307image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1770036076806image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1770036127560image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1770036142376image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1770036153340image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1770036167626image.png?attname=image.png) ![Description](https://cdn.bigmodel.cn/markdown/1770036178291image.png?attname=image.png) ## Quick Start ```bash theme={null} curl --location --request POST 'https://api.z.ai/api/paas/v4/layout_parsing' \ --header 'Authorization: Bearer your-api-key' \ --header 'Content-Type: application/json' \ --data-raw '{ "model": "glm-ocr", "file": "https://cdn.bigmodel.cn/static/logo/introduction.png" }' ``` **Install SDK** ```bash theme={null} # Install the latest version pip install zai-sdk # Or specify a version pip install zai-sdk==0.2.2 ``` **Verify installation** ```python theme={null} import zai print(zai.__version__) ``` **Basic Call** ```python theme={null} from zai import ZaiClient # Initialize client client = ZaiClient(api_key="your-api-key") image_url = "https://cdn.bigmodel.cn/static/logo/introduction.png" # Call layout parsing API response = client.layout_parsing.create( model="glm-ocr", file=image_url ) # Output result print(response) ``` **Install SDK** **Maven** ```xml theme={null} ai.z.openapi zai-sdk 0.3.3 ``` **Gradle (Groovy)** ```groovy theme={null} implementation 'ai.z.openapi:zai-sdk:0.3.3' ``` **Basic Call** ```java theme={null} import ai.z.openapi.ZaiClient; import ai.z.openapi.service.layoutparsing.LayoutParsingCreateParams; import ai.z.openapi.service.layoutparsing.LayoutParsingResponse; import ai.z.openapi.service.layoutparsing.LayoutParsingResult; public class LayoutParsing { public static void main(String[] args) { // Initialize client ZaiClient client = ZaiClient.builder() .ofZAI() .apiKey("your-api-key") .build(); String model = "glm-ocr"; String file = "https://cdn.bigmodel.cn/static/logo/introduction.png"; // Create layout parsing request LayoutParsingCreateParams params = LayoutParsingCreateParams.builder() .model(model) .file(file) .build(); // Send request LayoutParsingResponse response = client.layoutParsing().layoutParsing(params); // Handle response if (response.isSuccess()) { System.out.println("Parsing result: " + response.getData()); } else { System.err.println("Error: " + response.getMsg()); } } } ``` # FAQ Source: https://docs.z.ai/help/faq #### How does cache works in GLM-4.5? The caching mechanism is currently undergoing open beta testing. Therefore, the underlying logic of the caching mechanism (including but not limited to: how cache hits are triggered, how long cached content is retained, etc.) has not yet been announced. A part of your request content may be cached on our cloud platform, and the storage of cached content is currently completely free. During the request process, if your request triggers a cache hit, the cached content will be calculated at 1/5 of the original price. #### Why hasn’t my account balance changed after I used the API? * Please be advised that there is currently a processing delay in our billing system. You do not need to worry too much if you see your account balance kept the same after using api port. * The billing history reflect daily consumption records, and therefore display the billing status from the previous day (n-1). * Current day consumption will not be immediately visible in the billing details. #### How can I check the rate limits? You can check the rate limits from here: [https://z.ai/manage-apikey/rate-limits](https://z.ai/manage-apikey/rate-limits) #### How should I recharge? You can recharge from the billing page: [https://z.ai/manage-apikey/billing](https://z.ai/manage-apikey/billing) #### Why can't I recharge when using credit card? When using a credit card to recharge, please ensure that you are not using 3DS verification. 3DS verification is not supported in our platform at this moment. # Privacy Policy Source: https://docs.z.ai/legal-agreement/privacy-policy **Last Update:** September 29, 2025 Thank you for using Z.ai! JINGSHENG HENGXING TECHNOLOGY PTE.LTD, along with its affiliates (collectively referred to as "we", "our", or "us"), is committed to respecting your privacy and diligently protecting any information we gather from or about you. This Privacy Policy outlines how we process Personal Data that we collect from or about you in connection with your use of our websites, applications, and other services on Z.ai (together referred to as the "Services"). This Privacy Policy only applies to individual users and does not apply to content that we process on behalf of customers of our business offerings. If you are enterprises or developers using the API Services ("API Services") available on Z.ai, please refer to the Data Processing Addendum for API Services. ## 1. Data Controller The Services are provided and controlled by JINGSHENG HENGXING TECHNOLOGY PTE.LTD with its registered address in 10 ANSON ROAD, #26-03, INTERNATIONAL PLAZA, SINGAPORE 0759903. JINGSHENG HENGXING TECHNOLOGY PTE.LTD is the controller and is responsible for the processing of your Personal Data as described in this Privacy Policy. If you have any questions about how we use your personal data, please contact [user\_feedback@z.ai](mailto:user_feedback@z.ai). ## 2. What Personal Data We Collect ### 2.1 We collect personal data relating to you ("Personal Data") as follows: **Personal Data You Provide to Us Directly** We collect Personal Data if you create an account to use our Services or communicate with us as follows: * **Account information.** We collect information that you provide when you set up an account, such as your date of birth (where applicable), username (where applicable), email address, and password. * **Communication Information.** When you contact us, we collect the information you send us, such as proof of identity or age, contact details, feedback or inquiries and the contents of any messages you send. * **User Content.** This includes any text prompts, images, or other data you input. This information is processed in real-time to provide you with the Service. **Personal Data We Receive from Your Use of the Services:** We automatically collect certain information from you when you use the Services, including internet or other network activity information such as your IP address, unique device identifiers, and cookies. * **Device and Network Information.** In accordance with the permissions set on your device or browser, your device or browser automatically provides us with information regarding the timing and manner in which you install, access, or use our Services. This information may include details such as your device type, operating system, browser data and referring web pages, mobile network and connection details, mobile carrier or internet service provider (ISP), time zone configuration, IP address (including location information inferred from your IP address), and various identifiers (such as device or advertising identifiers, probabilistic identifiers, and other unique personal or online identifiers). * **Usage Information.** We collect information about your use of the Services, such as the dates and times of access, browsing history, information about the links you click, pages you view, and other information about how you use the Services, and technology on the devices you use to access the Services. * **Log Information.** We collect information about how our Services are performing when you use them. This information includes log files. In the event that you or your device encounters an error, we may collect details about the error, the time it occurred, the feature in use at that moment, the application's state when the error happened, and any communications or content present at the time of the error. * **Cookies & Similar Technologies.** We use cookies and similar tracking technologies to operate and provide the Service. For example, we use cookies to remember your language preferences, and for security purposes. We will obtain your consent to our use of cookies where required by law. For more information about cookie and instructions on how to set up your browser to accept, delete or disable cookies, see Chapter 10. **Personal Data We Receive from Other Sources** We may receive the information described in this Privacy Policy from other sources, such as: * **Log-in, Sign-up, or Linked Services.** Where available, if you choose to sign-up or log-in to the Services using a third-party service such as Google or GitHub, or link your account to a third-party service, we may collect information from the service, such as access token. * **Security Information.** We receive information from our trusted partners, such as security partners, to protect against fraud, abuse, and other security threats to our Services. * **Public Information.** We may obtain publicly available information via the Internet sources in order to train our models and provide services. ## 3. How We Use Your Personal Data We use your information to operate, provide, develop, and improve the Service, including for the following purposes set out in the table below. We may also aggregate or anonymize Personal Data in such a way that it no longer identifies you and use this data for the purposes mentioned above, such as analyzing how our Services are being utilized, enhancing and adding new features, and conducting research. We will keep and use the anonymized data in its de-identified form and will not attempt to re-identify it, unless required by law. | Purpose | Type of Personal Data processed, depending on the processing activity | Legal basis, depending on the process activity | | ----------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | To provide, analyze, and maintain our Services | Account Information
Communication Information
Data We Receive from Other Sources
Log Information
Usage Information
Device and Network Information
Cookies and Similar Technologies
Payment Personal Data | Where necessary to perform a contract with you, such as processing a user's prompts to provide a response. | | To improve and develop our Services and conduct research | Account Information
Communication Information
Data We Receive from Other Sources
Log Information
Usage Information
Device and Network Information
Cookies and Similar Technologies | Where necessary for our legitimate interests and those of third parties and broader society, including in developing, improving, or promoting our Services, such as when we train and improve our models. | | To communicate with you, including to send you information about our Services and events | Account Information
Communication Information
Log Information
Usage Information
Device and Network Information
Cookies and Similar Technologies | Where necessary to perform a contract with you, such as processing your contact information to send you a technical announcement about the Services.

Your consent when we ask for it to process your Personal Data for a specific purpose that we communicate to you, such as processing your contact information to send you certain forms of marketing communications. | | To prevent fraud, illegal activity, or misuses of our Services, and to protect the security of our systems and Services | Account Information
Communication Information
Data We Receive from Other Sources
Log Information
Usage Information
Device and Network Information
Cookies and Similar Technologies
Payment Personal Data | Where necessary to comply with a legal obligation.

Where we are not under a specific legal obligation, where necessary for our legitimate interests and those of third parties, including in protecting our Services from abuse, fraud, or security risks, such as processing data from security partners to protect against fraud, abuse and security threats in our Services. | | To comply with legal obligations and to protect the rights, privacy, safety, or property of our users, Z.ai, or third parties | Account Information
Communication Information
Data We Receive from Other Sources
Log Information
Usage Information
Device and Network Information
Cookies and Similar Technologies
Payment Personal Data | Where necessary to comply with a legal obligation, such as retaining transaction information to comply with record-keeping obligations.

Where we are not under a specific legal obligation, where necessary for our legitimate interests and those of third parties and broader society, including in protecting our or our affiliates', users', or third parties' rights, safety, and property, such as analyzing log data to identify fraud and abuse in our Services. | ## 4. How We Share Your Personal Data We may disclose your personal data to the following categories of recipients: * **Affiliates:** our affiliated companies in the Z.ai Group and/or their designated service providers, who provide data processing services necessary to provide you with our Services; * **Vendors and Service Providers:** third party vendors and service providers partners who provide data processing services to us as necessary to provide you with our Services, or assist us in delivering our products, services, websites and platforms as well as improving and optimising the same, or who otherwise process personal data for purposes described in this Privacy Policy. Specifically, we may use analytics service providers, such as Google Analytics or other analytics platforms, that collect pseudonymized or anonymized technical and interaction data, such as device type, operating system, browser version, IP address (with anonymization applied, where applicable), page views, session duration, and on-site event interactions (e.g., link or button clicks). Such information helps us understand usage patterns, enhance user experience, and improve our website and platform functionality. * **Business Transfers:** any business partner, investor, assignee or transferee (actual or prospective) to facilitate business asset transactions (which may extend to any merger, acquisition or any debt or asset sale) involving any of the Companies; * **Government Authorities or Other Third Parties:** any competent law enforcement body, regulatory, government agency, court or other third party (such as our professional advisers) where we believe disclosure is necessary (i) as a matter of applicable law or regulation, (ii) to exercise, establish or defend our legal rights or so a third party can defend theirs, or (iii) to protect your vital interests or those of any other person; and * **With your consent:** any other person, with your consent to the disclosure. ## 5. Your Rights and Choices Depending on where you live, you may have certain rights with respect to your personal information, such as the right to know how we collect and use your personal information. You may also have the right to access, change, oppose, request a copy of your authorization, file complaints before the competent authorities, withdraw your consent, or limit our collection and use of your personal information as well as to request that we delete it, and potentially others. In certain circumstances, you can also ask us to provide additional information about our collection and use of your personal information. Please note that your exercise of certain rights may impact your ability to use some or all of Services' features and functionalities. * **Right to know:** the right to know what personal data we process about you, including the categories of personal data, the categories of sources from which it is collected, the business or commercial purposes for collection, and the categories of third parties to whom we disclose it. * **Access & data portability:** the right to request a copy of the personal data we process about you, subject to certain exceptions and conditions. In certain cases and subject to applicable law, you have the right to port your information. * **Deletion:** the right to request that we delete personal data collected from you when you use our Services, subject to certain exceptions. You also are able to delete individual conversations, which will be removed immediately from your conversation history and automatically deleted from our back-end. * **Correction:** the right to request that we correct inaccurate personal data we retain about you, subject to certain exceptions. Please note that we cannot guarantee the factual accuracy of Outputs. If Outputs contain factually inaccurate personal data relating to you, you can submit a correction request and we will make a reasonable effort to correct this information—but due to the technical complexity of our large language models, it may not always be possible for us to do so. * **Objection:** the right to object to processing of your personal data, including profiling conducted on grounds of public or legitimate interest. In places where such a right applies, we will no longer process the personal data in case of such objection unless we demonstrate compelling legitimate grounds for the processing which override your interests, rights, and freedoms, or for the establishment, exercise or defense of legal claims. * **Restriction:** the right to restrict our processing of your personal data in certain circumstances. * **Withdrawal of consent.** Where our processing of your personal data is based on consent, you have the right to withdraw your consent. The withdrawal of consent will not affect the lawfulness of processing based on consent before its withdrawal. * **Automated decision-making:** We do not engage in decision making based solely on automated processing or profiling in a manner which produces a legal effect (i.e., impacts your legal rights) or significantly affects you in a similar way (e.g., significantly affects your financial circumstances or ability to access essential goods or services). Please contact us by using the contact information provided in this Privacy Policy if you would like to exercise any of your rights. We will respond to your request consistent with applicable law and subject to proper verification. You can control and access some of your personal information directly through settings. For example, you can manage your chat history. Should you choose to do so, you may also delete your chat history via your settings. If you choose to delete your account, you will not be able to reactivate your account or retrieve any of the content or information in connection with your account. ## 6. The Security of Your Personal Data The security of your personal data is a priority for us. We implement commercially reasonable technical, administrative, and physical security measures to protect your personal data from unauthorized access, theft, disclosure, alteration, or loss. We regularly assess our security practices to incorporate new technologies and methods. However, please note that no Internet or email transmission is completely secure or error-free. Specifically, emails sent to or from us may not be fully secure. As a result, we advise you to exercise caution when deciding what information to send to us via the Services or email. Additionally, we are not responsible for any bypassing of privacy settings or security measures on the Service or third-party websites. ## 7. How Long Do We Keep Your Personal Data We retain personal data for as long as it is needed to provide our Services and for the other purposes outlined in this Privacy Policy. Additionally, we retain personal data when required to fulfill contractual and legal obligations, when we have a legitimate business interest (such as improving and developing our Services and enhancing their safety, security, and stability), and for the exercise or defense of legal claims. Retention periods vary based on the volume, type, and sensitivity of the personal data, the purposes for which we use it, and any legal requirements, among other factors. For instance, when we process your personal data to provide you with the Services, we retain it as long as you have an account. This includes your account information, input. If you violate any of our terms, policies, or guidelines, we may immediately remove your account information, input from public view, but we will keep the personal data as needed to address the violation. When the personal data collected is no longer required by us, we and our service providers will perform the necessary procedures for destroying, deleting, erasing, or converting it into an anonymous form as permitted or required under applicable laws. ## 8. Where We Store Your Personal Data Depending on your location and the features of our Services that you access or use, your personal data may be transferred and processed outside the jurisdiction in which you reside, including to an affiliated company within the Z.ai Group or a third party (as outlined in section 4) located overseas. We generally provide the Services from Singapore, and our group companies and their designated service providers are typically located in Singapore. As a result, your personal data is generally processed in Singapore. We implement appropriate safeguards to ensure that your personal data remains protected in line with this Privacy Policy and applicable laws, and we only transfer your data abroad in compliance with applicable laws and legally recognized transfer mechanisms. ## 9. Personal Data relating to children Our Services are not directed to, or intended for, the individual under 18. We do not knowingly collect Personal Data from children under 13. If you have reason to believe that a child under 13 has provided Personal Data to us through the Services, please contact us via the contact details set out below. We will investigate any notification and, if appropriate, delete the Personal Data from our systems. Users under 18 must have permission from their parent or guardian to use our Services. ## 10. How Do We Use Cookies and Similar Technologies ### 10.1 Cookies and similar technologies we used We may use cookies and similar technologies to gather information resulting from access and/or use of our Services. The technologies we use may include: I. **Cookies or Browser Cookies:** Cookies are small text files stored in your computing or other electronic devices when you visit our website and platforms for record keeping purposes. Cookies can allow us to provide you with a personalized user experience, such as by allowing us to retrieve information which you previously provided us. Cookies can be "Persistent" or "Session" Cookies. Persistent Cookies may remain on your personal computer or mobile device across browsing sessions, while Session Cookies are generally deleted as soon as you close your web browser. We may also use cookies and similar tracking technologies to analyze usage trends, administer the website, monitor user interactions within the Services, and gain insights into how users engage with our offerings. To support these purposes, we may automatically collect certain technical information, which helps us understand aggregate usage patterns and optimize our Services. This information may include your Internet Protocol (IP) address, browser type, Internet Service Provider (ISP), referring and exit pages, files accessed on our website (e.g., HTML pages and graphics), operating system, date and time stamps, and clickstream data. II. **Web Beacons:** Certain sections of our services may contain small electronic files known as web beacons (also referred to as clear GIFs, pixel tags, and single-pixel GIFs) that permit us, for example, to count users who have visited those pages and understand how our Services are used (for example, to assess the popularity of certain pages and services) and to improve the performance and reliability of our Services. ### 10.2 Clear/disable cookies You can manage your cookie preferences or enable cookie-related features through the settings available in commonly used web browsers. We will respect your choices regarding the use of cookies and similar technologies in accordance with applicable laws and your expressed preferences. Please note that certain features of our Services rely on the use of cookies. If you choose to disable or delete cookies, or do not consent to their use, some parts of the Services may not function properly or as intended. For guidance on how to manage cookies, please refer to the help documentation provided by your browser. For your convenience, we have included links to cookie management instructions for popular browsers. However, we are not responsible for the accuracy, reliability, or completeness of any third-party content linked therein. * [Microsoft Internet Explorer](#) * [Microsoft Edge](#) * [Mozilla Firefox](#) * [Google Chrome](#) * [Safari for macOS](#) * [Safari for iOS](#) ### 10.3 More information about cookies For more information about cookie and instructions on how to set up your browser to accept, delete or disable cookies, see [www.allaboutcookies.org](http://www.allaboutcookies.org). ## 11. Privacy Policy Updates We may update this Privacy Policy from time to time. When we do, we will publish an updated version and effective date on this page, unless another type of notice is required by applicable law. We recommend that you review the Privacy Policy each time you access our Services to stay informed of our privacy practices. ## 12. Contact Us Questions, comments and requests regarding this policy should be addressed to [user\_feedback@z.ai](mailto:user_feedback@z.ai). ## Data Processing Addendum for API Services This Data Processing Addendum ("DPA") governs the processing of the data provided by business and enterprises users ("Customers") through API Services offered by JINGSHENG HENGXING TECHNOLOGY PTE.LTD, along with its affiliates (collectively referred to as "Company") on Z.ai platform ("Z.ai") and is hereby incorporated into the Additional Terms for API Services. Company and Customer each agree to comply with their respective obligations under applicable data privacy and data protection laws (collectively, "Data Protection Laws") in connection with the API Services. Customer is the person that determines the purposes and means for which Customer Data (as defined below) is processed (a "Data Controller"), whereas Company processes Customer Data in accordance with the Data Controller's instructions and on behalf of the Data Controller (as a "Data Processor"). "Data Controller" and "Data Processor" also mean the equivalent concepts under Data Protection Laws. For the purposes of this DPA, (i) "Personal Data" has the meaning assigned to the term "personal data" or "personal information" under applicable Data Protection Laws; and (ii) "Customer Data" means Personal Data that Customer provides to Company that Company processes on behalf of Customer to provide the Services. Company will process Customer Data as Customer's Data Processor to provide or maintain the Services and for the purposes set forth in this DPA, the Terms and/or in any other applicable agreements between Customer and Company. ### 1. Obligations of the Company As a Data Processor, the Company agrees to: a) Process Customer Data only 1) on Customer's behalf for the purpose of providing and supporting API Services; (ii) in compliance with the written instructions received from Customer; and (iii) in a manner that provides no less than the level of privacy protection required by Data Protection Laws. b) Adopt reasonable and appropriate organizational and institutional measures to safeguard the security of Customer Data in accordance with the requirements under Data Protection Laws. c) Provide reasonable assistance to the requests from Customer to fulfill its legal obligations in accordance with Data Protection Laws. d) Inform Customer if Company receives any complaint or request (in particular, requests for access to, rectification or blocking of Customer Data) directly from Customer's data subjects. e) Inform Customer if Company receives any legally binding request to disclose Customer Data to a law enforcement authority unless otherwise prohibited by laws. ### 2. Obligations of the Customer a) Customer shall comply with all Data Protection Laws. b) Customer represents, warrants and covenants that it has and shall maintain throughout the term all necessary rights, consents and authorizations to provide the Customer Data to Company and to authorize Company to use, disclose, retain and otherwise process Customer Data as contemplated by this DPA, the Terms and/or other processing instructions provided to Company. c) Customer shall be solely responsible for performing any of its obligations with regard to any requests from Customer's data subjects. d) Customer shall not provide Customer Data to Company except through agreed mechanisms. ### 3. International Data Transfer a) Company generally provide the Services from Singapore, and our group companies and their designated service providers are typically located in Singapore. As a result, Customer Data is generally processed in Singapore. b) Company implements appropriate safeguards to ensure that the Customer Data will only transfer Customer Data abroad in compliance with applicable Data Protection Laws and legally recognized transfer mechanisms. ### 4. Data Return and Deletion a) This DPA shall remain in effect as long as Company carries out Customer Data processing operations on Customer's behalf or until the termination of the Terms. b) The Company do not store any of the content the Customer or its End Users provide or generate while using our Services. This includes any texts, or other data you input. This information is processed in real-time to provide the Customer and End Users with the API Service and is not saved on our servers. c) For Customer Data other than those provided under Section 4(b), Company will temporarily store such data for the purposes of providing the API Services or in compliance with applicable laws. The Company will delete such data after the termination of the Terms unless otherwise required by applicable laws. # Subscriptions, Fees, and Payment Source: https://docs.z.ai/legal-agreement/subscription-terms ## 1. Fees and Billing * Access to certain Z.ai services or features may require payment. Applicable fees are listed on the **Z.ai**, unless otherwise communicated in writing. * To purchase services, you must provide complete and accurate billing information (“Payment Method”). You authorize us to charge your Payment Method for the applicable fees and any taxes. * If you proactively check the box, confirm, or otherwise select to enable automatic renewal as presented on the relevant page, you will be deemed to have authorized Z.ai or its partners (if any), upon expiration of the current subscription term, to automatically charge your payment method for the next renewal term in accordance with the fee standard applicable to such next renewal term, as then displayed on the platform or service page, including the then-effective pricing and service content. If you have not enabled automatic renewal, no automatic charge will be made. * If you subscribe through a third-party distributor (e.g., an app store), your payment, billing, and refund terms are governed by the distributor’s policies. * Except where required by law or explicitly stated, all payments are **non-refundable**. Please review your order carefully before confirming. ## 2. Subscriptions and Renewal * Some services are available by subscription (“Subscription”). The scope, features, and duration of your Subscription will be described during checkout. Z.ai may update the content or features from time to time. * Paid Subscriptions **renew automatically** at the end of each subscription term, and your Payment Method will be charged for the renewal term. * If your Subscription has an initial minimum term (“Initial Term”), it will automatically renew for the same duration (“Renewal Term”) unless canceled. * You may cancel your Subscription at any time using the tools provided by Z.ai. If purchased through a distributor, cancellation must follow the distributor’s process. * To avoid renewal charges, cancel at least **24 hours before the end of the current term**. Cancellation does not entitle you to a refund, but you will continue to have access until the end of the paid term. * For services with automatic renewal enabled, you may unilaterally stop automatic renewal at any time before the service validity period of the current subscription term expires. Z.ai may also unilaterally discontinue providing automatic renewal services to you as required by its operational strategy, in which case the subscription service term will terminate upon expiration of the current service period. In such event, Z.ai will notify you by announcement, in-site notice, or other reasonable means. Under these circumstances, the current subscription term that has already been paid for and has taken effect will remain unaffected; however, whether the service will continue to be available in the next term, how it may be renewed, and which rules will apply shall be subject to the page display or notice in effect at that time. **If an adjustment to the fee standard has a materially adverse effect on you, you should promptly discontinue the service. If you continue to use the service, you will be deemed to have accepted such adjustment.** ## 3. Service and Fee Adjustments Z.ai reserves the right to adjust its products and pricing standards due to business adjustments, operational strategy needs, unforeseeable circumstances, changed circumstances, or other reasonable causes, including but not limited to changes in content, price adjustments, product removal, service cancellation, reduction or addition of content, or termination of all or part of the service. Z.ai will notify you by page display, announcement, in-site notice, or other reasonable means before any new pricing takes effect. Unless otherwise expressly stated by Z.ai, the price for automatic renewal shall be the price, applicable rules, and promotional arrangements displayed on the page and actually applied by the system on the date the charge is made, rather than the original price at the time of your initial subscription. If you do not agree to such adjustments, you may turn off automatic renewal before the renewal takes place. If you continue to keep automatic renewal enabled or continue using the relevant service after the new price takes effect, you will be deemed to have agreed to the new subscription price. ## 4. Usage Rules 1. **Basic rules** * During your use, you should comply with applicable laws, regulations, policies, and industry standards. You shall not use the subscription service for any unlawful purposes, nor for engaging in any activities that are contrary to public policy and morals or infringe upon the rights and interests of others. * You shall comply with Terms of Use, Service Agreements, usage rules and other relevant agreementsf [Z.ai](http://Z.ai). * Violation of this agreement may result in the suspension or termination of services, and paid fees will not be refunded. 2. **Usage Scenario Restrictions** * You understand and agree that the usage quota under GLM Coding Plan is only used within [officially supported tools](https://docs.z.ai/devpack/tool/others). If the system detects usage through unauthorized or unsupported tools (such as SDK-based access or other third-party integrations), some subscription benefits may be restricted to ensure fairness and service stability. * You shall not use the GLM Coding Plan quota for general-purpose API access or any scenarios outside such tools, including but not limited to directly invoking model APIs from your own applications, bots, websites, SaaS products or other systems, unless you have entered into a separate written agreement with [Z.ai](http://Z.ai). * Unless otherwise agreed in writing, you may not resell, sub-resell, repackage, aggregate, proxy or otherwise provide the GLM Coding Plan to any third party, whether on a paid or free basis, nor may you use the GLM Coding Plan to provide model capabilities as a service to third parties. 3. **Personal-Use Only** * The GLM Coding Plan subscription is tied to a single account and is licensed only to the individual natural person associated with such account. * You shall not share your account or subscription, or allow any other person (including but not limited to your colleagues, friends, customers or any organization) to use your GLM Coding Plan quota, whether by sharing credentials, renting, lending, transferring, sublicensing or by any other means. * If [Z.ai](http://Z.ai) reasonably suspects that you are engaging in account sharing, bulk or automated usage on behalf of others, resale of access, or any other conduct that may harm the platform’s fair-use order or the legitimate rights and interests of third parties, [Z.ai](http://Z.ai) is entitled to take measures including, without limitation, restricting certain features, reducing or limiting your usage quota, suspending or terminating the service, reclaiming any remaining quota, and pursuing further liability in accordance with this Agreement and applicable laws. ## 5. Limitation of Liability * Z.ai shall exert reasonable efforts to ensure the stability and security of the subscription service. However, Z.ai **does not warrant** that the service will be entirely error-free or meet your expectations. Z.ai **shall not be liable** for service interruptions caused by **force majeure**, network failures, or third-party actions. * Unless otherwise expressly required by applicable laws or regulations, our total direct liability to you, regardless of the cause of action or form of claim, shall not exceed **the total amount you have spent in the most recent calendar month.** ## 6. Amendment and Termination of the Agreement * Z.ai reserves the right to amend this Agreement in accordance with changes in laws, regulations, and regulatory policies, market conditions, service needs, or business strategies. Any amended version of this Agreement will be published on the platform and will take effect upon publication. * If you continue to use the subscription services after the Agreement has been amended, you will be deemed to have accepted the amended Agreement. * If you violate any applicable laws, regulations, policies, the User Agreement, the Usage Guidelines, or any other platform rules or notices, Z.ai shall have the right to suspend or terminate your access to and use of the services. ## 7. Other Terms * Additional services or features may require separate fees, which you will have the chance to review and accept before being charged. * Z.ai is not liable for errors or issues caused by third-party payment processors. # Terms of Use Source: https://docs.z.ai/legal-agreement/terms-of-use **Last Update:** April 14, 2026 Welcome to Z.ai! **These terms of use (“Terms”) form an agreement between JINGSHENG HENGXING TECHNOLOGY PTE.LTD (“we”, “our”, or “us”) and you (“you”) as a user of the Z.ai accessible via z.ai/model-api (“Services”), governing your access and use of Z.ai for services or tools for your own purposes, internal organizational use, or for the benefit of end users. If you are enterprises or developers using the API services (“API Services”) available on Z.ai, you are also subject to the Additional Terms for API Services (“Additional Terms”).** Please ensure that you carefully read and fully understand all the terms and conditions herein. These Terms, along with our Privacy Policy, guidelines, rules, interpretations, announcements, notices, and other content that have been published or may be published in the future by Z.ai (**“Platform Rules”**), constitute the terms and conditions you must adhere to when using Z.ai. If you are a minor under the laws of your country or state of residence, you should review these Terms with your legal guardian. By using our Services or confirming your acceptance of these Terms (whichever occurs first), you acknowledge and agree to be legally bound by these Terms, which are subject to change and may be updated from time to time. Notification of changes to these Terms may be provided by us through posting the updated Terms on our website at Z.ai, by email to you, or via any other method we deem suitable. Should you object to any such changes, you must promptly notify us in writing to terminate your account and immediately cease all access to and use of the Services. Your continued account maintenance or use of the Services after such notification will be deemed as your acceptance of the amended Terms. If you have any questions regarding these Terms, you may consult Z.ai customer service. Except as otherwise expressly provided, any new functions or products provided by Z.ai and their associated services shall be unconditionally governed by these Terms. ## I. Definition **"Z.ai"** shall mean the platform operated by JINGSHENG HENGXING TECHNOLOGY PTE.LTD that provides generative AI services (such as API services) to individual and business users, including new service types we may develop from time to time. (also referred to as the **“Services”**). **"Platform Rules"** shall mean all rules, interpretations, announcements, notices, and other content that have been published or may be published in the future by the Z.ai. ## II. Registration and Security The Z.ai may be accessed without registration. However, to use the core services or features of Z.ai, you may be required to apply for an account and log into our Services using your account. Please keep your created key(s) securely, prevent any form of leakage, do not share or publicly disclose your key(s) with others, and do not expose your key(s) in browser or other client-side code. You will be solely responsible for any fees incurred and losses caused by sharing or leaking your key(s). We have the right to restrict, suspend or terminate your access to or use of the Z.ai or certain Services (or any features within the Services) if in our sole and absolute opinion, you are in breach of the Terms. ## III. Using Z.ai Services 1. You have the right to use the Services provided by the Z.ai in accordance with the terms and conditions of these Terms and Platform Rules. As generative artificial intelligence technology, models, and products continue to evolve, along with changes in laws and regulations, we may add, upgrade, modify, suspend, or terminate services, or make necessary adjustments to the technology, method, and performance of the Services. If such changes may affect your rights, we will promptly notify you through in-site notifications, website announcements, or via email. If you continue to use the Services after the changes, it indicates your acceptance of the adjusted service content. 2. As a user of our Service, you should comply with the requirements of these Terms, adhere to principles of voluntariness, equality, fairness, and good faith, not use this service to infringe upon the legitimate rights and interests of others or to seek improper gains, and not disrupt the normal order of the internet platform. You should comply with any and all applicable laws, regulations and policies of jurisdictions including but not limited to the United States, the European Union and/or other government authorities. 3. When using Z.ai services , you should comply with relevant laws, regulations, and platform rules, ensuring that you possess the necessary qualifications and capacities required by laws and regulations in the jurisdiction where the services are used. You are prohibited from engaging in or facilitating any of the following actions or any other illegal activities, including but not limited to: a) Using the Services in any way that compromises or threatens critical infrastructure or national security. b) Harming national interests of any country or jurisdiction. c) Inciting racial discrimination or hate crimes, or inciting, facilitating, or promoting violent extremism, terrorism, or hateful conduct. d) Undermining freedom of religion or inciting discrimination based on religious beliefs. e) Using the Service to create, publish or disseminate false information that may mislead or harm the public or any individual. f) Creating or disseminating obscene, pornographic, violent, murderous, terroristic, or criminal incitement content. g) Submitting, creating or disseminating content that is sexually explicit, suggestive, visually shocking, or otherwise disturbing. h) Using the Z.ai service in any manner that violates the rights of any third party, including but not limited to fraud, scams, spamming (SMS/email), or engaging in disrespectful, aggressive, hateful, abusive, insulting, or defamatory conduct toward any third party. i) Without proper authorization, you may not provide to Z.ai or collect, process, disclose, infer, or generate the personal information of others through Z.ai. You are strictly prohibited from using the Service in any manner that may infringe upon the personal information protection rights and privacy of any third party, including but not limited to the unauthorized collection, storage, or dissemination of any individual’s personal data, home address, phone number, email address, identification details (such as national identification number, social security number, passport number), or credit card information. j) Using the Services in any manner that endangers or exploits children, including any form of child sexual abuse or exploitation, as well as developing applications, tools, products, or services that are inappropriate for minors. k) Promote, encourage, or provide instructions on self-harm, suicide, or other dangerous activities that could lead to serious harm or death. l) Directly or indirectly, manipulate, redirect, divert, or hijack traffic, readership, or engagement related to Z.ai’s products and services, whether independently or in collaboration with third parties. m) Conducting any other prohibited behavior by applicable laws or regulations. 4. You should not engage in any activities that endanger the network and system security of Z.ai during your use of the Services, including but not limited to: a) Unauthorized access to networks, interference with normal network functions, data theft, intentional dissemination of malicious programs or viruses, or any other activities that endanger network security and order are strictly prohibited. b) The use of deep linking, page scraping, social bots, spiders, or other automated means to access this service or third-party services through this service, for the purpose of accessing, obtaining, or monitoring any unauthorized data, content, or servers/accounts, is strictly forbidden. c) Providing technical support, advertising, payment processing, or any other assistance to facilitate activities that endanger network security is prohibited. d) Reverse engineering, decompiling, disassembling, or attempting to extract data from Z.ai algorithms, source code, mechanisms, or any related components is strictly prohibited. e) You may not bypass, circumvent, or disseminate methods or techniques designed to bypass or circumvent the security measures implemented by Z.ai to prevent actions prohibited under these Terms. f) You may not use Z.ai to develop, train, or enhance any algorithms, models, or technologies that directly or indirectly compete with us is prohibited. g) Any other usage that may harm our interests is strictly forbidden. 5. You should comply with the following rules when using the service: a) You may not use this service dishonestly, including but not limited to falsely claiming that AI generated content (“Outputs”) is human-created, disguising human-created works as Outputs, engaging in deceptive advertising, marketing, transactions, fabricating discussions, generating fake public opinion trends, or manipulating engagement metrics such as readership, views, likes, plays, shares, or inducing interactions such as tipping, liking, or voting. b) You may not distribute or publish inappropriate Outputs using Z.ai through any means, including but not limited to copying, taking screenshots, screen recording, screen sharing, or live streaming. c) Unless otherwise agreed, you may not remove, modify, or obscure any AI identifiers added to Outputs by Z.ai, regardless of the form in which such identifiers are presented. d) You shall adopt technical measures to add identifiers and maintain logs in compliance with applicable laws. AI-generated Outputs shall be prominently marked at reasonable locations to indicate that it is generated by AI. e) You may not violate these Terms, any applicable law, or any rights of third parties, including but not limited to intellectual property rights, confidentiality obligations, or prior contractual rights. 6. You should follow these rules in the scenarios listed below: a) You shall not use Z.ai to conduct any decision-making activities, including but not limited to high-risk automated decision-making in areas that may materially affect the safety, rights and interests, or well‑being of individuals and society, such as health, education, credit, and critical infrastructure management. b) You may not use Z.ai for any services that require subject qualification or professional review, or as a substitute for professional services, including but not limited to professional fields such as medical care, legal services, news reporting, education, and investment and financial management. c) You may not use the Services to generate content intended for or related to political campaigns. d) If you create virtual characters using the Z.ai, you shall ensure that the virtual characters’ images, avatars, names, character settings, and data and content in the creation process do not infringe third-party rights, and shall not use virtual characters to engage in any acts that may constitute unfair competition. e) You may not steal or misappropriate domain names, websites, links, text, images, videos, data or any other content from Z.ai’s products and services, whether directly or indirectly, through methods including but not limited to link hijacking, redundant extraction, unauthorized scraping, simulated downloads, deep linking, or fraudulent registration. Additionally, you may not delete, obscure, or alter domain names, platform-specific identifiers, trademarks, or other proprietary notices in any manner. 7. If you discover any vulnerabilities or defects in the services provided by Z.ai, you are encouraged to report them through the feedback feature available within the interface or through the designated contact methods specified in these Terms. You must not publicly disclose, distribute, or publish any vulnerabilities or defects through the internet or other channels. 8. If you add new data, during your use of the Z.ai and models within the platform, the resulting responsibilities shall be borne by yourself. If you suffer damage after training, fine-tuning and development and claim that we should bear the responsibility, you shall prove that the damage is unrelated to your training, fine-tuning and development, otherwise we shall be exempted from liability for the damage. 9. If you use our Services to provide services to any third party, you assume full responsibility for all associated rights, obligations, and liabilities. We shall not be liable for any losses resulting from such use. 10. Third Party Services a) The Services may include third-party services, such as third-party plugins, APIs, or other applications. You acknowledge and agree that these third-party services are provided to you directly by the respective third-party service providers. You must comply with the terms and conditions set forth by these third parties (such as Terms of Use, Privacy Policy, Content Policy etc.), and any disputes arising from your use of such services must be resolved independently between you and the third-party provider. b) You acknowledge and agree that your use of third-party data,models, plug-ins, APIs, or other applications is subject to the terms and agreements of the respective third-party providers. If your Input contains illegal or inappropriate content, certain third-party services may refuse to generate Outputs for you. 11. If you use our service in the United States, you, together with your affiliates, employees, service providers, end users and any third parties under your direction or control, shall not use the Services to:process, store, or transmit any technical data, software, or content subject to the ITAR administered by the U.S. Department of State, or the EAR administered by the U.S. Department of Commerce; a) process, store, or transmit any Protected Health Information (PHI) or other data regulated under the Health Insurance Portability and Accountability Act (HIPAA), its implementing regulations, or any amendments thereto; b) process, store, or transmit any Nonpublic Personal Information (NPI) subject to the GLBA, or any regulations promulgated thereunder; or c) process or store personal information of children under the age of 13 in a manner that violates COPPA or any regulations promulgated thereunder. ## IV. Content 1. **User Content.** From time to time, and at our sole discretion, we may permit you to submit text, documents, or other materials to our Services for processing (**“Prompts”**). Prompts and Outputs are collectively referred to herein as “User Content”. 2. You assume full responsibility for all Prompts submitted to our Services. By submitting Prompts, you represent and warrant that: a) You own or hold all necessary rights, licenses, consents, and permissions to submit your Prompts to us. You hereby grant us all rights required to process your Prompts for the purposes set forth in these Terms, including the right to process on such Prompts in compliance with applicable laws and these Terms. b) All content included in your Prompts is either owned by you or that you have obtained all valid authorizations, licenses, and permissions from the relevant rights holders to submit, share, and allow us to process such content as provided under these Terms. c) Any Prompts provided by you to us shall not infringe any third party’s intellectual property rights, trade secrets, or other legitimate rights and interests. You further warrant that you shall not use such User content for any purpose that may constitute an infringement of third party intellectual property rights. d) You have not and will not, in any way, violate data protection laws or regulations, infringe upon the legal rights of any individual or entity, or use User Content for any unlawful purposes. You also ensure compliance with all applicable laws and regulations in protecting the personal data of your end-users, including issuing legally effective privacy notices to end-users. You confirm that you have obtained and retained any required consent from end-users for authorizing us to process your prompts. 3. You further agree and acknowledge that: a) Service Improvement and Development. For individual users, we reserve the right to process any User Content to improve our existing Services and/or to develop new products and services, including for our internal business operations and for the benefit of other customers. You hereby explicitly authorize and consent to our: (i) utilization and storage of User Content that does not constitute personal data for the purpose of developing and improving our machine learning and artificial intelligence technologies; and (ii) processing and storage of such User Content in locations outside of the jurisdiction where you access or use the Services. b) Data Collection and Usage. We may collect performance and usage metrics and data pertaining to your access and/or utilization of the Services, which may include, but is not limited to, model versions, inferences, upload times, as well as processing, diagnostic, and other technical data. We retain the right to use such metrics and data to enhance the quality, functionality, and other aspects of the Services and their underlying technologies. c) Public Sharing of User Content. Should you activate any features or functionalities of the Services (as applicable, and which we may, at our sole discretion, make available from time to time) that enable the public sharing of any User Content, you hereby agree and acknowledge that, notwithstanding any other provision within these Terms, any individual or entity with whom you have shared such content shall have the unrestricted right to access, use, save, reproduce, distribute, display, and transmit such content without limitation. You further agree that we shall not be held liable to you for any consequences arising from such public sharing and subsequent use of the User Content by third parties. 4. **Ownership of User Content.** As between you and us, and to the fullest extent permitted by applicable law, you retain all rights, title, and interest in the Prompts you submit and the Outputs generated specifically at your request and provided to you as a response to your submitted Prompts. You acknowledge and agree that the Outputs generated may lack uniqueness and could be similar or identical to Outputs generated for other users or any Third Party. Consequently, your rights in specific Outputs, if any, may not extend to Outputs generated at the request of other users or any Third Party. 5. **Our Use of User Content.** For individual users, we may use User Content to provide, maintain, develop, and improve our Services, comply with laws, enforce our terms, and ensure Service safety. Contact us with any concerns about our use of User Content. You grant us and our affiliates an unconditional, irrevocable, non-exclusive, royalty-free, fully transferable, sub-licensable, perpetual, worldwide license to access, use, host, modify, communicate, reproduce, adapt, create derivative works from, publish, perform, and distribute your User Content, and to authorize others to do so, in any format and on any platform. You also grant us a similar license to use your username and profile picture to identify you as the source of User Content. We may disclose your identity to third parties, including advisors, subcontractors, and those claiming User Content infringes their rights. 6. You waive any rights to prior review or approval of marketing materials related to User Content, and all privacy and publicity rights. You also waive any moral rights under the Copyright Act 2021 of Singapore, the rights described in Article 6bis of the Berne Convention for Protection of Literary and Artistic Works 1886, or any other rights of a similar laws. If any such rights cannot be waived, you agree not to assert them. 7. **For enterprises and developers using API Services, we will not use your User Content for developing or improving Services unless you explicitly agree to such use. Please refer to Additional Terms for further details.** ## V. IP Rights and Usage Terms 1. Each party retains ownership of their intellectual property as it stood prior to using the Services. Neither party’s rights will be transferred to the other, nor will they become jointly owned, as a result of fulfilling these Terms. To avoid any doubt, the ownership and intellectual property rights of our models, including but not limited to its parameters, algorithms, code, and framework structure, belong to us. 2. Without our prior written consent, you shall not at any time, in any manner or through any means (whether directly or indirectly, expressly or impliedly, including but not limited to any marketing, advertising, promotion, marketing or other purposes): use, refer to, publish, copy, display (publicly or non-publicly), mention, register or apply to register: (i) any trademarks (whether registered or not), trade names, names, logos or identifiers owned or authorized for use by us or our affiliates, or enable third parties to identify us or any of our affiliates through specific descriptions; or (ii) any names, trademarks, logos, words, terms, symbols or any combinations thereof in any language that are similar to the foregoing. If you violate the foregoing agreement, we have the right to require you to immediately delete the relevant content and be responsible for eliminating the impact and compensating all losses of us. 3. You shall not use Z.ai in any manner that infringes the copyrights or other intellectual property rights of any third party. Any content you input that is owned by a third party shall remain the property of its respective owner. 4. We are not responsible for any intellectual property infringement resulting from your use of Z.ai in the following ways: a) Using infringing or potentially infringing content when you knew or should have known about its infringement. b) Using unauthorized content within products or services on Z.ai. c) Inducing others to infringe on intellectual property rights through your use of Z.ai. d) Using content that explicitly belongs to a third party without permission. e) Removing or modifying copyright notices or other explicit attributions displayed by Z.ai. f) Using, modifying or otherwise altering AI-generated content in a way that gives rise to the infringement of any third party’s intellectual property rights. g) Combining generated content with external, content of third party, products, or services in a manner that leads to infringement. h) Using generated content for trademark or brand identity purposes without authorization. By using Z.ai, you acknowledge and agree to these terms and accept full responsibility for any violations. 5. You agree to defend and indemnify us from all losses, damages, and expenses (including reasonable attorneys' fees) arising from any third-party claims alleging that: a) your Prompts or any content you submit infringe third-party intellectual property rights; b) you have used the Services in combination with unauthorized third-party products, services, or content in a manner that results in infringement; or c) you continued to use the Services after receiving notice from us to cease such use due to potential infringement. ## VI. Copyright Protection Statement 1. **Reporting intellectual property or legal rights violations:** We have established measures and procedures to protect the legitimate rights of rights holders. If you believe that any part of Z.ai or its generated content infringes upon your intellectual property rights or other legal rights, you may submit an infringement notice in the form prescribed by and contain the information required under the applicable laws along with supporting evidence via email to [user\_feedback@z.ai](mailto:user_feedback@z.ai). We will take appropriate action, including removing or disabling relevant content that we believe violates these Terms. Additionally, we may terminate the visits of users who repeatedly infringe upon the rights of others, as deemed appropriate. 2. **Legal responsibility for false claims.** Please note that if a rights notice contains false statements, the submitter of the notice shall bear full legal responsibility for any resulting consequences, including but not limited to damages incurred by us and associated legal fees. If you are uncertain whether the content in question infringes your rights, we strongly recommend seeking professional legal advice before submitting a notice. 3. **Required documentation for a rights notice:** To ensure an efficient review of your claim, please provide the following information: a) Your detailed contact information, including your full name, a copy of your ID card or passport (if you are an individual), a copy of your business registration certificate (if you are an organization), your mailing address, telephone number, and email address. b) Proof of ownership of the allegedly infringed trademark, copyright, or any other legally enforceable rights. c) A clear and detailed description of the content that allegedly infringes upon your legal rights, along with screenshots or other evidence. d) A specific identification of the content on the alleged infringing webpage, indicating which part violates your rights. e) The following declaration in your notice:“I hereby confirm that the information in this notice is true and accurate. I am the legitimate rights holder of the reported content, or I have been authorized to act on its behalf. The reported content infringes my legal rights. If any information contained herein is false, I will bear all legal responsibilities resulting from such false statement.” f) If you are an individual, you must sign the notice. If you are an organization, the notice must bear your official seal. 4. **Processing and response time.** Upon verifying the authenticity of your claim, we will remove or disable access to the reported content. Please be aware that due to the time required for artificial intelligence model training, our removal or disabling measures may not take effect immediately. We appreciate your understanding. 5. We will only process English-language copyright notices that strictly comply with the applicable laws, and these Terms. Non-compliant notices will not be processed. 6. We will duly consider all compliant notices. You agree that before pursuing any legal action or remedy against us regarding infringing material, you must first provide us with a compliant notice and a reasonable opportunity to remove the material after our receipt. If we remove the material in response to your notice, you irrevocably waive any right of action against us under applicable law for any infringing material appearing on our Services before our removal under this section. 7. You expressly agree and acknowledge that we have no control over, and are not responsible or liable for, any content or material on third-party platforms or services. ## VII. Paid Accounts 1. When you use our fee-based Services, you may need to prepay for such Services. Your balance will be consumed if you use our paid services, and we reserve the right to suspend services if your balance are insufficient. 2. We have the right to determine the charging standards for fee-based services. The specific charging standards including but not limited to recharge methods, billing prices, consumption methods, and free quota for paid services are subject to the standards published on the relevant fee-based service pages. You may choose and pay for the corresponding Services according to your needs. 3. We may adjust our pricing model based on our business strategy. If such adjustments significantly affect your rights, we will promptly notify you through prominent methods such as in-site notifications, website announcements, or emails. If you continue to use the Services after the adjustments, it signifies your acceptance of the adjustments. 4. When recharging, please carefully confirm your order, account and information. You shall bear full responsibility for any losses incurred due to improper handling, inadequate comprehension, or insufficient understanding of the charge and billing procedures. You may cancel your paid subscription at any time. However, all payments are non-refundable unless otherwise required by applicable law. These Terms do not override any mandatory local laws that grant you specific cancellation rights. 5. You promise and guarantee the legality of the funds used to recharge. Otherwise, We have the right to cooperate with judicial or other government authorities' requirements to take corresponding measures on your account, including but not limited to locking, sealing, or restricting its use. ## VIII. Disclaimer of Warranties TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW: 1. OUR SERVICES ARE PROVIDED STRICTLY ON AN “AS IS” AND “AS AVAILABLE” BASIS; 2. WE, ALONG WITH OUR AFFILIATES AND LICENSORS, MAKE NO REPRESENTATIONS, WARRANTIES, GUARANTEES, OR UNDERTAKINGS (WHETHER EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE) REGARDING THE SERVICES, INCLUDING ANY OUTPUTS GENERATED. WE EXPRESSLY DISCLAIM ALL SUCH WARRANTIES, INCLUDING BUT NOT LIMITED TO THOSE RELATED TO MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, SATISFACTORY QUALITY, NON-INFRINGEMENT, RELIABILITY, IMPARTIALITY, CONSISTENCY, INTEROPERABILITY, CONFORMITY TO DESCRIPTION, SPECIFICATIONS, OR CRITERIA, COMPLIANCE WITH APPLICABLE LAWS AND THIRD-PARTY RIGHTS (INCLUDING BUT NOT LIMITED TO DATA PROTECTION, EXPORT CONTROL, AND INTELLECTUAL PROPERTY LAWS), AND UNINTERRUPTED ENJOYMENT. WE FURTHER DISCLAIM ANY GUARANTEES THAT THE SERVICES OR OUTPUTS WILL BE FREE FROM MALICIOUS, ERRONEOUS, MANIPULATED, INAPPROPRIATE, COMPROMISED, OR OTHERWISE HARMFUL CODE OR DATA, AS WELL AS ANY WARRANTIES ARISING FROM TRADE USAGE OR PRIOR DEALINGS; 3. WE DO NOT GUARANTEE THAT THE SERVICES WILL BE UNINTERRUPTED, ACCURATE, ERROR-FREE, OR SECURE, NOR DO WE WARRANT THAT ANY CONTENT WILL REMAIN INTACT, UNALTERED, OR FREE FROM LOSS; AND 4. YOU ACKNOWLEDGE AND AGREE THAT YOUR USE OF ANY OUTPUTS GENERATED BY OUR SERVICES IS ENTIRELY AT YOUR OWN RISK. YOU MUST NOT RELY ON SUCH OUTPUTS AS YOUR SOLE SOURCE OF TRUTH OR FACTUAL INFORMATION, NOR SHOULD THEY BE USED AS A SUBSTITUTE FOR PROFESSIONAL ADVICE. ## IX. Limitation of Liability 1. **Service limitations and no guarantees.** You acknowledge that despite our best efforts, due to inherent technical limitations in the technology supporting Z.ai’s services, we cannot guarantee the following: a) Z.ai may not meet your specific needs or expectations, and any products, services, or content obtained through we may not fulfill your intended purposes. b) We do not guarantee that the content generated by Z.ai will be 100% accurate, reliable, complete, functional, timely, secure, error-free, uninterrupted, or continuously stable. You should not rely on Z.ai’s Outputs as your sole source of factual information or as a substitute for professional advice. c) Although Z.ai applies AI-driven filtering mechanisms, some Outputs may still contain inaccuracies, inconsistencies, or material that could be deemed inappropriate. The Output of our Services does not represent the views or opinions of us, and you are responsible for assessing its accuracy and suitability, including conducting human review where necessary. d) Z.ai does not comprehend user inputs in the same way as a human, nor does it recognize underlying risks or ethical implications. The Outputs does not constitute advice or recommendations. You must make independent decisions based on your circumstances, and we are not liable for any risks or direct or indirect losses arising from reliance on Z.ai. 2. **Exemption from liability for service disruptions.** Z.ai fulfills its basic obligations as required by law; however, we are not liable for any failure, defect, delay, or modification in service performance due to: a) Force majeure events, including but not limited to natural disasters, strikes, riots, wars, governmental actions, and judicial or administrative orders. b) Failures in public utilities or third-party services, such as power outages or network disruptions. c) Situations where Z.ai has exercised reasonable management and due diligence, but disruptions occur due to routine or emergency system maintenance, system failures, software malfunctions, or cybersecurity threats. 3. **"As is" and "as available" disclaimer.** Unless otherwise explicitly stated in a written agreement, all information, content, materials, products, and services provided through Z.ai, including its generated content, are offered strictly on an “as is” and “as available” basis without any warranties of any kind. 4. **Exclusion of certain damages.** Neither we nor our affiliates, licensors, suppliers, or distributors shall be liable for any indirect, incidental, special, consequential, or exemplary damages. This includes, but is not limited to, damages for lost profits, loss of goodwill, loss of data, loss of use, or other intangible losses, even if we were advised of the possibility of such damages. 5. **Cap on liability.** Our total liability under these Terms shall not exceed the greater of: a) The amount you paid for the service that led to the claim within the 6 months preceding the liability, or b) One hundred dollars (\$100). 6. **Jurisdictional variations.** Some jurisdictions do not allow certain warranty disclaimers or limitations on liability. In such cases, the limitations set forth in this section will apply only to the maximum extent permitted by the laws of your country or state of residence, and you may have additional rights under local law. 7. **User commitments and liability.** You acknowledge the limitations of Z.ai and agree to comply with these Terms and all applicable laws and regulations. You agree not to engage in any actions that would result in legal claims, arbitration, compensation, or losses against Z.ai, its affiliates, or its employees in connection with your use of the platform. In the event of any such claims or disputes, you will be solely responsible for any resulting liabilities. 8. **Minor protection and liability disclaimer.** You acknowledge and agree that, despite our implementation of comprehensive measures to protect minors, we cannot fully guarantee that their use of this Service will always be controlled due to technical limitations, intentional misuse, or improper use by users. We bear no legal responsibility for any violations of laws, these Terms, or Platform Rules by minors, nor for any loss or harm resulting from a minor’s actions or a guardian’s failure to provide proper supervision. We strongly recommend that minors use this service under the guidance of their guardians, who are responsible for supervising their activities. ## X. Export Control and Sanctions 1. You understand that your use of Services, providing Inputs to and obtaining Outputs via Services, might be subject to all applicable laws and regulations of export controls and sanctions laws (collectively **“Export Control and Sanctions Laws”**) .You confirm that you and any person or entity you represent are not listed on any export control or economic sanction lists, nor are you or persons and entities you represent deemed to be included (including due to equity, control, or other affiliation relationships) in any such lists. 2. Such lists include but are not limited to the United Nations Security Council Sanctions List; the U.S. Specially Designated Nationals and Blocked Persons List (SDN List); the U.S. Entity List; the EU Consolidated List of Persons, Groups and Entities Subject to Financial Sanctions; China’s Unreliable Entities List, Export Control List, and Countermeasures List; and all other applicable control and sanctions lists issued by competent authorities with jurisdiction over you. 3. You confirm that you are not located in the following regions: Iran, North Korea, Cuba, Crimea, Donetsk, or Zaporizhzhia. 4. You understand that your use of the service, the inputs you provide to the service provider, and the Outputs you obtain through the service may be subject to all applicable export control and sanctions laws (collectively, "Export Control and Sanctions Laws"). 5. You confirm that you are solely responsible for complying with all applicable Export Control and Sanctions Laws related to your use of the service and your end users’ access to and use of the services. 6. You represent and warrant that the service shall not be used for the benefit of, nor exported, re-exported, or transferred to: (a) any person or entity located in Iran, North Korea, Cuba, Crimea, Donetsk, or Zaporizhzhia; (b) any party subject to restricted lists under applicable Export Control and Sanctions Laws that would prohibit your use of the service. 7. Our service shall not be used for any prohibited end use, including military purposes or end uses involving human rights violations. You and your end users shall not provide inputs containing materials or information that require export control licenses. ## XI. Termination 1. **Termination by you.** You may terminate these Terms through any of the following methods: a) By closing your account; b) By ceasing to use the services and explicitly rejecting any proposed changes before they take effect; c) By expressing your intent to discontinue using our Services and meeting the platform's termination conditions. If you submit a request to delete your account, we will review and process your request. Once approved, your account will be deleted, and these Terms or any other terms as between you and us will be terminated. 2. **Termination by us.** We reserve the right to terminate your access to the services by deleting your account under the following circumstances: a) **Violation of the service agreement** – If you breach the terms of these Terms, we may terminate your access in accordance with its enforcement policies. You will be notified upon termination. If you attempt to re-register under your own or any third party’s name after termination, we reserves the right to terminate your access again. b) **Inability to contact you** – If the email or phone number provided during registration is no longer valid or cannot receive messages, and you fail to update your contact details within three business days after being notified, or if we cannot reach you through any means provided at registration. c) **False registration information** – If we determine that the key details in your registration information are false. d) **Rejection of updated terms** – If these Terms are terminated or updated, and you explicitly refuse to accept the new terms. e) **Illegal or unauthorized activity** – If you post illegal content, infringe on others' legal rights, or engage in other serious violations of laws or Platform Rules. f) **Repeated violations of Platform Rules** – If you repeatedly violate Z.ai’s platform policies, and the violations are deemed serious, leading to account suspension under Platform Rules. g) **Other grounds for termination** – If we determine that your account should be terminated for any other justified reason. ## XII. General 1. **Entire agreement.** These Terms represent the complete and final agreement between you and us regarding the Services and any related matters, replacing any previous written or verbal agreements on the same subject. 2. **Updates and changes.** We continually develop and improve our Services and may update these Terms or the Services themselves from time to time. Changes may occur due to: a) Legal or regulatory updates, b) Security or safety concerns, c) Events beyond our reasonable control, d) Routine service improvements, or e) Adaptation to new technologies. We will notify you of these changes by updating our website, sending an email, or using other appropriate communication methods with at least 30 days advance notice. Any changes take effect as soon as they are posted on our website. If you disagree with the updated Terms, you must stop using the Services immediately and notify us in writing to close your account. Continuing to use the Services after notification means you accept the updated Terms. 3. **Independent Relationship.** You and us are independent entities. These Terms do not create an agency, partnership, joint venture, employment, or franchise relationship between us. 4. **Service Discontinuation.** We reserve the right to discontinue, suspend or modify the Services at any time, and may provide you with prior notice where reasonably practicable. 5. **Severability.** If any part of these Terms is found to be illegal, invalid, or unenforceable under the law of any jurisdiction, the remaining provisions will still be valid and enforceable to the fullest extent possible. 6. **No waiver of rights.** If we delay or choose not to enforce any part of these Terms, it does not mean we waive our rights. We retain the right to act on any future or similar breaches. Our rights and remedies under these Terms are in addition to any other legal rights we have. 7. **No assignment without consent.** You cannot transfer, assign, sublicense, or subcontract your rights or obligations under these Terms to anyone else without our prior written consent. 8. **Breach and remedial actions.** If you engage in any of the following actions, it will be considered a breach of contract, and we reserve the right to take appropriate measures: a) We have the right to review your registration information and service usage. If we identify or suspect any issues, we may issue inquiries or require corrective action. If you fail to provide a reasonable response within a reasonable timeframe, we may take measures such as restricting, blocking, or deleting your registration information or service access. b) We have the right, without prior notice, to delete or impose restrictions on content you publish through our services if we reasonably believe that: i. It contains fraudulent, malicious, or harmful false information. ii. It disrupts the normal order of the platform. iii. It violates public interest or may seriously harm the legitimate rights of us or other users. If your breach causes us any loss, we have the right to seek compensation from you. Such losses include, but are not limited to, damage to our reputation, lost business opportunities, legal costs (such as litigation, arbitration, attorney fees, travel expenses, investigation costs, notarization, and appraisal fees), as well as any compensation, penalties, or fines paid to third parties, including those arising from administrative or criminal sanctions. ## XIII. Governing Law and Dispute Resolution 1. You understand and agree that when using Z.ai products in different regions, you must comply with all applicable laws, regulations, and policy requirements of those regions. 2. You may not resell, export, or transfer Z.ai products, or any products developed using Z.ai, to specific individuals or countries subject to regulatory restrictions. Furthermore, you may not directly or indirectly export Z.ai products or any technology containing Z.ai products without obtaining all necessary prior approvals from the relevant government authorities and the explicit prior consent of the Z.ai team. 3. These Terms, along with any disputes, claims, or legal actions arising from or relating to their subject matter, interpretation, enforcement, or formation—including those based on tort or other non-contractual grounds—shall be governed by the laws of Singapore. 4. Any disputes arising from or related to your use of the Services shall be resolved through mutual negotiation between you and us. You may contact us using the details provided at the bottom of these Terms. 5. If a resolution cannot be reached through negotiation, any dispute, controversy, difference or claim arising out of or relating to these Terms, including the existence, validity, interpretation, performance, breach or termination thereof, shall be referred to and finally resolved by arbitration administered by the Singapore International Arbitration Centre (“SIAC”) in accordance with the Arbitration Rules of the Singapore International Arbitration Centre (“SIAC Rules”) for the time being in force, which rules are deemed to be incorporated by reference in this clause. The seat of the arbitration shall be Singapore. 6. If any provision of these Terms is deemed null, invalid, or unenforceable, such provision shall be considered severable and shall not affect the validity or enforceability of the remaining provisions of these Terms. ## XIV. Contact You have the right to submit complaints and reports by emailing [user\_feedback@z.ai](mailto:user_feedback@z.ai). We will assess your complaint or report and take appropriate action. *** ## Additional Terms for API Services The use of API services (**“API Services”**) by business and developers on Z.ai platform is also governed by these Additional Terms for API Services (**“Additional Terms”**). Terms that are not defined in these Additional Terms have the meanings given in the Terms. For business and developers’ use of API Services, in case any inconsistency exists between the Terms and Additional Terms, the Additional Terms shall prevail. ### 1. Use API Services a) We grant you a non-exclusive right to access and use the API Services during the valid term, which includes the right to use Z.ai’s API to integrate the Services into your applications or to develop downstream systems, applications or functions to your end users (**“End Users”**). b) You are solely responsible for the systems, applications, or functions developed by using API Services. You must assume the corresponding legal responsibilities and obligations and establish agreements with your End Users regarding their rights and obligations. It is also your duty to regulate and manage the usage behavior of End Users. c) You shall establish organizational and technical measures in accordance with the requirements provided under applicable laws and regulations, including but not limited to user management, data security, monitoring, warning, and emergency disposal, to ensure the integrity, confidentiality, and availability of your systems, networks, information, and data, preventing risks such as data security, public opinion, or misuse and abuse of products or services. d) You must truthfully and accurately disclose the use of models from the Z.ai (and other models, if applicable) in accordance with relevant regulations and supervisory requirements. You may not make false or misleading disclosures, labels or displays regarding products and services that are not making use of models from the Z.ai e) Given that API-returned information may not be flawless, you shall promptly take deletion, blocking, withdrawal and other disposal measures in accordance with the law for illegal and harmful information discovered or notified, to ensure lawful and compliant operation of applications. f) You and End Users must not engage in any of the following improper uses of the Z.ai during your use of our API Services, including but not limited to: i. Use the Services and End User Content (as defined in Section 3) in a way that violate applicable laws or our policies; ii. Any attempt to extract data, parameters, or code from the Z.ai; iii. Any use of the Z.ai for any services requiring specific qualifications, including but not limited to healthcare, finance, investments, insurance, credit, education, employment, housing, legal affairs, medical decisions, or other significant decisions; iv. Any employment of the Z.ai for decision-making activities; v. Any removal of copyright information from the Z.ai or its internal models; vi. Any reverse engineering, decompiling, disassembling, or attempting to extract data from Z.ai algorithms, source code, mechanisms, or any related components; vii. Any exploitation of our intellectual property through use, lease, lending, copying, modification, linking, reposting, compilation, publication, or establishment of mirror sites; viii. Except for specific functionalities expressly authorized by us, any unauthorized reproduction, modification, augmentation, deletion, interception, or creation of derivative works from: (i) the Z.ai; (ii) data cached in terminal memory during platform operation; (iii) client-server interaction data during software execution; or (iv) essential system operation data, including but not limited to access through unauthorized plugins, extensions, or third-party tools/services; ix. Any tampering with or falsification of software instructions or operational data to modify, enhance, reduce, or otherwise alter platform functionality or performance characteristics, or the commercialization or public dissemination of software/methods designed for such purposes, regardless of commercial intent; x. Any access to or use of our software and services through unauthorized third-party software, plugins, add-ons, or systems, or any development/publication/distribution of such tools; xi. Any direct or authorized interference with the Z.ai, including its components, modules, or data, whether conducted personally or through third-party software; xii. Except for authorized integration with your specific business scenarios, any use of the Z.ai's models, prompts, or model-generated content for the development, training, labeling, fine-tuning, optimization, iteration, or similar activities related to external models is strictly prohibited. Furthermore, any utilization of platform models or exported data to develop, train, or enhance algorithms or models that compete with us; xiii. Any generation of malicious code by using the Z.ai or its internal models; xiv. Any other unauthorized actions or usage potentially detrimental to our interests. ### 2. Data Protection a) We will collect and process data you provide as set forth in the Data Processing Addendum for API Services. ### 3. End User Content a) You and End User might provide Input and generate Output (collectively **"End User Content"**) through the applications or services you provided/developed by using our API Services. We won't claim ownership over your Input and End User Content. b) We will only use End User Content as necessary to provide you with the API Services, comply with applicable law, enforce our policies, and prevent abuse. \[We will not use End User Content to develop or improve Services, unless you explicitly agree to such use.] c) You are responsible for all End User Content. You warrant that the Input has all rights, licenses and permission required to provide Input to the Services. You are responsible for managing the accuracy, safety, legality, and compliance of End User Content, establishing mechanisms including but not limited to content review, user management, data security, monitoring and early warning, and emergency response. d) Due to the nature of the Services and artificial intelligence generally, Outputs may not be unique, and other users may receive similar content from the Services. Responses that are requested by and generated for other users are not considered your Outputs. You acknowledge that we may generate the same or similar content for others and that we reserve all rights to do so. ### 4. IP Rights a) Except as expressly set forth herein, the Additional Terms does not grant: (a) us any IP Rights in End User Content; or (b) you any IP Rights in the Services. You only have a limited right to use the API Services, and no ownership rights are transferred to you or End Users under the Additional Terms. b) You grant us only the limited rights that are reasonably necessary for us to deliver the API Services. This limited permission also extends to subcontractors or sub-processors. c) If you provide feedback, you grant us the right to use and exploit feedback without restriction or compensation. ### 5. Termination and Effect of Termination a) The Additional Terms will be terminated once the conditions set under Section XI of the Terms are met. b) If the Terms and the Additional Terms terminate, the rights granted by us to you will cease immediately. Termination or expiration will not affect any rights or obligations, including the payment of amounts due, which have accrued under these Terms and separate order forms (if any) up to the date of termination. c) Section 4 of the Additional Terms shall survive the termination of the Additional Terms. # New Released Source: https://docs.z.ai/release-notes/new-released Follow along with updates across Z.AI’s models ## Models * Designed for long-horizon tasks, GLM-5.1 can work independently for up to 8 hours in a single run, enabling a full loop from planning and execution to iterative refinement and final delivery. * It demonstrates stronger engineering intelligence across autonomous planning, sustained execution, bug fixing, and strategy iteration, while achieving comprehensive capability alignment with Claude Opus 4.6. Built with multi-turn SFT, RL, and a process-quality evaluation framework, GLM-5.1 further improves stability, consistency, and tool use over extended tasks. Learn more in our [documentation](/guides/llm/glm-5.1).\* * Designed for vision-based coding and claw-style Agent workflows, GLM-5V-Turbo brings native multimodal understanding to images, video, and text, enabling stronger planning, coding, and action execution in visually grounded tasks. * It strengthens multimodal tool use and GUI task execution, allowing the model to better interpret interfaces, design mocks, documents, and charts, plan the next action, and complete end-to-end Agent workflows while maintaining strong pure-text coding and reasoning performance. Learn more in our [documentation](/guides/vlm/glm-5v-turbo).\* * Designed for high-throughput OpenClaw lobster workloads, GLM-5-Turbo focuses on improving stability and efficiency in long-chain Agent tasks, enabling smoother execution for complex, multi-step workflows. * It strengthens tool and Skills integration and enhances complex instruction decomposition, allowing the model to better identify task goals, plan execution steps, coordinate across multiple agents, and maintain temporal consistency in extended tasks.Learn more in our [documentation](/guides/llm/glm-5-turbo).\* * Designed for complex system engineering and long-range Agent tasks, GLM-5 shifts the paradigm from coding to engineering, demonstrating strong deep-reasoning performance in backend architecture, complex algorithms, and stubborn bug fixing. * It directly benchmarks against Claude Opus 4.5 in code-logic density and systems-engineering capability, and integrates DeepSeek Sparse Attention for higher token efficiency while preserving long-context quality.Learn more in our [documentation](/guides/llm/glm-5).\* * We’ve launched GLM-OCR, a compact and high-performance optical character recognition model powered by the self-developed CogViT and GLM-0.5B encoder-decoder architecture, enabling efficient cross-modal alignment through its dedicated connection layer. * The update leverages CLIP pre-training on billions of image-text pairs to deliver robust visual semantic understanding and key token extraction capabilities, while maintaining a lightweight design for fast inference. Learn more in our [documentation](/guides/vlm/glm-ocr).\* * We’ve launched GLM-4.7-Flash, a lightweight and efficient model designed as the free-tier version of GLM-4.7, delivering strong performance across coding, reasoning, and generative tasks with low latency and high throughput. * The update brings competitive coding capabilities at its scale, offering best-in-class general abilities in writing, translation, long-form content, role play, and aesthetic outputs for high-frequency and real-time use cases. Learn more in our [documentation](/guides/llm/glm-4.7).\* * We’ve launched GLM-Image, a state-of-the-art image generation model built on a multimodal architecture and fully trained on domestic chips, combining autoregressive semantic understanding with diffusion-based decoding to deliver high-quality, controllable visual generation. * The update significantly enhances performance in knowledge-intensive scenarios, with more stable and accurate text rendering inside images, making GLM-Image especially well suited for commercial design, educational illustrations, and content-rich visual applications.Learn more in our [documentation](/guides/image/glm-image).\* * We’ve released GLM-4.7, our foundation model with significant improvements in coding, reasoning, and agentic capabilities. It delivers more reliable code generation, stronger long-context understanding, and improved end-to-end task execution across real-world development workflows. * The update brings open-source SOTA performance on major coding and reasoning benchmarks, enhanced agentic coding for goal-driven, multi-step tasks, and improved front-end and document generation quality. Learn more in our [documentation](/guides/llm/glm-4.7).\* * We’ve launched AutoGLM-Phone-Multilingual, our latest multimodal mobile automation framework that understands screen content and executes real actions through ADB. It enables natural-language task execution across 50+ mainstream apps, delivering true end-to-end mobile control. * The update introduces multilingual support (English & Chinese), enhanced workflow planning capabilities, and improved task execution reliability. Learn more in our [documentation](/guides/vlm/autoglm-phone-multilingual).\* * We’ve launched GLM-ASR-2512, our ASR model, delivering industry-leading accuracy with a Character Error Rate of just 0.0717, and significantly improved performance across real-world multilingual and accent-rich scenarios. * The update introduces enhanced custom dictionary support and expanded specialized terminology recognition. Learn more in our [documentation](/guides/audio/glm-asr-2512).\* * We’re excited to introduce GLM-4.6V, Z.ai’s latest iteration in multimodal large language models. This version enhances vision understanding, achieving state-of-the-art performance in tasks involving images and text. * The update also expands the context window to 128K, enabling more efficient processing of long inputs and complex multimodal tasks. Learn more in our [documentation](/guides/vlm/glm-4.6v).\* * We’ve launched GLM-4.6, the flagship coding model, showcasing enhanced performance in both public benchmarks and real-world programming tasks, making it the leading coding model in China. * The update also expands the context window to 200K, improving its ability to handle longer code and complex agent tasks. Learn more in our [documentation](/guides/llm/glm-4.6).\* * We’ve launched GLM-4.5V, a 100B-scale open-source vision reasoning model, supporting a broad range of visual tasks including video understanding, visual grounding, GUI agents and etc. * The update also adds a new thinking mode. Learn more in our [documentation](/guides/vlm/glm-4.5v).\* * We’ve launched GLM Slide/Poster Agent, an AI-powered creation agent that combines information retrieval, content structuring, and visual layout design to generate professional-grade slides and posters from natural language instructions. * The update also brings a seamless integration of content generation with design conventions. Learn more in our [documentation](/guides/agents/slide).\* * We’ve launched GLM-4.5, our latest native agentic LLM, delivering doubled parameter efficiency and strong reasoning, coding, and agentic capabilities. * It also offers seamless one-click compatibility with the Claude Code framework. Learn more in our [documentation](/guides/llm/glm-4.5).\* * We’ve launched CogVideoX-3, an incremental upgrade to our video generation model with improved quality and new features. * It adds support for start and end frame synthesis. Learn more in our [documentation](/guides/video/cogvideox-3).\* # Claude Code Source: https://docs.z.ai/scenario-example/develop-tools/claude Methods for Using the GLM Coding Plan in Claude Code Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows -- all through natural language commands. Claude Code is even more powerful with the [GLM Coding Plan](https://z.ai/subscribe?utm_source=zai\&utm_medium=link\&utm_term=devpack-integration\&utm_campaign=Platform_Ops&_channel_track_key=w3mNdY8g)— giving you 3× the usage at a fraction of the cost. Code faster, debug smarter, and manage workflows seamlessly with more tokens, and rock-solid reliability. For users who have used the service before 2025-09-30: \ The default model for GLM Coding Plan has been upgraded to GLM-4.7 with seamless user experience.\ However, if you previously configured fixed model mappings for GLM-4.5 in `settings.json`, please refer to the "How to Switch the Model in Use" section in the FAQ below to make adjustments and ensure you're using the latest GLM-4.7 model. After successfully configuring the subscription, the default server model mapping is applied, where you see the Claude model in the interface but the GLM model is actually used. \ You can manually adjust the model mapping (not recommended), see the "How to Switch the Model in Use" section in the FAQ for more details. ## Step 1: Installing the Claude Code Prerequisites: * [Node.js 18 or newer](https://nodejs.org/en/download/) * For MacOS, please use [nvm](https://nodejs.org/en/download/) to install Nodejs, if you directly install the package, maybe encounter permission issues * For Windows, please additionally install [Git for Windows](https://git-scm.com/download/win) ``` # Install Claude Code npm install -g @anthropic-ai/claude-code # Navigate to your project cd your-awesome-project # Complete claude ``` If you are not familiar with npm but have Cursor, you can enter the command in Cursor, and Cursor will guide you through the installation of Claude Code. ```bash theme={null} https://docs.anthropic.com/en/docs/claude-code/overview Help me install Claude Code ``` **Note**: If MacOS users encounter permission issues during installation, please use [nvm](https://nodejs.org/en/download/) to install Nodejs. ## Step 2: Config GLM Coding Plan * Access [Z.AI Open Platform](https://z.ai/model-api), Register or Login. * Create an API Key in the [API Keys](https://z.ai/manage-apikey/apikey-list) management page. * Copy your API Key for use. Set up environment variables **using one of the following methods** in the **macOS Linux** or **Windows**: **Note**: Some commands show no output when setting environment variables — that’s normal as long as no errors appear. Coding Tool Helper is a coding-tool companion that quickly loads **GLM Coding Plan** into your favorite **Coding Tools**. Install and run it, then follow the on-screen guidance to automatically install tools, configure plan, and manage MCP servers. ```bash theme={null} # Run Coding Tool Helper directly in the terminal npx @z_ai/coding-helper ``` For more details, please refer to the [Coding Tool Helper](/devpack/extension/coding-tool-helper) documentation. ![Description](https://cdn.bigmodel.cn/markdown/1764749390483image.png?attname=image.png) Just run the following command in your terminal \ Attention only macOS Linux environment is supported, this method does not support Windows ```bash theme={null} curl -O "https://cdn.bigmodel.cn/install/claude_code_zai_env.sh" && bash ./claude_code_zai_env.sh ``` The script will automatically modify `~/.claude/settings.json` to configure the following environment variables(You don't need to edit manually): ```json theme={null} { "env": { "ANTHROPIC_AUTH_TOKEN": "your_zai_api_key", "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic", "API_TIMEOUT_MS": "3000000" } } ``` If you have previously configured environment variables for Claude Code, you can manually configure them as follows. A new window is required for the changes to take effect. ```bash MacOS & Linux theme={null} # Edit the Claude Code configuration file `~/.claude/settings.json` # Add or modify the env fields ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN # Note to replace `your_zai_api_key` with the API Key you obtained in the previous step { "env": { "ANTHROPIC_AUTH_TOKEN": "your_zai_api_key", "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic", "API_TIMEOUT_MS": "3000000" } } ``` ```cmd Windows Cmd theme={null} # Run the following commands in Cmd # Note to replace `your_zai_api_key` with the API Key you obtained in the previous step setx ANTHROPIC_AUTH_TOKEN your_zai_api_key setx ANTHROPIC_BASE_URL https://api.z.ai/api/anthropic ``` ```powershell Windows PowerShell theme={null} # Run the following commands in PowerShell # Note to replace `your_zai_api_key` with the API Key you obtained in the previous step [System.Environment]::SetEnvironmentVariable('ANTHROPIC_AUTH_TOKEN', 'your_zai_api_key', 'User') [System.Environment]::SetEnvironmentVariable('ANTHROPIC_BASE_URL', 'https://api.z.ai/api/anthropic', 'User') ``` ## Step 3: Start with Claude Code Once the configuration is complete, you can start using **Claude Code** in your terminal or cmd: ``` cd your-project-directory claude ``` > If prompted with "Do you want to use this API key," select "Yes." After launching, grant Claude Code permission to access files in your folder as shown below: ![Description](https://cdn.bigmodel.cn/markdown/1753631613096claude-2.png?attname=claude-2.png) You can use Claude Code for development Now! *** ## FAQ ### How to Switch the Model in Use Mapping between Claude Code internal model environment variables and GLM models, with the default configuration as follows: * `ANTHROPIC_DEFAULT_OPUS_MODEL`: `GLM-4.7` * `ANTHROPIC_DEFAULT_SONNET_MODEL`: `GLM-4.7` * `ANTHROPIC_DEFAULT_HAIKU_MODEL`: `GLM-4.5-Air` If adjustments are needed, you can directly modify the configuration file (for example, \~/.claude/settings.json in Claude Code) to switch to other models. It is generally not recommended to manually adjust the model mapping, as hardcoding the model mapping makes it inconvenient to automatically update to the latest model when the GLM Coding Plan models are updated. If you want to use the latest default mappings (for existing users who have configured old model mappings), simply delete the model mapping configuration in `settings.json`, and Claude Code will automatically use the latest default models. 1. Configure `~/.claude/settings.json` with the following content: ```text theme={null} { "env": { "ANTHROPIC_DEFAULT_HAIKU_MODEL": "glm-4.5-air", "ANTHROPIC_DEFAULT_SONNET_MODEL": "glm-4.7", "ANTHROPIC_DEFAULT_OPUS_MODEL": "glm-4.7" } } ``` 2. Open a new terminal window and run `claude` to start Claude Code, enter `/status` to check the current model status. ![Description](https://cdn.bigmodel.cn/markdown/1759420390607image.png?attname=image.png) ### Vision Search Reader MCP Refer to the [Vision MCP Server](/devpack/mcp/vision-mcp-server) , [Search MCP Server](/devpack/mcp/search-mcp-server) and [Web Reader MCP Server](/devpack/mcp/reader-mcp-server) documentation; once configured, you can use them in Claude Code. ### Manual Configuration Not Work If you manually modified the `~/.claude/settings.json` configuration file but found the changes did not take effect, refer to the following troubleshooting steps. * Close all Claude Code windows, open a new command-line window, and run `claude` again to start. * If the issue persists, try deleting the `~/.claude/settings.json` file and then reconfigure the environment variables; Claude Code will automatically generate a new configuration file. * Confirm that the JSON format of the configuration file is correct, check the variable names, and ensure there are no missing or extra commas; you can use an online JSON validator tool to check. ### Recommended Claude Code Version We recommend using the latest version of Claude Code. You can check the current version and upgrade with the following commands: > We have verified compatibility with Claude Code 2.0.14 and other versions. ```bash theme={null} # Check the current version claude --version 2.0.14 (Claude Code) # Upgrade to the latest claude update ``` # Claude Code IDE Plugin Source: https://docs.z.ai/scenario-example/develop-tools/claude-for-ide How to use the Claude Code plugin in VS Code and JetBrains with GLM Coding Plan Claude Code is an intelligent coding tool that can run in the terminal, and it can also be used by installing plugins in IDEs such as VS Code and JetBrains. When paired with [**GLM Coding Plan**](https://zhipuaishengchan.datasink.sensorsdata.cn/t/Nd), Claude Code’s capabilities are further enhanced — enjoy **3× usage** at a lower price, making coding, debugging, and workflow management more efficient and stable. ## Step 1: Install Claude Code & Configure GLM Coding Plan The Claude Code IDE plugin depends on the Claude Code CLI. Please first complete the installation and GLM Coding Plan configuration by following the [**Claude Code Guide with GLM Coding Plan**](./claude). ## Step 2: Install and Use the IDE Plugin The Claude Code VS Code plugin supports VS Code and VS Code-based IDEs (such as Cursor and Trae). Open VS Code, go to the Extensions Marketplace, search for and install the “Claude Code” plugin. ![Description](https://cdn.bigmodel.cn/markdown/1763349262442image.png?attname=image.png) After installation, click the Claude Code icon at the top right as shown below to open the Claude Code panel. Wait a few seconds for initialization, then start using it. ![Description](https://cdn.bigmodel.cn/markdown/1763349672819image.png?attname=image.png) The Claude Code JetBrains plugin supports JetBrains IDEs such as IntelliJ IDEA, PyCharm, and WebStorm. Open a JetBrains IDE (e.g., IntelliJ IDEA or PyCharm), go to the Plugins Marketplace, search for “Claude Code,” and install it. ![Description](https://cdn.bigmodel.cn/markdown/1763349847970image.png?attname=image.png) After installation, restart the IDE to use Claude Code features. ![Description](https://cdn.bigmodel.cn/markdown/1763350722159image.png?attname=image.png) All set! You can now use Claude Code in your IDE for development. *** # Cline Source: https://docs.z.ai/scenario-example/develop-tools/cline Methods for Using the GLM Coding Plan in Cline Plugin Cline is a powerful VS Code plugin that lets you use AI models directly in your editor for code generation, file operations, and more. It not only accelerates your workflow but also provides contextual understanding and intelligent suggestions for complex tasks. Cline is even more capable with the [GLM Coding Plan](https://z.ai/subscribe?utm_source=zai\&utm_medium=link\&utm_term=devpack-integration\&utm_campaign=Platform_Ops&_channel_track_key=w3mNdY8g), giving you more usage at a lower cost, so you can handle code generation, file management, and multimodal interactions with ease. Using the GLM Coding Plan, you need to configure the dedicated Coding API [https://api.z.ai/api/coding/paas/v4](https://api.z.ai/api/coding/paas/v4) instead of the General API [https://api.z.ai/api/paas/v4](https://api.z.ai/api/paas/v4) For users who have used the service before 2025-12-22: \ The default model has been upgraded to GLM-4.7. Please update your config accordingly. ## Step 1: Installing the Cline Plugin ### 1. Open the Extensions Marketplace a. Open VS Code b. Click the Extensions Marketplace icon on the left c. Enter `cline` in the search box d. Locate the `Cline` extension ![Description](https://cdn.bigmodel.cn/markdown/1753688113562c1.png?attname=c1.png) ### 2. Install the Plugin a. Click the `Install` button to install b. After installation, choose to trust the developer ![Description](https://cdn.bigmodel.cn/markdown/1753688124582c2.jpg?attname=c2.jpg) ## Step 2: Configuring API Settings ### 1. Select API Key Method Choose `Use your own API Key` ![Description](https://cdn.bigmodel.cn/markdown/1753688131403c3.png?attname=c3.png) ### 2. Enter Configuration Information Fill in the relevant information according to the following settings: * **API Provider**: Select `OpenAI Compatible` * **Base URL**: Enter `https://api.z.ai/api/coding/paas/v4` * **API Key**: Enter your Z.AI API Key * **Model**: Select "Use custom" and enter the model name (e.g., `glm-4.7`) * **Other Configurations**: * Unchecking **Support Images** * Adjust **Context Window Size** to `200000` * Adjust `temperature` and other params according to your task requirements. ![Description](https://cdn.bigmodel.cn/markdown/1769394544820image.png) ## Step 3: Getting Started Once configured, you can enter your requirements in the input box to let the model assist you with various tasks, such as: * Creating and editing files * Generating code * Refactoring code * Explaining code logic * Debugging issues ![Description](https://cdn.bigmodel.cn/markdown/1753688145687c5.png?attname=c5.png) ## Step 4: Vision Search Reader MCP Refer to the [Vision MCP Server](/devpack/mcp/vision-mcp-server) , [Search MCP Server](/devpack/mcp/search-mcp-server) and [Web Reader MCP Server](/devpack/mcp/reader-mcp-server) documentation; once configured, you can use them in Cline. # Crush Source: https://docs.z.ai/scenario-example/develop-tools/crush Methods for Using the GLM Coding Plan in Crush Crush is a powerful AI coding agent for the terminal (CLI + TUI). It supports multiple models to handle code generation, debugging, file operations, and more — all inside your command line. Crush is supercharged with the [GLM Coding Plan](https://z.ai/subscribe?utm_source=zai\&utm_medium=link\&utm_term=devpack-integration\&utm_campaign=Platform_Ops&_channel_track_key=w3mNdY8g), making your terminal workflow smarter and more efficient. Using the GLM Coding Plan, you need to configure the dedicated Coding API [https://api.z.ai/api/coding/paas/v4](https://api.z.ai/api/coding/paas/v4) instead of the General API [https://api.z.ai/api/paas/v4](https://api.z.ai/api/paas/v4) For users who have used the service before 2025-12-22: \ The default model has been upgraded to GLM-4.7. Please update your config accordingly. ## Step 1: Installing Crush Select the appropriate installation method based on your system: ``` brew install charmbracelet/tap/crush ``` ``` npm install -g @charmland/crush ``` ``` yay -S crush-bin ``` ``` nix run github:numtide/nix-ai-tools#crush ``` ## Step 2: Configuring the GLM Model ### 1. Obtain Your Z.AI API Key Visit Z.AI to get your [API Key](https://z.ai/manage-apikey/apikey-list). ### 2. Launch Crush and Select Model Run the crush command to start the application: ``` crush ``` In the model selection interface, choose one of the following models: * glm-4.7 : Highest performance, strong coding version * glm-4.7 : Standard version, suitable for complex tasks * glm-4.5-air : Lightweight version, faster response ### 3. Enter your Z.AI API key Enter the API Key obtained from Z.AI at the prompt. ![Description](https://cdn.bigmodel.cn/markdown/1759228565353crush.png?attname=crush.png) ## Step 3: Modify Crush Configuration ### 1. Locate the Configuration File Depending on your OS, the configuration file can be found at: ```bash MacOS/Linux theme={null} ~/.config/crush/crush.json ``` ```powershell Windows theme={null} %USERPROFILE%\.config\crush\crush.json ``` ### 2. Switch to the GLM Coding Plan Endpoint Open the crush.json file and configure it as follows, making sure to replace with your API KEY: ``` { "providers": { "zai": { "id": "zai", "name": "ZAI Provider", "base_url": "https://api.z.ai/api/coding/paas/v4", "api_key": "your_api_key" } } } ``` ## Step 4: Complete Configuration and Pick model Press `ctrl+p`, choose "Switch Model" After configuration, you can now: * Generate and optimize code using GLM-4.7 * Conduct technical Q\&A and debugging * Execute complex programming tasks * Experience the powerful capabilities of Z.AI ## Step 5: Vision Search Reader MCP Refer to the [Vision MCP Server](/devpack/mcp/vision-mcp-server) , [Search MCP Server](/devpack/mcp/search-mcp-server) and [Web Reader MCP Server](/devpack/mcp/reader-mcp-server) documentation; once configured, you can use them in Crush. # Cursor Source: https://docs.z.ai/scenario-example/develop-tools/cursor Methods for using the GLM Coding Plan in Cursor You can easily integrate the **GLM-4.7** model into **Cursor** that supports the **OpenAI API protocol**. Note: Custom configuration is only supported in Cursor Pro and higher versions \ Using the GLM Coding Plan, you need to configure the dedicated Coding API [https://api.z.ai/api/coding/paas/v4](https://api.z.ai/api/coding/paas/v4) instead of the General API [https://api.z.ai/api/paas/v4](https://api.z.ai/api/paas/v4) ### 1. Install Cursor Download and install Cursor from the official website. ### 2. Create a New Provider/Model In Cursor, navigate to the "**Models**" section and click the "**Add Custom Model**". ![Description](https://cdn.bigmodel.cn/markdown/176032216013820251013-100735.jpeg?attname=20251013-100735.jpeg) * Select the **OpenAI Protocol**. * Configure the **OpenAI API Key** (obtained from the Z.AI). * In **Override OpenAI Base URL**, replace the default URL with `https://api.z.ai/api/coding/paas/v4`. * Enter the model you wish to use, such as `GLM-4.7`, `GLM-4.7` or `GLM-4.5-air`. * Note: In Cursor, the model name must be entered in uppercase, such as `GLM-4.7`. ![Description](https://cdn.bigmodel.cn/markdown/176032218295020251013-100740.jpeg?attname=20251013-100740.jpeg) ### 3. Save and Switch Models After configuration, save your settings and select the newly created **GLM-4.7 Provider** on the homepage. ### 4. Get Started With this setup, you can begin using the **GLM-4.7** model for code generation, debugging, task analysis, and more. ![Description](https://cdn.bigmodel.cn/markdown/176032221518820251013-100745.jpeg?attname=20251013-100745.jpeg) # Factory Droid Source: https://docs.z.ai/scenario-example/develop-tools/droid Methods for Using the GLM Coding Plan in Factory Droid Factory Droid is an enterprise-grade AI coding agent that lives in your terminal and handles end-to-end development workflows. Works seamlessly with [Z.AI's GLM Coding Plan](https://z.ai/subscribe) for high-performance models at exceptional value. ## Step 1: Installing Factory Droid **macOS / Linux:** ```bash theme={null} curl -fsSL https://app.factory.ai/cli | sh ``` **Windows:** ```powershell theme={null} irm https://app.factory.ai/cli/windows | iex ``` ## Step 2: Configuring Z.AI GLM Models ### 1. Get Your Z.AI API Key 1. Visit the [Z.AI API Console](https://z.ai/manage-apikey/apikey-list) 2. Create an API key if you don't have one ### 2. Configure Custom Models Factory Droid uses BYOK (Bring Your Own Key) to connect with Z.AI's GLM models. **Configuration file location** * macOS/Linux: `~/.factory/settings.json` * Windows: `%USERPROFILE%\.factory\settings.json` Use either method below:\ Remember to replace `your_api_key` with the API Key you obtained in the previous step. Method A: Anthropic Protocol ```json theme={null} { "customModels": [ { "displayName": "GLM-4.7 [Z.AI Coding Plan] - Anthropic", "model": "glm-4.7", "baseUrl": "https://api.z.ai/api/anthropic", "apiKey": "your_api_key", "provider": "anthropic", "maxOutputTokens": 131072 } ] } ``` Method B: OpenAi Chat Completion Protocol ```json theme={null} { "customModels": [ { "displayName": "GLM-4.7 [Z.AI Coding Plan] - Openai", "model": "glm-4.7", "baseUrl": "https://api.z.ai/api/coding/paas/v4", "apiKey": "your_api_key", "provider": "generic-chat-completion-api", "maxOutputTokens": 131072 } ] } ``` **Important notes** * GLM Coding Plan users must use the Coding API endpoint: `https://api.z.ai/api/coding/paas/v4` * Standard plan users use the general API endpoint: `https://api.z.ai/api/paas/v4` * Replace `your_api_key` with your actual API key * API keys are stored locally and never uploaded to Factory servers ## Step 3: Start Using Factory Droid ### 1. Launch Droid Navigate to your project directory and start droid: ```bash theme={null} cd /path/to/your/project droid ``` On first launch, you'll be prompted to sign in via your browser to connect to Factory's services. ### 2. Select Your Z.AI Model Once droid is running, use the `/model` command to select your Z.AI GLM model: ``` /model ``` Your custom Z.AI models will appear in a separate "Custom models" section. Select the GLM model you configured. ### 3. Start Coding Use droid for tasks like analyzing code, implementing features, fixing bugs, reviewing changes, and more. ## Key Features **Specification Mode** * Press **Shift+Tab** to activate * Describe features in plain language * Get automatic planning before implementation * Approve plans before any code changes **Auto-Run Mode** * **Low**: Edits and read-only commands * **Medium**: Reversible commands (package installs, builds, local git, etc.) * **High**: All commands except explicitly dangerous ones * Cycle modes with **Shift+Tab** **IDE Integration** * **VS Code/Cursor/Windsurf**: Auto-installs when you run `droid` * **JetBrains**: Install plugin from marketplace * Features: Interactive diffs, auto-shares current file/selection, quick launch **AGENTS.md — Project Conventions** Document your workflow at repo root: ```markdown theme={null} # Build & Test - Test: `npm test` - Build: `npm run build` # Conventions - TypeScript strict mode - 100-char line limit - Tests required for features ``` Droid automatically follows your team's practices. **Additional Features** * Cost tracking with `/cost` command * SOC-2 compliant with enterprise deployment options * Integrations: Jira, Notion, Slack, GitHub * MCP (Model Context Protocol) support * Transparent review workflow for every change ## Resources * **Documentation**: [docs.factory.ai](https://docs.factory.ai/cli/getting-started/overview) * **BYOK Configuration**: [docs.factory.ai/cli/byok/overview](https://docs.factory.ai/cli/byok/overview) * **Support**: [support@factory.ai](mailto:support@factory.ai) # Eigent Source: https://docs.z.ai/scenario-example/develop-tools/eigent Methods for Using the GLM Coding Plan in Eigent Eigent is an open-source cowork agent that runs on your desktop. It is built with a multi-agent workforce architecture, supported by general abilities such as browser automation, terminal automation and MCPs. This design enables agents in Eigent to perform tasks much like human workers operating in real desktop environments, without the need for deep API integrations or constant workflow reconfiguration. Eigent works seamlessly with the [GLM Coding Plan](https://z.ai/subscribe), delivering high-performance AI capabilities at exceptional value. Using the GLM Coding Plan, you need to configure the dedicated Coding API [https://api.z.ai/api/coding/paas/v4](https://api.z.ai/api/coding/paas/v4) instead of the General API [https://api.z.ai/api/paas/v4](https://api.z.ai/api/paas/v4) ## Step 1: Installing Eigent Choose the installation method that best suits your needs: ### 1. Download Eigent Visit [eigent.ai](https://eigent.ai/) and download the latest version for your platform (macOS 11+ or Windows). ### 2. Install the Application * **macOS**: Open the downloaded `.dmg` file and drag Eigent into your Applications folder * **Windows**: Run the downloaded `.exe` installer and follow the on-screen instructions ### 3. Launch Eigent Open the application and log in to get started. For developers who want to run Eigent locally from source code. **Prerequisites:** * Node.js >= 18.0.0, \< 23.0.0 * Python >= 3.12, \< 3.13 * Docker (recommended) or PostgreSQL 15 ### 1. Clone the Repository ```bash theme={null} git clone https://github.com/eigent-ai/eigent.git cd eigent ``` ### 2. Start Backend Services ```bash theme={null} cd server cp .env.example .env docker compose up -d ``` ### 3. Configure Frontend Environment In the project root directory, modify `.env.development`: ```bash theme={null} VITE_BASE_URL=/api VITE_USE_LOCAL_PROXY=true VITE_PROXY_URL=http://localhost:3001 ``` ### 4. Start Frontend Service ```bash theme={null} npm install npm run dev ``` ## Step 2: Configuring the GLM Model ### 1. Access Application Settings Launch Eigent and navigate to the **Home Page**, then click on the **Settings** tab. ![Description](https://cdn.bigmodel.cn/markdown/1768546337853image.png) ### 2. Locate Model Configuration In the Settings menu, find and select the **Models** section, then scroll down to the **Custom Model** area and look for the **Z.ai Config** card. ![Description](https://cdn.bigmodel.cn/markdown/176861556077320260117.jpg) ### 3. Enter Configuration Details Click on the Z.ai Config card and fill in the following information: * **API Key**: Enter your Z.AI API Key (obtain from [Z.AI API Console](https://z.ai/manage-apikey/apikey-list)) * **API Host**: Enter `https://api.z.ai/api/coding/paas/v4/` * **Model Type**: Enter the model name (e.g., `glm-4.7`, `glm-4.7`, or `glm-4.5-air`) Click **Save** to apply your changes. ### 4. Set as Default Once saved, click the **"Set as Default"** button on the Z.ai Config card to make GLM your default model. ## Step 3: Getting Started With the configuration complete, you can now use Eigent with GLM models for: * Autonomous code generation and refactoring * Complex debugging and problem-solving * Multi-file project management * Technical documentation generation * Code review and optimization Your Eigent agents are now powered by Z.AI's GLM models and ready to assist with your development tasks. # Gemini CLI Source: https://docs.z.ai/scenario-example/develop-tools/gemini Complete Guide to Accessing Z.AI GLM Models Using a Customized Gemini CLI Gemini CLI is a command-line interface tool that can be made compatible with Z.AI's GLM models by using a customized fork. ## Step 1: Obtaining the Custom Version ### 1. Cloning the Custom Repository Since the official Gemini CLI repository only supports Google's Gemini models, we need to use a customized branch that supports OpenRouter compatibility: ```bash theme={null} git clone https://github.com/heartyguy/gemini-cli cd gemini-cli ``` ### 2. Switching to the Compatible Branch ```bash theme={null} git checkout feature/openrouter-support ``` ## Step 2: Environment Configuration ### 1. Setting Environment Variables ![Description](https://cdn.bigmodel.cn/markdown/1753631661971gemini-1.png?attname=gemini-1.png) Configure the API base URL: ```bash theme={null} export OPENROUTER_BASE_URL="https://api.z.ai/api/coding/paas/v4" ``` Configure the API Key: ```bash theme={null} export OPENROUTER_API_KEY="your_api_key" ``` ## Step 3: Installation and Launch ### 1. System Requirements Ensure your Node.js version is >= 18. ### 2. Installing Dependencies ```bash theme={null} npm install ``` ### 3. Launch Process ![Description](https://cdn.bigmodel.cn/markdown/1753631666323gemini-2.png?attname=gemini-2.png) After launching, complete the following steps: 1. Select a background color theme. 2. User login (recommended to use a Google account for authentication, which will redirect to a webpage for verification). ## Step 4: Usage Results ![Description](https://cdn.bigmodel.cn/markdown/1753631670672gemini-3.png?attname=gemini-3.png) Once configured, you can use Z.AI's GLM models in the command line for conversations and code generation. # Goose Source: https://docs.z.ai/scenario-example/develop-tools/goose Methods for Using the GLM Coding Plan in Goose Goose is an AI agent tool that supports local or desktop environments and offers a CLI interface. It integrates with multiple models and connects to external tools and APIs via the MCP protocol to automate engineering tasks such as code generation, debugging, testing, and deployment. Goose is delivering a more stable and efficient functional experience with the [GLM Coding Plan](https://z.ai/subscribe?utm_source=zai\&utm_medium=link\&utm_term=devpack-integration\&utm_campaign=Platform_Ops&_channel_track_key=w3mNdY8g). For users who have used the service before 2025-12-22: \ The default model has been upgraded to GLM-4.7. Please update your config accordingly. ## Step 1: Installing Goose Desktop 1. Visit the official Goose documentation: [Goose Quickstart](https://block.github.io/goose/docs/quickstart/). 2. Choose the installer for your operating system and complete the setup for Goose Desktop. ## Step 2: Creating a New Provider 1. Open the Goose Desktop application and navigate to the main interface. 2. Locate and click **“Create New Provider”** in the left-hand menu (as shown in the figure). 3. Follow the prompts to enter the required information and complete the new Provider creation. ![Description](https://cdn.bigmodel.cn/markdown/1758091325715goose-1.jpeg?attname=goose-1.jpeg) ## Step 3: Select Anthropic Compatible and Configure 1. During Provider Setup, select the **Anthropic Compatible**. 2. Complete the following required configurations: * **Base URL**: `https://api.z.ai/api/anthropic` * **API Key**: Your Z.ai API key * **Model**: Select `GLM-4.7`(standard, complex tasks) or `GLM-4.5-air`(lightweight, faster response) based on your requirements. 3. Save your settings to complete the configuration. ![Description](https://cdn.bigmodel.cn/markdown/1759307955720image.png?attname=image.png) ## Step 4: Switching Models 1. After configuration, return to the Goose desktop main interface. 2. Locate and click "**Switch Models**" at the bottom of the main interface. 3. Select the newly created Provider from the dropdown list. 4. Verify the new Provider has successfully switched to the current model. ![Description](https://cdn.bigmodel.cn/markdown/1758091346221goose-3.jpeg?attname=goose-3.jpeg) ## Step 5: Start Using Goose with GLM 1. Once the provider is active, you can start interacting with Goose powered by GLM-4.7. 2. Enter your request, and Goose will automatically invoke the GLM-4.7 model based on your configuration to generate a response. ![Description](https://cdn.bigmodel.cn/markdown/1758091350444goose-4.jpeg?attname=goose-4.jpeg) ## Step 6: Vision Search Reader MCP Refer to the [Vision MCP Server](/devpack/mcp/vision-mcp-server) , [Search MCP Server](/devpack/mcp/search-mcp-server) and [Web Reader MCP Server](/devpack/mcp/reader-mcp-server) documentation; once configured, you can use them in Goose. # Grok CLI Source: https://docs.z.ai/scenario-example/develop-tools/gork Quick Start Guide for Connecting to Z.AI GLM Models Using Grok CLI Grok CLI is a streamlined command-line AI assistant that enables quick access to Z.AI's GLM models for conversation and code generation. ## Step 1: Installing Grok CLI Install Grok CLI globally via npm: ```bash theme={null} npm install -g @vibe-kit/grok-cli ``` ## Step 2: Environment Configuration Set the API base URL and API Key: ```bash theme={null} export GROK_BASE_URL="https://api.z.ai/api/coding/paas/v4" export GROK_API_KEY="your_api_key" ``` ## Step 3: Getting Started Launch Grok CLI with a specified model: ```bash theme={null} grok --model glm-4.7 ``` ![Description](https://cdn.bigmodel.cn/markdown/1753631674840gemini-4.png?attname=gemini-4.png) ## Notes > **Important Note**: Grok CLI currently has limited compatibility with thinking models, and thinking content will be displayed in full. Recommendations: > > * Wait for Grok CLI to improve compatibility with thinking models > * Or use non-thinking versions of the models # Kilo Code Source: https://docs.z.ai/scenario-example/develop-tools/kilo Methods for Using the GLM Coding Plan in Kilo Code plugin Kilo Code is a powerful VS Code plugin with support for MCP (Model Context Protocol), enabling you to generate code, debug, and manage projects directly within the editor — delivering a more efficient and intelligent development experience. Kilo Code’s performance is further enhanced with the [GLM Coding Plan](https://z.ai/subscribe?utm_source=zai\&utm_medium=link\&utm_term=devpack-integration\&utm_campaign=Platform_Ops&_channel_track_key=w3mNdY8g),, helping you achieve greater efficiency and stability in both code creation and project collaboration. Using the GLM Coding Plan, you need to configure the dedicated Coding API [https://api.z.ai/api/coding/paas/v4](https://api.z.ai/api/coding/paas/v4) instead of the General API [https://api.z.ai/api/paas/v4](https://api.z.ai/api/paas/v4) For users who have used the service before 2025-12-22: \ The default model has been upgraded to GLM-4.7. Please update your config accordingly. ## Step 1: Installing the Kilo Code Plugin ### 1. Open the Extensions Marketplace a. Launch VS Code b. Click the Extensions Marketplace icon on the left sidebar c. Search for `Kilo Code` in the search bar d. Locate the `Kilo Code` plugin ![Description](https://cdn.bigmodel.cn/markdown/1753687809443k1.jpg?attname=k1.jpg) ### 2. Install the Plugin a. Click the `Install` button to begin installation b. After installation, choose to trust the developer ![Description](https://cdn.bigmodel.cn/markdown/1753687816703k2.jpg?attname=k2.jpg) ## Step 2. Configuring API Settings ### 1. Select API Key Authentication Choose `Use your own API key` ![Description](https://cdn.bigmodel.cn/markdown/1753687824352k3.jpg?attname=k3.jpg) ### 2. Enter Configuration Details Fill in the following information as specified: > If your Kilo Code version is outdated and lacks the `International Coding Plan` option, please update the plugin to the latest version. * **API Provider**: Select `Z AI` * **Z AI Entrypoint**: Select `International Coding Plan (https://api.z.ai/api/coding/paas/v4/)` * **Z AI API Key**: Input your Z.AI API Key * **Model**: Select `glm-4.7` or any other model from the list ![Description](https://cdn.bigmodel.cn/markdown/1760943118846image.png?attname=image.png) ## Step 3. Getting Started Once configured, you can enter prompts in the input box to leverage the AI model for various tasks, such as: * Analyzing database table structures * Calculating statistics and averages * Generating and optimizing SQL queries * Code generation and refactoring * Project analysis and documentation writing ## Step 4: Vision Search Reader MCP Refer to the [Vision MCP Server](/devpack/mcp/vision-mcp-server) , [Search MCP Server](/devpack/mcp/search-mcp-server) and [Web Reader MCP Server](/devpack/mcp/reader-mcp-server) documentation; once configured, you can use them in Kilo Code. # n8n Workflow Source: https://docs.z.ai/scenario-example/develop-tools/n8n Build n8n automations that call Z.AI GLM models n8n lets you orchestrate API workflows without code. This guide shows how to call Z.AI's GLM models inside any n8n workflow using the OpenAI node. When you are subscribed to the **GLM Coding Plan**, switch the base URL to `https://api.z.ai/api/coding/paas/v4` instead of the general API endpoint used below. ## Prerequisites * An n8n deployment (desktop, self-hosted, or n8n Cloud) * Basic familiarity with n8n nodes ## QuickStart * Access [Z.AI Open Platform](https://z.ai/model-api), Register or Login. * Create an API Key in the [API Keys](https://z.ai/manage-apikey/apikey-list) management page. * Copy your API Key for use. * In the n8n UI, go to **Credentials → + New**. * Select **OpenAI** (use the OpenAI node to call Z.AI's Inference API). ![Description](https://cdn.bigmodel.cn/markdown/1764840441012image.png?attname=image.png) * Configure the credential: * **Name**: `ZAI Account` * **API KEY**: `YOUR_ZAI_API_KEY` * **Base URL**: `https://api.z.ai/api/paas/v4` * If you use the GLM Coding Plan, use the `https://api.z.ai/api/coding/paas/v4` as the base URL. * Save the credential and will auto test the credential to make sure it works. ![Description](https://cdn.bigmodel.cn/markdown/1764840826825image.png?attname=image.png) * In the n8n UI, go to **Workflows → + New**. * Add openai chat node in your workflow, and select `ZAI Account` as the credential. * Configure the openai chat node: * **Model**: `glm-4.7` ![Description](https://cdn.bigmodel.cn/markdown/1764841114665image.png?attname=image.png) * Click **Execute Workflow** to run the workflow. * The response will be available in the **Output** tab. ![Description](https://cdn.bigmodel.cn/markdown/1764841208844image.png?attname=image.png) ## Troubleshooting **Issue:** Receiving invalid api key error **Solutions:** 1. Confirm the api key is correctly copied 2. Confirm the api key has sufficient balance **Issue:** connection timeout **Solutions:** 1. Check network connection 2. Confirm firewall settings 3. Verify the base URL is correct 4. Increase timeout settings # Open Code Source: https://docs.z.ai/scenario-example/develop-tools/opencode > Complete guide to integrating Z.AI GLM models with OpenCode CLI OpenCode is a powerful AI coding agent that can be configured to use Z.AI's GLM models. Description For users who have used the service before 2025-12-22: \ The default model has been upgraded to GLM-4.7. Please update your config accordingly. ## Step 1: Installing OpenCode The easiest way to install OpenCode is through the install script. ```bash theme={null} curl -fsSL https://opencode.ai/install | bash ``` You can also install it with npm: ```bash theme={null} npm install -g opencode-ai ``` ## Step 2: Getting Started 1. Head over to the [Z.AI API Console](https://z.ai/manage-apikey/apikey-list) to get your API key. 2. Run `opencode auth login` and select **Z.AI**. ```bash theme={null} $ opencode auth login ┌ Add credential │ ◆ Select provider │ ● Z.AI │ ... └ ``` If you are subscribed to the **GLM Coding Plan**, select **Z.AI Coding Plan**. ```bash theme={null} $ opencode auth login ┌ Add credential │ ◆ Select provider │ ● Z.AI Coding Plan │ ... └ ``` 3. Enter your Z.AI API key. ```bash theme={null} $ opencode auth login ┌ Add credential │ ◇ Select provider │ Z.AI Coding Plan │ ◇ Enter your API key │ _ └ ``` 4. Run `opencode` to launch OpenCode. ```bash theme={null} $ opencode ``` Use the `/models` command to select a model like *GLM-4.7*. ``` /models ``` 5. Vision Search Reader MCP Refer to the [Vision MCP Server](/devpack/mcp/vision-mcp-server) , [Search MCP Server](/devpack/mcp/search-mcp-server) and [Web Reader MCP Server](/devpack/mcp/reader-mcp-server) documentation; once configured, you can use them in OpenCode. ## Share OpenCode’s share feature allows you to create public links to your OpenCode conversations, so you can collaborate with teammates or get help from others. #### How it works When you share a conversation, OpenCode: 1. Creates a unique public URL for your session 2. Syncs your conversation history to our servers 3. Makes the conversation accessible via the shareable link — `opencode.ai/s/` #### Share By default, conversations are not shared automatically. You can manually share them using the `/share` command: ``` /share ``` #### Un-share To stop sharing a conversation and remove it from public access: ``` /unshare ``` This will remove the share link and delete the data related to the conversation. Learn more about [sharing conversations](https://opencode.ai/docs/share/). ## IDE Extensions OpenCode integrates with VS Code, Cursor, or any IDE that supports a terminal. #### Installation To install OpenCode on VS Code and popular forks like Cursor, Windsurf, VSCodium: 1. Open VS Code 2. Open the integrated terminal 3. Run `opencode` - the extension installs automatically #### Usage * **Quick Launch**: Use `Cmd+Esc` (Mac) or `Ctrl+Esc` (Windows/Linux) to open OpenCode in a split terminal view, or focus an existing terminal session if one is already running. * **New Session**: Use `Cmd+Shift+Esc` (Mac) or `Ctrl+Shift+Esc` (Windows/Linux) to start a new OpenCode terminal session, even if one is already open. You can also click the OpenCode button in the UI. * **Context Awareness**: Automatically share your current selection or tab with OpenCode. * **File Reference Shortcuts**: Use `Cmd+Option+K` (Mac) or `Alt+Ctrl+K` (Linux/Windows) to insert file references. For example, `@File#L37-42`. Learn more about [IDE integrations](https://opencode.ai/docs/ide/). ## GitHub Workflow OpenCode integrates with your GitHub workflow. Mention `/opencode` or `/oc` in your comment, and OpenCode will execute tasks within your GitHub Actions runner. #### Features * **Triage issues**: Ask OpenCode to look into an issue and explain it to you. * **Fix and implement**: Ask OpenCode to fix an issue or implement a feature. And it will work in a new branch and submits a PR with all the changes. * **Secure**: OpenCode runs inside your GitHub's runners. #### Installation Run the following command in a project that is in a GitHub repo: ```bash theme={null} opencode github install ``` This will walk you through installing the GitHub app, creating the workflow, and setting up secrets. #### Examples Here are some examples of how you can use OpenCode in GitHub. * **Explain an issue** Add this comment in a GitHub issue. ``` /opencode explain this issue ``` OpenCode will read the entire thread, including all comments, and reply with a clear explanation. * **Fix an issue** In a GitHub issue, say: ``` /opencode fix this ``` And OpenCode will create a new branch, implement the changes, and open a PR with the changes. * **Review PRs and make changes** Leave the following comment on a GitHub PR. ``` Delete the attachment from S3 when the note is removed /oc ``` OpenCode will implement the requested change and commit it to the same PR. Learn more about [GitHub workflow](https://opencode.ai/docs/github/). ## Resources * **Documentation**: [opencode.ai/docs](https://opencode.ai/docs) * **GitHub Issues**: [github.com/sst/opencode/issues](https://github.com/sst/opencode/issues) * **Discord**: [opencode.ai/discord](https://opencode.ai/discord) # Other Tools Source: https://docs.z.ai/scenario-example/develop-tools/others Methods for using the GLM Coding Plan in other tools You can integrate GLM Coding Plan with coding tools covered by the plan that follow the OpenAI-compatible API protocol. Simply replace the API endpoint and provide the corresponding API key to seamlessly connect with GLM models. Using the GLM Coding Plan, you need to configure the dedicated Coding API `https://api.z.ai/api/coding/paas/v4` instead of the General API `https://api.z.ai/api/paas/v4` ## Step 1: Supported Tools Click on the tool documentation below that you wish to use, and follow the corresponding integration guide to set it up. The Claude Code IDE plugin supports VSCode and JetBrains. An AI-first code editor that supports custom model configurations. An AI programming extension for VS Code that supports code generation and file operations. An AI editor capable of independently completing various development tasks. Enterprise-grade AI coding agent that runs in the terminal to handle end-to-end workflows. A powerful VS Code extension for code generation and project management. A smart VS Code extension for code writing and refactoring. A terminal-based AI programming tool that supports both CLI and TUI interfaces. AI Agent tool, supporting local execution and automated engineering tasks. A desktop AI agent built on a multi-agent architecture, capable of automating browser, terminal, and MCP-powered workflows. ## Step 2: Install and Config > Core Steps: > > 1. Select an OpenAI-compatible provider. > 2. **`Add/Replace the OpenAI Base URL with https://api.z.ai/api/coding/paas/v4.`** > 3. **`Enter your Z.AI API Key and select GLM-5.1, GLM-5, GLM-5-Turbo, GLM-4.7 or GLM-4.5-air.`** Using **Cursor** as an example (Note: Custom configuration is only supported in Cursor Pro and higher versions), the following steps demonstrate how to integrate the `GLM-4.7` model via the OpenAI protocol. Similarly, other tools supporting the OpenAI protocol can adopt the same configuration approach. ### 1. Install Cursor Download and install Cursor from the official website. ### 2. Create a New Provider/Model In Cursor, navigate to the "**Models**" section and click the "**Add Custom Model**". ![Description](https://cdn.bigmodel.cn/markdown/176032216013820251013-100735.jpeg?attname=20251013-100735.jpeg) * Select the **OpenAI Protocol**. * Configure the **OpenAI API Key** (obtained from the Z.AI). * In **Override OpenAI Base URL**, replace the default URL with `https://api.z.ai/api/coding/paas/v4`. * Enter the model you wish to use, such as `GLM-5.1`, `GLM-5` or `GLM-4.5-air`. * Note: In Cursor, the model name must be entered in uppercase, such as `GLM-4.7`. ![Description](https://cdn.bigmodel.cn/markdown/176032218295020251013-100740.jpeg?attname=20251013-100740.jpeg) ### 3. Save and Switch Models After configuration, save your settings and select the newly created **GLM-4.7 Provider** on the homepage. ### 4. Get Started With this setup, you can begin using the GLM model for code generation, debugging, task analysis, and more. ![Description](https://cdn.bigmodel.cn/markdown/176032221518820251013-100745.jpeg?attname=20251013-100745.jpeg) ## Step 3: Replacing the API URL 1. **Locate the API configuration section in your tool**: For example, in **Goose**, this is typically where you set the API address in the configuration file; In **VS Code** plugins or **IntelliJ IDEA** plugins, configuration is usually done through the plugin's settings interface. 2. **Replace the OpenAI Base URL**: Replace the default OpenAI API URL with `https://api.z.ai/api/coding/paas/v4`. 3. **Enter API Key and Select Model**: * Enter your **Z.ai API Key**. * Select `GLM-5.1`(standard, complex tasks) or `GLM-4.5-air`(lightweight, faster response) based on your requirements. # Qoder Source: https://docs.z.ai/scenario-example/develop-tools/qoder Methods for using the GLM Coding Plan in Qoder Qoder ( /ˈkoʊdər/ ) is an agentic coding platform designed for real software development. It seamlessly integrates enhanced context engineering with intelligent agents to gain a comprehensive understanding of your codebase and systematically tackles software development tasks. Qoder is available in three forms — the **Qoder IDE**, the **Qoder JetBrains plugin**, and the **Qoder CLI**, all of which support the GLM Coding Plan. This guide uses the **Qoder IDE** as an example. For setup in the JetBrains plugin or CLI, see the official docs linked under "Resources". Custom models in Qoder support the **Coding Plan** only — the general API billing mode is not supported. This feature is available for **Pro Trial**, **Pro**, **Pro+**, and **Ultra** users; Free and Teams plans are not supported. ## Step 1: Install Qoder * Go to the [Qoder website](https://qoder.com/download?utm_source=content\&utm_medium=doc\&utm_campaign=glm) and download the form that fits your workflow: **Qoder IDE**, **Qoder JetBrains plugin**, or **Qoder CLI**. * In the top-right corner of the Qoder IDE, click the user icon — or use the keyboard shortcut `⌘ ⇧ ,` (macOS) / `Ctrl Shift ,` (Windows) — and choose **Sign In**. ![Qoder sign-in entry point](https://img.alicdn.com/imgextra/i3/O1CN01D9xH0z27ScrUgebgn_!!6000000007796-2-tps-2820-1836.png) * In the page that opens, click **Sign Up** at the bottom and complete registration, or sign up directly with your Google or GitHub account. * Return to the Qoder IDE. You can now use all features. ## Step 2: Configure GLM in Qoder Use your **Z.AI API Key** to add the GLM Coding Plan model in Qoder IDE. * Click the **Qoder Settings** icon at the top-right of the Qoder window, then choose **Models** from the sidebar. ![Qoder IDE Settings → Models](https://img.alicdn.com/imgextra/i4/O1CN01Ouhmfz28cX2nGchJb_!!6000000007953-0-tps-2820-1836.jpg) * Click **+ Add** to open the model creation dialog. * **Provider**: Select **Z.ai - International** or **Z.ai - China** * **Type**: Select **Coding plan** * **Model**: Choose the model you want to use from the dropdown * **API Key**: Enter your **Z.AI API Key** Click **Get API Key** to open the Z.AI API Key management page in your browser. ![Add Model dialog](https://img.alicdn.com/imgextra/i4/O1CN01o6WUmC1yQUTPDWH7P_!!6000000006573-0-tps-1130-916.jpg) * Click **Add**. Qoder will automatically verify the API key and connection. * On success, the GLM model appears in your custom models list and is ready to use in chat. * On failure, Qoder shows the error returned by Z.AI for troubleshooting. ## Step 3: Get Started with Qoder + GLM Once configured, switch to your newly added **GLM** model from the model selector at the bottom of the Chat panel — or in Quest mode — in Qoder IDE, and start chatting. You can then use GLM for feature implementation, code generation, refactoring, debugging, and multi-step agentic coding tasks. ![Using GLM-5 in Quest mode](https://img.alicdn.com/imgextra/i4/O1CN01pmhAbq1xQf0L5j2zh_!!6000000006438-0-tps-2042-1150.jpg) ## FAQ ### Connection Fails When Adding Model * Verify that your **Z.AI API Key** is correct and active. * Confirm that you have subscribed to the **GLM Coding Plan**. * Check network connectivity and try again. If errors persist, review the error message shown in the **Add Model** dialog. ### Coding Plan worked in Qoder before, but after a while I can no longer switch to a Coding Plan model and the configuration is not editable. Why? * **Cause**: New Qoder users get a free two-week **Qoder Pro Trial**. After it expires, the account automatically reverts to the free version, which no longer supports Coding Plan. * **Solution**: Upgrade to **Qoder Pro Trial**, **Pro**, **Pro+**, or **Ultra**. ## Resources * **Qoder Website**: [qoder.com](https://qoder.com/?utm_source=content\&utm_medium=doc\&utm_campaign=glm) * **Using GLM in Qoder JetBrains plugin and Qoder CLI**: [Doc](https://help.aliyun.com/zh/model-studio/qoder-coding-plan) # Roo Code Source: https://docs.z.ai/scenario-example/develop-tools/roo Methods for Using the GLM Coding Plan in Roo Code Plugin Roo Code is an intelligent VS Code plugin that assists with project analysis, code generation, and refactoring—making the development process smoother and more efficient. Roo Code becomes even more powerful with the [GLM Coding Plan](https://z.ai/subscribe?utm_source=zai\&utm_medium=link\&utm_term=devpack-integration\&utm_campaign=Platform_Ops&_channel_track_key=w3mNdY8g) — giving you greater efficiency and stability in project management and code optimization. Using the GLM Coding Plan, you need to configure the dedicated Coding API [https://api.z.ai/api/coding/paas/v4](https://api.z.ai/api/coding/paas/v4) instead of the General API [https://api.z.ai/api/paas/v4](https://api.z.ai/api/paas/v4) For users who have used the service before 2025-12-22: \ The default model has been upgraded to GLM-4.7. Please update your config accordingly. ## Step 1: Installing the Roo Code Plugin ### 1. Open the Extensions Marketplace a. Launch VS Code b. Click the Extensions Marketplace icon on the left sidebar c. Enter `Roo Code` in the search bar d. Locate the `Roo Code` plugin ![Description](https://cdn.bigmodel.cn/markdown/1753687765281r1.png?attname=r1.png) ### 2. Install the Plugin a. Click the `Install` button to begin installation b. After installation, select "Trust the Author" ## Step 2: Configuring API Settings ### Configuration Details Fill in the following information as specified: > If your Roo Code version is lower and does not have the `China Coding Plan` option, please update the plugin to the latest version first. * **API Provider**: Select `Z AI` * **Z AI Entrypoint**:Select `International Coding Plan (https://api.z.ai/api/coding/paas/v4/)` * **Z AI API Key**: Input your Z.AI API Key * **Model**: Select `glm-4.7` or other model in the list ![Description](https://cdn.bigmodel.cn/markdown/1760942980972image.png?attname=image.png) ## Step 3: Permission Setup and Usage ### 1. Configure Permissions Select the permissions you wish to enable based on your needs: * File read/write operations * Auto-approve execution * Project access permissions ![Description](https://cdn.bigmodel.cn/markdown/1753687800340r4.png?attname=r4.png) ### 2. Start Coding Enter your requirements in the input box, and Roo Code will assist with: * Summarizing the current project structure * Analyzing key modules and functionalities * Code refactoring and optimization * Generating documentation and comments * Issue diagnosis and repair suggestions ## Step 4: Vision Search Reader MCP Refer to the [Vision MCP Server](/devpack/mcp/vision-mcp-server) , [Search MCP Server](/devpack/mcp/search-mcp-server) and [Web Reader MCP Server](/devpack/mcp/reader-mcp-server) documentation; once configured, you can use them in Roo Code. ### Demo