CogView-4

Overview

CogView-4 is Z.AI’s first open-source text-to-image model. It has comprehensive improvements in semantic understanding, image generation quality, and the ability to generate both English and Chinese text. It supports bilingual input of any length in Chinese and English and can generate images of any resolution within a specified range.

Price

$0.01 / image

Input Modality

Text

Output Modality

Image

Usage

Food & Beverage Promotion

Generates visually appealing, detailed, and realistic food images based on dish names, ingredient characteristics, and style requirements, incorporating creative text elements. Suitable for menu design, food delivery platform displays, and offline posters.

E-commerce Product Images

Quickly generates high-resolution product display images based on product features and selling points, adding bilingual promotional text as needed. Fits the image requirements for different product pages and campaign visuals on e-commerce platforms.

Game Asset Creation

Produces high-resolution, detailed character illustrations and concept art based on game worldviews and character settings, meeting the needs of multi-resolution production.

Educational Material Illustrations

Analyzes teaching text content and automatically generates matching illustrations and scene images, adapted to the layout and resolution requirements of various educational materials, enhancing the visualization of knowledge.

Cultural & Tourism Promotion

Generates promotional images in different sizes based on cultural and tourism themes, skillfully combining text with region-specific visual elements to increase the appeal of cultural and tourism marketing.

Resources

API Documentation: Learn how to call the API.

Introducting CogView-4

Achieved SOTA Performance at Release

DPG-Bench (Dense Prompt Graph Benchmark) is a benchmark for evaluating text-to-image generation models, focusing on the model’s performance in complex semantic alignment and instruction following.At the time of release, CogView-4 ranked first overall in the DPG-Bench benchmark test, achieving SOTA performance among open-source text-to-image models. Description

Better Chinese Understanding and Generation

Technically, CogView-4 replaced the English-only T5 encoder with the bilingual GLM-4 encoder and trained the model using bilingual image-text data, enabling the model to handle bilingual prompts.CogView-4 supports Chinese and English prompts and is especially good at understanding and following Chinese prompts, greatly lowering the prompt threshold for users. It is the first open-source text-to-image model capable of generating Chinese characters in the images, making it particularly suitable for creative needs in advertising, short videos, and other fields.

Any Resolution and Any-Length Prompts

CogView-4 implements a mixed training paradigm of text descriptions (captions) of any length and images of any resolution. The model supports input prompts of any length and can generate images at any resolution within the supported range. This not only provides users with more creative freedom but also improves training efficiency.

Examples

Food & Beverage Promotion
E-commerce Product Images
Game Asset Creation
Cultural & Tourism Promotion

Prompt

Close-up, commercial food photography, intense indoor lighting, extreme detail. A Christmas dinner table, a corner of the table where a long-haired orange tabby cat leans its head close to a plate, greedily sniffing the festive feast with an expression of pure delight. The table features roast chicken, plants, salad, champagne, and gold-rimmed porcelain tea sets. Afternoon sunlight bathes the cat’s profile in golden light, casting a soft glow over both the food and its fur. A Christmas tree adorns the background. The image emphasizes the texture of the food and the cat’s coat, featuring strong lighting and a warm, festive Christmas atmosphere.

Display

Quick Start

cURL
Python
Java

curl --request POST \
--url https://api.z.ai/api/paas/v4/images/generations \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
    "model": "cogView-4-250304",
    "prompt": "A cute little kitten sitting on a sunny windowsill, with the background of blue sky and white clouds.",
    "size": "1024x1024"
}'

Install SDK

# Install latest version
pip install zai-sdk

# Or specify version
pip install zai-sdk==0.1.0

Verify Installation

import zai
print(zai.__version__)

Call Example

from zai import ZaiClient
client = ZaiClient(api_key="your-api-key")
response = client.images.generations(
model="cogView-4-250304",
prompt="A cute little kitten sitting on a sunny windowsill, with the background of blue sky and white clouds.",
)
print(response.data[0].url)

Install SDKMaven

<dependency>
    <groupId>ai.z.openapi</groupId>
    <artifactId>zai-sdk</artifactId>
    <version>0.3.0</version>
</dependency>

Gradle (Groovy)

implementation 'ai.z.openapi:zai-sdk:0.3.0'

Call Example

import ai.z.openapi.ZaiClient;
import ai.z.openapi.core.Constants;
import ai.z.openapi.service.image.CreateImageRequest;
import ai.z.openapi.service.image.ImageResponse;

public class CogView4Example {
public static void main(String[] args) {
ZaiClient client = ZaiClient.builder().ofZAI().apiKey("YOUR_API_KEY").build();
// Create image generation request
CreateImageRequest request = CreateImageRequest.builder()
.model(Constants.ModelCogView4250304)
.prompt("A cute little kitten sitting on a sunny windowsill, with the background of blue sky and white clouds.")
.size("1024x1024")
.build();
ImageResponse response = client.images().createImage(request);
System.out.println(response.getData());
}
}

Please note that the output of the CogView-4 model is an image URL. You will need to download the image using this URL.

Get Started

Language Models

Vision Language Models

Image Generation Models

Video Generation Models

Image Generation Models

Audio Models

Capabilities

Tools

Agents

Overview

Price

Input Modality

Output Modality

Usage

Resources

Introducting CogView-4

Achieved SOTA Performance at Release

Better Chinese Understanding and Generation

Any Resolution and Any-Length Prompts

Examples

Prompt

Display

Prompt

Display

Prompt

Display

Prompt

Display

Quick Start

Get Started

Language Models

Vision Language Models

Image Generation Models

Video Generation Models

Image Generation Models

Audio Models

Capabilities

Tools

Agents

​ Overview

Price

Input Modality

Output Modality

​ Usage

​ Resources

​ Introducting CogView-4

Achieved SOTA Performance at Release

Better Chinese Understanding and Generation

Any Resolution and Any-Length Prompts

​ Examples

Prompt

Display

Prompt

Display

Prompt

Display

Prompt

Display

​ Quick Start

Overview

Usage

Resources

Introducting CogView-4

Examples

Quick Start