CogVideoX-2

Price	Input Modality	Output Modality
$0.1 / video	Image/Text	Video

Recommended Use Cases

Short Video Creative Content Generation: Automatically expands input image-text scripts or single-frame images into coherent short videos, accurately following style instructions. Especially suitable for mass production of micro-drama content.
Anime Animation Production: Supports transforming static character images and storyboards into smooth dynamic animations, accurately presenting large-scale character movements and subtle facial expressions. Outputs anime-style short videos in Chinese, American, and Japanese styles, meeting the high-volume production needs of animation studios and fan creators.
Dynamic E-Commerce Product Advertising: Generates multi-angle video demonstrations of products based on product images and selling point descriptions. Highlights product details with stable camera movement and lighting effects. Supports quick adaptation to various video ad formats on different platforms.

Resources

API Documentation: Learn how to call the API.

Detailed Description

Supports Large-Scale Subject Movement

CogVideoX-2 has improved frame stability and action continuity, resulting in significant enhancements in performance subtlety and camera richness. Characters and props no longer simply “slightly move” on the original frame, but can perform large-scale actions according to prompts.

Industry-Leading Instruction Compliance

CogVideoX-2 maintains excellent instruction-following ability, understanding and faithfully executing complex prompts to better serve creators’ storytelling needs. At the same time, it maintains consistency in character representation, style, and atmosphere within the video, ensuring that newly generated content aligns closely with the original art style and enhances narrative completeness.

Mastery of Diverse Artistic Styles

CogVideoX-2 excels in a wide range of artistic styles, including but not limited to realistic, 3D animation, 2D animation, and various other unique artistic styles.

Text-to-Video

Prompt	Video
Peter Rabbit (main subject) drives a small car (subject action), wandering along the road (environment description), with a joyful and delighted expression on his face (atmosphere setting).
A journey across the desert, a caravan of camels walks over golden sand dunes, the setting sun paints the sky red, creating a magnificent and tranquil scene.
Close-up shot (camera description), bathed in the soft light of dusk (lighting), a parrot stands on the balcony railing, with purple feathers and a pink beak (subject description), set against a backdrop of city skyscrapers (environment description).

Image-to-Video

CogVideoX can convert user-provided static images into dynamic videos.For optimal results, it is recommended to use PNG or JPEG file formats, with a file size no larger than 5MB.Prompt suggestions should follow the structure: “subject (background) + motion description”.

Prompt	Video
The little girl in the scene smiled happily.
Make the entire scene move.
Under a macro lens, a slice of pork curls up into a massive wave. A tiny figure bravely surfs on this “wave,” with the surfboard kicking up delicate splashes.

Example

curl --request POST \
  --url https://api.z.ai/api/paas/v4/videos/generations \
  --header 'Accept-Language: en-US,en' \
  --header 'Authorization: Bearer {your apikey}' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "cogvideox-2",
  "quality": "quality",
  "with_audio": true,
  "size": "1920x1080",
  "fps": "30",
  "prompt": "Peter Rabbit is driving a small car, wandering along the road, with a joyful and happy expression on his face."
}'

Overview

Language Models

Video Generation Models

CogVideoX-2

Recommended Use Cases

Resources

Detailed Description

Supports Large-Scale Subject Movement

Industry-Leading Instruction Compliance

Mastery of Diverse Artistic Styles

Text-to-Video

Image-to-Video

Example

Overview

Language Models

Video Generation Models

​Recommended Use Cases

​Resources

​Detailed Description

​Supports Large-Scale Subject Movement

​Industry-Leading Instruction Compliance

​Mastery of Diverse Artistic Styles

​Text-to-Video

​Image-to-Video

​Example

Recommended Use Cases

Resources

Detailed Description

Supports Large-Scale Subject Movement

Industry-Leading Instruction Compliance

Mastery of Diverse Artistic Styles

Text-to-Video

Image-to-Video

Example