Model VersionCapabilitiesDurationResolutionPrice
vidu2-imageImage-to-Video Generation4s720p$0.2 / video
vidu2-start-endStart and End Frame4s720p$0.2 / video
vidu2-referenceReference-based Video Generation4s720p$0.4 / video

Capability Description

  • Image-to-Video Generation​: Generate a video by providing a starting frame or both starting and ending frames along with corresponding text descriptions.
  • Start and End Frame​: Supports input of two images: the first uploaded image is treated as the starting frame, and the second as the ending frame. The model uses these images as input parameters to generate the video.
  • Reference-based Video Generation​: Generate a video from a text prompt; currently supports both a general style and an anime style optimized for animation.
The URL link for the video generated by the model is valid for one day. Please save it as soon as possible if needed.
Scenario TypeDescription
General Entertainment Content Generation- Input a single frame or IP elements to quickly generate short videos with coherent storylines and interactive special effects
- Supports diverse visual styles from anime-inspired to realistic
- Tailored for mass production of UGC creative content on short video platforms
​ Anime Short Drama Production- Input static character images or keyframes to generate smooth animated sequences and micro-dramas
- Accurately reproduce detailed character movements (e.g., facial expressions)
- Supports mass production in various styles such as Chinese and Japanese anime
- Designed to meet animation studios’ needs for IP-based content expansion
Advertising & E-commerce Marketing- Input real product images to intelligently generate dynamic advertising videos
- Clearly showcase product features such as 3C details and beauty product textures
- Automatically adapt to various platform formats, such as vertical videos for Tiktok and horizontal layouts for social feeds

Resources

API Documentation: Learn how to call the API.

Detailed Description

  1. Efficient Video Generation Speed
With optimized model computing architecture, video rendering efficiency is significantly enhanced. This allows daily content teams to respond quickly to trending topics, and enables e-commerce sellers to mass-produce product display videos on demand—greatly reducing content delivery time and helping creators seize traffic windows.
  1. Cost-effective 720P Output
The cost of generating 720P resolution videos has dropped to 40% of the Q1 version. Small and medium-sized brands can now create batch videos for multiple SKUs, while advertising teams can test creative concepts like “product close-ups + scenario storytelling” at a lower cost—meeting full-platform marketing needs without breaking the content budget.
  1. Stable and Controllable Image-to-Video Generation
The model addresses the “texture color shift” issue—accurately restoring details like the silky glow of satin or the matte finish of leather in clothing videos. In e-commerce scenarios, product colors are displayed more realistically. Dynamic frame compensation is optimized, ensuring smooth, shake-free motion for rotating 3C products or hand demonstrations in beauty tutorials. Multiple visual styles are supported, enabling eye-catching content like “product close-up + stylized camera movement,” ideal for e-commerce main images and short-form promotional videos.
  1. Semantically Enhanced Keyframe Transition
The model strikes a balance between creativity and stability, delivering significantly improved performance and semantic understanding—making it the optimal solution for keyframe-based video generation. By accurately analyzing scene logic and action continuity, transitions between frames are smooth and natural, enhancing narrative coherence throughout the content.
  1. Semantically Enhanced Keyframe Transition
The model strikes a balance between creativity and stability, delivering significantly improved performance and semantic understanding—making it the optimal solution for keyframe-based video generation. By accurately analyzing scene logic and action continuity, transitions between frames are smooth and natural, enhancing narrative coherence throughout the content. 020f485a Fb03 4698 8a6c F9f89b5b7361 Jpe
Image-to-Video Generation
curl --location --request POST 'https://api.z.ai/api/paas/v4/videos/generations' \
--header 'Authorization: Bearer {your apikey}' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model":"vidu2-image",
    "image_url":"https://example.com/path/to/your/image.jpg",
    "prompt":"Peter Rabbit drives a small car along the road, his face filled with joy and happiness.",
    "duration":4,
    "size":"720x480",
    "movement_amplitude":"auto"
}'
Start and End Frame
curl --location --request POST 'https://api.z.ai/api/paas/v4/videos/generations' \
--header 'Authorization: Bearer {your apikey}' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model":"vidu2-start-end",
    "image_url":["https://example.com/path/to/your/image1.jpg","https://example.com/path/to/your/image2.jpg"],
    "prompt":"Peter Rabbit drives a small car along the road, his face filled with joy and happiness.",
    "duration":4,
    "size":"720x480",
    "movement_amplitude":"auto"
}'
Reference-based Video Generation
curl --location --request POST 'https://api.z.ai/api/paas/v4/videos/generations' \
--header 'Authorization: Bearer {your apikey}' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model":"vidu2-reference",
    "image_url":["https://example.com/path/to/your/image1.jpg","https://example.com/path/to/your/image2.jpg","https://example.com/path/to/your/image3.jpg"],
    "prompt":"Peter Rabbit drives a small car along the road, his face filled with joy and happiness.",
    "duration":4,
    "aspect_ratio":"16:9",
    "size":"720x480",
    "movement_amplitude":"auto",
    "with_audio":true
}'