All API endpoints require Bearer Token authenticationGet your API Key:Visit the API Key Management Page to get your API KeyAdd it to the request header:
Video content descriptionRequired for text-to-video; optional for image-to-video or video-reference-to-videoIt is recommended to clearly specify the subject, action, camera movement, and style for better generation results
The prompt is limited to 4000 characters, but 500 characters are recommended.
Whether to return the last frame imageWhen set to true, the task result will additionally return the URL of the video’s last frame image, which can be used for continuous video generationDefault: false
{ "model": "doubao-seedance-2.0", "prompt": "The kitten stands up and walks toward the camera", "image_urls": ["https://example.com/cat.jpg"], "duration": 5}
{ "model": "doubao-seedance-2.0", "prompt": "A scene of a person speaking", "video_urls": ["https://example.com/reference.mp4"], "audio_urls": ["https://example.com/speech.wav"], "size": "16:9", "duration": 11}
{ "model": "doubao-seedance-2.0", "prompt": "A man stops a woman and says: \"Remember, you must never point your finger at the moon.\"", "generate_audio": true}
Case 9: Reference Images + Reference Video + Reference Audio (Multi-Modal Video)
Combine reference images, reference video, and reference audio to generate an immersive first-person perspective advertisement video. Ideal for product promotions, brand ads, and other scenarios requiring multi-source material fusion.
{ "model": "doubao-seedance-2.0", "prompt": "Use video 1's first-person perspective throughout, and use audio 1 as the background music throughout. First-person POV fruit tea advertisement for seedance brand 'Peace Apple' apple fruit tea limited edition. First frame is image 1: your hand picks a dewy Aksu red apple with a crisp apple collision sound. 2-4s: quick cut, your hand drops apple chunks into a shaker cup, adds ice and tea base, shakes vigorously, ice collision and shaking sounds sync with upbeat drum beats, background voice: 'Fresh-cut, fresh-shaken'. 4-6s: first-person close-up of the finished product, layered fruit tea poured into a clear cup, your hand gently squeezes cream cap spreading on top, sticks a pink label on the cup, camera zooms in on the layered texture of cream cap and fruit tea. 6-8s: first-person handheld cup raise, you lift the fruit tea from image 2 toward the camera (simulating handing it to the viewer), cup label clearly visible, background voice 'Take a sip of freshness', final frame freezes on image 2. Background voice consistently uses a female tone.", "image_urls": [ "https://example.com/tea_pic1.jpg", "https://example.com/tea_pic2.jpg" ], "video_urls": ["https://example.com/tea_video1.mp4"], "audio_urls": ["https://example.com/tea_audio1.mp3"], "generate_audio": true, "size": "16:9", "duration": 11}
Approved virtual avatar assets can be passed directly as reference images without re-uploading or re-reviewing.
{ "model": "doubao-seedance-2.0", "prompt": "The character walks naturally on a city street under bright sunshine", "image_urls": ["asset://asset_a"], "duration": 5, "resolution": "720p"}
{ "model": "doubao-seedance-2.0-fast", "prompt": "The character strolls in a park with a gentle breeze", "image_urls": ["asset://asset_a"], "duration": 5, "resolution": "720p"}
Case 13: Asset URL Image + Reference Video (Motion Transfer)
Combine an approved portrait asset with a reference video to drive the character to perform specified movements.
{ "model": "doubao-seedance-2.0", "prompt": "The character dances to the rhythm of the reference video with smooth and natural movements", "image_urls": ["https://example.com/dance_reference.jpg", "asset://asset_a"], "video_urls": ["https://example.com/dance_reference.mp4", "asset://asset_a"], "duration": 8, "resolution": "720p"}
Query Task ResultsVideo generation is an async task that returns a task_id upon submission. Use the Get Task Status endpoint to query generation progress and results.