All API endpoints require Bearer Token authenticationGet your API Key:Visit the API Key Management Page to get your API KeyAdd it to the request header:
SkyReels V4 auto-routes to the correct mode based on request fields — no mode field needed:
Mode
Trigger
Capability
T2V (Text-to-Video)
Only prompt + general fields
Pure text-driven generation
I2V (Image-to-Video)
Any of first_frame_image / end_frame_image / mid_frame_images
First/end/key frame control
Omni (Multimodal Reference)
Any of ref_images / ref_videos
Subject reference, grid collage, motion reference, video extension, audio sync
Strict mutual exclusion: I2V fields (first_frame_image / end_frame_image / mid_frame_images) and Omni fields (ref_images / ref_videos) cannot be used together, otherwise returns 422.
@tag mechanism: When using mid_frame_images / ref_images / ref_videos, each element must declare a tag starting with @ (e.g., @image1, @Actor-1, @video1), and the tagmust appear in the prompt.Think of prompt as the “script” and tag as a “character pointer” to specific assets (images / videos). For example, a prompt like "@Actor-1 walks into the scene of @video1" instructs the system to inject the reference image subject tied to @Actor-1 and the motion reference tied to @video1 into the generation process.
The model field must be explicitly provided — no default value.
Pricing is strongly tied to resolution and whether ref_videos is used: 1080p is significantly more expensive than 480p / 720p; tiers with ref_videos (video input) cost ~1.5 ~ 2× compared to those without. See the Pricing Page for exact rates.
Text prompt, max 1280 tokensDescribe scenes, subjects, actions, styles in detail for better generation results.When using ref_images / ref_videos / mid_frame_images, the promptmust contain the corresponding @tag (e.g., @Actor-1, @video1, @image1).Example: "@Actor-1 walks through a neon-lit street at night."
End frame image URL (jpg / jpeg / png / gif / bmp)When provided, this image is used as the ending frame of the video. Can be combined with first_frame_image for first-and-last-frame control.
reference - Motion / subject reference, overrides duration (follows the reference video length, max 10 seconds), carries input video audio by default; can be combined with ref_images.type=image
extend - Video extension, billed by the requested duration; cannot be combined with ref_images
{ "model": "skyreels-v4-fast", "prompt": "Slowly pull the camera back to reveal the entire scene.", "first_frame_image": "https://example.com/start.png", "duration": 5}
Case 6: Omni - Multi-Subject + Video Motion Reference
{ "model": "skyreels-v4-fast", "prompt": "The man from @image_1 imitates the move on the left in @video_1. The woman from @image_2 imitates the right side.", "duration": 5, "ref_images": [ { "tag": "@image_1", "type": "image", "image_urls": ["https://example.com/a.png"] }, { "tag": "@image_2", "type": "image", "image_urls": ["https://example.com/b.png"] } ], "ref_videos": [ { "tag": "@video_1", "type": "reference", "video_url": "https://example.com/motion.mp4" } ]}
This case uses ref_videos.type=reference, so the requested duration will be overridden by the actual reference video length (max 10 seconds). Even though "duration": 5 is passed here, the final video length follows the reference video.
{ "model": "skyreels-v4-fast", "prompt": "Create a video showing how to make tomato and egg noodles based on @image1.", "ref_images": [ { "tag": "@image1", "type": "grid", "image_urls": ["https://example.com/recipe_grid.png"] } ]}
Query Task ResultsVideo generation is an async task that returns a task_id upon submission. Use the Get Task Status endpoint to query generation progress and results.