Skip to main content
POST
/
v1
/
videos
/
generations
curl --request POST \
  --url https://api.apimart.ai/v1/videos/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "happyhorse-1.0",
    "prompt": "A little girl walking down the road, cinematic feel",
    "resolution": "1080P",
    "size": "16:9",
    "duration": 5,
    "seed": 42
  }'
{
  "code": 200,
  "data": [
    {
      "status": "submitted",
      "task_id": "task_01J9HA7JPQ9A0Z6JZ3V8M9W6PZ"
    }
  ]
}
curl --request POST \
  --url https://api.apimart.ai/v1/videos/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "happyhorse-1.0",
    "prompt": "A little girl walking down the road, cinematic feel",
    "resolution": "1080P",
    "size": "16:9",
    "duration": 5,
    "seed": 42
  }'
{
  "code": 200,
  "data": [
    {
      "status": "submitted",
      "task_id": "task_01J9HA7JPQ9A0Z6JZ3V8M9W6PZ"
    }
  ]
}

Authorization

Authorization
string
required
All API endpoints require Bearer Token authenticationGet your API Key:Visit the API Key Management Page to get your API KeyAdd it to the request header:
Authorization: Bearer YOUR_API_KEY

Mode Routing

happyhorse-1.0 is the unified entry for Text-to-Video / Image-to-Video / Reference-Image-to-Video / Video Edit. The backend automatically determines the mode based on incoming parameters. All modes are billed by the same rule (resolution × seconds only):
Fields you passRoutes ToMode Description
prompt onlyText-to-Video (T2V)Generate video purely from text
prompt + first_frame_imageImage-to-Video (I2V)Animate from a first-frame image
prompt + image_urls (1–9 images)Reference-Image-to-Video (R2V)Generate a new scene from reference images
prompt + video_url (optional image_urls 0–5 as style refs / audio_setting)Video Edit (EDIT)Rewrite / restylize a source video
Routing priority (high to low): video_url > first_frame_image > image_urls > prompt only. Mutual exclusion rules: the three media fields (first_frame_image / image_urls / video_url) are mutually exclusive in pairs. The only valid combination is video_url + image_urls (EDIT mode + reference images). Passing two mutually exclusive fields returns 400 mixed_media_not_allowed.

Request Parameters

model
string
required
Video generation model name, fixed as happyhorse-1.0
prompt
string
Video content description, up to 2500 characters; cannot contain special tokens
  • T2V / R2V / EDIT modes: required
  • I2V mode: optional, but recommended to guide camera movement and actions
Example: "A little girl walking down the road, cinematic feel"
first_frame_image
string
First-frame image, triggers I2V (Image-to-Video). Supports URL or base64 (data:image/<mime>;base64,<payload>, the gateway uploads it to OSS automatically)Mutually exclusive with image_urls / video_url
First-frame image requirements:
  • Format: JPEG / JPG / PNG / BMP / WEBP
  • Short side: ≥ 300px
  • Aspect ratio: 1:2.5 to 2.5:1
  • File size: ≤ 10MB
image_urls
array<string>
Image array:
  • R2V mode (only image_urls provided): 1–9 images, used as subject/style references to generate a new scene
  • EDIT mode (provided together with video_url): 0–5 images, used as style reference
Supports URL or base64Mutually exclusive with first_frame_image; can be combined with video_url
Reference image requirements:
  • Format: JPEG / JPG / PNG / BMP / WEBP
  • Short side: ≥ 720p recommended
  • Aspect ratio: short / long ≥ 0.4
  • File size: ≤ 10MB
  • Count: R2V must be 1–9; EDIT up to 5
video_url
string
Source video URL, triggers EDIT (Video Edit). Base64 is not supported — provide an HTTP/HTTPS direct linkMutually exclusive with first_frame_image; can be combined with image_urls (≤ 5)
Source video requirements:
  • Duration: 3–60 seconds (> 15s will be auto-truncated by the upstream from 0 to 15s)
  • Resolution: minimum 480p, short side ≥ 360
  • Aspect ratio: 1:8 to 8:1
  • Format: MP4 / MOV (H.264 recommended)
  • Frame rate: > 8 fps
  • File size: ≤ 100MB
In EDIT mode, the generated video’s duration matches the source video (capped at the truncated 15s when the source is longer). The duration parameter has no effect here. To control the output length, trim the source video to the target duration before uploading.
audio_setting
string
default:"auto"
Audio setting, only effective in EDIT mode (must pass video_url)Options:
  • auto - Auto-generate audio (default)
  • origin - Keep the source video’s audio track
Passing this field outside EDIT mode returns 400 audio_setting_only_for_edit
resolution
string
default:"1080P"
Video resolution (affects billing)Options:
  • 720P - Standard
  • 1080P - High definition (default)
duration
integer
default:"5"
Video duration in seconds (affects billing)Supported range: any integer from 3 to 15Default: 5
Has no effect in EDIT mode (when video_url is provided): the generated video’s duration matches the source video (billed by the truncated 15s when the source is longer than 15s). To control the output length, trim the source video first.
size
string
default:"16:9"
Aspect ratioSupported formats:
  • 16:9 - Landscape widescreen (default)
  • 9:16 - Portrait
  • 1:1 - Square
  • 4:3 - Landscape
  • 3:4 - Portrait
Ignored in I2V / EDIT modes — the output aspect ratio is determined automatically by the input media (first-frame image / source video)
watermark
boolean
default:"true"
Whether to add a watermark to the generated video
  • true: add watermark (default)
  • false: no watermark
seed
integer
Random seed used to control the randomness of generated contentValue range: [0, 2147483647]. If omitted, a random seed is used.
  • For identical requests, the model generates different results when receiving different seed values (e.g., omitting seed)
  • For identical requests, the model generates similar results when receiving the same seed value, but exact consistency is not guaranteed

Response

code
integer
Response status code, 200 on success
data
array
Response data array

Use Cases

Case 1: Text-to-Video T2V (Simplest Request)

{
  "model": "happyhorse-1.0",
  "prompt": "A little girl walking down the road, cinematic feel"
}

Case 2: Text-to-Video T2V (Full Parameters)

{
  "model": "happyhorse-1.0",
  "prompt": "A coastal road at sunset, slow-motion camera push-in, cinematic feel",
  "resolution": "1080P",
  "size": "16:9",
  "duration": 8,
  "watermark": false,
  "seed": 42
}

Case 3: Image-to-Video I2V (first_frame_image)

{
  "model": "happyhorse-1.0",
  "prompt": "Bring the scene in the image to life",
  "first_frame_image": "https://example.com/first_frame.png",
  "resolution": "1080P",
  "duration": 5
}

Case 4: Reference-Image-to-Video R2V (multiple references)

{
  "model": "happyhorse-1.0",
  "prompt": "The protagonist from image 1 runs through the scene from image 2, then picks up the prop from image 3. Keep a 3D cartoon style with smooth motion.",
  "image_urls": [
    "https://example.com/img_01.jpg",
    "https://example.com/img_02.png",
    "https://example.com/img_03.jpeg"
  ],
  "resolution": "1080P",
  "size": "16:9",
  "duration": 5,
  "watermark": false
}

Case 5: Video Edit EDIT (keep original audio + style reference)

{
  "model": "happyhorse-1.0",
  "prompt": "Convert the character in the video to a cartoon style, preserving the original motion",
  "video_url": "https://example.com/source.mp4",
  "image_urls": [
    "https://example.com/style_ref.jpg"
  ],
  "resolution": "1080P",
  "audio_setting": "origin",
  "seed": 42
}

Case 6: 720P to Save Cost

{
  "model": "happyhorse-1.0",
  "prompt": "Waves crashing on the beach at sunset",
  "resolution": "720P",
  "size": "16:9",
  "duration": 5
}

Mode Selection Guide

RequirementRecommended Approach
Generate video from text onlyPass only prompt (T2V)
Make an image “come alive” (use it as the first frame)Pass first_frame_image (I2V)
Generate a new scene from a set of reference imagesPass image_urls (1–9, R2V)
Rewrite / restylize an existing videoPass video_url (EDIT), optionally combine with image_urls (0–5) as style refs
Save costUse resolution: "720P"

Usage Tips

  1. Unified entry logic: input fields decide the mode. Note that the three media fields (first_frame_image / image_urls / video_url) are mutually exclusive in pairs
  2. size only effective in T2V/R2V: in I2V / EDIT modes size is ignored — the output aspect ratio is determined by the input media
  3. Duration: 5–10 seconds is the sweet spot. Too short causes choppy motion; too long significantly increases upstream processing time
  4. First-frame image quality: clear, well-composed, subject centered — significantly improves I2V output
  5. Prompt writing: describe motion / camera / atmosphere (e.g., “slow push-in, cinematic, warm tones”) for better results than purely static scene descriptions
  6. EDIT input video: > 15 seconds will be auto-truncated by the upstream from 0 to 15s. If you need other segments, slice the video yourself first
Query Task ResultsVideo generation is an async task that returns a task_id upon submission. Use the Get Task Status endpoint to query generation progress and results.