Skip to main content
POST
/
v1
/
videos
/
generations
curl --request POST \
  --url https://api.apimart.ai/v1/videos/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "happyhorse-1.1",
    "prompt": "A little girl walking down the road, cinematic feel",
    "resolution": "1080P",
    "size": "16:9",
    "duration": 5,
    "seed": 42
  }'
{
  "code": 200,
  "data": [
    {
      "status": "submitted",
      "task_id": "task_01J9HA7JPQ9A0Z6JZ3V8M9W6PZ"
    }
  ]
}
curl --request POST \
  --url https://api.apimart.ai/v1/videos/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "happyhorse-1.1",
    "prompt": "A little girl walking down the road, cinematic feel",
    "resolution": "1080P",
    "size": "16:9",
    "duration": 5,
    "seed": 42
  }'
{
  "code": 200,
  "data": [
    {
      "status": "submitted",
      "task_id": "task_01J9HA7JPQ9A0Z6JZ3V8M9W6PZ"
    }
  ]
}

Authorization

Authorization
string
required
All API endpoints require Bearer Token authenticationGet your API Key:Visit the API Key Management Page to get your API KeyAdd it to the request header:
Authorization: Bearer YOUR_API_KEY

Mode Routing

happyhorse-1.1 is the unified entry for Text-to-Video / Image-to-Video / Reference-Image-to-Video. The backend automatically determines the mode based on incoming parameters. All modes are billed by the same rule (resolution × seconds only):
Fields you passRoutes ToMode Description
prompt onlyText-to-Video (T2V)Generate video purely from text
prompt + first_frame_imageImage-to-Video (I2V)Animate from a first-frame image
prompt + image_urls (1–9 images)Reference-Image-to-Video (R2V)Generate a new scene from reference images
Routing priority (high to low): first_frame_image > image_urls > prompt only. Mutual exclusion rules: the two media fields (first_frame_image / image_urls) are mutually exclusive. Passing both mutually exclusive fields returns 400 mixed_media_not_allowed.

Request Parameters

model
string
required
Video generation model name, fixed as happyhorse-1.1
prompt
string
Video content description, up to 2500 characters; cannot contain special tokensExample: "A little girl walking down the road, cinematic feel"
first_frame_image
string
First-frame image, triggers I2V (Image-to-Video). Supports URL or base64 (data:image/<mime>;base64,<payload>, the gateway uploads it to OSS automatically)Mutually exclusive with image_urls
First-frame image requirements:
  • Format: JPEG / JPG / PNG / BMP / WEBP
  • Short side: ≥ 300px
  • Aspect ratio: 1:2.5 to 2.5:1
  • File size: ≤ 10MB
image_urls
array<string>
Image array (R2V mode): 1–9 images, used as subject/style references to generate a new sceneSupports URL or base64Mutually exclusive with first_frame_image
Reference image requirements:
  • Format: JPEG / JPG / PNG / BMP / WEBP
  • Short side: ≥ 720p recommended
  • Aspect ratio: short / long ≥ 0.4
  • File size: ≤ 10MB
  • Count: 1–9 images
resolution
string
default:"1080P"
Video resolution (affects billing)Options:
  • 720P - Standard
  • 1080P - High definition (default)
duration
integer
default:"5"
Video duration in seconds (affects billing)Supported range: any integer from 3 to 15Default: 5
size
string
default:"16:9"
Aspect ratioSupported formats:
  • 16:9 - Landscape widescreen (default)
  • 9:16 - Portrait
  • 1:1 - Square
  • 4:3 - Landscape
  • 3:4 - Portrait
Ignored in I2V mode — the output aspect ratio is determined automatically by the input media (first-frame image)
watermark
boolean
default:"false"
Whether to add a watermark to the generated video
  • true: Add watermark
  • false: Do not add watermark (default)
seed
integer
Random seed used to control the randomness of generated contentValue range: [0, 2147483647]. If omitted, a random seed is used.
  • For identical requests, the model generates different results when receiving different seed values (e.g., omitting seed)
  • For identical requests, the model generates similar results when receiving the same seed value, but exact consistency is not guaranteed

Response

code
integer
Response status code, 200 on success
data
array
Response data array

Use Cases

Case 1: Text-to-Video T2V (Simplest Request)

{
  "model": "happyhorse-1.1",
  "prompt": "A little girl walking down the road, cinematic feel"
}

Case 2: Text-to-Video T2V (Full Parameters)

{
  "model": "happyhorse-1.1",
  "prompt": "A coastal road at sunset, slow-motion camera push-in, cinematic feel",
  "resolution": "1080P",
  "size": "16:9",
  "duration": 8,
  "seed": 42
}

Case 3: Image-to-Video I2V (first_frame_image)

{
  "model": "happyhorse-1.1",
  "prompt": "Bring the scene in the image to life",
  "first_frame_image": "https://example.com/first_frame.png",
  "resolution": "1080P",
  "duration": 5
}

Case 4: Reference-Image-to-Video R2V (multiple references)

{
  "model": "happyhorse-1.1",
  "prompt": "The protagonist from image 1 runs through the scene from image 2, then picks up the prop from image 3. Keep a 3D cartoon style with smooth motion.",
  "image_urls": [
    "https://example.com/img_01.jpg",
    "https://example.com/img_02.png",
    "https://example.com/img_03.jpeg"
  ],
  "resolution": "1080P",
  "size": "16:9",
  "duration": 5
}

Case 5: 720P to Save Cost

{
  "model": "happyhorse-1.1",
  "prompt": "Waves crashing on the beach at sunset",
  "resolution": "720P",
  "size": "16:9",
  "duration": 5
}

Mode Selection Guide

RequirementRecommended Approach
Generate video from text onlyPass only prompt (T2V)
Make an image “come alive” (use it as the first frame)Pass first_frame_image (I2V)
Generate a new scene from a set of reference imagesPass image_urls (1–9, R2V)
Save costUse resolution: "720P"

Usage Tips

  1. Unified entry logic: input fields decide the mode. Note that the two media fields (first_frame_image / image_urls) are mutually exclusive
  2. size only effective in T2V/R2V: in I2V mode size is ignored — the output aspect ratio is determined by the input media
  3. Duration: 5–10 seconds is the sweet spot. Too short causes choppy motion; too long significantly increases upstream processing time
  4. First-frame image quality: clear, well-composed, subject centered — significantly improves I2V output
  5. Prompt writing: describe motion / camera / atmosphere (e.g., “slow push-in, cinematic, warm tones”) for better results than purely static scene descriptions
Query Task ResultsVideo generation is an async task that returns a task_id upon submission. Use the Get Task Status endpoint to query generation progress and results.