HappyHorse 1.0 Video Generation

curl --request POST \
  --url https://api.apimart.ai/v1/videos/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "happyhorse-1.0",
    "prompt": "A little girl walking down the road, cinematic feel",
    "resolution": "1080P",
    "size": "16:9",
    "duration": 5,
    "seed": 42
  }'

{
  "code": 200,
  "data": [
    {
      "status": "submitted",
      "task_id": "task_01J9HA7JPQ9A0Z6JZ3V8M9W6PZ"
    }
  ]
}

Authorization

string

required

All API endpoints require Bearer Token authenticationGet your API Key:Visit the API Key Management Page to get your API KeyAdd it to the request header:

Authorization: Bearer YOUR_API_KEY

Mode Routing

happyhorse-1.0 is the unified entry for Text-to-Video / Image-to-Video / Reference-Image-to-Video / Video Edit. The backend automatically determines the mode based on incoming parameters. All modes are billed by the same rule (resolution × seconds only):

Fields you pass	Routes To	Mode Description
`prompt` only	Text-to-Video (T2V)	Generate video purely from text
`prompt` + `first_frame_image`	Image-to-Video (I2V)	Animate from a first-frame image
`prompt` + `image_urls` (1–9 images)	Reference-Image-to-Video (R2V)	Generate a new scene from reference images
`prompt` + `video_url` (optional `image_urls` 0–5 as style refs / `audio_setting`)	Video Edit (EDIT)	Rewrite / restylize a source video

Routing priority (high to low): video_url > first_frame_image > image_urls > prompt only. Mutual exclusion rules: the three media fields (first_frame_image / image_urls / video_url) are mutually exclusive in pairs. The only valid combination is video_url + image_urls (EDIT mode + reference images). Passing two mutually exclusive fields returns 400 mixed_media_not_allowed.

Request Parameters

model

string

required

Video generation model name, fixed as happyhorse-1.0

prompt

string

Video content description, up to 2500 characters; cannot contain special tokens

T2V / R2V / EDIT modes: required
I2V mode: optional, but recommended to guide camera movement and actions

Example: "A little girl walking down the road, cinematic feel"

first_frame_image

string

First-frame image, triggers I2V (Image-to-Video). Supports URL or base64 (data:image/<mime>;base64,<payload>, the gateway uploads it to OSS automatically)Mutually exclusive with image_urls / video_url

First-frame image requirements:

Format: JPEG / JPG / PNG / BMP / WEBP
Short side: ≥ 300px
Aspect ratio: 1:2.5 to 2.5:1
File size: ≤ 10MB

image_urls

array<string>

Image array:

R2V mode (only image_urls provided): 1–9 images, used as subject/style references to generate a new scene
EDIT mode (provided together with video_url): 0–5 images, used as style reference

Supports URL or base64Mutually exclusive with first_frame_image; can be combined with video_url

Reference image requirements:

Format: JPEG / JPG / PNG / BMP / WEBP
Short side: ≥ 720p recommended
Aspect ratio: short / long ≥ 0.4
File size: ≤ 10MB
Count: R2V must be 1–9; EDIT up to 5

video_url

string

Source video URL, triggers EDIT (Video Edit). Base64 is not supported — provide an HTTP/HTTPS direct linkMutually exclusive with first_frame_image; can be combined with image_urls (≤ 5)

Source video requirements:

Duration: 3–60 seconds (> 15s will be auto-truncated by the upstream from 0 to 15s)
Resolution: minimum 480p, short side ≥ 360
Aspect ratio: 1:8 to 8:1
Format: MP4 / MOV (H.264 recommended)
Frame rate: > 8 fps
File size: ≤ 100MB

In EDIT mode, the generated video’s duration matches the source video (capped at the truncated 15s when the source is longer). The duration parameter has no effect here. To control the output length, trim the source video to the target duration before uploading.

audio_setting

string

default:"auto"

Audio setting, only effective in EDIT mode (must pass video_url)Options:

auto - Auto-generate audio (default)
origin - Keep the source video’s audio track

Passing this field outside EDIT mode returns 400 audio_setting_only_for_edit

resolution

string

default:"1080P"

Video resolution (affects billing)Options:

720P - Standard
1080P - High definition (default)

duration

integer

default:"5"

Video duration in seconds (affects billing)Supported range: any integer from 3 to 15Default: 5

Has no effect in EDIT mode (when video_url is provided): the generated video’s duration matches the source video (billed by the truncated 15s when the source is longer than 15s). To control the output length, trim the source video first.

size

string

default:"16:9"

Aspect ratioSupported formats:

16:9 - Landscape widescreen (default)
9:16 - Portrait
1:1 - Square
4:3 - Landscape
3:4 - Portrait

Ignored in I2V / EDIT modes — the output aspect ratio is determined automatically by the input media (first-frame image / source video)

watermark

boolean

default:"true"

Whether to add a watermark to the generated video

true: add watermark (default)
false: no watermark

seed

integer

Random seed used to control the randomness of generated contentValue range: [0, 2147483647]. If omitted, a random seed is used.

For identical requests, the model generates different results when receiving different seed values (e.g., omitting seed)
For identical requests, the model generates similar results when receiving the same seed value, but exact consistency is not guaranteed

Response

code

integer

Response status code, 200 on success

data

array

Response data array

Show Array Elements

status

string

Task status, submitted when initially submitted

task_id

string

Unique task identifier for querying task status and results

Use Cases

Case 1: Text-to-Video T2V (Simplest Request)

{
  "model": "happyhorse-1.0",
  "prompt": "A little girl walking down the road, cinematic feel"
}

Case 2: Text-to-Video T2V (Full Parameters)

{
  "model": "happyhorse-1.0",
  "prompt": "A coastal road at sunset, slow-motion camera push-in, cinematic feel",
  "resolution": "1080P",
  "size": "16:9",
  "duration": 8,
  "watermark": false,
  "seed": 42
}

Case 3: Image-to-Video I2V (first_frame_image)

{
  "model": "happyhorse-1.0",
  "prompt": "Bring the scene in the image to life",
  "first_frame_image": "https://example.com/first_frame.png",
  "resolution": "1080P",
  "duration": 5
}

Case 4: Reference-Image-to-Video R2V (multiple references)

{
  "model": "happyhorse-1.0",
  "prompt": "The protagonist from image 1 runs through the scene from image 2, then picks up the prop from image 3. Keep a 3D cartoon style with smooth motion.",
  "image_urls": [
    "https://example.com/img_01.jpg",
    "https://example.com/img_02.png",
    "https://example.com/img_03.jpeg"
  ],
  "resolution": "1080P",
  "size": "16:9",
  "duration": 5,
  "watermark": false
}

Case 5: Video Edit EDIT (keep original audio + style reference)

{
  "model": "happyhorse-1.0",
  "prompt": "Convert the character in the video to a cartoon style, preserving the original motion",
  "video_url": "https://example.com/source.mp4",
  "image_urls": [
    "https://example.com/style_ref.jpg"
  ],
  "resolution": "1080P",
  "audio_setting": "origin",
  "seed": 42
}

Case 6: 720P to Save Cost

{
  "model": "happyhorse-1.0",
  "prompt": "Waves crashing on the beach at sunset",
  "resolution": "720P",
  "size": "16:9",
  "duration": 5
}

Mode Selection Guide

Requirement	Recommended Approach
Generate video from text only	Pass only `prompt` (T2V)
Make an image “come alive” (use it as the first frame)	Pass `first_frame_image` (I2V)
Generate a new scene from a set of reference images	Pass `image_urls` (1–9, R2V)
Rewrite / restylize an existing video	Pass `video_url` (EDIT), optionally combine with `image_urls` (0–5) as style refs
Save cost	Use `resolution: "720P"`

Usage Tips

Unified entry logic: input fields decide the mode. Note that the three media fields (first_frame_image / image_urls / video_url) are mutually exclusive in pairs
size only effective in T2V/R2V: in I2V / EDIT modes size is ignored — the output aspect ratio is determined by the input media
Duration: 5–10 seconds is the sweet spot. Too short causes choppy motion; too long significantly increases upstream processing time
First-frame image quality: clear, well-composed, subject centered — significantly improves I2V output
Prompt writing: describe motion / camera / atmosphere (e.g., “slow push-in, cinematic, warm tones”) for better results than purely static scene descriptions
EDIT input video: > 15 seconds will be auto-truncated by the upstream from 0 to 15s. If you need other segments, slice the video yourself first

Query Task ResultsVideo generation is an async task that returns a task_id upon submission. Use the Get Task Status endpoint to query generation progress and results.

Overview

Text Series

Image Series

Video Series

Audio Series

Upload Management

Task Management

Account Management

HappyHorse 1.0 Video Generation

Authorization

Mode Routing

Request Parameters

Response

Use Cases

Case 1: Text-to-Video T2V (Simplest Request)

Case 2: Text-to-Video T2V (Full Parameters)

Case 3: Image-to-Video I2V (first_frame_image)

Case 4: Reference-Image-to-Video R2V (multiple references)

Case 5: Video Edit EDIT (keep original audio + style reference)

Case 6: 720P to Save Cost

Mode Selection Guide

Usage Tips

Overview

Text Series

Image Series

Video Series

Audio Series

Upload Management

Task Management

Account Management

​Authorization

​Mode Routing

​Request Parameters

​Response

​Use Cases

​Case 1: Text-to-Video T2V (Simplest Request)

​Case 2: Text-to-Video T2V (Full Parameters)

​Case 3: Image-to-Video I2V (first_frame_image)

​Case 4: Reference-Image-to-Video R2V (multiple references)

​Case 5: Video Edit EDIT (keep original audio + style reference)

​Case 6: 720P to Save Cost

​Mode Selection Guide

​Usage Tips

Authorization

Mode Routing

Request Parameters

Response

Use Cases

Case 1: Text-to-Video T2V (Simplest Request)

Case 2: Text-to-Video T2V (Full Parameters)

Case 3: Image-to-Video I2V (first_frame_image)

Case 4: Reference-Image-to-Video R2V (multiple references)

Case 5: Video Edit EDIT (keep original audio + style reference)

Case 6: 720P to Save Cost

Mode Selection Guide

Usage Tips