Skip to main content
POST
/
v1
/
audio
/
transcriptions
curl --request POST \
  --url https://api.apimart.ai/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'file=@/path/to/audio.mp3' \
  --form 'model=whisper-1' \
  --form 'language=en' \
  --form 'response_format=json'
{
  "text": "This is a transcribed text from the test audio."
}
curl --request POST \
  --url https://api.apimart.ai/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'file=@/path/to/audio.mp3' \
  --form 'model=whisper-1' \
  --form 'language=en' \
  --form 'response_format=json'
{
  "text": "This is a transcribed text from the test audio."
}

Authorizations

Authorization
string
required
All interfaces require Bearer Token authenticationGet API Key:Visit API Key Management Page to get your API KeyAdd to request header:
Authorization: Bearer YOUR_API_KEY

Body

file
file
required
Audio file to transcribeSupported formats: mp3, mp4, mpeg, mpga, m4a, wav, webmMaximum file size: 25 MB
model
string
default:"whisper-1"
required
Speech recognition model nameExample: "whisper-1"
language
string
Language code of the audio (ISO-639-1 format)Specifying the language can improve accuracy and speedSupported languages include: zh (Chinese), en (English), ja (Japanese), ko (Korean), and 99 other languagesExample: "en"
prompt
string
Optional text prompt to guide the transcription style or continue from previous audioMaximum 224 tokens
response_format
string
default:"json"
Output formatSupported formats:
  • json - JSON format (text only)
  • text - Plain text
  • srt - SRT subtitle format
  • verbose_json - Verbose JSON format (includes timestamps and metadata)
  • vtt - WebVTT subtitle format
temperature
number
default:"0"
Sampling temperature, range 0 to 1Higher values (like 0.8) make output more random, lower values (like 0.2) make it more deterministic and consistent

Response

text
string
Transcribed text content
task
string
Task type, fixed as transcribeOnly returned in verbose_json format
language
string
Detected or specified language codeOnly returned in verbose_json format
duration
number
Audio duration (seconds)Only returned in verbose_json format
segments
array
Array of text segmentsOnly returned in verbose_json format