Authorizations
All interfaces require Bearer Token authenticationGet API Key:Visit API Key Management Page to get your API KeyAdd to request header:
Body
Audio file to transcribeSupported formats: mp3, mp4, mpeg, mpga, m4a, wav, webmMaximum file size: 25 MB
Speech recognition model nameExample:
"whisper-1"Language code of the audio (ISO-639-1 format)Specifying the language can improve accuracy and speedSupported languages include: zh (Chinese), en (English), ja (Japanese), ko (Korean), and 99 other languagesExample:
"en"Optional text prompt to guide the transcription style or continue from previous audioMaximum 224 tokens
Output formatSupported formats:
json- JSON format (text only)text- Plain textsrt- SRT subtitle formatverbose_json- Verbose JSON format (includes timestamps and metadata)vtt- WebVTT subtitle format
Sampling temperature, range 0 to 1Higher values (like 0.8) make output more random, lower values (like 0.2) make it more deterministic and consistent
Response
Transcribed text content
Task type, fixed as
transcribeOnly returned in verbose_json formatDetected or specified language codeOnly returned in verbose_json format
Audio duration (seconds)Only returned in verbose_json format
Array of text segmentsOnly returned in verbose_json format