cURL
Python
JavaScript
Go
Java
PHP
Ruby
Swift
C#
C
Objective-C
OCaml
Dart
R
curl --request POST \
--url https://api.apimart.ai/v1/audio/transcriptions \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: multipart/form-data' \
--form 'file=@/path/to/audio.mp3' \
--form 'model=whisper-1' \
--form 'language=en' \
--form 'response_format=json'
200
200 (Verbose Format)
200 (SRT Subtitle Format)
400
401
402
413
429
500
502
{
"text" : "This is a transcribed text from the test audio."
}
Authorizations
All interfaces require Bearer Token authentication Get API Key: Visit API Key Management Page to get your API Key Add to request header: Authorization: Bearer YOUR_API_KEY
Body
⚠️ Online testing (Try it) is not supported for this endpoint Due to file upload limitations, please test using:
Apifox / Postman - Manually change file parameter to file type after importing
cURL - Refer to code examples on the right
SDK - Use SDK examples in various languages
Audio file to transcribe (File type) ⚠️ Note : When testing with Apifox or similar tools:
After importing, manually change this parameter type to file
Ensure request Content-Type is multipart/form-data
Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm Maximum file size: 25 MB
model
string
default: "whisper-1"
required
Speech recognition model name Example: "whisper-1"
Language code of the audio (ISO-639-1 format) Specifying the language can improve accuracy and speed Supported languages include: zh (Chinese), en (English), ja (Japanese), ko (Korean), and 99 other languages Example: "en"
Optional text prompt to guide the transcription style or continue from previous audio Maximum 224 tokens
Output format Supported formats:
json - JSON format (text only)
text - Plain text
srt - SRT subtitle format
verbose_json - Verbose JSON format (includes timestamps and metadata)
vtt - WebVTT subtitle format
Sampling temperature, range 0 to 1 Higher values (like 0.8) make output more random, lower values (like 0.2) make it more deterministic and consistent
Response
Task type, fixed as transcribe Only returned in verbose_json format
Detected or specified language code Only returned in verbose_json format
Audio duration (seconds) Only returned in verbose_json format
Array of text segments Only returned in verbose_json format Segment start time (seconds)
Segment end time (seconds)
Sampling temperature used