Authorizations
##All APIs require Bearer Token authentication##Get API Key:Visit the API Key Management Page to get your API KeyAdd to request header:
Body
Model nameSupported models include:
gpt-5- OpenAI latest multimodal modelGPT-4o-image- GPT-4 optimized multimodal modelgpt-4-vision- GPT-4 vision understanding model- More models coming soon…
Input content listEach input item contains:
role: Role type (user,assistant,system)content: Content array, supports multiple types:input_text: Text inputinput_image: Image input
Controls output randomness, range 0-2
- Lower values (e.g. 0.2) make output more deterministic
- Higher values (e.g. 1.8) make output more random
Maximum number of tokens to generateDifferent models have different maximum limits, please refer to specific model documentation
Whether to use streaming output
true: Stream response (SSE format)false: Return complete response at once
Nucleus sampling parameter, range 0-1Controls diversity of generated text, recommended to use with temperature alternativelyDefault: 1.0
Tools list for extending model capabilitiesSupported tool types:
- Web Search (
web_search): Real-time internet information search - File Search (
file_search): Search uploaded file content - Function Calling (
function): Call custom functions - Remote MCP (
remote_mcp): Connect to remote Model Context Protocol services
[{"type": "web_search"}]Response
Unique identifier for the response
Object type, fixed as
responseCreation timestamp
Actual model name used
List of generated replies
Token usage statistics
Usage Examples
Text-Only Input
Using Web Search Tool
cURL Example
Image Understanding
Multi-Image Analysis
Base64 Encoded Image
Using File Search Tool
Using Function Calling
Using Remote MCP
Combining Multiple Tools
Content Type Specifications
input_text
Text input type Properties:type: Fixed as"input_text"text: Text content (string)
input_image
Image input type Properties:type: Fixed as"input_image"image_url: Image URL or Base64 encoded data URI
- JPEG
- PNG
- GIF
- WebP
- Maximum file size: 20MB
- Recommended aspect_ratio: No more than 2048x2048 pixels
Tool Usage Details
Web Search
The web search tool allows the model to access real-time internet information. Configuration example:- Query latest news and current events
- Get real-time data (stocks, weather, exchange rates, etc.)
- Search for latest technical documentation
- Verify factual information
File Search
The file search tool allows the model to search for relevant information in uploaded documents. Configuration example:- Analyze internal corporate documents
- Search technical specifications and manuals
- Query contracts and legal documents
- Knowledge base Q&A systems
Function Calling
Define custom functions to enable the model to call external APIs or perform specific operations. Complete configuration example:name: Function name (required)description: Function description (required)parameters: Parameter definition using JSON Schema formattype: Parameter typeproperties: Parameter property definitionsrequired: List of required parameters
- Call third-party APIs
- Execute database queries
- Trigger business processes
- Integrate with internal systems
Remote MCP
Connect to remote Model Context Protocol (MCP) services to extend model capabilities. Configuration example:url: MCP server address (required)auth_token: Authentication token (optional)timeout: Timeout in seconds, default 30 seconds
- Connect to enterprise-level AI services
- Use domain-specific models
- Access protected data sources
- Distributed AI system integration
Tool Response Format
When the model uses tools, the response format will include tool call information:- Model receives user input
- Analyzes whether tools are needed
- If needed, returns tool call request
- Client executes tool call
- Returns tool results to model
- Model generates final response
Important Notes
-
Image URL requirements:
- Must be a publicly accessible URL
- Or use Base64 encoded Data URI format
-
Token billing:
- Images consume tokens based on their aspect_ratio
- High-aspect_ratio images are automatically resized to optimize costs
- Tool calls also consume additional tokens
-
Content order:
- Order of elements in content array affects model understanding
- Recommended to place text instructions first, then images
-
Multimodal combinations:
- Can mix multiple texts and images in one request
- Supports multi-turn conversations with context coherence
-
Tool usage limitations:
- When using multiple tools simultaneously, the model intelligently selects the most appropriate tool
- Function calling requires clear function definitions and parameter descriptions
- Web search results may be limited by region and time
-
API compatibility:
- Fully compatible with OpenAI Responses API format
- Seamlessly migrate existing OpenAI code
- Supports all OpenAI tool extension features