OpenAI Multimodal Responses API

curl https://api.apimart.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "model": "gpt-5",
    "input": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_text",
            "text": "What is in this image?"
          },
          {
            "type": "input_image",
            "image_url": "https://openai-documentation.vercel.app/images/cat_and_otter.png"
          }
        ]
      }
    ]
  }'

{
  "code": 200,
  "data": {
    "id": "resp-9876543210",
    "object": "response",
    "created": 1677652288,
    "model": "gpt-5",
    "choices": [
      {
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "This image shows a cat and an otter. They appear to be interacting with each other in a very cute and heartwarming scene. The cat and otter seem to be getting along well."
        },
        "finish_reason": "stop"
      }
    ],
    "usage": {
      "prompt_tokens": 156,
      "completion_tokens": 45,
      "total_tokens": 201
    }
  }
}

Authorizations

Authorization

string

required

##All APIs require Bearer Token authentication##Get API Key:Visit the API Key Management Page to get your API KeyAdd to request header:

Authorization: Bearer YOUR_API_KEY

Body

model

string

default:"gpt-5"

required

Model nameSupported models include:

gpt-5 - OpenAI latest multimodal model
GPT-4o-image - GPT-4 optimized multimodal model
gpt-4-vision - GPT-4 vision understanding model
More models coming soon…

input

array

required

Input content listInput array, each item contains role and content fields.💡 Quick fill (Try it area):

Click ”+ Add an item” to add an input item
role input: user (user message), assistant (AI response), or system (system prompt)
content add content blocks (can include text and images)

Show Field details

role

string

default:"user"

required

Role typeOptions: user (user message), assistant (AI response, for multi-turn), system (system prompt, to set AI behavior)

content

array

required

Content arraySupports multiple types of content blocks, can include text and images.

Show Content block types

type

string

required

Content typeOptions:

input_text: Text input
input_image: Image input

text

string

Text contentUsed when type is input_text, fill in the text content

image_url

string

Image URLUsed when type is input_image, fill in the image URL or base64 encoding

temperature

number

Controls output randomness, range 0-2

Lower values (e.g. 0.2) make output more deterministic
Higher values (e.g. 1.8) make output more random

Default: 1.0

max_tokens

integer

Maximum number of tokens to generateDifferent models have different maximum limits, please refer to specific model documentation

stream

boolean

Whether to use streaming output

true: Stream response (SSE format)
false: Return complete response at once

Default: false

top_p

number

Nucleus sampling parameter, range 0-1Controls diversity of generated text, recommended to use with temperature alternativelyDefault: 1.0

tools

array

Tools list for extending model capabilitiesSupported tool types:

Web Search (web_search): Real-time internet information search
File Search (file_search): Search uploaded file content
Function Calling (function): Call custom functions
Remote MCP (remote_mcp): Connect to remote Model Context Protocol services

Example: [{"type": "web_search"}]

Response

string

Unique identifier for the response

object

string

Object type, fixed as response

created

integer

Creation timestamp

model

string

Actual model name used

choices

array

List of generated replies

Show Properties

index

integer

Choice index

message

object

Message content

Show Properties

role

string

Role type (assistant)

content

string

Generated text content

finish_reason

string

Finish reasonPossible values:

stop - Natural completion
length - Max length reached
content_filter - Content filtering

usage

object

Token usage statistics

Show Properties

prompt_tokens

integer

Number of tokens in input

completion_tokens

integer

Number of tokens in output

total_tokens

integer

Total number of tokens

Usage Examples

Text-Only Input

{
  "model": "gpt-5",
  "input": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "Hello, introduce artificial intelligence"
        }
      ]
    }
  ]
}

Using Web Search Tool

{
  "model": "gpt-5",
  "tools": [{"type": "web_search"}],
  "input": "What positive news is there today?"
}

cURL Example

curl "https://api.apimart.ai/v1/responses" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer <token>" \
    -d '{
        "model": "gpt-5",
        "tools": [{"type": "web_search"}],
        "input": "What positive news is there today?"
    }'

Image Understanding

{
  "model": "gpt-5",
  "input": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "Describe this image"
        },
        {
          "type": "input_image",
          "image_url": "https://example.com/image.jpg"
        }
      ]
    }
  ]
}

Multi-Image Analysis

{
  "model": "gpt-5",
  "input": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "Compare the similarities and differences of these two images"
        },
        {
          "type": "input_image",
          "image_url": "https://example.com/image1.jpg"
        },
        {
          "type": "input_image",
          "image_url": "https://example.com/image2.jpg"
        }
      ]
    }
  ]
}

Base64 Encoded Image

{
  "model": "gpt-5",
  "input": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "Analyze this image"
        },
        {
          "type": "input_image",
          "image_url": "data:image/jpeg;base64,/9j/4AAQSkZJRg..."
        }
      ]
    }
  ]
}

Using File Search Tool

{
  "model": "gpt-5",
  "tools": [{"type": "file_search"}],
  "input": "Based on uploaded documents, summarize the company's quarterly performance"
}

Using Function Calling

{
  "model": "gpt-5",
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather information for a specified city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string",
              "description": "City name, e.g.: Beijing"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"],
              "description": "Temperature unit"
            }
          },
          "required": ["city"]
        }
      }
    }
  ],
  "input": "What's the weather like in Beijing today?"
}

Using Remote MCP

{
  "model": "gpt-5",
  "tools": [
    {
      "type": "remote_mcp",
      "remote_mcp": {
        "url": "https://mcp.example.com/api",
        "auth_token": "your_mcp_token"
      }
    }
  ],
  "input": "Query user information in the database"
}

Combining Multiple Tools

{
  "model": "gpt-5",
  "tools": [
    {"type": "web_search"},
    {"type": "file_search"},
    {
      "type": "function",
      "function": {
        "name": "calculate",
        "description": "Perform mathematical calculations",
        "parameters": {
          "type": "object",
          "properties": {
            "expression": {
              "type": "string",
              "description": "Mathematical expression"
            }
          },
          "required": ["expression"]
        }
      }
    }
  ],
  "input": "Search for the latest Bitcoin price and calculate the total value of 100 Bitcoins"
}

Content Type Specifications

input_text

Text input type Properties:

type: Fixed as "input_text"
text: Text content (string)

input_image

Image input type Properties:

type: Fixed as "input_image"
image_url: Image URL or Base64 encoded data URI

Supported image formats:

JPEG
PNG
GIF
WebP

Image size limits:

Maximum file size: 20MB
Recommended aspect_ratio: No more than 2048x2048 pixels

Tool Usage Details

Web Search

The web search tool allows the model to access real-time internet information. Configuration example:

{
  "tools": [{"type": "web_search"}]
}

Use cases:

Query latest news and current events
Get real-time data (stocks, weather, exchange rates, etc.)
Search for latest technical documentation
Verify factual information

File Search

The file search tool allows the model to search for relevant information in uploaded documents. Configuration example:

{
  "tools": [{"type": "file_search"}]
}

Use cases:

Analyze internal corporate documents
Search technical specifications and manuals
Query contracts and legal documents
Knowledge base Q&A systems

Function Calling

Define custom functions to enable the model to call external APIs or perform specific operations. Complete configuration example:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_stock_price",
        "description": "Get real-time stock price",
        "parameters": {
          "type": "object",
          "properties": {
            "symbol": {
              "type": "string",
              "description": "Stock symbol, e.g.: AAPL"
            },
            "currency": {
              "type": "string",
              "enum": ["USD", "CNY"],
              "description": "Currency unit",
              "default": "USD"
            }
          },
          "required": ["symbol"]
        }
      }
    }
  ]
}

Parameter descriptions:

name: Function name (required)
description: Function description (required)
parameters: Parameter definition using JSON Schema format
- type: Parameter type
- properties: Parameter property definitions
- required: List of required parameters

Use cases:

Call third-party APIs
Execute database queries
Trigger business processes
Integrate with internal systems

Remote MCP

Connect to remote Model Context Protocol (MCP) services to extend model capabilities. Configuration example:

{
  "tools": [
    {
      "type": "remote_mcp",
      "remote_mcp": {
        "url": "https://your-mcp-server.com/api",
        "auth_token": "your_auth_token",
        "timeout": 30
      }
    }
  ]
}

Parameter descriptions:

url: MCP server address (required)
auth_token: Authentication token (optional)
timeout: Timeout in seconds, default 30 seconds

Use cases:

Connect to enterprise-level AI services
Use domain-specific models
Access protected data sources
Distributed AI system integration

Tool Response Format

When the model uses tools, the response format will include tool call information:

{
  "id": "resp-123456",
  "object": "response",
  "created": 1677652288,
  "model": "gpt-5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\": \"Beijing\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Tool call workflow:

Model receives user input
Analyzes whether tools are needed
If needed, returns tool call request
Client executes tool call
Returns tool results to model
Model generates final response

Important Notes

Image URL requirements:
- Must be a publicly accessible URL
- Or use Base64 encoded Data URI format
Token billing:
- Images consume tokens based on their aspect_ratio
- High-aspect_ratio images are automatically resized to optimize costs
- Tool calls also consume additional tokens
Content order:
- Order of elements in content array affects model understanding
- Recommended to place text instructions first, then images
Multimodal combinations:
- Can mix multiple texts and images in one request
- Supports multi-turn conversations with context coherence
Tool usage limitations:
- When using multiple tools simultaneously, the model intelligently selects the most appropriate tool
- Function calling requires clear function definitions and parameter descriptions
- Web search results may be limited by region and time
API compatibility:
- Fully compatible with OpenAI Responses API format
- Seamlessly migrate existing OpenAI code
- Supports all OpenAI tool extension features

Overview

Text Series

Image Series

Video Series

Audio Series

Task Management

Account Management

OpenAI Multimodal Responses API

Authorizations

Body

Response

Usage Examples

Text-Only Input

Using Web Search Tool

Image Understanding

Multi-Image Analysis

Base64 Encoded Image

Using File Search Tool

Using Function Calling

Using Remote MCP

Combining Multiple Tools

Content Type Specifications

input_text

input_image

Tool Usage Details

Web Search

File Search

Function Calling

Remote MCP

Tool Response Format

Important Notes

Overview

Text Series

Image Series

Video Series

Audio Series

Task Management

Account Management

​Authorizations

​Body

​Response

​Usage Examples

​Text-Only Input

​Using Web Search Tool

​Image Understanding

​Multi-Image Analysis

​Base64 Encoded Image

​Using File Search Tool

​Using Function Calling

​Using Remote MCP

​Combining Multiple Tools

​Content Type Specifications

​input_text

​input_image

​Tool Usage Details

​Web Search

​File Search

​Function Calling

​Remote MCP

​Tool Response Format

​Important Notes

Authorizations

Body

Response

Usage Examples

Text-Only Input

Using Web Search Tool

Image Understanding

Multi-Image Analysis

Base64 Encoded Image

Using File Search Tool

Using Function Calling

Using Remote MCP

Combining Multiple Tools

Content Type Specifications

input_text

input_image

Tool Usage Details

Web Search

File Search

Function Calling

Remote MCP

Tool Response Format

Important Notes