openai/gpt-4, openai/gpt-4o, or even neuroa/m1-preview. It mirrors the behavior of OpenAI’s chat completions API.
Request Body
Send a list of messages and specify the model you’d like to use.| Field | Type | Required | Description |
|---|---|---|---|
model | string | ✅ Yes | Model ID to use (e.g. openai/o1-pro, anthropic/claude-4-opus) |
messages | array | ✅ Yes | Ordered history of messages in the chat |
temperature | number | ❌ No | Controls randomness: 0 (deterministic) to 2 (more random) |
stream | boolean | ❌ No | Enables streaming if true |
max_tokens | number | ❌ No | Max number of tokens to generate |
thinking | ThinkingObject | ❌ No | Only supported by M1 Series models |
tools | object | ❌ No | Tool calling payload (OpenAI, Gemini, Anthropic support this) |
thinking Object
Optional parameter for M1 Series models.
| Field | Type | Required | Description |
|---|---|---|---|
thinking | boolean | ✅ Yes | Enables internal thinking-like behavior |
effort | string | ✅ Yes | Can be "low", "medium" or "high" |
Example Response
Streaming Responses
Whenstream: true is set, the API responds with a Server-Sent Events (SSE) stream. Each chunk includes a delta until completion.
Sample stream response:
data: message is a part of the response. The client should collect delta.content pieces to reconstruct the final message.
Authentication
Use the Bearer token method in yourAuthorization header:
