Skip to main content
POST
/
v1
/
chat
/
completions
Create a chat completion
curl --request POST \
  --url https://{host}/v2/workspaces/slug:llm-gateway/webhooks/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "messages": [
    {
      "role": "system",
      "content": "<string>",
      "name": "<string>",
      "tool_call_id": "<string>",
      "tool_calls": [
        {
          "id": "<string>",
          "type": "function",
          "function": {
            "name": "<string>",
            "arguments": "<string>"
          }
        }
      ]
    }
  ]
}
'
{
  "id": "<string>",
  "object": "chat.completion",
  "created": 123,
  "model": "<string>",
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "system",
        "content": "<string>",
        "name": "<string>",
        "tool_call_id": "<string>",
        "tool_calls": [
          {
            "id": "<string>",
            "type": "function",
            "function": {
              "name": "<string>",
              "arguments": "<string>"
            }
          }
        ]
      },
      "finish_reason": "<string>"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "total_tokens": 123,
    "cost": 123,
    "duration_ms": 123,
    "carbon": {}
  }
}

Documentation Index

Fetch the complete documentation index at: https://docs.prisme.ai/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

Authorization
string
header
required

User session JWT or instance API key (iak_*). Send as Authorization: Bearer <token>.

Body

application/json

OpenAI-compatible chat completion request. Only the fields actually accepted by the gateway are documented here.

model
string
required

Model id from the catalogue (e.g. gpt-4o, eu.anthropic.claude-sonnet-4-20250514-v1:0, vertex-gemini-2.5-flash).

Maximum string length: 256
messages
object[]
required

Conversation history (system + user/assistant/tool turns).

temperature
any

Sampling temperature (provider-dependent range, typically 0–2).

max_tokens
any

Max tokens to generate.

top_p
any

Nucleus sampling parameter.

frequency_penalty
any

OpenAI-style frequency penalty.

presence_penalty
any

OpenAI-style presence penalty.

stop
any

One or more stop sequences (string or array of strings).

stream
boolean

When true, the response is a text/event-stream of ChatCompletionChunk deltas terminating with data: [DONE].

tools
object[]

Tool/function definitions made available to the model. Forwarded to providers that support tool calling.

tool_choice
any

Tool selection hint: "auto", "none", "required", or { type: "function", function: { name } }.

response_format
object

OpenAI-style structured output hint (e.g. { "type": "json_object" }).

seed
any

Provider seed for reproducible sampling (where supported).

task_id
string

Prisme.ai extension. Opaque correlation id propagated to A2A (agent-to-agent) flows.

Maximum string length: 128
analytics_context
object

Prisme.ai extension. Caller-supplied analytics context merged into the analytics.llm.completion event.

Response

Successful completion. Content type depends on request.stream:

  • application/json: non-streaming ChatCompletionResponse.
  • text/event-stream: SSE stream of ChatCompletionChunk payloads terminated by data: [DONE].

Non-streaming chat completion response. Mirrors OpenAI's shape with Prisme.ai extensions on usage (cost, duration_ms, carbon).

id
string
required

Generated id (chatcmpl-<correlationId>).

object
enum<string>
required
Available options:
chat.completion
created
integer
required

Unix timestamp (seconds).

model
string
required

Resolved model id used to serve the request.

choices
object[]
required
usage
object

Token, cost, and carbon accounting.