Inference

Once Carrot Labs has built a custom model for you, use it through the Carrot API — an OpenAI-compatible endpoint that works with any OpenAI SDK.

Base URL

https://api.carrotlabs.ai/v1

Getting started

Point your OpenAI client at the Carrot API and use your custom model name:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.carrotlabs.ai/v1",
    api_key="sk-...",
)

response = client.chat.completions.create(
    model="my-custom-model",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

curl https://api.carrotlabs.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-custom-model",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Model names

The model field should be your Carrot model name — the name shown in the Models page of the dashboard.

If you see a 404 model not found error, check that the model name matches exactly what's shown in the Models page. Names are case-sensitive.

Streaming

Set stream: true to receive the response as it's generated:

stream = client.chat.completions.create(
    model="my-custom-model",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Supported endpoints

Endpoint	Description
`POST /v1/chat/completions`	Chat completions
`POST /v1/completions`	Text completions
`POST /v1/embeddings`	Embeddings

All endpoints accept the standard OpenAI request format. See the API Reference for full details.

Automatic tracing

Add the X-Carrot-Trace: true header to automatically capture traces from inference requests — no SDK needed:

client = OpenAI(
    base_url="https://api.carrotlabs.ai/v1",
    api_key="sk-...",
    default_headers={"X-Carrot-Trace": "true"},
)

See Tracing for more on how traces work.

Usage tracking

Every inference request is tracked with token counts, latency, and status. View your usage in the Dashboard.

Error responses

Status	Meaning
`401`	Invalid or missing API key
`404`	Model not found for your account
`502`	Inference provider temporarily unavailable

Inference

On this page