Carrot LabsCarrot Docs

Inference

Run your custom model through Carrot's OpenAI-compatible API.

Once Carrot Labs has built a custom model for you, use it through the Carrot API — an OpenAI-compatible endpoint that works with any OpenAI SDK.

Base URL

https://api.carrotlabs.ai/v1

Getting started

Point your OpenAI client at the Carrot API and use your custom model name:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.carrotlabs.ai/v1",
    api_key="sk-...",
)

response = client.chat.completions.create(
    model="my-custom-model",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
curl https://api.carrotlabs.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-custom-model",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Model names

The model field should be your Carrot model name — the name shown in the Models page of the dashboard.

If you see a 404 model not found error, check that the model name matches exactly what's shown in the Models page. Names are case-sensitive.

Streaming

Set stream: true to receive the response as it's generated:

stream = client.chat.completions.create(
    model="my-custom-model",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Supported endpoints

EndpointDescription
POST /v1/chat/completionsChat completions
POST /v1/completionsText completions
POST /v1/embeddingsEmbeddings

All endpoints accept the standard OpenAI request format. See the API Reference for full details.

Automatic tracing

Add the X-Carrot-Trace: true header to automatically capture traces from inference requests — no SDK needed:

client = OpenAI(
    base_url="https://api.carrotlabs.ai/v1",
    api_key="sk-...",
    default_headers={"X-Carrot-Trace": "true"},
)

See Tracing for more on how traces work.

Usage tracking

Every inference request is tracked with token counts, latency, and status. View your usage in the Dashboard.

Error responses

StatusMeaning
401Invalid or missing API key
404Model not found for your account
502Inference provider temporarily unavailable

On this page