Inference
Run your custom model through Carrot's OpenAI-compatible API.
Once Carrot Labs has built a custom model for you, use it through the Carrot API — an OpenAI-compatible endpoint that works with any OpenAI SDK.
Base URL
https://api.carrotlabs.ai/v1Getting started
Point your OpenAI client at the Carrot API and use your custom model name:
from openai import OpenAI
client = OpenAI(
base_url="https://api.carrotlabs.ai/v1",
api_key="sk-...",
)
response = client.chat.completions.create(
model="my-custom-model",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)curl https://api.carrotlabs.ai/v1/chat/completions \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"model": "my-custom-model",
"messages": [{"role": "user", "content": "Hello!"}]
}'Model names
The model field should be your Carrot model name — the name shown in the Models page of the dashboard.
If you see a 404 model not found error, check that the model name matches exactly what's shown in the Models page. Names are case-sensitive.
Streaming
Set stream: true to receive the response as it's generated:
stream = client.chat.completions.create(
model="my-custom-model",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)Supported endpoints
| Endpoint | Description |
|---|---|
POST /v1/chat/completions | Chat completions |
POST /v1/completions | Text completions |
POST /v1/embeddings | Embeddings |
All endpoints accept the standard OpenAI request format. See the API Reference for full details.
Automatic tracing
Add the X-Carrot-Trace: true header to automatically capture traces from inference requests — no SDK needed:
client = OpenAI(
base_url="https://api.carrotlabs.ai/v1",
api_key="sk-...",
default_headers={"X-Carrot-Trace": "true"},
)See Tracing for more on how traces work.
Usage tracking
Every inference request is tracked with token counts, latency, and status. View your usage in the Dashboard.
Error responses
| Status | Meaning |
|---|---|
401 | Invalid or missing API key |
404 | Model not found for your account |
502 | Inference provider temporarily unavailable |