Quick Start

API Documentation

Get started with the CLEX unified API in under 60 seconds. Access models through a single endpoint with provider routing handled by CLEX.

1 Set your base URL and API key

Environment Setup

# Base URL for all CLEX API calls
CLEX_BASE_URL="https://api.ai.clex.in/v1"

# Your CLEX API key (set as environment variable)
CLEX_API_KEY="clex_xxxxxxxxxxxxxxxxxxxx"

Note: https://api.ai.clex.in/v1 is the API base URL for your SDK or HTTP client. It is not a browser page. Open the dashboard to create keys and manage access.

2 Make your first API call

cURL

curl https://api.ai.clex.in/v1/chat/completions \
  -H "Authorization: Bearer $CLEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama-3.3-70b-instruct",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Important: /v1/chat/completions expects a POST request with JSON. If you open it directly in a browser tab, it will not behave like a web page.

3 Integrate into your app

Terminal

# Install OpenAI SDK if not already
npm install openai

# Run your script
CLEX_API_KEY="clex_xxx" node app.js

# You are now running on CLEX proxy

Authentication

CLEX uses CLEX API keys for authentication. The CLEX routing layer handles upstream provider credentials and normalization.

How it works

1

Create a CLEX API key from api.ai.clex.in

2

Set it as CLEX_API_KEY environment variable

3

CLEX validates your key and routes requests to the selected provider

API authentication

// Send your CLEX API key as a Bearer token
Authorization: Bearer clex_xxxxxxxxxxxxxxxxxxxx

Chat Completions

The primary endpoint for generating AI responses. Compatible with the OpenAI chat completions format.

POST /v1/chat/completions

In this docs page, the API Explorer uses local /api/chat as a proxy for https://api.ai.clex.in/v1/chat/completions.

Use https://api.ai.clex.in/v1/chat/completions from code, cURL, Postman, or the OpenAI SDK. For browser navigation and API keys, go to api.ai.clex.in.

Request Body

Parameter	Type	Required	Description
model	string	Yes	CLEX model ID (e.g. `meta/llama-3.3-70b-instruct`)
messages	array	Yes	Array of message objects with `role` and `content`
temperature	number	No	Sampling temperature, 0–2. Default: 0.7
max_tokens	integer	No	Maximum tokens to generate. Default is model-specific when omitted.
stream	boolean	No	Enable streaming. Default: true
top_p	number	No	Nucleus sampling threshold, 0–1. Default: 0.9

Response Format

Responses follow the OpenAI-compatible format. When streaming is enabled (default), responses arrive as Server-Sent Events.

Non-streaming response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  }
}

Streaming

Streaming uses Server-Sent Events (SSE). Each event contains a JSON object with the incremental token. The stream is terminated by a data: [DONE] event.

Streaming event format

// Each SSE line:
data: {"choices":[{"delta":{"content":"Hello"}}]}

data: {"choices":[{"delta":{"content":" world"}}]}

data: [DONE]

Available Models

CLEX provides access to models models. Browse the full catalog on the Models page. Popular picks:

Model ID	Publisher	Context	Max Output	Pricing	Use Case

Code Examples

Python — requests

import requests
import json
import os

url = "https://api.ai.clex.in/v1/chat/completions"

payload = {
    "model": "meta/llama-3.3-70b-instruct",
    "messages": [
        {"role": "user", "content": "What is quantum computing?"}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
}

# Streaming response
headers = {"Authorization": f"Bearer {os.environ['CLEX_API_KEY']}"}
response = requests.post(url, json=payload, headers=headers, stream=True)

for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: '):
            data = line[6:]
            if data == '[DONE]':
                break
            chunk = json.loads(data)
            content = chunk['choices'][0]['delta'].get('content', '')
            print(content, end='')

JavaScript — fetch

const response = await fetch('https://api.ai.clex.in/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer ' + process.env.CLEX_API_KEY,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'meta/llama-3.3-70b-instruct',
    messages: [
      { role: 'user', content: 'What is quantum computing?' }
    ]
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = line.slice(6);
      if (data === '[DONE]') continue;
      const parsed = JSON.parse(data);
      const token = parsed.choices?.[0]?.delta?.content;
      if (token) process.stdout.write(token);
    }
  }
}

cURL

curl https://api.ai.clex.in/v1/chat/completions \
  -X POST \
  -H "Authorization: Bearer $CLEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama-3.3-70b-instruct",
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 1024,
    "stream": true
  }'

Error Handling

Errors return a standardized JSON object containing an error field with descriptive information, helping developers debug integrations quickly.

JSON Error Format

{
  "error": {
    "message": "Model 'meta/llama-nonexistent' is not available. Check /v1/models for supported models.",
    "type": "upstream_error",
    "code": "provider_error",
    "status": 404
  }
}

HTTP Status	Error Code	Cause & Resolution
400 Bad Request	`invalid_request_error`	Cause: Malformed JSON or invalid parameter. Fix: Validate the `messages` array structure and parameter types.
401 Unauthorized	`authentication_error`	Cause: Missing or invalid CLEX API key. Fix: Ensure the `CLEX_API_KEY` environment variable is set on the server.
404 Not Found	`model_not_found`	Cause: Requested model ID is incorrect or deprecated. Fix: Verify the model exact string from the catalog.
429 Rate Limit	`rate_limit_exceeded`	Cause: Exceeded allowed quota or request velocity. Fix: Pace your requests using exponential backoff. Wait a few seconds before retrying.
500 Server Error	`internal_server_error`	Cause: Unhandled backend exception or upstream provider failure. Fix: Retry the request after a short delay. If it persists, check Support.
503 Unavailable	`service_unavailable`	Cause: Model is overloaded or under maintenance. Fix: Wait and retry, or gracefully fall back to a smaller model (e.g. Llama 8B).

Credits & quotas

Every /v1/chat/completions call charges a number of credits based on the model you call. Daily credit budgets reset at 00:00 UTC. The per-minute counter is a separate burst-protection limit and counts requests, not credits.

Free

100

credits / day

40 req / minute · 1 API key

Starter

500

credits / day

80 req / minute · 5 API keys

Pro

2,000

credits / day

200 req / minute · 20 API keys

Per-model credit cost

Tier	Credits / call	Typical models
Cheap	1	≤ ~9B params, embedding, safety, rerank — Gemma-3N, Phi-4-Mini, Mistral-Small, Granite-3.3-8B
Standard	2	~10–35B + small MoE-A3B — Nemotron-Nano-30B-A3B, DeepSeek-R1-Distill-Qwen-32B, Mistral-Medium
Large	3	49–123B + 80B-A3B thinking — Qwen3.5-122B, GLM5, Kimi-K2-Thinking, Llama-3.3-Nemotron-Super-49B
Premium	5	200B+ MoE / 250B+ dense / multimodal — Kimi-K2.5, Qwen3-Coder-480B, DeepSeek-V3.2, Llama-3.1-Nemotron-Ultra-253B

The full machine-readable map is at /api/credits/pricing. Models not listed in the explicit map default to 1 credit per call.

💡 Best Practices

• Implement exponential backoff on 429 errors
• Read the X-Clex-Credits-Remaining header on every response to budget your daily quota
• Use streaming for long responses to improve perceived latency
• Pick the smallest model that does the job — small/standard tiers cost 1–2 credits versus 5 for premium

Tutorials & Guides

Kickstart your production workflow with these step-by-step onboarding guides.

Build a Chatbot UI

Learn to wire up vanilla JS and standard HTML to stream tokens back to the client.

Read Guide

Agentic Workflows

Use DeepSeek or Llama 3.3 for multi-step reasoning and tool-calling flows.

Read Guide

RAG Setup

Build retrieval pipelines using CLEX embedding endpoints alongside conversational LLMs.

Read Guide

API Changelog & Versioning

CLEX endpoints follow semantic versioning. Breaking changes will always be introduced under a new version prefix (e.g. /v2/chat/completions). Below are recent platform updates.

March 2026

Interactive Documentation

Released the new API Explorer inside our documentation, standard metadata view in our catalog, and an expanded Error Reference.

February 2026

Extended Model Catalog

Added support for Mistral Large 3, DeepSeek R1 distillation variants, and Meta Llama 4 Scout. All endpoints maintained full backward compatibility.