Quick Start

API Documentation

Get started with the CLEX unified API in under 60 seconds. Access models through a single endpoint with provider routing handled by CLEX.

1 Set your base URL and API key

Environment Setup
# Base URL for all CLEX API calls
CLEX_BASE_URL="https://api.ai.clex.in/v1"

# Your CLEX API key (set as environment variable)
CLEX_API_KEY="clex_xxxxxxxxxxxxxxxxxxxx"

Note: https://api.ai.clex.in/v1 is the API base URL for your SDK or HTTP client. It is not a browser page. Open the dashboard to create keys and manage access.

2 Make your first API call

cURL
curl https://api.ai.clex.in/v1/chat/completions \
  -H "Authorization: Bearer $CLEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama-3.3-70b-instruct",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Important: /v1/chat/completions expects a POST request with JSON. If you open it directly in a browser tab, it will not behave like a web page.

3 Integrate into your app

Terminal
# Install OpenAI SDK if not already
npm install openai

# Run your script
CLEX_API_KEY="clex_xxx" node app.js

# You are now running on CLEX proxy

Authentication

CLEX uses CLEX API keys for authentication. The CLEX routing layer handles upstream provider credentials and normalization.

How it works

1

Create a CLEX API key from api.ai.clex.in

2

Set it as CLEX_API_KEY environment variable

3

CLEX validates your key and routes requests to the selected provider

API authentication
// Send your CLEX API key as a Bearer token
Authorization: Bearer clex_xxxxxxxxxxxxxxxxxxxx

Chat Completions

The primary endpoint for generating AI responses. Compatible with the OpenAI chat completions format.

POST /v1/chat/completions

In this docs page, the API Explorer uses local /api/chat as a proxy for https://api.ai.clex.in/v1/chat/completions.

Use https://api.ai.clex.in/v1/chat/completions from code, cURL, Postman, or the OpenAI SDK. For browser navigation and API keys, go to api.ai.clex.in.

Request Body

Parameter Type Required Description
model string Yes CLEX model ID (e.g. meta/llama-3.3-70b-instruct)
messages array Yes Array of message objects with role and content
temperature number No Sampling temperature, 0–2. Default: 0.7
max_tokens integer No Maximum tokens to generate. Default is model-specific when omitted.
stream boolean No Enable streaming. Default: true
top_p number No Nucleus sampling threshold, 0–1. Default: 0.9

Response Format

Responses follow the OpenAI-compatible format. When streaming is enabled (default), responses arrive as Server-Sent Events.

Non-streaming response
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21
  }
}

Streaming

Streaming uses Server-Sent Events (SSE). Each event contains a JSON object with the incremental token. The stream is terminated by a data: [DONE] event.

Streaming event format
// Each SSE line:
data: {"choices":[{"delta":{"content":"Hello"}}]}

data: {"choices":[{"delta":{"content":" world"}}]}

data: [DONE]

Available Models

CLEX provides access to models models. Browse the full catalog on the Models page. Popular picks:

Model ID Publisher Context Max Output Pricing Use Case

Code Examples

Python — requests
import requests
import json
import os

url = "https://api.ai.clex.in/v1/chat/completions"

payload = {
    "model": "meta/llama-3.3-70b-instruct",
    "messages": [
        {"role": "user", "content": "What is quantum computing?"}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
}

# Streaming response
headers = {"Authorization": f"Bearer {os.environ['CLEX_API_KEY']}"}
response = requests.post(url, json=payload, headers=headers, stream=True)

for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: '):
            data = line[6:]
            if data == '[DONE]':
                break
            chunk = json.loads(data)
            content = chunk['choices'][0]['delta'].get('content', '')
            print(content, end='')

Error Handling

Errors return a standardized JSON object containing an error field with descriptive information, helping developers debug integrations quickly.

JSON Error Format
{
  "error": {
    "message": "Model 'meta/llama-nonexistent' is not available. Check /v1/models for supported models.",
    "type": "upstream_error",
    "code": "provider_error",
    "status": 404
  }
}
HTTP Status Error Code Cause & Resolution
400 Bad Request invalid_request_error Cause: Malformed JSON or invalid parameter.
Fix: Validate the messages array structure and parameter types.
401 Unauthorized authentication_error Cause: Missing or invalid CLEX API key.
Fix: Ensure the CLEX_API_KEY environment variable is set on the server.
404 Not Found model_not_found Cause: Requested model ID is incorrect or deprecated.
Fix: Verify the model exact string from the catalog.
429 Rate Limit rate_limit_exceeded Cause: Exceeded allowed quota or request velocity.
Fix: Pace your requests using exponential backoff. Wait a few seconds before retrying.
500 Server Error internal_server_error Cause: Unhandled backend exception or upstream provider failure.
Fix: Retry the request after a short delay. If it persists, check Support.
503 Unavailable service_unavailable Cause: Model is overloaded or under maintenance.
Fix: Wait and retry, or gracefully fall back to a smaller model (e.g. Llama 8B).

Credits & quotas

Every /v1/chat/completions call charges a number of credits based on the model you call. Daily credit budgets reset at 00:00 UTC. The per-minute counter is a separate burst-protection limit and counts requests, not credits.

Free

100

credits / day

40 req / minute · 1 API key

Starter

500

credits / day

80 req / minute · 5 API keys

Pro

2,000

credits / day

200 req / minute · 20 API keys

Per-model credit cost

Tier Credits / call Typical models
Cheap 1 ≤ ~9B params, embedding, safety, rerank — Gemma-3N, Phi-4-Mini, Mistral-Small, Granite-3.3-8B
Standard 2 ~10–35B + small MoE-A3B — Nemotron-Nano-30B-A3B, DeepSeek-R1-Distill-Qwen-32B, Mistral-Medium
Large 3 49–123B + 80B-A3B thinking — Qwen3.5-122B, GLM5, Kimi-K2-Thinking, Llama-3.3-Nemotron-Super-49B
Premium 5 200B+ MoE / 250B+ dense / multimodal — Kimi-K2.5, Qwen3-Coder-480B, DeepSeek-V3.2, Llama-3.1-Nemotron-Ultra-253B

The full machine-readable map is at /api/credits/pricing. Models not listed in the explicit map default to 1 credit per call.

💡 Best Practices

  • Implement exponential backoff on 429 errors
  • Read the X-Clex-Credits-Remaining header on every response to budget your daily quota
  • Use streaming for long responses to improve perceived latency
  • Pick the smallest model that does the job — small/standard tiers cost 1–2 credits versus 5 for premium

Tutorials & Guides

Kickstart your production workflow with these step-by-step onboarding guides.

API Changelog & Versioning

CLEX endpoints follow semantic versioning. Breaking changes will always be introduced under a new version prefix (e.g. /v2/chat/completions). Below are recent platform updates.

Interactive Documentation

Released the new API Explorer inside our documentation, standard metadata view in our catalog, and an expanded Error Reference.

Extended Model Catalog

Added support for Mistral Large 3, DeepSeek R1 distillation variants, and Meta Llama 4 Scout. All endpoints maintained full backward compatibility.