Documentation - RouteLLM

Overview #

RouteLLM is an intelligent AI model routing service that automatically selects the optimal AI model for each request based on complexity, cost, and performance requirements. By using RouteLLM, you can reduce AI costs by up to 85% while maintaining or improving quality.

Key Features

Smart Model Selection: Advanced prompt analysis routes requests to the most suitable model
Cost Optimization: Automatically uses cheaper models for simple tasks
OpenAI Compatible: Drop-in replacement for OpenAI API
Real-time Analytics: Detailed insights into usage and performance
Enterprise Security: SOC 2 compliant with end-to-end encryption

Supported Models

OpenAI (GPT-4, GPT-3.5-turbo)
Anthropic Claude (Claude-3 Opus, Sonnet, Haiku)
Google Gemini (Gemini Pro, Ultra)
Cohere (Command, Command Light)
And more...

Quick Start #

Get Started in 3 Steps

Replace your existing OpenAI integration with RouteLLM in under 5 minutes.

Sign up at routellm.dev and get your API key
Replace your API endpoint
Start saving costs immediately!

Installation

# Install the OpenAI library (RouteLLM is compatible)
pip install openai

# Or use our enhanced Python SDK
pip install routellm
                    

// Install via npm
npm install openai

// Or use our enhanced JavaScript SDK
npm install @routellm/client
                    

# No installation needed - just use cURL
curl -X POST https://api.routellm.dev/api/v2/recommend/smart \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"
                    

Basic Usage

import openai

# Simply change the base URL and API key
openai.api_base = "https://api.routellm.dev/v1"
openai.api_key = "rllm_your_api_key_here"

# Use exactly like OpenAI API
response = openai.ChatCompletion.create(
    model="smart-route",  # Let RouteLLM choose the best model
    messages=[
        {"role": "user", "content": "Write a Python function to calculate fibonacci numbers"}
    ]
)

print(response.choices[0].message.content)
                    

import OpenAI from 'openai';

// Initialize with RouteLLM endpoint
const openai = new OpenAI({
    baseURL: 'https://api.routellm.dev/v1',
    apiKey: 'rllm_your_api_key_here'
});

// Use exactly like OpenAI API
const completion = await openai.chat.completions.create({
    model: 'smart-route',  // Let RouteLLM choose
    messages: [
        { role: 'user', content: 'Explain quantum computing' }
    ]
});

console.log(completion.choices[0].message.content);
                    

curl -X POST https://api.routellm.dev/v1/chat/completions \
  -H "Authorization: Bearer rllm_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "smart-route",
    "messages": [
      {
        "role": "user", 
        "content": "What is the capital of France?"
      }
    ]
  }'
                    

Authentication #

RouteLLM uses API keys for authentication. You can manage your API keys from the dashboard.

Getting Your API Key

Sign up at routellm.dev
Go to your dashboard
Click "Generate New Key" in the API Keys section
Copy and securely store your API key

Using API Keys

Include your API key in the Authorization header with every request:

curl -X POST https://api.routellm.dev/api/v2/recommend/smart \
  -H "Authorization: Bearer rllm_your_api_key_here" \
  -H "Content-Type: application/json"
                

Security Best Practices:

Never expose API keys in client-side code
Store keys as environment variables
Rotate keys regularly
Use different keys for development and production
Set appropriate expiration dates

Rate Limits #

RouteLLM implements rate limiting to ensure fair usage and system stability. Limits are applied per API key.

Limit Type	Free Tier	Pro Tier	Enterprise
Requests per minute	100	1,000	Custom
Requests per day	1,000	10,000	Unlimited
Monthly spend limit	$10	$100	Custom

Rate Limit Headers

Every API response includes rate limit information in the headers:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1640995200
X-RateLimit-Retry-After: 60
                

Handling Rate Limits

When you exceed rate limits, you'll receive a 429 Too Many Requests response. Implement exponential backoff in your applications:

import time
import random

def api_call_with_retry(func, max_retries=3):
    for attempt in range(max_retries):
        try:
            return func()
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            
            # Exponential backoff with jitter
            delay = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(delay)
                

Smart Routing API #

POST /api/v2/recommend/smart Stable

The smart routing endpoint analyzes your prompt and automatically selects the optimal AI model based on complexity, cost, and performance requirements.

Request Body

Parameter	Type	Required	Description
`prompt`	string	Required	The text prompt to analyze and route
`max_tokens`	integer	Optional	Maximum tokens in the response (default: 150)
`temperature`	number	Optional	Sampling temperature 0-1 (default: 0.7)
`user_preferences`	object	Optional	User preferences for cost vs quality balance

Example Request

{
  "prompt": "Write a Python function to implement a binary search algorithm",
  "max_tokens": 500,
  "temperature": 0.3,
  "user_preferences": {
    "cost_priority": "balanced",
    "quality_threshold": "high"
  }
}
                    

Example Response

{
  "message": "Smart routing completed successfully",
  "status": "success",
  "recommended_model": "gpt-3.5-turbo",
  "estimated_cost": 0.002,
  "routing_reason": "Code generation task with medium complexity",
  "confidence_score": 0.92,
  "alternatives": [
    {
      "model": "gpt-4",
      "cost": 0.06,
      "reason": "Higher quality option for complex code"
    }
  ],
  "classification": {
    "category": "code_generation",
    "complexity": "medium",
    "domain": "programming"
  }
}
                    

Model Discovery #

GET /api/v2/models Stable

Get information about all available AI models, including pricing, capabilities, and performance metrics.

Example Response

{
  "models": [
    {
      "name": "gpt-4",
      "provider": "openai",
      "cost_per_token": 0.00003,
      "context_length": 8192,
      "capabilities": ["text", "code", "reasoning"],
      "performance_score": 0.95
    },
    {
      "name": "claude-3-opus",
      "provider": "anthropic", 
      "cost_per_token": 0.000015,
      "context_length": 200000,
      "capabilities": ["text", "reasoning", "analysis"],
      "performance_score": 0.93
    }
  ],
  "count": 12,
  "last_updated": "2024-01-15T10:30:00Z"
}
                    

Error Handling #

RouteLLM uses standard HTTP status codes and provides detailed error information to help you debug issues.

Common Error Codes

Status Code	Error Type	Description
400	Bad Request	Invalid request format or missing required parameters
401	Unauthorized	Invalid or missing API key
429	Rate Limited	Too many requests - rate limit exceeded
500	Server Error	Internal server error - try again later

Error Response Format

{
  "error": {
    "type": "invalid_request",
    "message": "The prompt parameter is required",
    "code": "missing_parameter",
    "param": "prompt"
  },
  "request_id": "req_12345"
}
                

Best Practices

Always check response status codes
Implement proper error handling and retries
Log request_id for debugging support requests
Use exponential backoff for 429 errors

Monitoring & Analytics #

RouteLLM provides comprehensive monitoring and analytics through the dashboard and API endpoints.

Dashboard Analytics

Visit your dashboard to monitor:

Real-time cost savings
Request volume and success rates
Model usage distribution
Performance metrics and latency
Rate limit consumption

Programmatic Monitoring

GET /api/v1/dashboard/stats

Get your usage statistics and performance metrics programmatically.

Example Response

{
  "totalSavings": 1250.50,
  "totalRequests": 15000,
  "avgLatency": 45,
  "successRate": 99.9,
  "rateLimits": {
    "hourlyUsed": 150,
    "hourlyLimit": 1000,
    "monthlySpent": 25.00,
    "monthlyLimit": 100.00
  }
}
                    

Webhooks

Configure webhooks to receive real-time notifications about your usage:

Rate limit warnings (80% threshold reached)
Monthly spend alerts
API key expiration notices
Service status updates

RouteLLM Documentation

Overview #

Key Features

Supported Models

Quick Start #

Get Started in 3 Steps

Installation

Basic Usage

Authentication #

Getting Your API Key

Using API Keys

Rate Limits #

Rate Limit Headers

Handling Rate Limits

Smart Routing API #

Request Body

Example Request

Example Response

Model Discovery #

Example Response

Error Handling #

Common Error Codes

Error Response Format

Best Practices

Monitoring & Analytics #

Dashboard Analytics

Programmatic Monitoring

Example Response

Webhooks