Overview #

RouteLLM is an intelligent AI model routing service that automatically selects the optimal AI model for each request based on complexity, cost, and performance requirements. By using RouteLLM, you can reduce AI costs by up to 85% while maintaining or improving quality.

Key Features

  • Smart Model Selection: Advanced prompt analysis routes requests to the most suitable model
  • Cost Optimization: Automatically uses cheaper models for simple tasks
  • OpenAI Compatible: Drop-in replacement for OpenAI API
  • Real-time Analytics: Detailed insights into usage and performance
  • Enterprise Security: SOC 2 compliant with end-to-end encryption

Supported Models

  • OpenAI (GPT-4, GPT-3.5-turbo)
  • Anthropic Claude (Claude-3 Opus, Sonnet, Haiku)
  • Google Gemini (Gemini Pro, Ultra)
  • Cohere (Command, Command Light)
  • And more...

Quick Start #

Get Started in 3 Steps

Replace your existing OpenAI integration with RouteLLM in under 5 minutes.

  1. Sign up at routellm.dev and get your API key
  2. Replace your API endpoint
  3. Start saving costs immediately!

Installation

# Install the OpenAI library (RouteLLM is compatible) pip install openai # Or use our enhanced Python SDK pip install routellm
// Install via npm npm install openai // Or use our enhanced JavaScript SDK npm install @routellm/client
# No installation needed - just use cURL curl -X POST https://api.routellm.dev/api/v2/recommend/smart \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json"

Basic Usage

import openai # Simply change the base URL and API key openai.api_base = "https://api.routellm.dev/v1" openai.api_key = "rllm_your_api_key_here" # Use exactly like OpenAI API response = openai.ChatCompletion.create( model="smart-route", # Let RouteLLM choose the best model messages=[ {"role": "user", "content": "Write a Python function to calculate fibonacci numbers"} ] ) print(response.choices[0].message.content)
import OpenAI from 'openai'; // Initialize with RouteLLM endpoint const openai = new OpenAI({ baseURL: 'https://api.routellm.dev/v1', apiKey: 'rllm_your_api_key_here' }); // Use exactly like OpenAI API const completion = await openai.chat.completions.create({ model: 'smart-route', // Let RouteLLM choose messages: [ { role: 'user', content: 'Explain quantum computing' } ] }); console.log(completion.choices[0].message.content);
curl -X POST https://api.routellm.dev/v1/chat/completions \ -H "Authorization: Bearer rllm_your_api_key_here" \ -H "Content-Type: application/json" \ -d '{ "model": "smart-route", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'

Authentication #

RouteLLM uses API keys for authentication. You can manage your API keys from the dashboard.

Getting Your API Key

  1. Sign up at routellm.dev
  2. Go to your dashboard
  3. Click "Generate New Key" in the API Keys section
  4. Copy and securely store your API key

Using API Keys

Include your API key in the Authorization header with every request:

curl -X POST https://api.routellm.dev/api/v2/recommend/smart \ -H "Authorization: Bearer rllm_your_api_key_here" \ -H "Content-Type: application/json"

Security Best Practices:

  • Never expose API keys in client-side code
  • Store keys as environment variables
  • Rotate keys regularly
  • Use different keys for development and production
  • Set appropriate expiration dates

Rate Limits #

RouteLLM implements rate limiting to ensure fair usage and system stability. Limits are applied per API key.

Limit Type Free Tier Pro Tier Enterprise
Requests per minute 100 1,000 Custom
Requests per day 1,000 10,000 Unlimited
Monthly spend limit $10 $100 Custom

Rate Limit Headers

Every API response includes rate limit information in the headers:

X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 999 X-RateLimit-Reset: 1640995200 X-RateLimit-Retry-After: 60

Handling Rate Limits

When you exceed rate limits, you'll receive a 429 Too Many Requests response. Implement exponential backoff in your applications:

import time import random def api_call_with_retry(func, max_retries=3): for attempt in range(max_retries): try: return func() except RateLimitError as e: if attempt == max_retries - 1: raise # Exponential backoff with jitter delay = (2 ** attempt) + random.uniform(0, 1) time.sleep(delay)

Smart Routing API #

POST /api/v2/recommend/smart Stable

The smart routing endpoint analyzes your prompt and automatically selects the optimal AI model based on complexity, cost, and performance requirements.

Request Body

Parameter Type Required Description
prompt string Required The text prompt to analyze and route
max_tokens integer Optional Maximum tokens in the response (default: 150)
temperature number Optional Sampling temperature 0-1 (default: 0.7)
user_preferences object Optional User preferences for cost vs quality balance

Example Request

{ "prompt": "Write a Python function to implement a binary search algorithm", "max_tokens": 500, "temperature": 0.3, "user_preferences": { "cost_priority": "balanced", "quality_threshold": "high" } }

Example Response

{ "message": "Smart routing completed successfully", "status": "success", "recommended_model": "gpt-3.5-turbo", "estimated_cost": 0.002, "routing_reason": "Code generation task with medium complexity", "confidence_score": 0.92, "alternatives": [ { "model": "gpt-4", "cost": 0.06, "reason": "Higher quality option for complex code" } ], "classification": { "category": "code_generation", "complexity": "medium", "domain": "programming" } }

Model Discovery #

GET /api/v2/models Stable

Get information about all available AI models, including pricing, capabilities, and performance metrics.

Example Response

{ "models": [ { "name": "gpt-4", "provider": "openai", "cost_per_token": 0.00003, "context_length": 8192, "capabilities": ["text", "code", "reasoning"], "performance_score": 0.95 }, { "name": "claude-3-opus", "provider": "anthropic", "cost_per_token": 0.000015, "context_length": 200000, "capabilities": ["text", "reasoning", "analysis"], "performance_score": 0.93 } ], "count": 12, "last_updated": "2024-01-15T10:30:00Z" }

Error Handling #

RouteLLM uses standard HTTP status codes and provides detailed error information to help you debug issues.

Common Error Codes

Status Code Error Type Description
400 Bad Request Invalid request format or missing required parameters
401 Unauthorized Invalid or missing API key
429 Rate Limited Too many requests - rate limit exceeded
500 Server Error Internal server error - try again later

Error Response Format

{ "error": { "type": "invalid_request", "message": "The prompt parameter is required", "code": "missing_parameter", "param": "prompt" }, "request_id": "req_12345" }

Best Practices

  • Always check response status codes
  • Implement proper error handling and retries
  • Log request_id for debugging support requests
  • Use exponential backoff for 429 errors

Monitoring & Analytics #

RouteLLM provides comprehensive monitoring and analytics through the dashboard and API endpoints.

Dashboard Analytics

Visit your dashboard to monitor:

  • Real-time cost savings
  • Request volume and success rates
  • Model usage distribution
  • Performance metrics and latency
  • Rate limit consumption

Programmatic Monitoring

GET /api/v1/dashboard/stats

Get your usage statistics and performance metrics programmatically.

Example Response

{ "totalSavings": 1250.50, "totalRequests": 15000, "avgLatency": 45, "successRate": 99.9, "rateLimits": { "hourlyUsed": 150, "hourlyLimit": 1000, "monthlySpent": 25.00, "monthlyLimit": 100.00 } }

Webhooks

Configure webhooks to receive real-time notifications about your usage:

  • Rate limit warnings (80% threshold reached)
  • Monthly spend alerts
  • API key expiration notices
  • Service status updates