RouteLLM Documentation
Complete guide to integrating with RouteLLM - the intelligent AI model routing platform that saves costs and improves performance.
Overview #
RouteLLM is an intelligent AI model routing service that automatically selects the optimal AI model for each request based on complexity, cost, and performance requirements. By using RouteLLM, you can reduce AI costs by up to 85% while maintaining or improving quality.
Key Features
- Smart Model Selection: Advanced prompt analysis routes requests to the most suitable model
- Cost Optimization: Automatically uses cheaper models for simple tasks
- OpenAI Compatible: Drop-in replacement for OpenAI API
- Real-time Analytics: Detailed insights into usage and performance
- Enterprise Security: SOC 2 compliant with end-to-end encryption
Supported Models
- OpenAI (GPT-4, GPT-3.5-turbo)
- Anthropic Claude (Claude-3 Opus, Sonnet, Haiku)
- Google Gemini (Gemini Pro, Ultra)
- Cohere (Command, Command Light)
- And more...
Quick Start #
Get Started in 3 Steps
Replace your existing OpenAI integration with RouteLLM in under 5 minutes.
- Sign up at routellm.dev and get your API key
- Replace your API endpoint
- Start saving costs immediately!
Installation
Basic Usage
Authentication #
RouteLLM uses API keys for authentication. You can manage your API keys from the dashboard.
Getting Your API Key
- Sign up at routellm.dev
- Go to your dashboard
- Click "Generate New Key" in the API Keys section
- Copy and securely store your API key
Using API Keys
Include your API key in the Authorization
header with every request:
Security Best Practices:
- Never expose API keys in client-side code
- Store keys as environment variables
- Rotate keys regularly
- Use different keys for development and production
- Set appropriate expiration dates
Rate Limits #
RouteLLM implements rate limiting to ensure fair usage and system stability. Limits are applied per API key.
Limit Type | Free Tier | Pro Tier | Enterprise |
---|---|---|---|
Requests per minute | 100 | 1,000 | Custom |
Requests per day | 1,000 | 10,000 | Unlimited |
Monthly spend limit | $10 | $100 | Custom |
Rate Limit Headers
Every API response includes rate limit information in the headers:
Handling Rate Limits
When you exceed rate limits, you'll receive a 429 Too Many Requests
response. Implement exponential backoff in your applications:
Smart Routing API #
The smart routing endpoint analyzes your prompt and automatically selects the optimal AI model based on complexity, cost, and performance requirements.
Request Body
Parameter | Type | Required | Description |
---|---|---|---|
prompt |
string | Required | The text prompt to analyze and route |
max_tokens |
integer | Optional | Maximum tokens in the response (default: 150) |
temperature |
number | Optional | Sampling temperature 0-1 (default: 0.7) |
user_preferences |
object | Optional | User preferences for cost vs quality balance |
Example Request
Example Response
Model Discovery #
Get information about all available AI models, including pricing, capabilities, and performance metrics.
Example Response
Error Handling #
RouteLLM uses standard HTTP status codes and provides detailed error information to help you debug issues.
Common Error Codes
Status Code | Error Type | Description |
---|---|---|
400 | Bad Request | Invalid request format or missing required parameters |
401 | Unauthorized | Invalid or missing API key |
429 | Rate Limited | Too many requests - rate limit exceeded |
500 | Server Error | Internal server error - try again later |
Error Response Format
Best Practices
- Always check response status codes
- Implement proper error handling and retries
- Log request_id for debugging support requests
- Use exponential backoff for 429 errors
Monitoring & Analytics #
RouteLLM provides comprehensive monitoring and analytics through the dashboard and API endpoints.
Dashboard Analytics
Visit your dashboard to monitor:
- Real-time cost savings
- Request volume and success rates
- Model usage distribution
- Performance metrics and latency
- Rate limit consumption
Programmatic Monitoring
Get your usage statistics and performance metrics programmatically.
Example Response
Webhooks
Configure webhooks to receive real-time notifications about your usage:
- Rate limit warnings (80% threshold reached)
- Monthly spend alerts
- API key expiration notices
- Service status updates