LiteLLM Proxy Setup - Alex Sidebar

LiteLLM proxy connects Alex Sidebar to Amazon Bedrock, Google Vertex AI, and other enterprise AI providers. Teams can use their existing cloud infrastructure without changing their security setup.

For iOS Developers: If your company already pays for AWS Bedrock or Google Cloud AI, this guide shows how to use those models in Xcode with Alex Sidebar instead of buying separate API keys.

What is LiteLLM?

LiteLLM is an open-source proxy that translates between the OpenAI API format and 100+ different AI providers. Alex Sidebar can work with enterprise AI services that don’t support the OpenAI API format.

LiteLLM is what Alex Sidebar uses internally for model connections, making it a well-tested solution for enterprise deployments. Current stable version: v1.73.6-stable (June 2025)

Why Use LiteLLM?

Use Your Company's AI

If your company uses AWS Bedrock or Google Cloud AI, LiteLLM lets you access those models through Alex Sidebar

Data Never Leaves Your Infrastructure

Your code stays within your company’s cloud. No data goes to Alex Sidebar servers

Track Costs by Project

See exactly how much each project costs. Set budgets and get alerts

One Interface for All Models

Switch between Claude 4 on Bedrock, Gemini 2.5 on Vertex, or GPT-4 on Azure without changing code

Quick Start

Install LiteLLM

Choose your deployment method:

Option 1: pip install (simplest)

pip install 'litellm[proxy]'
litellm --model bedrock/claude-4-sonnet --port 4000

Option 2: Docker (recommended for production)

docker run -p 4000:4000 ghcr.io/berriai/litellm:v1.73.6-stable

Configure Your Providers

Create a config.yaml file in your LiteLLM directory:

model_list:
  # Amazon Bedrock - Latest Claude 4 Models
  - model_name: "claude-4-sonnet"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
  
  # Google Vertex AI - Latest Gemini 2.5 Models
  - model_name: "gemini-2.5-pro"
    litellm_params:
      model: "vertex_ai/gemini-2.5-pro"
      vertex_project: "your-gcp-project"
      vertex_location: "us-central1"
  
         # OpenAI O-Series with Reasoning
   - model_name: "o4-mini"
     litellm_params:
       model: "o4-mini-2025-04-16"
       api_key: "your-openai-key"
       
   - model_name: "o3-pro"
     litellm_params:
       model: "o3-pro-2025-06-10"
       api_key: "your-openai-key"

# Start proxy with config
# litellm --config config.yaml --port 4000

Connect Alex Sidebar

In Alex Sidebar, add a custom model pointing to your LiteLLM proxy:

Open Settings → Models → Custom Models
Click “Add New Model”
Configure:
- Model ID: Your model name from config.yaml (e.g., claude-4-sonnet)
- Base URL: Your LiteLLM URL + /v1 (e.g., https://litellm.company.com/v1)
- API Key: Your LiteLLM proxy key (if configured)

Provider-Specific Setup

Amazon Bedrock

Ensure your AWS credentials are configured on the LiteLLM server
Enable the models you need in the AWS Bedrock console
Add to your LiteLLM config:

model_list:
  # Latest Claude 4 Models
  - model_name: "claude-4-opus"
    litellm_params:
      model: "bedrock/anthropic.claude-4-opus-20250514-v1:0"
      aws_region_name: "us-east-1"
      
  - model_name: "claude-4-sonnet"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
      
  # Reasoning and Thinking Support
  - model_name: "claude-4-sonnet-reasoning"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
      thinking: true
  
  # Latest Llama 4 Models
  - model_name: "llama4-70b"
    litellm_params:
      model: "bedrock/meta.llama4-70b-instruct-v1:0"
      aws_region_name: "us-east-1"
      
  # DeepSeek R1 Models
  - model_name: "deepseek-r1"
    litellm_params:
      model: "bedrock/deepseek.deepseek-r1-distill-llama-70b"
      aws_region_name: "us-east-1"

Ensure your AWS credentials are configured on the LiteLLM server
Enable the models you need in the AWS Bedrock console
Add to your LiteLLM config:

model_list:
  # Latest Claude 4 Models
  - model_name: "claude-4-opus"
    litellm_params:
      model: "bedrock/anthropic.claude-4-opus-20250514-v1:0"
      aws_region_name: "us-east-1"
      
  - model_name: "claude-4-sonnet"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
      
  # Reasoning and Thinking Support
  - model_name: "claude-4-sonnet-reasoning"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
      thinking: true
  
  # Latest Llama 4 Models
  - model_name: "llama4-70b"
    litellm_params:
      model: "bedrock/meta.llama4-70b-instruct-v1:0"
      aws_region_name: "us-east-1"
      
  # DeepSeek R1 Models
  - model_name: "deepseek-r1"
    litellm_params:
      model: "bedrock/deepseek.deepseek-r1-distill-llama-70b"
      aws_region_name: "us-east-1"

LiteLLM supports multiple AWS authentication methods:

IAM roles (recommended for EC2/ECS)
Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
AWS profiles
Temporary credentials with STS
Workload Identity Federation (for cross-cloud deployments)

Google Vertex AI

Enable the Vertex AI API in your GCP project
Set up authentication (service account recommended)
Add to your LiteLLM config:

model_list:
  # Latest Gemini 2.5 Models
  - model_name: "gemini-2.5-pro"
    litellm_params:
      model: "vertex_ai/gemini-2.5-pro"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
      
  - model_name: "gemini-2.5-flash"
    litellm_params:
      model: "vertex_ai/gemini-2.5-flash"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
      
  # Claude 4 on Vertex AI
  - model_name: "claude-4-vertex"
    litellm_params:
      model: "vertex_ai/claude-4-opus"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
  
  # Multimodal Image Generation
  - model_name: "imagen-4"
    litellm_params:
      model: "vertex_ai/imagen-4"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"

Enable the Vertex AI API in your GCP project
Set up authentication (service account recommended)
Add to your LiteLLM config:

model_list:
  # Latest Gemini 2.5 Models
  - model_name: "gemini-2.5-pro"
    litellm_params:
      model: "vertex_ai/gemini-2.5-pro"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
      
  - model_name: "gemini-2.5-flash"
    litellm_params:
      model: "vertex_ai/gemini-2.5-flash"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
      
  # Claude 4 on Vertex AI
  - model_name: "claude-4-vertex"
    litellm_params:
      model: "vertex_ai/claude-4-opus"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
  
  # Multimodal Image Generation
  - model_name: "imagen-4"
    litellm_params:
      model: "vertex_ai/imagen-4"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"

Set up authentication using one of these methods:

Service account JSON file: export GOOGLE_APPLICATION_CREDENTIALS=path/to/key.json
Workload Identity (for GKE)
Default application credentials
Authorized user credentials

Azure OpenAI

Configure Azure OpenAI with latest models:

model_list:
# Latest O-Series Models
- model_name: "o4-mini"
litellm_params:
  model: "azure/o4-mini-2025-04-16"
  api_base: "https://your-resource.openai.azure.com"
  api_key: "your-azure-key"
  api_version: "2025-01-01-preview"
  
- model_name: "o3-pro"
litellm_params:
  model: "azure/o3-pro-2025-06-10"
  api_base: "https://your-resource.openai.azure.com"
  api_key: "your-azure-key"
  api_version: "2025-01-01-preview"
  
# GPT-4 with Audio Preview
- model_name: "gpt-4o-audio"
litellm_params:
  model: "azure/gpt-4o-audio-preview-2025-06-03"
  api_base: "https://your-resource.openai.azure.com"
  api_key: "your-azure-key"
  api_version: "2025-01-01-preview"

Configure Azure OpenAI with latest models:

model_list:
# Latest O-Series Models
- model_name: "o4-mini"
litellm_params:
  model: "azure/o4-mini-2025-04-16"
  api_base: "https://your-resource.openai.azure.com"
  api_key: "your-azure-key"
  api_version: "2025-01-01-preview"
  
- model_name: "o3-pro"
litellm_params:
  model: "azure/o3-pro-2025-06-10"
  api_base: "https://your-resource.openai.azure.com"
  api_key: "your-azure-key"
  api_version: "2025-01-01-preview"
  
# GPT-4 with Audio Preview
- model_name: "gpt-4o-audio"
litellm_params:
  model: "azure/gpt-4o-audio-preview-2025-06-03"
  api_base: "https://your-resource.openai.azure.com"
  api_key: "your-azure-key"
  api_version: "2025-01-01-preview"

Azure supports multiple authentication methods:

API keys
Azure AD tokens
Managed Identity
Certificate-based authentication

Advanced Features

Reasoning and Thinking Capabilities

Enable advanced reasoning for supported models:

model_list:
  - model_name: "claude-4-reasoning"
    litellm_params:
      model: "anthropic/claude-4-sonnet-20250514"
      thinking: true
      
  - model_name: "o3-pro-reasoning"
    litellm_params:
      model: "o3-pro-2025-06-10"
      reasoning_effort: "high"
      
  - model_name: "o4-mini-reasoning"
    litellm_params:
      model: "o4-mini-2025-04-16"
      reasoning_effort: "medium"

Multimodal Support

Configure models for text, image, audio, and video:

model_list:
  # Vision + Audio Models
  - model_name: "gpt-4o-multimodal"
    litellm_params:
      model: "gpt-4o"
      supports_vision: true
      supports_audio: true
      
  # Gemini with Enhanced Multimodal
  - model_name: "gemini-2.5-multimodal"
    litellm_params:
      model: "vertex_ai/gemini-2.5-pro"
      supports_vision: true
      supports_pdf_input: true
      supports_audio: true

MCP Gateway Integration

Enable Model Context Protocol for enhanced tool use:

general_settings:
  enable_mcp_gateway: true
  
mcp_servers:
  - server_name: "filesystem"
    server_command: ["uvx", "mcp-server-filesystem", "/path/to/allowed/files"]
    
  - server_name: "jira"
    server_command: ["node", "/path/to/jira-mcp-server"]
    auth_type: "api_key"
    auth_value: "your-jira-api-key"

Team Configuration

For team accounts, you can override all Alex Sidebar model endpoints:

Go to Alex Sidebar Admin Portal
Navigate to Models tab
Add your LiteLLM proxy URL as Base URL for each model type
All team members automatically use your proxy

All AI requests from your team go through your infrastructure. You control the data and costs.

See the Team Configuration Guide for detailed instructions on managing team models.

Advanced Configuration

Load Balancing with Fallbacks

Distribute requests across multiple model deployments with intelligent routing:

model_list:
  - model_name: "claude-4-primary"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
  
  - model_name: "claude-4-fallback"
    litellm_params:
      model: "anthropic/claude-4-sonnet-20250514"
      api_key: "fallback-key"

router_settings:
  routing_strategy: "least-busy"  # or "round-robin", "weighted-round-robin"
  model_group_alias: "claude-4"
  fallbacks: [{"claude-4-primary": ["claude-4-fallback"]}]
  cooldown_time: 60  # seconds before retrying failed deployment

Cost Tracking and Budget Management

Enable comprehensive cost tracking:

general_settings:
  master_key: "your-secret-key"
  database_url: "postgresql://user:pass@localhost:5432/litellm"
  
# Budget controls
litellm_settings:
  max_budget: 1000  # Monthly budget in USD
  budget_duration: "monthly"  # daily, weekly, monthly
  success_callback: ["langfuse", "prometheus"]
  track_cost_callback: true
  
# User-level budgets
user_api_key_config:
  user1:
    budget_duration: "monthly"
    max_budget: 100

Security and Rate Limiting

Secure your LiteLLM deployment with advanced controls:

general_settings:
  master_key: "sk-your-secret-key"
  
  # Enhanced security
  allowed_ips: ["10.0.0.0/8", "172.16.0.0/12"]
  disable_spend_logs: false
  guardrails: ["presidio_pii", "bedrock_guardrails"]
  
  # Advanced rate limiting
  max_parallel_requests: 1000
  max_request_per_minute: 10000
  rate_limiting_strategy: "sliding-window"  # New accurate rate limiting
  
# SCIM integration for enterprise SSO
scim_settings:
  enabled: true
  scim_base_url: "https://your-litellm.com/scim/v2"

Vector Store Integration

Connect to knowledge bases and vector stores:

vector_stores:
  - store_name: "company_docs"
    store_type: "bedrock_knowledge_base"
    knowledge_base_id: "your-kb-id"
    aws_region: "us-east-1"
    
  - store_name: "technical_docs"
    store_type: "pinecone"
    api_key: "your-pinecone-key"
    environment: "your-environment"

# Auto-activate for specific models
model_list:
  - model_name: "claude-4-with-kb"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      vector_store: "company_docs"

Monitoring & Observability

LiteLLM v1.73.6 provides enhanced monitoring capabilities:

Performance Metrics

2x Higher RPS: Enhanced aiohttp transport for improved performance
50ms Median Latency: Optimized for high-throughput applications
Multi-instance Rate Limiting: Accurate rate limiting across deployments

Dashboard Features

general_settings:
  database_url: "postgresql://user:pass@localhost:5432/litellm"
  ui_access_mode: "admin_only"  # or "all"
  
# Enhanced logging
litellm_settings:
  success_callback: ["langfuse", "prometheus", "datadog"]
  failure_callback: ["slack", "pagerduty"]
  
# Session tracking
session_config:
  enable_session_logs: true
  session_retention_days: 30

Real-time Monitoring

# Prometheus metrics
prometheus_settings:
  enabled: true
  track_end_users: false  # Opt-in to prevent large metric sets
  
# Health checks
health_check:
  enabled: true
  check_interval: 300  # seconds
  models_to_check: ["claude-4-sonnet", "gemini-2.5-pro", "o4-mini-2025-04-16", "o3-pro-2025-06-10"]

Troubleshooting

Connection refused error

Authentication errors

Model not found or deprecated

High latency or rate limiting

Cost tracking issues

Common Use Cases for iOS Teams

Scenario 1: Company Uses AWS with Latest Models

Your company has AWS Bedrock with Claude 4 models. Instead of buying Anthropic API keys:

Deploy LiteLLM v1.73.6-stable on an EC2 instance
Configure it to use your Bedrock Claude 4 models with reasoning capabilities
Developers connect Alex Sidebar to your LiteLLM endpoint
All costs go to your AWS bill with detailed tracking

Scenario 2: Multi-Cloud Model Testing

Test latest models across providers without changing code:

model_list:
  - model_name: "best-reasoning"
    litellm_params:
      model: "o3-pro-2025-06-10"  # OpenAI's latest reasoning model
      reasoning_effort: "high"
      
  - model_name: "best-multimodal"
    litellm_params:
      model: "vertex_ai/gemini-2.5-pro"  # Google's latest multimodal
      
  - model_name: "best-coding"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"  # Claude 4 for coding

Scenario 3: Development vs Production with New Models

# Dev environment - use efficient models
- model_name: "dev-model"
  litellm_params:
    model: "bedrock/anthropic.claude-3-haiku-20240307-v1:0"
    max_tokens: 1000

# Staging - test new capabilities
- model_name: "staging-model"
  litellm_params:
    model: "vertex_ai/gemini-2.5-flash"

# Production - use most capable models  
- model_name: "prod-model"
  litellm_params:
    model: "bedrock/anthropic.claude-4-opus-20250514-v1:0"
    thinking: true

Scenario 4: Enterprise Security and Compliance

general_settings:
  master_key: "sk-your-secure-key"
  guardrails: ["presidio_pii", "bedrock_guardrails"]
  
# PII masking and content filtering
guardrail_settings:
  presidio:
    mask_entities: ["PERSON", "EMAIL", "PHONE_NUMBER"]
    block_entities: ["MEDICAL_LICENSE"]
  
  bedrock_guardrails:
    guardrail_id: "your-guardrail-id"
    guardrail_version: "1"

Enterprise Features (LiteLLM v1.73.6+)

SCIM Integration

Automatic user provisioning from your identity provider:

Okta, Azure AD, OneLogin support
Automatic team creation and user assignment
Deprovisioning when users are removed

Advanced Analytics

Team and tag-based usage tracking
Daily spend analysis by model and user
Session grouping and analysis
Audit logs for compliance

Enhanced Security

Vector store permissions by team/user
MCP server access controls
IP allowlisting and rate limiting
End-to-end encryption options

Next Steps

Review LiteLLM’s official documentation for detailed configuration options
Check the Proxy UI to monitor costs and usage with the new dashboard
Explore Vector Store integration for RAG applications
Join the Alex Sidebar Discord for help with enterprise setups
Contact [email protected] for business support and enterprise features

LiteLLM v1.73.6-stable gives you control over your AI infrastructure with the latest models and enterprise-grade features, working seamlessly with all Alex Sidebar capabilities.

Team Configuration Overview

On this page

What is LiteLLM?
Why Use LiteLLM?
Quick Start
Provider-Specific Setup
Amazon Bedrock
Google Vertex AI
Azure OpenAI
Advanced Features
Reasoning and Thinking Capabilities
Multimodal Support
MCP Gateway Integration
Team Configuration
Advanced Configuration
Load Balancing with Fallbacks
Cost Tracking and Budget Management
Security and Rate Limiting
Vector Store Integration
Monitoring & Observability
Performance Metrics
Dashboard Features
Real-time Monitoring
Troubleshooting
Common Use Cases for iOS Teams
Scenario 1: Company Uses AWS with Latest Models
Scenario 2: Multi-Cloud Model Testing
Scenario 3: Development vs Production with New Models
Scenario 4: Enterprise Security and Compliance
Enterprise Features (LiteLLM v1.73.6+)
SCIM Integration
Advanced Analytics
Enhanced Security
Next Steps

Alex Sidebar

New Additions

API Keys

Chat

Working with Completions

Configuration

Window Settings

Support

​What is LiteLLM?

​Why Use LiteLLM?

Use Your Company's AI

Data Never Leaves Your Infrastructure

Track Costs by Project

One Interface for All Models

​Quick Start

​Provider-Specific Setup

​Amazon Bedrock

​Google Vertex AI

​Azure OpenAI

​Advanced Features

​Reasoning and Thinking Capabilities

​Multimodal Support

​MCP Gateway Integration

​Team Configuration

​Advanced Configuration

​Load Balancing with Fallbacks

​Cost Tracking and Budget Management

​Security and Rate Limiting

​Vector Store Integration

​Monitoring & Observability

​Performance Metrics

​Dashboard Features

​Real-time Monitoring

​Troubleshooting

​Common Use Cases for iOS Teams

​Scenario 1: Company Uses AWS with Latest Models

​Scenario 2: Multi-Cloud Model Testing

​Scenario 3: Development vs Production with New Models

​Scenario 4: Enterprise Security and Compliance

​Enterprise Features (LiteLLM v1.73.6+)

​SCIM Integration

​Advanced Analytics

​Enhanced Security

​Next Steps

What is LiteLLM?

Why Use LiteLLM?

Quick Start

Provider-Specific Setup

Amazon Bedrock

Google Vertex AI

Azure OpenAI

Advanced Features

Reasoning and Thinking Capabilities

Multimodal Support

MCP Gateway Integration

Team Configuration

Advanced Configuration

Load Balancing with Fallbacks

Cost Tracking and Budget Management

Security and Rate Limiting

Vector Store Integration

Monitoring & Observability

Performance Metrics

Dashboard Features

Real-time Monitoring

Troubleshooting

Common Use Cases for iOS Teams

Scenario 1: Company Uses AWS with Latest Models

Scenario 2: Multi-Cloud Model Testing

Scenario 3: Development vs Production with New Models

Scenario 4: Enterprise Security and Compliance

Enterprise Features (LiteLLM v1.73.6+)

SCIM Integration

Advanced Analytics

Enhanced Security

Next Steps