LiteLLM proxy connects Alex Sidebar to Amazon Bedrock, Google Vertex AI, and other enterprise AI providers. Teams can use their existing cloud infrastructure without changing their security setup.

For iOS Developers: If your company already pays for AWS Bedrock or Google Cloud AI, this guide shows how to use those models in Xcode with Alex Sidebar instead of buying separate API keys.

What is LiteLLM?

LiteLLM is an open-source proxy that translates between the OpenAI API format and 100+ different AI providers. Alex Sidebar can work with enterprise AI services that don’t support the OpenAI API format.

LiteLLM is what Alex Sidebar uses internally for model connections, making it a well-tested solution for enterprise deployments. Current stable version: v1.73.6-stable (June 2025)

Why Use LiteLLM?

Use Your Company's AI

If your company uses AWS Bedrock or Google Cloud AI, LiteLLM lets you access those models through Alex Sidebar

Data Never Leaves Your Infrastructure

Your code stays within your company’s cloud. No data goes to Alex Sidebar servers

Track Costs by Project

See exactly how much each project costs. Set budgets and get alerts

One Interface for All Models

Switch between Claude 4 on Bedrock, Gemini 2.5 on Vertex, or GPT-4 on Azure without changing code

Quick Start

1

Install LiteLLM

Choose your deployment method:

Option 1: pip install (simplest)

pip install 'litellm[proxy]'
litellm --model bedrock/claude-4-sonnet --port 4000

Option 2: Docker (recommended for production)

docker run -p 4000:4000 ghcr.io/berriai/litellm:v1.73.6-stable
2

Configure Your Providers

Create a config.yaml file in your LiteLLM directory:

model_list:
  # Amazon Bedrock - Latest Claude 4 Models
  - model_name: "claude-4-sonnet"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
  
  # Google Vertex AI - Latest Gemini 2.5 Models
  - model_name: "gemini-2.5-pro"
    litellm_params:
      model: "vertex_ai/gemini-2.5-pro"
      vertex_project: "your-gcp-project"
      vertex_location: "us-central1"
  
         # OpenAI O-Series with Reasoning
   - model_name: "o4-mini"
     litellm_params:
       model: "o4-mini-2025-04-16"
       api_key: "your-openai-key"
       
   - model_name: "o3-pro"
     litellm_params:
       model: "o3-pro-2025-06-10"
       api_key: "your-openai-key"

# Start proxy with config
# litellm --config config.yaml --port 4000
3

Connect Alex Sidebar

In Alex Sidebar, add a custom model pointing to your LiteLLM proxy:

  1. Open Settings → Models → Custom Models
  2. Click “Add New Model”
  3. Configure:
    • Model ID: Your model name from config.yaml (e.g., claude-4-sonnet)
    • Base URL: Your LiteLLM URL + /v1 (e.g., https://litellm.company.com/v1)
    • API Key: Your LiteLLM proxy key (if configured)

Provider-Specific Setup

Amazon Bedrock

  1. Ensure your AWS credentials are configured on the LiteLLM server
  2. Enable the models you need in the AWS Bedrock console
  3. Add to your LiteLLM config:
model_list:
  # Latest Claude 4 Models
  - model_name: "claude-4-opus"
    litellm_params:
      model: "bedrock/anthropic.claude-4-opus-20250514-v1:0"
      aws_region_name: "us-east-1"
      
  - model_name: "claude-4-sonnet"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
      
  # Reasoning and Thinking Support
  - model_name: "claude-4-sonnet-reasoning"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
      thinking: true
  
  # Latest Llama 4 Models
  - model_name: "llama4-70b"
    litellm_params:
      model: "bedrock/meta.llama4-70b-instruct-v1:0"
      aws_region_name: "us-east-1"
      
  # DeepSeek R1 Models
  - model_name: "deepseek-r1"
    litellm_params:
      model: "bedrock/deepseek.deepseek-r1-distill-llama-70b"
      aws_region_name: "us-east-1"

Google Vertex AI

  1. Enable the Vertex AI API in your GCP project
  2. Set up authentication (service account recommended)
  3. Add to your LiteLLM config:
model_list:
  # Latest Gemini 2.5 Models
  - model_name: "gemini-2.5-pro"
    litellm_params:
      model: "vertex_ai/gemini-2.5-pro"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
      
  - model_name: "gemini-2.5-flash"
    litellm_params:
      model: "vertex_ai/gemini-2.5-flash"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
      
  # Claude 4 on Vertex AI
  - model_name: "claude-4-vertex"
    litellm_params:
      model: "vertex_ai/claude-4-opus"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
  
  # Multimodal Image Generation
  - model_name: "imagen-4"
    litellm_params:
      model: "vertex_ai/imagen-4"
      vertex_project: "your-project-id"
      vertex_location: "us-central1"

Azure OpenAI

Configure Azure OpenAI with latest models:

model_list:
# Latest O-Series Models
- model_name: "o4-mini"
litellm_params:
  model: "azure/o4-mini-2025-04-16"
  api_base: "https://your-resource.openai.azure.com"
  api_key: "your-azure-key"
  api_version: "2025-01-01-preview"
  
- model_name: "o3-pro"
litellm_params:
  model: "azure/o3-pro-2025-06-10"
  api_base: "https://your-resource.openai.azure.com"
  api_key: "your-azure-key"
  api_version: "2025-01-01-preview"
  
# GPT-4 with Audio Preview
- model_name: "gpt-4o-audio"
litellm_params:
  model: "azure/gpt-4o-audio-preview-2025-06-03"
  api_base: "https://your-resource.openai.azure.com"
  api_key: "your-azure-key"
  api_version: "2025-01-01-preview"

Advanced Features

Reasoning and Thinking Capabilities

Enable advanced reasoning for supported models:

model_list:
  - model_name: "claude-4-reasoning"
    litellm_params:
      model: "anthropic/claude-4-sonnet-20250514"
      thinking: true
      
  - model_name: "o3-pro-reasoning"
    litellm_params:
      model: "o3-pro-2025-06-10"
      reasoning_effort: "high"
      
  - model_name: "o4-mini-reasoning"
    litellm_params:
      model: "o4-mini-2025-04-16"
      reasoning_effort: "medium"

Multimodal Support

Configure models for text, image, audio, and video:

model_list:
  # Vision + Audio Models
  - model_name: "gpt-4o-multimodal"
    litellm_params:
      model: "gpt-4o"
      supports_vision: true
      supports_audio: true
      
  # Gemini with Enhanced Multimodal
  - model_name: "gemini-2.5-multimodal"
    litellm_params:
      model: "vertex_ai/gemini-2.5-pro"
      supports_vision: true
      supports_pdf_input: true
      supports_audio: true

MCP Gateway Integration

Enable Model Context Protocol for enhanced tool use:

general_settings:
  enable_mcp_gateway: true
  
mcp_servers:
  - server_name: "filesystem"
    server_command: ["uvx", "mcp-server-filesystem", "/path/to/allowed/files"]
    
  - server_name: "jira"
    server_command: ["node", "/path/to/jira-mcp-server"]
    auth_type: "api_key"
    auth_value: "your-jira-api-key"

Team Configuration

For team accounts, you can override all Alex Sidebar model endpoints:

  1. Go to Alex Sidebar Admin Portal
  2. Navigate to Models tab
  3. Add your LiteLLM proxy URL as Base URL for each model type
  4. All team members automatically use your proxy

All AI requests from your team go through your infrastructure. You control the data and costs.

See the Team Configuration Guide for detailed instructions on managing team models.

Advanced Configuration

Load Balancing with Fallbacks

Distribute requests across multiple model deployments with intelligent routing:

model_list:
  - model_name: "claude-4-primary"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      aws_region_name: "us-east-1"
  
  - model_name: "claude-4-fallback"
    litellm_params:
      model: "anthropic/claude-4-sonnet-20250514"
      api_key: "fallback-key"

router_settings:
  routing_strategy: "least-busy"  # or "round-robin", "weighted-round-robin"
  model_group_alias: "claude-4"
  fallbacks: [{"claude-4-primary": ["claude-4-fallback"]}]
  cooldown_time: 60  # seconds before retrying failed deployment

Cost Tracking and Budget Management

Enable comprehensive cost tracking:

general_settings:
  master_key: "your-secret-key"
  database_url: "postgresql://user:pass@localhost:5432/litellm"
  
# Budget controls
litellm_settings:
  max_budget: 1000  # Monthly budget in USD
  budget_duration: "monthly"  # daily, weekly, monthly
  success_callback: ["langfuse", "prometheus"]
  track_cost_callback: true
  
# User-level budgets
user_api_key_config:
  user1:
    budget_duration: "monthly"
    max_budget: 100

Security and Rate Limiting

Secure your LiteLLM deployment with advanced controls:

general_settings:
  master_key: "sk-your-secret-key"
  
  # Enhanced security
  allowed_ips: ["10.0.0.0/8", "172.16.0.0/12"]
  disable_spend_logs: false
  guardrails: ["presidio_pii", "bedrock_guardrails"]
  
  # Advanced rate limiting
  max_parallel_requests: 1000
  max_request_per_minute: 10000
  rate_limiting_strategy: "sliding-window"  # New accurate rate limiting
  
# SCIM integration for enterprise SSO
scim_settings:
  enabled: true
  scim_base_url: "https://your-litellm.com/scim/v2"

Vector Store Integration

Connect to knowledge bases and vector stores:

vector_stores:
  - store_name: "company_docs"
    store_type: "bedrock_knowledge_base"
    knowledge_base_id: "your-kb-id"
    aws_region: "us-east-1"
    
  - store_name: "technical_docs"
    store_type: "pinecone"
    api_key: "your-pinecone-key"
    environment: "your-environment"

# Auto-activate for specific models
model_list:
  - model_name: "claude-4-with-kb"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"
      vector_store: "company_docs"

Monitoring & Observability

LiteLLM v1.73.6 provides enhanced monitoring capabilities:

Performance Metrics

  • 2x Higher RPS: Enhanced aiohttp transport for improved performance
  • 50ms Median Latency: Optimized for high-throughput applications
  • Multi-instance Rate Limiting: Accurate rate limiting across deployments

Dashboard Features

general_settings:
  database_url: "postgresql://user:pass@localhost:5432/litellm"
  ui_access_mode: "admin_only"  # or "all"
  
# Enhanced logging
litellm_settings:
  success_callback: ["langfuse", "prometheus", "datadog"]
  failure_callback: ["slack", "pagerduty"]
  
# Session tracking
session_config:
  enable_session_logs: true
  session_retention_days: 30

Real-time Monitoring

# Prometheus metrics
prometheus_settings:
  enabled: true
  track_end_users: false  # Opt-in to prevent large metric sets
  
# Health checks
health_check:
  enabled: true
  check_interval: 300  # seconds
  models_to_check: ["claude-4-sonnet", "gemini-2.5-pro", "o4-mini-2025-04-16", "o3-pro-2025-06-10"]

Troubleshooting

Common Use Cases for iOS Teams

Scenario 1: Company Uses AWS with Latest Models

Your company has AWS Bedrock with Claude 4 models. Instead of buying Anthropic API keys:

  1. Deploy LiteLLM v1.73.6-stable on an EC2 instance
  2. Configure it to use your Bedrock Claude 4 models with reasoning capabilities
  3. Developers connect Alex Sidebar to your LiteLLM endpoint
  4. All costs go to your AWS bill with detailed tracking

Scenario 2: Multi-Cloud Model Testing

Test latest models across providers without changing code:

model_list:
  - model_name: "best-reasoning"
    litellm_params:
      model: "o3-pro-2025-06-10"  # OpenAI's latest reasoning model
      reasoning_effort: "high"
      
  - model_name: "best-multimodal"
    litellm_params:
      model: "vertex_ai/gemini-2.5-pro"  # Google's latest multimodal
      
  - model_name: "best-coding"
    litellm_params:
      model: "bedrock/anthropic.claude-4-sonnet-20250514-v1:0"  # Claude 4 for coding

Scenario 3: Development vs Production with New Models

# Dev environment - use efficient models
- model_name: "dev-model"
  litellm_params:
    model: "bedrock/anthropic.claude-3-haiku-20240307-v1:0"
    max_tokens: 1000

# Staging - test new capabilities
- model_name: "staging-model"
  litellm_params:
    model: "vertex_ai/gemini-2.5-flash"

# Production - use most capable models  
- model_name: "prod-model"
  litellm_params:
    model: "bedrock/anthropic.claude-4-opus-20250514-v1:0"
    thinking: true

Scenario 4: Enterprise Security and Compliance

general_settings:
  master_key: "sk-your-secure-key"
  guardrails: ["presidio_pii", "bedrock_guardrails"]
  
# PII masking and content filtering
guardrail_settings:
  presidio:
    mask_entities: ["PERSON", "EMAIL", "PHONE_NUMBER"]
    block_entities: ["MEDICAL_LICENSE"]
  
  bedrock_guardrails:
    guardrail_id: "your-guardrail-id"
    guardrail_version: "1"

Enterprise Features (LiteLLM v1.73.6+)

SCIM Integration

Automatic user provisioning from your identity provider:

  • Okta, Azure AD, OneLogin support
  • Automatic team creation and user assignment
  • Deprovisioning when users are removed

Advanced Analytics

  • Team and tag-based usage tracking
  • Daily spend analysis by model and user
  • Session grouping and analysis
  • Audit logs for compliance

Enhanced Security

  • Vector store permissions by team/user
  • MCP server access controls
  • IP allowlisting and rate limiting
  • End-to-end encryption options

Next Steps

LiteLLM v1.73.6-stable gives you control over your AI infrastructure with the latest models and enterprise-grade features, working seamlessly with all Alex Sidebar capabilities.