Introduction
R2R’sLLMProvider
supports multiple third-party Language Model (LLM) providers, offering flexibility in choosing and switching between different models based on your specific requirements. This guide provides an in-depth look at configuring and using various LLM providers within the R2R framework.
Architecture Overview
R2R’s LLM system is built on a flexible provider model:- LLM Provider: An abstract base class that defines the common interface for all LLM providers.
- Specific LLM Providers: Concrete implementations for different LLM services (e.g., OpenAI, LiteLLM).
Providers
LiteLLM Provider (Default)
The defaultLiteLLMProvider
offers a unified interface for multiple LLM services.
Key features:
- Support for OpenAI, Anthropic, Vertex AI, HuggingFace, Azure OpenAI, Ollama, Together AI, and Openrouter
- Consistent API across different LLM providers
- Easy switching between models
OpenAI Provider
TheOpenAILLM
class provides direct integration with OpenAI’s models.
Key features:
- Direct access to OpenAI’s API
- Support for the latest OpenAI models
- Fine-grained control over model parameters
Local Models
Support for running models locally using Ollama or other local inference engines, through LiteLLM. Key features:- Privacy-preserving local inference
- Customizable model selection
- Reduced latency for certain use cases
Configuration
LLM Configuration
Update thecompletions
section in your r2r.toml
file:
The provided
generation_config
is used to establish the default generation parameters for your deployment. These settings can be overridden at runtime, offering flexibility in your application. You can adjust parameters:- At the application level, by modifying the R2R configuration
- For individual requests, by passing custom parameters to the
rag
orget_completion
methods - Through API calls, by including specific parameters in your request payload
Security Best Practices
- API Key Management: Use environment variables or secure key management solutions for API keys.
- Rate Limiting: Implement rate limiting to prevent abuse of LLM endpoints.
- Input Validation: Sanitize and validate all inputs before passing them to LLMs.
- Output Filtering: Implement content filtering for LLM outputs to prevent inappropriate content.
- Monitoring: Regularly monitor LLM usage and outputs for anomalies or misuse.
Custom LLM Providers in R2R
LLM Provider Structure
The LLM system in R2R is built on two main components:LLMConfig
: A configuration class for LLM providers.LLMProvider
: An abstract base class that defines the interface for all LLM providers.
LLMConfig
TheLLMConfig
class is used to configure LLM providers:
LLMProvider
TheLLMProvider
is an abstract base class that defines the common interface for all LLM providers:
Creating a Custom LLM Provider
To create a custom LLM provider, follow these steps:- Create a new class that inherits from
LLMProvider
. - Implement the required methods:
get_completion
andget_completion_stream
. - (Optional) Add any additional methods or attributes specific to your provider.
Registering and Using the Custom Provider
To use your custom LLM provider in R2R:- Update the
LLMConfig
class to include your custom provider:
- Update your R2R configuration to use the custom provider:
- In your R2R application, register the custom provider:
Prompt Engineering
R2R supports advanced prompt engineering techniques:- Template Management: Create and manage reusable prompt templates.
- Dynamic Prompts: Generate prompts dynamically based on context or user input.
- Few-shot Learning: Incorporate examples in your prompts for better results.
Troubleshooting
Common issues and solutions:- API Key Errors: Ensure your API keys are correctly set and have the necessary permissions.
- Rate Limiting: Implement exponential backoff for retries on rate limit errors.
- Context Length Errors: Be mindful of the maximum context length for your chosen model.
- Model Availability: Ensure the requested model is available and properly configured.
Performance Considerations
- Batching: Use batching for multiple, similar requests to improve throughput.
- Streaming: Utilize streaming for long-form content generation to improve perceived latency.
- Model Selection: Balance between model capability and inference speed based on your use case.
Server Configuration
TheR2RConfig
class handles the configuration of various components, including LLMs. Here’s a simplified version: