Language Models (LLMs)
Configure and use multiple Language Model providers in R2R
Introduction
R2R’s LLMProvider
supports multiple third-party Language Model (LLM) providers, offering flexibility in choosing and switching between different models based on your specific requirements. This guide provides an in-depth look at configuring and using various LLM providers within the R2R framework.
Architecture Overview
R2R’s LLM system is built on a flexible provider model:
- LLM Provider: An abstract base class that defines the common interface for all LLM providers.
- Specific LLM Providers: Concrete implementations for different LLM services (e.g., OpenAI, LiteLLM).
These providers work in tandem to ensure flexible and efficient language model integration.
Providers
LiteLLM Provider (Default)
The default LiteLLMProvider
offers a unified interface for multiple LLM services.
Key features:
- Support for OpenAI, Anthropic, Vertex AI, HuggingFace, Azure OpenAI, Ollama, Together AI, and Openrouter
- Consistent API across different LLM providers
- Easy switching between models
OpenAI Provider
The OpenAILLM
class provides direct integration with OpenAI’s models.
Key features:
- Direct access to OpenAI’s API
- Support for the latest OpenAI models
- Fine-grained control over model parameters
Local Models
Support for running models locally using Ollama or other local inference engines, through LiteLLM.
Key features:
- Privacy-preserving local inference
- Customizable model selection
- Reduced latency for certain use cases
Configuration
LLM Configuration
Update the completions
section in your r2r.toml
file:
[completions]
provider = "litellm"
[completions.generation_config]
model = "gpt-4"
temperature = 0.7
max_tokens = 150
The provided generation_config
is used to establish the default generation parameters for your deployment. These settings can be overridden at runtime, offering flexibility in your application. You can adjust parameters:
- At the application level, by modifying the R2R configuration
- For individual requests, by passing custom parameters to the
rag
orget_completion
methods - Through API calls, by including specific parameters in your request payload
This allows you to fine-tune the behavior of your language model interactions on a per-use basis while maintaining a consistent baseline configuration.
Security Best Practices
- API Key Management: Use environment variables or secure key management solutions for API keys.
- Rate Limiting: Implement rate limiting to prevent abuse of LLM endpoints.
- Input Validation: Sanitize and validate all inputs before passing them to LLMs.
- Output Filtering: Implement content filtering for LLM outputs to prevent inappropriate content.
- Monitoring: Regularly monitor LLM usage and outputs for anomalies or misuse.
Custom LLM Providers in R2R
LLM Provider Structure
The LLM system in R2R is built on two main components:
LLMConfig
: A configuration class for LLM providers.LLMProvider
: An abstract base class that defines the interface for all LLM providers.
LLMConfig
The LLMConfig
class is used to configure LLM providers:
from r2r.base import ProviderConfig
from r2r.base.abstractions.llm import GenerationConfig
from typing import Optional
class LLMConfig(ProviderConfig):
provider: Optional[str] = None
generation_config: Optional[GenerationConfig] = None
def validate(self) -> None:
if not self.provider:
raise ValueError("Provider must be set.")
if self.provider and self.provider not in self.supported_providers:
raise ValueError(f"Provider '{self.provider}' is not supported.")
@property
def supported_providers(self) -> list[str]:
return ["litellm", "openai"]
LLMProvider
The LLMProvider
is an abstract base class that defines the common interface for all LLM providers:
from abc import abstractmethod
from r2r.base import Provider
from r2r.base.abstractions.llm import GenerationConfig, LLMChatCompletion, LLMChatCompletionChunk
class LLMProvider(Provider):
def __init__(self, config: LLMConfig) -> None:
if not isinstance(config, LLMConfig):
raise ValueError("LLMProvider must be initialized with a `LLMConfig`.")
super().__init__(config)
@abstractmethod
def get_completion(
self,
messages: list[dict],
generation_config: GenerationConfig,
**kwargs,
) -> LLMChatCompletion:
pass
@abstractmethod
def get_completion_stream(
self,
messages: list[dict],
generation_config: GenerationConfig,
**kwargs,
) -> LLMChatCompletionChunk:
pass
Creating a Custom LLM Provider
To create a custom LLM provider, follow these steps:
- Create a new class that inherits from
LLMProvider
. - Implement the required methods:
get_completion
andget_completion_stream
. - (Optional) Add any additional methods or attributes specific to your provider.
Here’s an example of a custom LLM provider:
import logging
from typing import Generator
from r2r.base import LLMProvider, LLMConfig, LLMChatCompletion, LLMChatCompletionChunk
from r2r.base.abstractions.llm import GenerationConfig
logger = logging.getLogger(__name__)
class CustomLLMProvider(LLMProvider):
def __init__(self, config: LLMConfig) -> None:
super().__init__(config)
# Initialize any custom attributes or connections here
self.custom_client = self._initialize_custom_client()
def _initialize_custom_client(self):
# Initialize your custom LLM client here
pass
def get_completion(
self,
messages: list[dict],
generation_config: GenerationConfig,
**kwargs,
) -> LLMChatCompletion:
# Implement the logic to get a completion from your custom LLM
response = self.custom_client.generate(messages, **generation_config.dict(), **kwargs)
# Convert the response to LLMChatCompletion format
return LLMChatCompletion(
id=response.id,
choices=[
{
"message": {
"role": "assistant",
"content": response.text
},
"finish_reason": response.finish_reason
}
],
usage={
"prompt_tokens": response.usage.prompt_tokens,
"completion_tokens": response.usage.completion_tokens,
"total_tokens": response.usage.total_tokens
}
)
def get_completion_stream(
self,
messages: list[dict],
generation_config: GenerationConfig,
**kwargs,
) -> Generator[LLMChatCompletionChunk, None, None]:
# Implement the logic to get a streaming completion from your custom LLM
stream = self.custom_client.generate_stream(messages, **generation_config.dict(), **kwargs)
for chunk in stream:
yield LLMChatCompletionChunk(
id=chunk.id,
choices=[
{
"delta": {
"role": "assistant",
"content": chunk.text
},
"finish_reason": chunk.finish_reason
}
]
)
# Add any additional methods specific to your custom provider
def custom_method(self, *args, **kwargs):
# Implement custom functionality
pass
Registering and Using the Custom Provider
To use your custom LLM provider in R2R:
- Update the
LLMConfig
class to include your custom provider:
class LLMConfig(ProviderConfig):
# ...existing code...
@property
def supported_providers(self) -> list[str]:
return ["litellm", "openai", "custom"] # Add your custom provider here
- Update your R2R configuration to use the custom provider:
[completions]
provider = "custom"
[completions.generation_config]
model = "your-custom-model"
temperature = 0.7
max_tokens = 150
- In your R2R application, register the custom provider:
from r2r import R2R
from r2r.base import LLMConfig
from your_module import CustomLLMProvider
def get_llm_provider(config: LLMConfig):
if config.provider == "custom":
return CustomLLMProvider(config)
# ... handle other providers ...
r2r = R2R(llm_provider_factory=get_llm_provider)
Now you can use your custom LLM provider seamlessly within your R2R application:
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
response = r2r.get_completion(messages)
print(response.choices[0].message.content)
By following this structure, you can integrate any LLM or service into R2R, maintaining consistency with the existing system while adding custom functionality as needed.
Prompt Engineering
R2R supports advanced prompt engineering techniques:
- Template Management: Create and manage reusable prompt templates.
- Dynamic Prompts: Generate prompts dynamically based on context or user input.
- Few-shot Learning: Incorporate examples in your prompts for better results.
Troubleshooting
Common issues and solutions:
- API Key Errors: Ensure your API keys are correctly set and have the necessary permissions.
- Rate Limiting: Implement exponential backoff for retries on rate limit errors.
- Context Length Errors: Be mindful of the maximum context length for your chosen model.
- Model Availability: Ensure the requested model is available and properly configured.
Performance Considerations
- Batching: Use batching for multiple, similar requests to improve throughput.
- Streaming: Utilize streaming for long-form content generation to improve perceived latency.
- Model Selection: Balance between model capability and inference speed based on your use case.
Server Configuration
The R2RConfig
class handles the configuration of various components, including LLMs. Here’s a simplified version:
class R2RConfig:
REQUIRED_KEYS: dict[str, list] = {
# ... other keys ...
"completions": ["provider"],
# ... other keys ...
}
def __init__(self, config_data: dict[str, Any]):
# Load and validate configuration
# ...
# Set LLM configuration
self.completions = LLMConfig.create(**self.completions)
# Override GenerationConfig defaults
GenerationConfig.set_default(**self.completions.get("generation_config", {}))
# ... other initialization ...
This configuration system allows for flexible setup of LLM providers and their default parameters.
Conclusion
R2R’s LLM system provides a flexible and powerful foundation for integrating various language models into your applications. By understanding the available providers, configuration options, and best practices, you can effectively leverage LLMs to enhance your R2R-based projects.
For further customization and advanced use cases, refer to the R2R API Documentation and configuration guide.
Was this page helpful?