LLMs
Learn how to configure LLMs in your R2R deployment
R2R uses language models to generate responses based on retrieved context. You can configure R2R’s server-side LLM generation settings with the r2r.toml
:
Key generation configuration options:
provider
: The LLM provider (defaults to “LiteLLM” for maximum flexibility).concurrent_request_limit
: Maximum number of concurrent LLM requests.model
: The language model to use for generation.temperature
: Controls the randomness of the output (0.0 to 1.0).top_p
: Nucleus sampling parameter (0.0 to 1.0).max_tokens_to_sample
: Maximum number of tokens to generate.stream
: Enable/disable streaming of generated text.api_base
: The base URL for remote communication, e.g.https://api.openai.com/v1
Serving select LLM providers
Supported models include:
- openai/gpt-4o
- openai/gpt-4-turbo
- openai/gpt-4
- openai/gpt-4o-mini
For a complete list of supported OpenAI models and detailed usage instructions, please refer to the LiteLLM OpenAI documentation.
Supported models include:
- openai/gpt-4o
- openai/gpt-4-turbo
- openai/gpt-4
- openai/gpt-4o-mini
For a complete list of supported OpenAI models and detailed usage instructions, please refer to the LiteLLM OpenAI documentation.
Supported models include:
- azure/gpt-4o
- azure/gpt-4-turbo
- azure/gpt-4
- azure/gpt-4o-mini
- azure/gpt-4o-mini For a complete list of supported Azure models and detailed usage instructions, please refer to the LiteLLM Azure documentation.
Supported models include:
- anthropic/claude-3-5-sonnet-20240620
- anthropic/claude-3-opus-20240229
- anthropic/claude-3-sonnet-20240229
- anthropic/claude-3-haiku-20240307
- anthropic/claude-2.1
For a complete list of supported Anthropic models and detailed usage instructions, please refer to the LiteLLM Anthropic documentation.
Supported models include:
- vertex_ai/gemini-pro
- vertex_ai/gemini-pro-vision
- vertex_ai/claude-3-opus@20240229
- vertex_ai/claude-3-sonnet@20240229
- vertex_ai/mistral-large@2407
For a complete list of supported Vertex AI models and detailed usage instructions, please refer to the LiteLLM Vertex AI documentation.
Supported models include:
- bedrock/anthropic.claude-3-sonnet-20240229-v1:0
- bedrock/anthropic.claude-v2
- bedrock/anthropic.claude-instant-v1
- bedrock/amazon.titan-text-express-v1
- bedrock/meta.llama2-70b-chat-v1
- bedrock/mistral.mixtral-8x7b-instruct-v0:1
For a complete list of supported AWS Bedrock models and detailed usage instructions, please refer to the LiteLLM AWS Bedrock documentation.
pip install boto3>=1.28.57
). Make sure to set up your AWS credentials properly before using Bedrock models. Supported models include:
- llama-3.1-8b-instant
- llama-3.1-70b-versatile
- llama-3.1-405b-reasoning
- llama3-8b-8192
- llama3-70b-8192
- mixtral-8x7b-32768
- gemma-7b-it
For a complete list of supported Groq models and detailed usage instructions, please refer to the LiteLLM Groq documentation.
Note: Groq supports ALL models available on their platform. Use the prefix groq/
when specifying the model name.
Additional features:
- Supports streaming responses
- Function/Tool calling available for compatible models
- Speech-to-Text capabilities with Whisper model
Supported models include:
- llama2
- mistral
- mistral-7B-Instruct-v0.1
- mixtral-8x7B-Instruct-v0.1
- codellama
- llava (vision model)
For a complete list of supported Ollama models and detailed usage instructions, please refer to the LiteLLM Ollama documentation.
Supported models include:
- command-r
- command-light
- command-r-plus
- command-medium
For a complete list of supported Cohere models and detailed usage instructions, please refer to the LiteLLM Cohere documentation.
Supported models include:
- anyscale/meta-llama/Llama-2-7b-chat-hf
- anyscale/meta-llama/Llama-2-13b-chat-hf
- anyscale/meta-llama/Llama-2-70b-chat-hf
- anyscale/mistralai/Mistral-7B-Instruct-v0.1
- anyscale/codellama/CodeLlama-34b-Instruct-hf
For a complete list of supported Anyscale models and detailed usage instructions, please refer to the Anyscale Endpoints documentation.
Runtime Configuration of LLM Provider
R2R supports runtime configuration of the LLM provider, allowing you to dynamically change the model or provider for each request. This flexibility enables you to use different models or providers based on specific requirements or use cases.
Combining Search and Generation
When performing a RAG query, you can dynamically set the LLM generation settings:
For more detailed information on configuring other search and RAG settings, please refer to the RAG Configuration documentation.
Next Steps
For more detailed information on configuring specific components of R2R, please refer to the following pages: