Learn how to configure LLMs in your R2R deployment
R2R uses language models to generate responses based on retrieved context. You can configure R2R’s server-side LLM generation settings with the r2r.toml
:
Key generation configuration options:
provider
: The LLM provider (defaults to “LiteLLM” for maximum flexibility).concurrent_request_limit
: Maximum number of concurrent LLM requests.model
: The language model to use for generation.temperature
: Controls the randomness of the output (0.0 to 1.0).top_p
: Nucleus sampling parameter (0.0 to 1.0).max_tokens_to_sample
: Maximum number of tokens to generate.stream
: Enable/disable streaming of generated text.api_base
: The base URL for remote communication, e.g. https://api.openai.com/v1
Supported models include:
For a complete list of supported OpenAI models and detailed usage instructions, please refer to the LiteLLM OpenAI documentation.
Supported models include:
For a complete list of supported OpenAI models and detailed usage instructions, please refer to the LiteLLM OpenAI documentation.
Supported models include:
Supported models include:
For a complete list of supported Anthropic models and detailed usage instructions, please refer to the LiteLLM Anthropic documentation.
Supported models include:
For a complete list of supported Vertex AI models and detailed usage instructions, please refer to the LiteLLM Vertex AI documentation.
Supported models include:
For a complete list of supported AWS Bedrock models and detailed usage instructions, please refer to the LiteLLM AWS Bedrock documentation.
pip install boto3>=1.28.57
). Make sure to set up your AWS credentials properly before using Bedrock models. Supported models include:
For a complete list of supported Groq models and detailed usage instructions, please refer to the LiteLLM Groq documentation.
Note: Groq supports ALL models available on their platform. Use the prefix groq/
when specifying the model name.
Additional features:
Supported models include:
For a complete list of supported Ollama models and detailed usage instructions, please refer to the LiteLLM Ollama documentation.
Supported models include:
For a complete list of supported Cohere models and detailed usage instructions, please refer to the LiteLLM Cohere documentation.
Supported models include:
For a complete list of supported Anyscale models and detailed usage instructions, please refer to the Anyscale Endpoints documentation.
R2R supports runtime configuration of the LLM provider, allowing you to dynamically change the model or provider for each request. This flexibility enables you to use different models or providers based on specific requirements or use cases.
When performing a RAG query, you can dynamically set the LLM generation settings:
For more detailed information on configuring other search and RAG settings, please refer to the RAG Configuration documentation.
For more detailed information on configuring specific components of R2R, please refer to the following pages:
Learn how to configure LLMs in your R2R deployment
R2R uses language models to generate responses based on retrieved context. You can configure R2R’s server-side LLM generation settings with the r2r.toml
:
Key generation configuration options:
provider
: The LLM provider (defaults to “LiteLLM” for maximum flexibility).concurrent_request_limit
: Maximum number of concurrent LLM requests.model
: The language model to use for generation.temperature
: Controls the randomness of the output (0.0 to 1.0).top_p
: Nucleus sampling parameter (0.0 to 1.0).max_tokens_to_sample
: Maximum number of tokens to generate.stream
: Enable/disable streaming of generated text.api_base
: The base URL for remote communication, e.g. https://api.openai.com/v1
Supported models include:
For a complete list of supported OpenAI models and detailed usage instructions, please refer to the LiteLLM OpenAI documentation.
Supported models include:
For a complete list of supported OpenAI models and detailed usage instructions, please refer to the LiteLLM OpenAI documentation.
Supported models include:
Supported models include:
For a complete list of supported Anthropic models and detailed usage instructions, please refer to the LiteLLM Anthropic documentation.
Supported models include:
For a complete list of supported Vertex AI models and detailed usage instructions, please refer to the LiteLLM Vertex AI documentation.
Supported models include:
For a complete list of supported AWS Bedrock models and detailed usage instructions, please refer to the LiteLLM AWS Bedrock documentation.
pip install boto3>=1.28.57
). Make sure to set up your AWS credentials properly before using Bedrock models. Supported models include:
For a complete list of supported Groq models and detailed usage instructions, please refer to the LiteLLM Groq documentation.
Note: Groq supports ALL models available on their platform. Use the prefix groq/
when specifying the model name.
Additional features:
Supported models include:
For a complete list of supported Ollama models and detailed usage instructions, please refer to the LiteLLM Ollama documentation.
Supported models include:
For a complete list of supported Cohere models and detailed usage instructions, please refer to the LiteLLM Cohere documentation.
Supported models include:
For a complete list of supported Anyscale models and detailed usage instructions, please refer to the Anyscale Endpoints documentation.
R2R supports runtime configuration of the LLM provider, allowing you to dynamically change the model or provider for each request. This flexibility enables you to use different models or providers based on specific requirements or use cases.
When performing a RAG query, you can dynamically set the LLM generation settings:
For more detailed information on configuring other search and RAG settings, please refer to the RAG Configuration documentation.
For more detailed information on configuring specific components of R2R, please refer to the following pages: