Configure your R2R ingestion pipeline
r2r.toml
relate to the ingestion process, some of which are shown below
[database]
section configures the Postgres database used for semantic search and document management. During retrieval, this database is queried to find the most relevant document chunks based on vector similarity.
[ingestion]
section determines how different file types are processed and converted into text. This includes protocol for how text is split into smaller, manageable pieces. This affects the granularity of information storage and retrieval.
[embedding]
section defines the model and parameters for converting text into vector embeddings. In the retrieval process, these settings are used to embed the user’s query, allowing it to be compared against the stored document embeddings.