> ## Documentation Index
> Fetch the complete documentation index at: https://r2r-patch-decruft-openapi-json.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Hybrid Search

> Learn how to implement and use hybrid search with R2R

## Introduction

R2R's hybrid search combines traditional keyword-based searching with modern semantic understanding, providing more accurate and contextually relevant results. This approach is particularly effective for complex queries where both specific terms and overall meaning are crucial.

## How R2R Hybrid Search Works

1. **Full-Text Search**: Utilizes PostgreSQL's full-text search with `ts_rank_cd` and `websearch_to_tsquery`.
2. **Semantic Search**: Performs vector similarity search using the query's embedded representation.
3. **Reciprocal Rank Fusion (RRF)**: Merges results from full-text and semantic searches using the formula:
   ```
   COALESCE(1.0 / (rrf_k + full_text.rank_ix), 0.0) * full_text_weight +
   COALESCE(1.0 / (rrf_k + semantic.rank_ix), 0.0) * semantic_weight
   ```
4. **Result Ranking**: Orders final results based on the combined RRF score.

## Key Features

### Full-Text Search

The full-text search component incorporates:

* PostgreSQL's `tsvector` for efficient text searching
* `websearch_to_tsquery` for parsing user queries
* `ts_rank_cd` for ranking full-text search results

### Semantic Search

The semantic search component uses:

* Vector embeddings for storing and querying semantic representations
* Cosine similarity for measuring the relevance of documents to the query

## Configuration

### VectorSearchSettings

Key settings for vector search configuration:

```python
class VectorSearchSettings(BaseModel):
    use_hybrid_search: bool
    search_limit: int
    filters: dict[str, Any]
    hybrid_search_settings: Optional[HybridSearchSettings]
    # ... other settings
```

### HybridSearchSettings

Specific parameters for hybrid search:

```python
class HybridSearchSettings(BaseModel):
    full_text_weight: float
    semantic_weight: float
    full_text_limit: int
    rrf_k: int
```

## Usage Example

```python
from r2r import R2RClient

client = R2RClient()

vector_settings = {
    "use_hybrid_search": True,
    "search_limit": 20,
    # Can add logical filters, as shown:
    # "filters":{"category": {"$eq": "technology"}},
    "hybrid_search_settings" : {
        "full_text_weight": 1.0,
        "semantic_weight": 5.0,
        "full_text_limit": 200,
        "rrf_k": 50
    }
}

results = client.search(
    query="Who was Aristotle?",
    vector_search_settings=vector_settings
)
print(results)
```

## Results Comparison

### Basic Vector Search

```json
{
  "results": {
    "vector_search_results": [
      {
        "score": 0.780314067545999,
        "text": "Aristotle[A] (Greek: Ἀριστοτέλης Aristotélēs, pronounced [aristotélɛːs]; 384–322 BC) was an Ancient Greek philosopher and polymath...",
        "metadata": {
          "title": "aristotle.txt",
          "version": "v0",
          "chunk_order": 0
        }
      }, ...
    ]
  }
}
```

### Hybrid Search with RRF

```json
{
  "results": {
    "vector_search_results": [
      {
        "score": 0.0185185185185185,
        "text": "Aristotle[A] (Greek: Ἀριστοτέλης Aristotélēs, pronounced [aristotélɛːs]; 384–322 BC) was an Ancient Greek philosopher and polymath...",
        "metadata": {
          "title": "aristotle.txt",
          "version": "v0",
          "chunk_order": 0,
          "semantic_rank": 1,
          "full_text_rank": 3
        }, ...
      }
    ]
  }
}
```

## Best Practices

1. Optimize PostgreSQL indexing for both full-text and vector searches
2. Regularly update search indices
3. Monitor performance and adjust weights as needed
4. Use appropriate vector dimensions and embedding models for your use case

## Conclusion

R2R's hybrid search offers a powerful solution for complex information retrieval needs, combining the strengths of keyword matching and semantic understanding. Its flexible configuration and use of Reciprocal Rank Fusion make it adaptable to a wide range of use cases, from technical documentation to broad, context-dependent queries.
