Skip to main content

Document Ingestion and Management

Ingest Files

Ingest files or directories into your R2R system:
const files = [
  { path: 'path/to/file1.txt', name: 'file1.txt' },
  { path: 'path/to/file2.txt', name: 'file2.txt' }
];
const metadatas = [{ key1: 'value1' }, { key2: 'value2' }];

const ingestResponse = await client.ingestFiles(files, {
  metadatas,
  user_ids: ['user-id-1', 'user-id-2'],
});
response
object
The response from the R2R system after ingesting the files.
[{'message': 'Ingestion task queued successfully.', 'task_id': '6e27dfca-606d-422d-b73f-2d9e138661b4', 'document_id': 'c3291abf-8a4e-5d9d-80fd-232ef6fd8526'}, ...]
files
Array<string | File | { path: string; name: string }>
required
An array of file paths, File objects, or objects with path and name properties to ingest.
options
object
metadatas
Record<string, any>
An optional array of metadata objects corresponding to each file.
document_ids
Array<string>
An optional array of document IDs to assign to the ingested files.
user_ids
Array<string | null>
An optional array of user IDs associated with the ingested files.
ingestion_config
Optional[Union[dict, ChunkingConfig]]
The ingestion config override parameter enables developers to customize their R2R chunking strategy at runtime.

Update Files

Update existing documents:
const files = [
  { path: '/path/to/updated_file1.txt', name: 'updated_file1.txt' }
];
const document_ids = ['document-id-1'];
const updateResponse = await client.updateFiles(files, {
  document_ids,
  metadatas: [{ key: 'updated_value' }] // to overwrite the existing metadata
});
response
object
The response from the R2R system after updating the files.
  {'results': {'processed_documents': [{'id': '9f375ce9-efe9-5b57-8bf2-a63dee5f3621', 'title': 'aristotle_v2.txt'}], 'failed_documents': [], 'skipped_documents': []}}
files
Array<File | { path: string; name: string }>
required
An array of File objects or objects with path and name properties to update.
options
object
required
document_ids
Array<string>
required
An array of document IDs corresponding to the files being updated.
metadatas
Array<Record<string, any>>
An optional array of metadata objects for the updated files.
ingestion_config
Record<string, any>
The ingestion config override parameter enables developers to customize their R2R chunking strategy at runtime.

Documents Overview

Retrieve high-level document information, restricted to user files, except when called by a superuser where it will then return results from over all users:
const documentsOverview = await client.documentsOverview();
response
Array<object>
An array of objects containing document information.
[
  {
    'document_id': '9fbe403b-c11c-5aae-8ade-ef22980c3ad1',
    'version': 'v1',
    'size_in_bytes': 73353,
    'metadata': {},
    'status': 'success',
    'user_id': '2acb499e-8428-543b-bd85-0d9098718220',
    'title': 'aristotle.txt',
    'created_at': '2024-07-21T20:09:14.218741Z',
    'updated_at': '2024-07-21T20:09:14.218741Z',
    'metadata': {'x': 'y'}
  },
  ...
]
document_ids
Array<string>
An optional array of document IDs to filter the overview.

Document Chunks

Fetch chunks for a particular document:
const documentId = '9fbe403b-c11c-5aae-8ade-ef22980c3ad1';
const chunks = await client.documentChunks(documentId);
response
Array<object>
An array of objects containing chunk information.
[{
  'text': 'Aristotle[A] (Greek: Ἀριστοτέλης Aristotélēs, pronounced [aristotélɛːs]; 384–322 BC) was an Ancient Greek philosopher and polymath...',
  'user_id': '2acb499e-8428-543b-bd85-0d9098718220',
  'document_id': '9fbe403b-c11c-5aae-8ade-ef22980c3ad1',
  'extraction_id': 'aeba6400-1bd0-5ee9-8925-04732d675434',
  'fragment_id': 'f48bcdad-4155-52a4-8c9d-8ba06e996ba3'
  'metadata': {'title': 'aristotle.txt', 'version': 'v0', 'chunk_order': 0}}
},
...]
document_id
string
required
The ID of the document to retrieve chunks for.

Delete Documents

Delete a document by its ID:
const deleteResponse = await client.delete({ document_id: "91662726-7271-51a5-a0ae-34818509e1fd" });
response
object
The response from the R2R system after successfully deleting the documents.
{'results': {}}
filters
{ [key: string]: string | string[] }
required
A list of logical filters to perform over input documents fields which identifies the unique set of documents to delete (e.g., {"document_id": {"$eq": "9fbe403b-c11c-5aae-8ade-ef22980c3ad1"}}). Logical operations might include variables such as "user_id" or "title" and filters like neq, gte, etc.
I