Skip to main content
RAG (Retrieval Augmented Generation) settings control how your documents are processed and retrieved. Fine-tuning these settings can significantly improve the quality of agent responses.

Quick Start: Presets

When creating a knowledge base, choose a preset that matches your needs:
PresetBest ForTrade-off
FastQuick setup, general contentSpeed over precision
BalancedMost use casesGood balance of speed and quality
QualityComplex documents, high accuracy needsSlower processing, better results
Each preset configures the parser, chunking strategy, and chunk sizes automatically. You can customize settings after creation if needed.
Start with Balanced for most use cases. Switch to Quality if you notice retrieval issues with complex documents like PDFs with tables or multi-column layouts.

Document Parsing

Before chunking, documents are parsed to extract text. The parser affects how well structure is preserved.
ParserIdentifierSpeedOCRBest For
TikatikaFastNoPlain text, simple documents
Tika + OCRtika-ocrSlowYesScanned documents, images with text
UnstructuredunstructuredMediumNoDocuments with headings, lists, tables
Unstructured + OCRunstructured-ocrSlowestYesComplex scanned layouts, multi-column
The preset you choose selects an appropriate parser, but you can override it at any time in the knowledge base settings — see Configuring the Document Parser.

Understanding RAG

When an agent uses a knowledge base:
  1. Query - The user’s question is converted to an embedding
  2. Search - Similar document chunks are retrieved
  3. Context - Retrieved chunks are sent to the AI model
  4. Response - The model generates an answer using the context
Each step can be configured to optimize for your use case.

Chunking Settings

Chunking splits documents into smaller pieces for retrieval.

Chunk Size

How many tokens per chunk (default: 512).
SizeProsCons
Small (256)Precise retrievalMay split context
Medium (512)Good balanceDefault choice
Large (1024)More context per chunkLess precise matching
When to adjust:
  • Decrease for Q&A with short, specific answers
  • Increase for documents where context spans paragraphs

Chunk Overlap

Tokens shared between consecutive chunks (default: 50). Overlap ensures that information at chunk boundaries isn’t lost. A sentence at the end of one chunk appears at the start of the next.
OverlapProsCons
Small (0-25)Less redundancyMay lose boundary context
Medium (50)BalancedDefault choice
Large (100+)Better continuityMore storage, slower search

Chunking Strategy

How text is split:
StrategyDescription
FixedSplit at token count (default)
ParagraphSplit at paragraph boundaries
SentenceSplit at sentence boundaries
SemanticSplit by meaning changes
Start with the default fixed strategy. Switch to semantic chunking if you notice important concepts being split awkwardly.

Embedding Settings

Embedding Model

The model that converts text to vectors. Pick it carefully — it cannot be changed in place later. Consider:
  • Language - Some models specialize in specific languages
  • Domain - Specialized models for code, legal, medical, etc.
  • Size - Larger models are more accurate but slower
The embedding model and its dimensions are frozen at the moment a knowledge base is created. The Reindex button reapplies chunking and parsing, but it cannot swap the embedding model — the physical vector index is allocated for the original model’s dimensions and cannot be resized. To move to a different embedding model, follow the side-by-side migration in Changing the embedding model.

Embedding Dimensions

Higher dimensions capture more nuance but use more storage. Most models have a fixed dimension (e.g., 1536 for OpenAI embeddings). Like the model itself, dimensions are frozen at creation.

Retrieval Settings

Top K

How many chunks to retrieve (default: 5).
K ValueProsCons
Small (3-5)Focused, fastMay miss relevant info
Medium (5-10)Good coverageDefault choice
Large (10-20)ComprehensiveMay include irrelevant chunks

Similarity Threshold

Minimum similarity score to include a chunk (0-1 scale).
ThresholdEffect
Low (0.3)More results, lower relevance
Medium (0.5)Balanced
High (0.7)Fewer results, higher relevance
Set higher thresholds when precision matters more than recall.

Reranking

Optional second pass to improve retrieval quality:
  1. Initial search retrieves more candidates (e.g., 20)
  2. A reranking model scores each candidate
  3. Only top results are used
Reranking improves quality but adds latency. Enable for use cases where accuracy is critical.

Knowledge Base Settings

Access these in the knowledge base’s Settings tab.

Configuring Chunking

  1. Open the knowledge base
  2. Go to Settings > Processing
  3. Adjust chunk size and overlap
  4. Click Save
Changing chunking settings only affects new documents. Click Reindex All to apply changes to existing documents.

Configuring Retrieval

  1. Open the knowledge base
  2. Go to Settings > Retrieval
  3. Adjust Top K, threshold, and reranking
  4. Click Save
Retrieval settings take effect immediately - no reindexing needed.

Configuring the Document Parser

The parser can be changed after the knowledge base is created — it is no longer locked to the preset chosen at creation time.
  1. Open the knowledge base
  2. Go to Settings > Processing
  3. Select the parser — use Tika + OCR (tika-ocr) for scanned documents or images that contain text
  4. Click Save, then Reindex All
Switching the parser only affects new documents until you click Reindex All to reprocess existing ones. OCR parsers (tika-ocr, unstructured-ocr) extract text from scans and images but are noticeably slower to index.
For files uploaded directly into an agent’s chat, the parser lives on the agent’s conversation store rather than a regular knowledge base. See Enabling OCR for chat-uploaded files.

Testing Retrieval

After changing settings, test the impact:
  1. Go to the knowledge base
  2. Use the Test Search feature (if available)
  3. Enter a query
  4. Review which chunks are retrieved
  5. Check if relevant content is included
Or test via an agent:
  1. Open an agent that uses this knowledge base
  2. Go to Playground
  3. Ask questions and observe tool calls
  4. Check if the retrieved context is relevant

Common Scenarios

FAQ-Style Content

Short questions with specific answers:
  • Chunk size: 256-512
  • Top K: 3-5
  • High similarity threshold (0.6+)

Long-Form Documents

Research papers, manuals, reports:
  • Chunk size: 512-1024
  • Overlap: 100+
  • Top K: 5-10

Technical Documentation

Code examples, API references:
  • Consider code-aware chunking
  • Higher overlap to preserve examples
  • Semantic chunking if available

Mixed Content

Various document types:
  • Start with defaults
  • Test with representative queries
  • Adjust based on results

Performance Considerations

Index Size

Smaller chunks with more overlap = larger index:
  • More storage used
  • Potentially slower search
  • Better for accuracy-critical use cases

Query Latency

Factors affecting search speed:
  • Number of documents
  • Chunk count
  • Top K value
  • Reranking enabled
For large knowledge bases, balance quality vs. speed.

Cost

Consider:
  • Embedding API costs for indexing
  • Reranking costs if enabled
  • Token costs from larger context

Troubleshooting

”Agent doesn’t find relevant content”

  • Decrease similarity threshold
  • Increase Top K
  • Check if content is actually indexed
  • Verify the query matches document language/terminology

”Retrieved context seems irrelevant”

  • Increase similarity threshold
  • Enable reranking
  • Review chunk boundaries
  • Consider different embedding model

”Important context is split across chunks”

  • Increase chunk size
  • Increase overlap
  • Try semantic chunking

”Too much irrelevant content in responses”

  • Decrease Top K
  • Increase similarity threshold
  • Enable reranking
  • More specific instructions in agent prompt

Next Steps

Test with agents

See how settings affect real queries in the Playground

Advanced RAG

Learn about multi-query, hierarchical retrieval, and more