RAG Settings - Prisme.ai

RAG (Retrieval Augmented Generation) settings control how your documents are processed and retrieved. Fine-tuning these settings can significantly improve the quality of agent responses.

Quick Start: Presets

When creating a knowledge base, choose a preset that matches your needs:

Preset	Best For	Trade-off
Fast	Quick setup, general content	Speed over precision
Balanced	Most use cases	Good balance of speed and quality
Quality	Complex documents, high accuracy needs	Slower processing, better results

Each preset configures the parser, chunking strategy, and chunk sizes automatically. You can customize settings after creation if needed.

Start with Balanced for most use cases. Switch to Quality if you notice retrieval issues with complex documents like PDFs with tables or multi-column layouts.

Document Parsing

Before chunking, documents are parsed to extract text. The parser affects how well structure is preserved.

Parser	Speed	Structure	Best For
Standard	Fast	Basic	Plain text, simple documents
Structured	Medium	Good	Documents with headings, lists
OCR-enabled	Slow	Good	Scanned documents, images with text
AI-powered	Slowest	Best	Complex layouts, tables, multi-column

The preset you choose selects an appropriate parser, but you can override it in advanced settings.

Understanding RAG

When an agent uses a knowledge base:

Query - The user’s question is converted to an embedding
Search - Similar document chunks are retrieved
Context - Retrieved chunks are sent to the AI model
Response - The model generates an answer using the context

Each step can be configured to optimize for your use case.

Chunking Settings

Chunking splits documents into smaller pieces for retrieval.

Chunk Size

How many tokens per chunk (default: 512).

Size	Pros	Cons
Small (256)	Precise retrieval	May split context
Medium (512)	Good balance	Default choice
Large (1024)	More context per chunk	Less precise matching

When to adjust:

Decrease for Q&A with short, specific answers
Increase for documents where context spans paragraphs

Chunk Overlap

Tokens shared between consecutive chunks (default: 50). Overlap ensures that information at chunk boundaries isn’t lost. A sentence at the end of one chunk appears at the start of the next.

Overlap	Pros	Cons
Small (0-25)	Less redundancy	May lose boundary context
Medium (50)	Balanced	Default choice
Large (100+)	Better continuity	More storage, slower search

Chunking Strategy

How text is split:

Strategy	Description
Fixed	Split at token count (default)
Paragraph	Split at paragraph boundaries
Sentence	Split at sentence boundaries
Semantic	Split by meaning changes

Start with the default fixed strategy. Switch to semantic chunking if you notice important concepts being split awkwardly.

Embedding Settings

Embedding Model

The model that converts text to vectors. Changing models requires reindexing all documents. Consider:

Language - Some models specialize in specific languages
Domain - Specialized models for code, legal, medical, etc.
Size - Larger models are more accurate but slower

Embedding Dimensions

Higher dimensions capture more nuance but use more storage. Most models have a fixed dimension (e.g., 1536 for OpenAI embeddings).

Retrieval Settings

Top K

How many chunks to retrieve (default: 5).

K Value	Pros	Cons
Small (3-5)	Focused, fast	May miss relevant info
Medium (5-10)	Good coverage	Default choice
Large (10-20)	Comprehensive	May include irrelevant chunks

Similarity Threshold

Minimum similarity score to include a chunk (0-1 scale).

Threshold	Effect
Low (0.3)	More results, lower relevance
Medium (0.5)	Balanced
High (0.7)	Fewer results, higher relevance

Set higher thresholds when precision matters more than recall.

Reranking

Optional second pass to improve retrieval quality:

Initial search retrieves more candidates (e.g., 20)
A reranking model scores each candidate
Only top results are used

Reranking improves quality but adds latency. Enable for use cases where accuracy is critical.

Knowledge Base Settings

Access these in the knowledge base’s Settings tab.

Configuring Chunking

Open the knowledge base
Go to Settings > Processing
Adjust chunk size and overlap
Click Save

Changing chunking settings only affects new documents. Click Reindex All to apply changes to existing documents.

Configuring Retrieval

Open the knowledge base
Go to Settings > Retrieval
Adjust Top K, threshold, and reranking
Click Save

Retrieval settings take effect immediately - no reindexing needed.

Testing Retrieval

After changing settings, test the impact:

Go to the knowledge base
Use the Test Search feature (if available)
Enter a query
Review which chunks are retrieved
Check if relevant content is included

Or test via an agent:

Open an agent that uses this knowledge base
Go to Playground
Ask questions and observe tool calls
Check if the retrieved context is relevant

Common Scenarios

FAQ-Style Content

Short questions with specific answers:

Chunk size: 256-512
Top K: 3-5
High similarity threshold (0.6+)

Long-Form Documents

Research papers, manuals, reports:

Chunk size: 512-1024
Overlap: 100+
Top K: 5-10

Technical Documentation

Code examples, API references:

Consider code-aware chunking
Higher overlap to preserve examples
Semantic chunking if available

Mixed Content

Various document types:

Start with defaults
Test with representative queries
Adjust based on results

Performance Considerations

Index Size

Smaller chunks with more overlap = larger index:

More storage used
Potentially slower search
Better for accuracy-critical use cases

Query Latency

Factors affecting search speed:

Number of documents
Chunk count
Top K value
Reranking enabled

For large knowledge bases, balance quality vs. speed.

Cost

Consider:

Embedding API costs for indexing
Reranking costs if enabled
Token costs from larger context

Troubleshooting

”Agent doesn’t find relevant content”

Decrease similarity threshold
Increase Top K
Check if content is actually indexed
Verify the query matches document language/terminology

”Retrieved context seems irrelevant”

Increase similarity threshold
Enable reranking
Review chunk boundaries
Consider different embedding model

”Important context is split across chunks”

Increase chunk size
Increase overlap
Try semantic chunking

”Too much irrelevant content in responses”

Decrease Top K
Increase similarity threshold
Enable reranking
More specific instructions in agent prompt

Next Steps

Test with agents

See how settings affect real queries in the Playground

Advanced RAG

Learn about multi-query, hierarchical retrieval, and more

Overview

Chat

Agent Creator

Knowledges

Builder

Governe

Insights (beta)

Documentation Index

​Quick Start: Presets

​Document Parsing

​Understanding RAG

​Chunking Settings

​Chunk Size

​Chunk Overlap

​Chunking Strategy

​Embedding Settings

​Embedding Model

​Embedding Dimensions

​Retrieval Settings

​Top K

​Similarity Threshold

​Reranking

​Knowledge Base Settings

​Configuring Chunking

​Configuring Retrieval

​Testing Retrieval

​Common Scenarios

​FAQ-Style Content

​Long-Form Documents

​Technical Documentation

​Mixed Content

​Performance Considerations

​Index Size

​Query Latency

​Cost

​Troubleshooting

​”Agent doesn’t find relevant content”

​”Retrieved context seems irrelevant”

​”Important context is split across chunks”

​”Too much irrelevant content in responses”

​Next Steps

Test with agents

Advanced RAG

Quick Start: Presets

Document Parsing

Understanding RAG

Chunking Settings

Chunk Size

Chunk Overlap

Chunking Strategy

Embedding Settings

Embedding Model

Embedding Dimensions

Retrieval Settings

Top K

Similarity Threshold

Reranking

Knowledge Base Settings

Configuring Chunking

Configuring Retrieval

Testing Retrieval

Common Scenarios

FAQ-Style Content

Long-Form Documents

Technical Documentation

Mixed Content

Performance Considerations

Index Size

Query Latency

Cost

Troubleshooting

”Agent doesn’t find relevant content”

”Retrieved context seems irrelevant”

”Important context is split across chunks”

”Too much irrelevant content in responses”

Next Steps