Documentation Index Fetch the complete documentation index at: https://docs.prisme.ai/llms.txt
Use this file to discover all available pages before exploring further.
RAG (Retrieval Augmented Generation) settings control how your documents are processed and retrieved. Fine-tuning these settings can significantly improve the quality of agent responses.
Quick Start: Presets
When creating a knowledge base, choose a preset that matches your needs:
Preset Best For Trade-off Fast Quick setup, general content Speed over precision Balanced Most use cases Good balance of speed and quality Quality Complex documents, high accuracy needs Slower processing, better results
Each preset configures the parser, chunking strategy, and chunk sizes automatically. You can customize settings after creation if needed.
Start with Balanced for most use cases. Switch to Quality if you notice retrieval issues with complex documents like PDFs with tables or multi-column layouts.
Document Parsing
Before chunking, documents are parsed to extract text. The parser affects how well structure is preserved.
Parser Speed Structure Best For Standard Fast Basic Plain text, simple documents Structured Medium Good Documents with headings, lists OCR-enabled Slow Good Scanned documents, images with text AI-powered Slowest Best Complex layouts, tables, multi-column
The preset you choose selects an appropriate parser, but you can override it in advanced settings.
Understanding RAG
When an agent uses a knowledge base:
Query - The user’s question is converted to an embedding
Search - Similar document chunks are retrieved
Context - Retrieved chunks are sent to the AI model
Response - The model generates an answer using the context
Each step can be configured to optimize for your use case.
Chunking Settings
Chunking splits documents into smaller pieces for retrieval.
Chunk Size
How many tokens per chunk (default: 512).
Size Pros Cons Small (256) Precise retrieval May split context Medium (512) Good balance Default choice Large (1024) More context per chunk Less precise matching
When to adjust:
Decrease for Q&A with short, specific answers
Increase for documents where context spans paragraphs
Chunk Overlap
Tokens shared between consecutive chunks (default: 50).
Overlap ensures that information at chunk boundaries isn’t lost. A sentence at the end of one chunk appears at the start of the next.
Overlap Pros Cons Small (0-25) Less redundancy May lose boundary context Medium (50) Balanced Default choice Large (100+) Better continuity More storage, slower search
Chunking Strategy
How text is split:
Strategy Description Fixed Split at token count (default) Paragraph Split at paragraph boundaries Sentence Split at sentence boundaries Semantic Split by meaning changes
Start with the default fixed strategy. Switch to semantic chunking if you notice important concepts being split awkwardly.
Embedding Settings
Embedding Model
The model that converts text to vectors. Changing models requires reindexing all documents.
Consider:
Language - Some models specialize in specific languages
Domain - Specialized models for code, legal, medical, etc.
Size - Larger models are more accurate but slower
Embedding Dimensions
Higher dimensions capture more nuance but use more storage. Most models have a fixed dimension (e.g., 1536 for OpenAI embeddings).
Retrieval Settings
Top K
How many chunks to retrieve (default: 5).
K Value Pros Cons Small (3-5) Focused, fast May miss relevant info Medium (5-10) Good coverage Default choice Large (10-20) Comprehensive May include irrelevant chunks
Similarity Threshold
Minimum similarity score to include a chunk (0-1 scale).
Threshold Effect Low (0.3) More results, lower relevance Medium (0.5) Balanced High (0.7) Fewer results, higher relevance
Set higher thresholds when precision matters more than recall.
Reranking
Optional second pass to improve retrieval quality:
Initial search retrieves more candidates (e.g., 20)
A reranking model scores each candidate
Only top results are used
Reranking improves quality but adds latency. Enable for use cases where accuracy is critical.
Knowledge Base Settings
Access these in the knowledge base’s Settings tab.
Configuring Chunking
Open the knowledge base
Go to Settings > Processing
Adjust chunk size and overlap
Click Save
Changing chunking settings only affects new documents. Click Reindex All to apply changes to existing documents.
Configuring Retrieval
Open the knowledge base
Go to Settings > Retrieval
Adjust Top K, threshold, and reranking
Click Save
Retrieval settings take effect immediately - no reindexing needed.
Testing Retrieval
After changing settings, test the impact:
Go to the knowledge base
Use the Test Search feature (if available)
Enter a query
Review which chunks are retrieved
Check if relevant content is included
Or test via an agent:
Open an agent that uses this knowledge base
Go to Playground
Ask questions and observe tool calls
Check if the retrieved context is relevant
Common Scenarios
FAQ-Style Content
Short questions with specific answers:
Chunk size: 256-512
Top K: 3-5
High similarity threshold (0.6+)
Research papers, manuals, reports:
Chunk size: 512-1024
Overlap: 100+
Top K: 5-10
Technical Documentation
Code examples, API references:
Consider code-aware chunking
Higher overlap to preserve examples
Semantic chunking if available
Mixed Content
Various document types:
Start with defaults
Test with representative queries
Adjust based on results
Index Size
Smaller chunks with more overlap = larger index:
More storage used
Potentially slower search
Better for accuracy-critical use cases
Query Latency
Factors affecting search speed:
Number of documents
Chunk count
Top K value
Reranking enabled
For large knowledge bases, balance quality vs. speed.
Cost
Consider:
Embedding API costs for indexing
Reranking costs if enabled
Token costs from larger context
Troubleshooting
”Agent doesn’t find relevant content”
Decrease similarity threshold
Increase Top K
Check if content is actually indexed
Verify the query matches document language/terminology
”Retrieved context seems irrelevant”
Increase similarity threshold
Enable reranking
Review chunk boundaries
Consider different embedding model
”Important context is split across chunks”
Increase chunk size
Increase overlap
Try semantic chunking
”Too much irrelevant content in responses”
Decrease Top K
Increase similarity threshold
Enable reranking
More specific instructions in agent prompt
Next Steps
Test with agents See how settings affect real queries in the Playground
Advanced RAG Learn about multi-query, hierarchical retrieval, and more