Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.prisme.ai/llms.txt

Use this file to discover all available pages before exploring further.

RAG (Retrieval Augmented Generation) settings control how your documents are processed and retrieved. Fine-tuning these settings can significantly improve the quality of agent responses.

Quick Start: Presets

When creating a knowledge base, choose a preset that matches your needs:
PresetBest ForTrade-off
FastQuick setup, general contentSpeed over precision
BalancedMost use casesGood balance of speed and quality
QualityComplex documents, high accuracy needsSlower processing, better results
Each preset configures the parser, chunking strategy, and chunk sizes automatically. You can customize settings after creation if needed.
Start with Balanced for most use cases. Switch to Quality if you notice retrieval issues with complex documents like PDFs with tables or multi-column layouts.

Document Parsing

Before chunking, documents are parsed to extract text. The parser affects how well structure is preserved.
ParserSpeedStructureBest For
StandardFastBasicPlain text, simple documents
StructuredMediumGoodDocuments with headings, lists
OCR-enabledSlowGoodScanned documents, images with text
AI-poweredSlowestBestComplex layouts, tables, multi-column
The preset you choose selects an appropriate parser, but you can override it in advanced settings.

Understanding RAG

When an agent uses a knowledge base:
  1. Query - The user’s question is converted to an embedding
  2. Search - Similar document chunks are retrieved
  3. Context - Retrieved chunks are sent to the AI model
  4. Response - The model generates an answer using the context
Each step can be configured to optimize for your use case.

Chunking Settings

Chunking splits documents into smaller pieces for retrieval.

Chunk Size

How many tokens per chunk (default: 512).
SizeProsCons
Small (256)Precise retrievalMay split context
Medium (512)Good balanceDefault choice
Large (1024)More context per chunkLess precise matching
When to adjust:
  • Decrease for Q&A with short, specific answers
  • Increase for documents where context spans paragraphs

Chunk Overlap

Tokens shared between consecutive chunks (default: 50). Overlap ensures that information at chunk boundaries isn’t lost. A sentence at the end of one chunk appears at the start of the next.
OverlapProsCons
Small (0-25)Less redundancyMay lose boundary context
Medium (50)BalancedDefault choice
Large (100+)Better continuityMore storage, slower search

Chunking Strategy

How text is split:
StrategyDescription
FixedSplit at token count (default)
ParagraphSplit at paragraph boundaries
SentenceSplit at sentence boundaries
SemanticSplit by meaning changes
Start with the default fixed strategy. Switch to semantic chunking if you notice important concepts being split awkwardly.

Embedding Settings

Embedding Model

The model that converts text to vectors. Changing models requires reindexing all documents. Consider:
  • Language - Some models specialize in specific languages
  • Domain - Specialized models for code, legal, medical, etc.
  • Size - Larger models are more accurate but slower

Embedding Dimensions

Higher dimensions capture more nuance but use more storage. Most models have a fixed dimension (e.g., 1536 for OpenAI embeddings).

Retrieval Settings

Top K

How many chunks to retrieve (default: 5).
K ValueProsCons
Small (3-5)Focused, fastMay miss relevant info
Medium (5-10)Good coverageDefault choice
Large (10-20)ComprehensiveMay include irrelevant chunks

Similarity Threshold

Minimum similarity score to include a chunk (0-1 scale).
ThresholdEffect
Low (0.3)More results, lower relevance
Medium (0.5)Balanced
High (0.7)Fewer results, higher relevance
Set higher thresholds when precision matters more than recall.

Reranking

Optional second pass to improve retrieval quality:
  1. Initial search retrieves more candidates (e.g., 20)
  2. A reranking model scores each candidate
  3. Only top results are used
Reranking improves quality but adds latency. Enable for use cases where accuracy is critical.

Knowledge Base Settings

Access these in the knowledge base’s Settings tab.

Configuring Chunking

  1. Open the knowledge base
  2. Go to Settings > Processing
  3. Adjust chunk size and overlap
  4. Click Save
Changing chunking settings only affects new documents. Click Reindex All to apply changes to existing documents.

Configuring Retrieval

  1. Open the knowledge base
  2. Go to Settings > Retrieval
  3. Adjust Top K, threshold, and reranking
  4. Click Save
Retrieval settings take effect immediately - no reindexing needed.

Testing Retrieval

After changing settings, test the impact:
  1. Go to the knowledge base
  2. Use the Test Search feature (if available)
  3. Enter a query
  4. Review which chunks are retrieved
  5. Check if relevant content is included
Or test via an agent:
  1. Open an agent that uses this knowledge base
  2. Go to Playground
  3. Ask questions and observe tool calls
  4. Check if the retrieved context is relevant

Common Scenarios

FAQ-Style Content

Short questions with specific answers:
  • Chunk size: 256-512
  • Top K: 3-5
  • High similarity threshold (0.6+)

Long-Form Documents

Research papers, manuals, reports:
  • Chunk size: 512-1024
  • Overlap: 100+
  • Top K: 5-10

Technical Documentation

Code examples, API references:
  • Consider code-aware chunking
  • Higher overlap to preserve examples
  • Semantic chunking if available

Mixed Content

Various document types:
  • Start with defaults
  • Test with representative queries
  • Adjust based on results

Performance Considerations

Index Size

Smaller chunks with more overlap = larger index:
  • More storage used
  • Potentially slower search
  • Better for accuracy-critical use cases

Query Latency

Factors affecting search speed:
  • Number of documents
  • Chunk count
  • Top K value
  • Reranking enabled
For large knowledge bases, balance quality vs. speed.

Cost

Consider:
  • Embedding API costs for indexing
  • Reranking costs if enabled
  • Token costs from larger context

Troubleshooting

”Agent doesn’t find relevant content”

  • Decrease similarity threshold
  • Increase Top K
  • Check if content is actually indexed
  • Verify the query matches document language/terminology

”Retrieved context seems irrelevant”

  • Increase similarity threshold
  • Enable reranking
  • Review chunk boundaries
  • Consider different embedding model

”Important context is split across chunks”

  • Increase chunk size
  • Increase overlap
  • Try semantic chunking

”Too much irrelevant content in responses”

  • Decrease Top K
  • Increase similarity threshold
  • Enable reranking
  • More specific instructions in agent prompt

Next Steps

Test with agents

See how settings affect real queries in the Playground

Advanced RAG

Learn about multi-query, hierarchical retrieval, and more