Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.prisme.ai/llms.txt

Use this file to discover all available pages before exploring further.

Documents are the foundation of your knowledge bases. This guide covers how to add, organize, and maintain your document collections.

Uploading Files

Drag and Drop

The simplest way to add files:
  1. Open a knowledge base
  2. Drag files from your computer onto the page
  3. Drop them in the upload zone
  4. Wait for processing

File Picker

Alternatively:
  1. Click Upload Files
  2. Select files from your computer
  3. Click Open

Bulk Upload

For many files:
  • Drag a folder (supported in Chrome and Edge)
  • Select multiple files in the picker
  • Files are processed in parallel

Supported File Types

CategoryFormatsNotes
DocumentsPDF, DOCX, DOC, TXT, RTFText is extracted automatically
PresentationsPPTX, PPTSlide content and notes
SpreadsheetsXLSX, XLS, CSVCell values and headers
WebHTML, MarkdownRendered content
CodeMost languagesSyntax-aware chunking
Maximum file size depends on your organization’s configuration. Typical limits are 50-100MB per file.

Images and Scans

For documents that are scanned images or contain images with text:
  • OCR is applied automatically to extract text
  • Quality depends on image clarity
  • Consider re-scanning poor quality documents

Adding Web Content

Single URLs

To add an individual web page:
  1. Click Add URL
  2. Enter the full URL (including https://)
  3. Click Add
The page is fetched immediately and its content indexed.

Web Crawling

For multiple pages from a website:
  1. Click Add Web Source
  2. Enter the starting URL
  3. Configure basic settings such as path filters, blacklisted patterns, sitemap mode, and XPath filtering
  4. Click Start Crawling
The crawler discovers pages by following links and indexes their content. For the detailed workflow and current crawl settings, see Crawl a Website.

Advanced Crawler Settings

For more control, expand Hostname settings to configure:
  • Path filters - Only crawl specific sections (e.g., /docs/)
  • Blacklisted patterns - Skip low-value or duplicate URL patterns
  • Robots.txt - Keep the site’s crawling rules enabled by default
  • Sitemap mode - Only follow links from the sitemap
  • XPath filter - Extract only the useful page content
  • HTTP headers - Send custom headers to controlled internal sites
These settings can be configured per hostname if your source spans multiple domains.

Crawl Status

While crawling, a status banner shows:
  • Pages discovered, indexed, and skipped
  • Any errors encountered
  • Estimated completion
You can pause a crawl in progress and resume later.

Automatic Recrawling

Keep content fresh with scheduled recrawls:
  1. Open the web source settings
  2. Set Recrawl Schedule:
    • Manual only
    • Every 12 hours
    • Daily
    • Weekly
    • Monthly
  3. Save
The crawler checks for new and updated pages on schedule. Unchanged pages are skipped to save processing time.

Document Processing

When you add a document, several things happen:

1. Text Extraction

Content is extracted from the file format. This includes:
  • Body text
  • Headers and titles
  • Table content
  • Image captions (if available)
  • Metadata (author, date, etc.)

2. Chunking

Text is split into smaller pieces called chunks. This is necessary because:
  • Search works better with focused passages
  • AI models have context limits
  • Relevant information can be isolated
Default settings:
  • Chunk size: 512 tokens
  • Overlap: 50 tokens (consecutive chunks share context)

3. Embedding

Each chunk is converted to a vector (a list of numbers) using the embedding model. This enables semantic search - finding content by meaning, not just keywords.

4. Indexing

Chunks and their embeddings are stored in a vector database, ready for search.

Filtering Documents

Use the source filter to narrow the document list:
FilterShows
AllEverything in the knowledge base
FilesUploaded documents only
WebCrawled web pages only
Combine with search to quickly find specific documents.

Document Status

Each document shows a status that updates in real-time:
StatusMeaning
QueuedWaiting to be processed
ProcessingCurrently being extracted and indexed
ReadySuccessfully processed and searchable
ErrorSomething went wrong during processing
Status changes appear automatically - no need to refresh the page. After uploading, watch as documents move from queued to processing to ready. Click an error status to see details. Common issues:
  • Unsupported format - File type not recognized
  • Password protected - Document is encrypted
  • Extraction failed - Content couldn’t be read
  • Too large - File exceeds size limit

Viewing Document Details

Click any document to see:
  • File information - Name, type, size, dates
  • Processing details - Chunk count, tokens, parser used
  • Chunks viewer - See exactly how the document was split

The Chunks Viewer

Understanding how documents are chunked helps debug retrieval issues:
  1. Click View Chunks on any document
  2. Browse through chunks (paginated for large documents)
  3. Expand any chunk to see its full text
  4. Search within the chunks to find specific content
  5. Copy chunk text for testing or debugging
Each chunk shows:
  • Text content (expandable)
  • Page number (for PDFs)
  • Token count
  • Position in document
If important information spans multiple chunks poorly, consider adjusting chunk size or using a different chunking strategy in RAG Settings.

Document Tags

Organize documents with tags:
  1. Select a document
  2. Click Edit Tags
  3. Add or remove tags
  4. Save
Tags help with:
  • Filtering the document list
  • Finding specific content types
  • Organizing large collections

Updating Documents

To replace a document with a new version:
  1. Delete the old document
  2. Upload the new version
Or:
  1. Click Reindex on the document
  2. This re-processes the existing file
For frequently updated content, consider using connectors that sync automatically rather than manual uploads.

Deleting Documents

To remove a document:
  1. Find it in the documents list
  2. Click the delete icon (trash)
  3. Confirm deletion
The document and all its chunks are removed. This affects search results immediately.

Bulk Deletion

To delete multiple documents:
  1. Use filters to narrow the list
  2. Select documents using checkboxes
  3. Click Delete Selected
  4. Confirm

Reindexing

When you change RAG settings (chunk size, embedding model, etc.), existing documents keep their old chunks. To apply new settings:

Single Document

Click Reindex on any document to reprocess it with current settings.

All Documents

To reindex the entire knowledge base:
  1. Go to Settings
  2. Scroll to Danger Zone
  3. Click Reindex All Documents
  4. Confirm
Reindexing large knowledge bases takes time and consumes processing resources. Documents remain searchable during reindexing, but results may be inconsistent until complete.

Best Practices

Well-formatted documents with clear headings produce better chunks and retrieval. Clean up messy documents before uploading.
After adding documents, test search in the Playground. Verify that relevant content is retrieved for typical questions.
Duplicate content hurts retrieval quality. If the same information appears in multiple documents, keep the most authoritative version.
Many focused documents are better than few giant documents. Split large documents by topic if they cover multiple subjects.
Filenames become part of the metadata and can help with retrieval. Use descriptive names, not “Document1.pdf”.

Next Steps

Connect external sources

Set up automatic syncing with SharePoint, Google Drive, and more

Configure RAG settings

Fine-tune chunking and retrieval for better results