Documents are the foundation of your knowledge bases. This guide covers how to add, organize, and maintain your document collections.Documentation Index
Fetch the complete documentation index at: https://docs.prisme.ai/llms.txt
Use this file to discover all available pages before exploring further.
Uploading Files
Drag and Drop
The simplest way to add files:- Open a knowledge base
- Drag files from your computer onto the page
- Drop them in the upload zone
- Wait for processing
File Picker
Alternatively:- Click Upload Files
- Select files from your computer
- Click Open
Bulk Upload
For many files:- Drag a folder (supported in Chrome and Edge)
- Select multiple files in the picker
- Files are processed in parallel
Supported File Types
| Category | Formats | Notes |
|---|---|---|
| Documents | PDF, DOCX, DOC, TXT, RTF | Text is extracted automatically |
| Presentations | PPTX, PPT | Slide content and notes |
| Spreadsheets | XLSX, XLS, CSV | Cell values and headers |
| Web | HTML, Markdown | Rendered content |
| Code | Most languages | Syntax-aware chunking |
Maximum file size depends on your organization’s configuration. Typical limits are 50-100MB per file.
Images and Scans
For documents that are scanned images or contain images with text:- OCR is applied automatically to extract text
- Quality depends on image clarity
- Consider re-scanning poor quality documents
Adding Web Content
Single URLs
To add an individual web page:- Click Add URL
- Enter the full URL (including https://)
- Click Add
Web Crawling
For multiple pages from a website:- Click Add Web Source
- Enter the starting URL
- Configure basic settings such as path filters, blacklisted patterns, sitemap mode, and XPath filtering
- Click Start Crawling
Advanced Crawler Settings
For more control, expand Hostname settings to configure:- Path filters - Only crawl specific sections (e.g.,
/docs/) - Blacklisted patterns - Skip low-value or duplicate URL patterns
- Robots.txt - Keep the site’s crawling rules enabled by default
- Sitemap mode - Only follow links from the sitemap
- XPath filter - Extract only the useful page content
- HTTP headers - Send custom headers to controlled internal sites
Crawl Status
While crawling, a status banner shows:- Pages discovered, indexed, and skipped
- Any errors encountered
- Estimated completion
Automatic Recrawling
Keep content fresh with scheduled recrawls:- Open the web source settings
- Set Recrawl Schedule:
- Manual only
- Every 12 hours
- Daily
- Weekly
- Monthly
- Save
Document Processing
When you add a document, several things happen:1. Text Extraction
Content is extracted from the file format. This includes:- Body text
- Headers and titles
- Table content
- Image captions (if available)
- Metadata (author, date, etc.)
2. Chunking
Text is split into smaller pieces called chunks. This is necessary because:- Search works better with focused passages
- AI models have context limits
- Relevant information can be isolated
- Chunk size: 512 tokens
- Overlap: 50 tokens (consecutive chunks share context)
3. Embedding
Each chunk is converted to a vector (a list of numbers) using the embedding model. This enables semantic search - finding content by meaning, not just keywords.4. Indexing
Chunks and their embeddings are stored in a vector database, ready for search.Filtering Documents
Use the source filter to narrow the document list:| Filter | Shows |
|---|---|
| All | Everything in the knowledge base |
| Files | Uploaded documents only |
| Web | Crawled web pages only |
Document Status
Each document shows a status that updates in real-time:| Status | Meaning |
|---|---|
| Queued | Waiting to be processed |
| Processing | Currently being extracted and indexed |
| Ready | Successfully processed and searchable |
| Error | Something went wrong during processing |
- Unsupported format - File type not recognized
- Password protected - Document is encrypted
- Extraction failed - Content couldn’t be read
- Too large - File exceeds size limit
Viewing Document Details
Click any document to see:- File information - Name, type, size, dates
- Processing details - Chunk count, tokens, parser used
- Chunks viewer - See exactly how the document was split
The Chunks Viewer
Understanding how documents are chunked helps debug retrieval issues:- Click View Chunks on any document
- Browse through chunks (paginated for large documents)
- Expand any chunk to see its full text
- Search within the chunks to find specific content
- Copy chunk text for testing or debugging
- Text content (expandable)
- Page number (for PDFs)
- Token count
- Position in document
Document Tags
Organize documents with tags:- Select a document
- Click Edit Tags
- Add or remove tags
- Save
- Filtering the document list
- Finding specific content types
- Organizing large collections
Updating Documents
To replace a document with a new version:- Delete the old document
- Upload the new version
- Click Reindex on the document
- This re-processes the existing file
Deleting Documents
To remove a document:- Find it in the documents list
- Click the delete icon (trash)
- Confirm deletion
Bulk Deletion
To delete multiple documents:- Use filters to narrow the list
- Select documents using checkboxes
- Click Delete Selected
- Confirm
Reindexing
When you change RAG settings (chunk size, embedding model, etc.), existing documents keep their old chunks. To apply new settings:Single Document
Click Reindex on any document to reprocess it with current settings.All Documents
To reindex the entire knowledge base:- Go to Settings
- Scroll to Danger Zone
- Click Reindex All Documents
- Confirm
Best Practices
Use clean source documents
Use clean source documents
Well-formatted documents with clear headings produce better chunks and retrieval. Clean up messy documents before uploading.
Test with representative queries
Test with representative queries
After adding documents, test search in the Playground. Verify that relevant content is retrieved for typical questions.
Remove duplicates
Remove duplicates
Duplicate content hurts retrieval quality. If the same information appears in multiple documents, keep the most authoritative version.
Keep documents focused
Keep documents focused
Many focused documents are better than few giant documents. Split large documents by topic if they cover multiple subjects.
Use meaningful filenames
Use meaningful filenames
Filenames become part of the metadata and can help with retrieval. Use descriptive names, not “Document1.pdf”.
Next Steps
Connect external sources
Set up automatic syncing with SharePoint, Google Drive, and more
Configure RAG settings
Fine-tune chunking and retrieval for better results