Deploy and configure the Prisme.ai web crawling and search engine capabilities for knowledge base creation and content discovery
Common Environment Variables
Variable Name | Description | Default Value | Affected Services |
---|---|---|---|
REDIS_URL | Redis connection URL for communication between services | redis://localhost:6379 | Both |
ELASTIC_SEARCH_URL | ElasticSearch connection URL for document storage | localhost | Both |
Crawler-Specific Environment Variables
Variable Name | Description | Default Value | Affected Services |
---|---|---|---|
MAX_CONTENT_LEN | Maximum length (in characters) of documents crawled | 150000 | prismeai-crawler |
CONCURRENT_REQUESTS | The maximum number of concurrent (i.e. simultaneous) requests that will be performed by the Scrapy downloader | 16 | prismeai-crawler |
CONCURRENT_REQUESTS_PER_DOMAIN | The maximum number of concurrent (i.e. simultaneous) requests that will be performed to any single domain. | 16 | prismeai-crawler |
DOWNLOAD_DELAY | Minimum seconds to wait between 2 consecutive requests to the same domain. | 0 | prismeai-crawler |
REQUEST_QUEUES_POLLING_INTERVAL | Interval in seconds between each time we pull new requests from the queue | 5 | prismeai-crawler |
REQUEST_QUEUES_POLLING_SIZE | Number of requests to start from the queue in a single poll | 1 | prismeai-crawler |
USER_AGENT | Crawler HTTP user agent | Prisme.ai (https://prisme.ai) | prismeai-crawler |
Configure Dependencies
Deploy Microservices
values.yaml
configuration:Verify Deployment
Running
status and be ready (e.g., 1/1
).Configure Network Access
Create a Test SearchEngine
id
field.Check Crawl Progress
metrics.indexed_pages
field is greater than 0metrics.pending_requests
field indicates active crawlingcrawl_history
section shows pages that have been processedTest Search Functionality
results
array containing pages from the crawled website that match your search term. Each result should include relevance information and content snippets.Web Crawling
Search Capabilities
Crawl Configuration Options
ElasticSearch Index Management
Performance Tuning
MAX_CONTENT_LEN
to balance comprehensiveness with resource usageCrawling Issues
Search Problems
MAX_CONTENT_LEN
settingPerformance Issues