Example of Platform Sizing
Resource recommendations for self-hosted deployments
This sizing depends on your specific use case, whether it’s full automation or chat mode. We strongly recommend conducting your own load testing tailored to your specific infrastructure and use cases.
Target Performance Metrics
User Interactions
4-10 per user
Average number of interactions each user makes with the platform
First Token Response
478ms (P95)
Time to first token from LLM API (using OpenAI as reference)
Concurrent Users
100 new users/second
Platform should handle 100 new users each second under peak load
Infrastructure Components
Kubernetes Cluster
Node Configuration: 5 nodes with 8GB RAM and 4 vCPU each
Storage Systems
Capacity: 50GB Elastic File System for shared storage
Configuration: Can be shared between different environments or isolated for each
Capacity: 50GB Elastic File System for shared storage
Configuration: Can be shared between different environments or isolated for each
Bucket Structure: Environment-based separation
- 1 “models” bucket per environment
- 1 “uploads” bucket per environment
- 1 “uploads-public” bucket per environment (behind CDN)
Databases
Data Types: RBAC permissions, users, application data
Configuration:
- 3 nodes in replica set
- 2GB RAM and 2 vCPU per node
- 1,000 IOPS
Disk Space: 10GB total storage requirement
Environment Separation:
- 1 “permissions” database per environment
- 1 “users” database per environment
- 1 “collections” database per environment
Version: MongoDB version 6 with path to version 7
The cluster can be shared across environments with proper database separation.
Data Types: RBAC permissions, users, application data
Configuration:
- 3 nodes in replica set
- 2GB RAM and 2 vCPU per node
- 1,000 IOPS
Disk Space: 10GB total storage requirement
Environment Separation:
- 1 “permissions” database per environment
- 1 “users” database per environment
- 1 “collections” database per environment
Version: MongoDB version 6 with path to version 7
The cluster can be shared across environments with proper database separation.
Data Types: Real-time EDA streams, permission cache, OIDC sessions, rate limits, application cache
Configuration:
- 1 master and 2 replicas
- 3GB RAM and 2 vCPU per node
Environment Separation: 1 cluster per environment recommended
Version: Redis version 5 or higher
Data Types: Crawl queue, metadata of known documents, search engine configurations
Configuration:
- 1 master and 2 replicas
- 2GB RAM and 2 vCPU per node
Environment Separation: 1 database per environment
Version: Redis version 5 or higher
Scaling Example: 100,000 documents → 600MB RAM
The cluster can be shared across environments with proper database separation.
Data Types: Persisted EDA events for traceability and statistics calculations, text content of crawled documents
Configuration:
- 3 nodes
- 8GB RAM and 4 vCPU per node
Disk Space: 400GB per node (NVMe or SSD storage recommended)
Version: Elasticsearch/OpenSearch version 8 or higher
Implementation: Redis Stack (Redis with vector search capabilities)
Data Types: Text chunks accompanied by their vector embeddings
Configuration:
- 1 master and 2 replicas
- 5GB RAM and 2 vCPU per node
Environment Separation: 1 cluster per environment recommended
Version: Redis version 5 or higher with SEARCH and JSON modules
Scaling Example: 100,000 chunks from 20,000 documents requires approximately 2.6GB RAM
Scaling Considerations
Monitoring Recommendations
We recommend monitoring the following metrics to ensure optimal performance:
System Metrics
- CPU utilization
- Memory usage
- Disk I/O and latency
- Network throughput
Application Metrics
- Request latency
- Error rates
- Concurrent users
- Queue lengths
Database Metrics
- Query performance
- Connection pool usage
- Index efficiency
- Replication lag
These sizing recommendations provide a starting point, but real-world performance may vary. Always conduct load testing with scenarios that reflect your actual usage patterns.
Was this page helpful?