This sizing depends on your specific use case, whether it’s full automation or chat mode. We strongly recommend conducting your own load testing tailored to your specific infrastructure and use cases.
Target Performance Metrics
User Interactions
4-10 per userAverage number of interactions each user makes with the platform
First Token Response
478ms (P95)Time to first token from LLM API (using OpenAI as reference)
Concurrent Users
100 new users/secondPlatform should handle 100 new users each second under peak load
Infrastructure Components
Kubernetes Cluster
Node Configuration: 5 nodes with 8GB RAM and 4 vCPU eachStorage Systems
Capacity: 50GB Elastic File System for shared storageConfiguration: Can be shared between different environments or isolated for each
Databases
Data Types: RBAC permissions, users, application dataConfiguration:
- 3 nodes in replica set
- 2GB RAM and 2 vCPU per node
- 1,000 IOPS
- 1 “permissions” database per environment
- 1 “users” database per environment
- 1 “collections” database per environment
The cluster can be shared across environments with proper database separation.
Scaling Considerations
Horizontal vs. Vertical Scaling
Horizontal vs. Vertical Scaling
When planning your infrastructure, consider these scaling approaches:
- Horizontal Scaling: Add more nodes to distribute load. Recommended for Kubernetes nodes and database replicas.
- Vertical Scaling: Increase resources (RAM, CPU) on existing nodes. Useful for temporary peaks or when hitting connection limits.
Resource Allocation Strategy
Resource Allocation Strategy
When allocating resources across components:
- Start with our baseline recommendations
- Monitor resource utilization during testing
- Identify bottlenecks (usually memory or disk I/O)
- Scale the constrained resources first before adding more nodes
Environment Isolation
Environment Isolation
For multi-environment deployments:
- Development: Can share infrastructure with minimal isolation
- Staging: Should mimic production but can use smaller resources
- Production: Requires dedicated resources and stricter isolation
Monitoring Recommendations
We recommend monitoring the following metrics to ensure optimal performance:System Metrics
- CPU utilization
- Memory usage
- Disk I/O and latency
- Network throughput
Application Metrics
- Request latency
- Error rates
- Concurrent users
- Queue lengths
Database Metrics
- Query performance
- Connection pool usage
- Index efficiency
- Replication lag
These sizing recommendations provide a starting point, but real-world performance may vary. Always conduct load testing with scenarios that reflect your actual usage patterns.