Resource recommendations for self-hosted deployments
This sizing depends on your specific use case, whether it’s full automation or chat mode. We strongly recommend conducting your own load testing tailored to your specific infrastructure and use cases.
When planning your infrastructure, consider these scaling approaches:
Horizontal Scaling: Add more nodes to distribute load. Recommended for Kubernetes nodes and database replicas.
Vertical Scaling: Increase resources (RAM, CPU) on existing nodes. Useful for temporary peaks or when hitting connection limits.
For most production deployments, we recommend a combination of both approaches with an emphasis on horizontal scaling for better resilience.
Resource Allocation Strategy
When allocating resources across components:
Start with our baseline recommendations
Monitor resource utilization during testing
Identify bottlenecks (usually memory or disk I/O)
Scale the constrained resources first before adding more nodes
Vector databases and Elasticsearch typically benefit most from additional memory, while Redis is often constrained by CPU.
Environment Isolation
For multi-environment deployments:
Development: Can share infrastructure with minimal isolation
Staging: Should mimic production but can use smaller resources
Production: Requires dedicated resources and stricter isolation
We recommend full cluster-level isolation for production, while development and staging can share some database clusters with proper database-level separation.
We recommend monitoring the following metrics to ensure optimal performance:
System Metrics
CPU utilization
Memory usage
Disk I/O and latency
Network throughput
Application Metrics
Request latency
Error rates
Concurrent users
Queue lengths
Database Metrics
Query performance
Connection pool usage
Index efficiency
Replication lag
These sizing recommendations provide a starting point, but real-world performance may vary. Always conduct load testing with scenarios that reflect your actual usage patterns.