Skip to main content
This page tells you how much CPU and memory to give each Prisme.ai service, when to scale, and how to debug resource problems. Database resources are covered in Databases.

Two reference variants

We ship two sets of resource defaults. Treat them as the recommended lower and upper bounds for production — your environment should land somewhere between the two, tuned to your actual load and use case.
VariantUse when
Balanced (chart default)Up to a few hundred concurrent users, budget-sensitive. Each pod is sized for typical load, with HPA absorbing spikes.
High-throughputThousands of concurrent users, max throughput per pod. Requests are sized at the maximum each service can actually use.
Start at Balanced. Balanced is now the chart default and the right starting point for a fresh install — high enough to run a real workload comfortably, low enough to avoid over-paying before you know your traffic shape. Move toward High-throughput (or somewhere in between) once you’ve observed your actual usage.

Trade-off in plain English

  • High requests → fewer pods per node → idle CPU/RAM is wasted when pods aren’t busy.
  • Low requests → more pods per node → risk of a node being unable to provide CPU/RAM under load spikes.
Pick Balanced if your traffic is mostly steady and budget matters. Pick High-throughput if you need each pod to absorb large bursts on its own. Most production deployments end up at an intermediate point — there is no reason to copy either column verbatim once you have real data.

Right-sizing from observed usage

Once the platform is live, monitor per-pod CPU and memory and resize from data, not from these tables:
  • CPU request → set to the P95 of the pod’s normal usage. CPU is compressible — the occasional P99 spike is absorbed by the node without breaking the workload, so paying for it as a permanent request wastes capacity.
  • Memory request → set to the P99 of the pod’s normal usage. Memory is not compressible — exceeding the request risks eviction or OOMKill, so the request must comfortably cover the worst legitimate spike.
  • Memory limit → keep close to the request (the chart defaults already do this) so a runaway pod is capped before it OOMs the node.
  • CPU limit → don’t set one (see Why no CPU limit? below).
Iterate: re-measure after a few weeks of steady traffic and after any major usage shift (new product enabled, big workspace migration, traffic doubled).

Cluster size baseline

The requests / limits below add up to the following node count when scheduled on 16 GB RAM, 4 vCPU worker nodes (a common cloud baseline like AWS m6i.xlarge, GCP n2-standard-4, Azure D4s_v5):
VariantStarting nodesNotes
Balanced5 nodes (16 GB / 4 vCPU each)Covers ~8 vCPU / ~14 GB of steady-state requests for core + apps, with HPA headroom up to ~20 vCPU / ~40 GB at max replicas.
High-throughput8 nodes (16 GB / 4 vCPU each)Larger per-pod requests + higher HPA ceilings. Enable the cluster autoscaler so the pool grows past 8 under sustained spikes.
These are starting points — the actual node count depends on your HPA targets, the database pods you co-locate, and any extra workloads. Rely on the cluster autoscaler in both cases.

Core namespace

Variant A — Balanced

ServiceCPU requestMemory requestMemory limitHPA min / max
api-gateway300m200Mi300Mi2 / 4
workspaces300m200Mi400Mi2 / 4
runtime (3 workers)2000m2500Mi2500Mi2 / 4
events500m300Mi500Mi2 / 4
console100m300Mi300Mi2 / 4

Variant B — High-throughput

ServiceCPU requestMemory requestMemory limitHPA min / max
api-gateway2000m600Mi600Mi2 / 5
workspaces1250m500Mi500Mi2 / 5
runtime (4 workers)3800m4000Mi4Gi2 / 5
events1500m600Mi600Mi2 / 5
console1000m300Mi300Mi2 / 5

Apps namespace

Variant A — Balanced

ServiceCPU requestMemory requestMemory limitHPA min / max
functions400m1.5Gi2Gi1 / 6
crawler1500m3Gi3Gi1 / 4
searchengine100m200Mi400Mi1 / 4

Variant B — High-throughput

ServiceCPU requestMemory requestMemory limitHPA min / max
functions1500m1500Mi3Gi2 / 5
crawler3000m5Gi5Gi2 / 5
searchengine1500m500Mi500Mi2 / 5

Why no CPU limit?

You’ll notice the tables above have no CPU limit. Don’t set one on Prisme.ai services.
  • The Linux CFS scheduler already throttles pods that exceed their request when another pod on the same node needs that CPU.
  • A CPU limit on top of that wastes free CPU: when the node has spare cycles and a pod could use them, the limit blocks it for no reason.
  • In practice, CPU limits on Node.js workloads cause unnecessary tail-latency spikes and add zero protection.
Memory limits stay — memory is not compressible, so a runaway pod must be capped before it triggers OOM on the whole node.

Horizontal Pod Autoscaling (HPA)

Golden rule for production: at least 2 replicas of every core service, and of functions and crawler in apps. Single-replica means a single bad pod, restart or node drain takes the service down.
The HPA max values in the variant tables above are starting points, not ceilings. They were chosen to cover the typical load profile of each variant — review them against your own observed traffic and raise them when you see HPA sitting at max replicas during peaks or latency degrading. The right max replicas count is the one that comfortably absorbs your highest expected spike with headroom; treat the defaults as a starting baseline to tune.

Example HPA config

prismeai-runtime:
  hpa:
    enabled: true
    minReplicas: 2
    maxReplicas: 5
    targetCPUUtilizationPercentage: 70
Service groupTarget CPUNotes
api-gateway, workspaces, runtime, console70%Standard CPU-bound services
events70% CPUPin clients via a consistent-hash header (or Istio DestinationRule on user-agent) — events benefit from sticky sessions
functions70% CPUMemory is the real constraint; over-provision memory before scaling
crawler80% CPU + 80% memoryHeavy memory pressure during indexing
searchengine70% CPU

Scaling prismeai-runtime workers

prismeai-runtime is the only platform service that runs multiple workers inside the same pod. The pod runs a main thread that receives HTTP calls from api-gateway and events from the broker and dispatches them to N worker threads which actually execute the automations. Adding workers raises the per-pod parallelism without spinning up new pods. Configured via the Helm value prismeai-runtime.maxWorkers (which populates the RUNNER_MAX_THREADS env var inside the container). Default: 1.
The runtime resource numbers in the variant tables above are sized for a specific worker count: 3 workers for Balanced, 4 workers for High-throughput. If you change maxWorkers, re-derive the pod resources with the math below — keeping the defaults while bumping workers will starve each one.

Golden rule: scale workers and resources together

A worker is a Node.js thread — it competes for the pod’s CPU and memory budget alongside the main thread. Doubling maxWorkers without raising resources.requests / resources.limits halves what each worker gets and you’ll see latency degrade or workers OOMKilled. The chart slices 75% of resources.limits.memory across the workers (the other 25% goes to V8 metadata, native modules and the main thread). So sizing the pod is a two-step decision:
  1. Pick how much heap each worker needs based on the workload it will execute. For production workloads, plan 500 MiB to 1 GiB per worker depending on the automations they run (heavier templating, large payloads or heavy in-flight contexts push it toward 1 GiB).
  2. Set limits.memory and maxWorkers together so the chart’s auto-slice gives each worker that heap.
In practice you have two equivalent ways to think about it:
  • Top-down — keep limits.memory fixed and lower maxWorkers: each worker’s heap rises proportionally.
  • Bottom-up — target a per-worker heap and raise limits.memory to fit maxWorkers of them, plus the 25% main-thread overhead.
Example: aiming for 4 workers at ~750 MiB heap each → limits.memory ≈ 4 GiB, maxWorkers: 4 (this is the High-throughput default). For CPU, budget at least one core per worker plus a small share for the main thread (e.g. requests.cpu: "4.5" for 4 workers).

Node.js workers heap is auto-sized

Each worker thread has its own heap memory hard limit, configured via Node.js’s --max-old-space-size option. You don’t need to set NODE_OPTIONS manually — the chart auto-derives each worker’s heap from resources.limits.memory and maxWorkers:
maxOldSpaceSize = floor( memory_limit_in_Mi / maxWorkers × 0.75 )

Resource Quotas and Limit Ranges

Apply quotas at the namespace level to prevent runaway consumption.
apiVersion: v1
kind: ResourceQuota
metadata:
  name: prismeai-core-quota
  namespace: prismeai-core
spec:
  hard:
    requests.cpu: "16"
    requests.memory: "32Gi"
    limits.memory: "64Gi"
    pods: "50"
apiVersion: v1
kind: LimitRange
metadata:
  name: prismeai-core-limits
  namespace: prismeai-core
spec:
  limits:
    - type: Container
      defaultRequest:
        cpu: "250m"
        memory: "512Mi"
      default:
        memory: "1Gi"

Troubleshooting

kubectl describe pod <pod> -n <namespace> | tail -30
kubectl describe nodes | grep -A 5 "Allocated resources"
  • “0/N nodes available: Insufficient cpu/memory” → reduce requests, increase node size, or scale the node pool.
  • “node(s) didn’t match Pod’s node affinity/selector” → check tolerations and nodeSelector against your node pool labels.
kubectl get hpa -n <namespace>
kubectl describe hpa/<hpa> -n <namespace>
  • Sustained 100% CPU on max replicas → raise maxReplicas, or move from Balanced to High-throughput variant for the affected service.
  • Memory-bound service (e.g. functions) → raise memory request rather than replica count.
kubectl logs <pod> -n <namespace> --previous
kubectl describe pod <pod> -n <namespace> | grep -i oom
  • For runtime, check BROKER_EMIT_MAXLEN — very large events can spike memory.
  • For functions, check NODE_OPTIONS=--max-old-space-size=... matches the container memory limit.
  • Raise the memory request and limit together.
kubectl top pods -n <namespace>
  • High CPU usage approaching the request, with growing latency → raise the CPU request (not a limit).
  • Verify no CPU limit is set on Prisme.ai services.
kubectl top nodes
kubectl describe nodes/<node>
  • One node carrying disproportionate load → check pod anti-affinity on critical services, or rebalance with a rolling restart.
  • Persistent saturation → scale the node pool or move to larger instances.
kubectl get events -n <namespace> --sort-by=.lastTimestamp | tail -30
  • Failing readiness probe right after rollout → check service dependencies (DB reachable, secrets mounted).
  • Crash loop → kubectl logs <pod> -n <namespace> --previous.
  • If the regression is in a new chart version, roll back: helm rollback <release> <revision> -n <namespace> — see Helm install — Upgrade, rollback, uninstall.
Symptom in kubectl get events:
FailedScheduling   0/3 nodes are available: 1 Insufficient memory, 2 node(s) had volume node affinity conflict.
A pod with a ReadWriteOnce PVC can only be scheduled on the node where the volume lives. If that node is full and the others can’t mount the volume, scheduling fails.Fixes:
  • Free CPU/memory on the node that hosts the volume (delete idle pods, scale a noisy neighbor down).
  • Move to a ReadWriteMany storage class (EFS / Azure Files / Filestore / CephFS) for workloads that can be re-scheduled freely — see Requirements.
  • Add a node to the pool.
Alerts: KubePersistentVolumeFillingUp, KubePersistentVolumeFull.
kubectl get pvc -A
kubectl describe pvc <pvc> -n <namespace>
  • For Elasticsearch or OpenSearch, the most common cause is unbounded event growth — enable the prismeai-events cleanup job and tune retention. See Elasticsearch — Events automated cleanup.
  • Expand the PVC if the storage class supports it (kubectl edit pvc <pvc> → bump spec.resources.requests.storage).
  • Otherwise, snapshot + restore into a larger volume.

Next Steps

Helm install

Configure values and deploy core + apps namespaces.

Databases

PostgreSQL or MongoDB, Redis, Elasticsearch or OpenSearch.

Install products

Configure your Prisme.ai AI products.

Operations

Scaling, updates and backups.