Skip to main content
To operate Prisme.ai efficiently in production, it’s essential to monitor service health, resource usage, and error rates. This guide explains how to install and configure Prometheus and Grafana using Operators in a Kubernetes environment.

Why Use Operators?

Using Kubernetes Operators simplifies lifecycle management of complex systems like Prometheus and Grafana:
  • Automated installation and upgrades
  • Simplified configuration
  • Native CRDs for monitoring targets, dashboards, alerts

Step-by-Step Installation

1

Install Prometheus Operator

You can install the Prometheus Operator via Helm:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace
2

Expose Grafana Dashboard

Expose Grafana using an Ingress or port-forward:
kubectl port-forward svc/kube-prometheus-grafana 3000:80 -n monitoring
Then access it at http://localhost:3000Default credentials:
  • Username: admin
  • Password: admin (or see adminPassword in the values file)
3

Configure Prometheus Scrape Targets

Prisme.ai services expose Prometheus-compatible metrics endpoints (e.g. /metrics). To scrape them, define a ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: prisme-api
  labels:
    release: kube-prometheus
spec:
  selector:
    matchLabels:
      app: api-gateway
  namespaceSelector:
    matchNames:
    - prisme-ai
  endpoints:
  - port: http
    path: /metrics
    interval: 30s
4

Import Dashboards

Grafana supports importing dashboards via the UI or ConfigMaps.Use community dashboards for:
  • Kubernetes cluster monitoring
  • Pod resource usage
  • API Gateway latency & error rates
  • Redis, MongoDB, and Elasticsearch health

Alerts and Notifications

Set up alert rules and connect them to notification channels:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: high-cpu
  labels:
    release: kube-prometheus
spec:
  groups:
  - name: prisme-rules
    rules:
    - alert: HighCpuUsage
      expr: sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) > 0.9
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High CPU usage detected"
        description: "Pod {{ $labels.pod }} is using over 90% CPU."
You can use the Alertmanager bundled with the Prometheus Operator.Configure alertmanager.yaml to define receivers:
receivers:
- name: 'slack-notifications'
  slack_configs:
  - channel: '#alerts'
    send_resolved: true
    username: 'alertmanager'
    api_url: 'https://hooks.slack.com/services/...'

Best Practices

Namespace Separation

  • Run monitoring stack in a dedicated namespace (monitoring)
  • Use RBAC to isolate metrics access

Retention & Storage

  • Configure Prometheus retention (--storage.tsdb.retention.time=15d)
  • Mount persistent volumes for metric storage

Service Discovery

  • Use ServiceMonitor and PodMonitor for automatic discovery
  • Label all Prisme.ai services consistently (e.g., app: api-gateway)

Grafana Security

  • Change default admin password
  • Enable SSO integration (e.g., OAuth, LDAP) if required

Next Steps