Monitoring Prisme.ai with Prometheus & Grafana

To operate Prisme.ai efficiently in production, it’s essential to monitor service health, resource usage, and error rates. This guide explains how to install and configure Prometheus and Grafana using Operators in a Kubernetes environment.

Why Use Operators?

Using Kubernetes Operators simplifies lifecycle management of complex systems like Prometheus and Grafana:

Automated installation and upgrades
Simplified configuration
Native CRDs for monitoring targets, dashboards, alerts

Step-by-Step Installation

Install Prometheus Operator

You can install the Prometheus Operator via Helm:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace

Expose Grafana Dashboard

Expose Grafana using an Ingress or port-forward:

kubectl port-forward svc/kube-prometheus-grafana 3000:80 -n monitoring

Then access it at http://localhost:3000Default credentials:

Username: admin
Password: admin (or see adminPassword in the values file)

Configure Prometheus Scrape Targets

Prisme.ai services expose Prometheus-compatible metrics endpoints (e.g. /metrics). To scrape them, define a ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: prisme-api
  labels:
    release: kube-prometheus
spec:
  selector:
    matchLabels:
      app: api-gateway
  namespaceSelector:
    matchNames:
    - prisme-ai
  endpoints:
  - port: http
    path: /metrics
    interval: 30s

Import Dashboards

Grafana supports importing dashboards via the UI or ConfigMaps.Use community dashboards for:

Kubernetes cluster monitoring
Pod resource usage
API Gateway latency & error rates
Redis, MongoDB, and Elasticsearch health

Alerts and Notifications

Set up alert rules and connect them to notification channels:

Basic Alert Rule Example

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: high-cpu
  labels:
    release: kube-prometheus
spec:
  groups:
  - name: prisme-rules
    rules:
    - alert: HighCpuUsage
      expr: sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) > 0.9
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High CPU usage detected"
        description: "Pod {{ $labels.pod }} is using over 90% CPU."

Integrate with Slack, Teams or Email

You can use the Alertmanager bundled with the Prometheus Operator.Configure alertmanager.yaml to define receivers:

receivers:
- name: 'slack-notifications'
  slack_configs:
  - channel: '#alerts'
    send_resolved: true
    username: 'alertmanager'
    api_url: 'https://hooks.slack.com/services/...'

Best Practices

Namespace Separation

Run monitoring stack in a dedicated namespace (monitoring)
Use RBAC to isolate metrics access

Retention & Storage

Configure Prometheus retention (--storage.tsdb.retention.time=15d)
Mount persistent volumes for metric storage

Service Discovery

Use ServiceMonitor and PodMonitor for automatic discovery
Label all Prisme.ai services consistently (e.g., app: api-gateway)

Grafana Security

Change default admin password
Enable SSO integration (e.g., OAuth, LDAP) if required

Next Steps

Products Configuration

Configure your Prisme.ai AI products

Operations Management

Learn about scaling operations efficiently

Backup and Maintenance

Learn about backup strategies

Overview

Cloud Providers

Docker & Kubernetes Deployment

Entreprise Services

AI Products

Operations

Monitoring Prisme.ai with Prometheus & Grafana

Why Use Operators?

Step-by-Step Installation

Alerts and Notifications

Best Practices

Namespace Separation

Retention & Storage

Service Discovery

Grafana Security

Next Steps

Products Configuration

Operations Management

Backup and Maintenance

Overview

Cloud Providers

Docker & Kubernetes Deployment

Entreprise Services

AI Products

Operations

​Why Use Operators?

​Step-by-Step Installation

​Alerts and Notifications

​Best Practices

Namespace Separation

Retention & Storage

Service Discovery

Grafana Security

​Next Steps

Products Configuration

Operations Management

Backup and Maintenance

Why Use Operators?

Step-by-Step Installation

Alerts and Notifications

Best Practices

Next Steps