Configuring AI Knowledge

AI Knowledge is Prisme.ai’s product for agentic assistants powered by tools and retrieval-augmented generation (RAG). It enables teams to build agents that leverage internal knowledge across various formats, interact with APIs via tools, and collaborate with other agents through context sharing — enabling true multi-agent workflows with robust LLM support and enterprise-grade configuration options. This guide explains how to configure AI Knowledge in a self-hosted environment.

Core Capabilities

Configure multi-model support with failover and fine-tuned prompts
Automate agent provisioning via AI Builder
Enforce limits, security, and monitoring
Enable builtin tools like summarization, search, code interpreter, web browsing
Integrate with OpenSearch, Redis, or other vector stores

LLM Providers

OpenAI

Configure the llm.openai.openai.models field :

llm:
  openai:
    ...
    openai:
      api_key: '{{secret.openaiApiKey}}'
      models:
        - gpt-4
        - gpt-4o
        - o1-preview
        - o1-mini

OpenAI Azure

Configure the llm.openai.azure.resources.*.deployments field.
Multiple resources can be added by appending additional entries to the llm.openai.azure.resources array :

llm:
  openai:
    azure:
      resources:
        - resource: "resource name"
          api_key: '{{secret.azureOpenaiApiKey}}'
          api_version: '2023-05-15'
          deployments:
            - gpt-4
            - embedding-ada

Bedrock

Configure the llm.bedrock.*.models and llm.bedrock.*.region fields.
Multiple regions can be used by appending additional entries to the llm.bedrock array :

llm:
  ...
  bedrock:
    - credentials:
        aws_access_key_id: '{{secret.awsBedrockAccessKey}}'
        aws_secret_access_key: '{{secret.awsBedrockSecretAccessKey}}'
      models:
        - mistral.mistral-large-2402-v1:0
        - amazon.titan-embed-image-v1
      region: eu-west-3
    - credentials:
        aws_access_key_id: '{{secret.awsBedrockAccessKey}}'
        aws_secret_access_key: '{{secret.awsBedrockSecretAccessKey}}'
      models:
        - amazon.titan-embed-text-v1
      region: us-east-1

Vertex

Configure the llm.openai.vertex field :

llm:
  vertex:
    credentials:
      service_account: '{{secret.vertexServiceAccount}}'
    host: us-central1-aiplatform.googleapis.com
    models:
      - projects/my-project-id/locations/us-central1/publishers/google/models/gemini-2.5-flash-preview-05-20

While deploying a model through Vertex the name of the model should represent the full endpoint name as in the above example.The modelAliases feature comes really handy for this provider! In order to provide better readability to your users the above can be transformed into:

llm:
  vertex:
    credentials:
      service_account: '{{secret.vertexServiceAccount}}'
    host: us-central1-aiplatform.googleapis.com
    models:
      - vertex-gemini-2.5-flash-preview-05-20
 ...
 modelAliases:
  vertex-gemini-2.5-flash-preview-05-20: projects/my-project-id/locations/us-central1/publishers/google/models/gemini-2.5-flash-preview-05-20

Note that the service_account credentials should be ommitted if you deployed your platform on GCP and rely on IAM authentication. Also, the service_account value should either be :

JSON
Stringified JSON (handy if you save it within a secret)

To configure third-party models on Vertex (for example, Claude), please specify the format tag.

  vertex:
    - credentials:
        service_account: '{{secret.vertexServiceAccount}}'
      host: us-east5-aiplatform.googleapis.com
      models:
        - gemini-2.0-flash-us-east5
        - imagen-3.0-generate-002-us-east5
        - gemini-embedding-exp-03-07-us-east5
    - credentials:
        service_account: '{{secret.vertexServiceAccount}}'
      host: us-east5-aiplatform.googleapis.com
      format: anthropic
      models:
        - vertex-claude-sonnet-3.5-v2-20241022

OpenAI-Compatible Providers

Configure the llm.openailike field :

llm:
  ...
  openailike:
    - api_key: "{{config.apiKey1}}"
      endpoint: "endpoint 1"
      models:
        - mistal-large
    - api_key: "{{secret.apiKey2}}"
      endpoint: "endpoint 2"
      provider: Mistral
      models:
        - mistral-small
        - mistral-mini
      options:
        excludeParameters:
          - presence_penalty
          - frequency_penalty
          - seed

Optional Parameters:

provider: The provider name used in analytics metrics and dashboards.
options.excludeParameters: Allows exclusion of certain OpenAI generic parameters not supported by the given model.

Gemini integration :

        - api_key: '{{secret.geminiApiKey}}'
          endpoint: https://generativelanguage.googleapis.com/v1beta/openai/
          models:
            - gemini-2.0-flash
            - gemini-2.0-flash-lite
            - gemini-1.5-pro
            - gemini-2.5-pro-preview-03-25
          options:
            excludeParameters:
              - presence_penalty
              - frequency_penalty

Global Configuration

Default models

defaultModels:
  completions: gpt-4o
  embeddings: az-embedding-ada
  enhanceQuery: gpt-3.5-turbo
  toolRouting: gpt-4o

Set base models for completions, embeddings, and query enhancement.

Default agent parameters

Configure the following fields at the root of the configuration :

openai:
  model-embeddings: '{{config.defaultModels.embeddings}}' # The default model to use for embeddings
  model: '{{config.defaultModels.completions}}' # The default model to use for LLM completions
  temperature: 0.1 
  max_tokens: 2000 # Max LLM response size, INCLUDING reasoning
  top_p: 1
  frequency_penalty: 0
  presence_penalty: 0.6
textSplitter:
  chunkSize: 1000
  chunkOverlap: 200
indexInitialized: true
embeddings:
  numberOfSearchResults: 10
  maxContext: 2048

Rate Limits

Rate limits can currently be applied at two stages in messages processing :

When a message is received (requestsPerMinute limits for projects or users).
After RAG stages and before the LLM call (tokensPerMinute limits for projects, users, models, or requestsPerMinute limits for models).

Embedding model rate limits are applied before vectorizing a document, per project or model.This is how to configure token and request limits globally or per user/project:

limits:
  files_count: 20
  llm:
    users:
      tokensPerMinute: 100000
      requestsPerMinute: 20
    projects:
      tokensPerMinute: 30000
      requestsPerMinute: 300
  embeddings:
    projects:
      tokensPerMinute: 1000000

limits.llm.users: Defines per-user message/token limits across all projects.
limits.llm.projects: Defines default message/token limits per project. These limits can be overridden per project via the /admin page in AI Knowledge.
limits.files_count: Specifies the maximum number of documents allowed in AI Knowledge projects. This number can also be overridden per project via the /admin page.

See Models specifications for rate limits per model.

Model Aliases

If you have multiple LLM Providers or regions with the same model names (for example gpt-4), you can use custom names:

llm:
   openai:
      azure:
         resources:
            - resource: "resource name"
              api_key: '{{secret.azureOpenaiApiKey}}'
              api_version: '2023-05-15'
              deployments:
                 - gpt-4-openai
      openai:
         api_key: '{{secret.openaiApiKey}}'
         models:
            - gpt-4-azure

And you can map them to the name expected by the provider with the following:

modelAliases:
   gpt-4-openai: gpt-4
   gpt-4-azure: gpt-4

As a reminder, here is how modelsSpecifications could look like :

modelsSpecifications:
  gpt-4-openai:
    displayName: GPT 4 OpenAi
    maxContext: 8192
    ...
  gpt-4-azure:
    displayName: GPT 4 Azure
    maxContext: 8192
    ...

SSO Access

If you have your own SSO configured, you need to explicitly allow SSO authenticated users to access AI Knowledge pages :

Open AI Knowledge workspace
Open Settings > Advanced
Manage roles
Add your SSO provider technical name after prismeai: {} at the very beginning :

authorizations:
  roles:
    editor: {}
    free:
      auth:
        prismeai: {}
        yourOwnSso: {}

Account Management

By default, sharing an agent with an external email will automatically send an invitation mail to let the external user create an account and access the agent.You can disable this to enforce user control :

disableAccountCreation: true

Only existing users will be able to access shared agents.

Onboarding, Toasts & Statuses

AI Knowledge supports onboarding flows, multilingual statuses, and customizable notifications:

status:
  colors:
    published: '#5CA44A'
    pending: '#FF9261'
    draft: '#E5E5E5'
prompt:
  default: |
    You will only answer based on this context:
    ${context}
toasts:
  i18n:
    fr:
      documentCrawled: a été indexé par votre IA 🤖
    en:
      documentCrawled: was indexed by your AI 🤖

Models Configuration

Configure all available models with descriptions, rate limits, and failover:

modelsSpecifications:
  gpt-4o:
    displayName: GPT-4o
    maxContext: 128000
    maxResponseTokens: 2000
    isHiddenFromEndUser: true
    subtitle:
      fr: Modèle hébergé aux USA.
      en: Model hosted in the USA.
    description:
      fr: Le modèle GPT-4o sur OpenAI. Vous pouvez utiliser des documents C1 et C2.
      en: The GPT-4o model on OpenAI. You can use documents C1 and C2.
    rateLimits:
      requestsPerMinute: 1000
      tokensPerMinute: 100000
    failoverModel: 'gpt-4o'
    region: eu-west
	environmentalMetrics:
	  energyPerToken: 4.35e-7
      pueProfile: efficient
    display:
      brand: Open AI
      name: GPT-4o
      icon: https://staging-uploads.prisme.ai/cAZmE4C/r3G28CGTInbQSkVXXfDw4.open-ai.svg
      ecoScore: high
      trainingDate: '2024-11-20T00:00:00.000Z'
      cost: low
    capabilities:
      text:
        enabled: true
      vision:
        enabled: true
      image:
        enabled: true
      file:
        enabled: true
        maxSize: 20000000
  text-embedding-ada-002:
    type: embeddings
    maxContext: 2048
    batchSize: 96
    subtitle: {}
    description: {}
  mistral.mistral-large-2402-v1:0:
    maxContext: 120000
    additionalRequestBody:
      completions:
        guardrailConfig:
          guardrailIdentifier: "..."
          guardrailVersion: '1'
      embeddings: {}
  some-bedrock-model:
    maxContext: 120000
    promptCache:
      system: true

Customize descriptions

All LLM models (excluding those with type: embeddings) will automatically appear in the AI Store menu unless disabled at the agent level, with the configured titles and descriptions.
displayName specifies the user-facing name of the model, replacing the technical or original model name to ensure a more intuitive and user-friendly experience.
isHiddenFromEndUser specifies that a model in the configuration will be hidden from end users. This feature allows administrators to temporarily disable a model or conceal it from the end-user interface without permanently removing it from the configuration.

Context & response tokens

maxContext specifies the maximum token size of the context that can be passed to the model, applicable to both LLMs (full prompt size) and embedding models (maximum chunk size for vectorization).
maxResponseTokens defines the maximum completion size requested from the LLM, which can be overridden in individual agent settings.

Provider specific parameters

additionalRequestBody.completions and additionalRequestBody.embeddings specify custom parameters which will be sent within all HTTP request bodies for the given model, used to enable AWS Guardrails in above example

Embeddings batch size

By default, documents paragraphs are vectorized in batches of 96.
You can customize this batchSize per model :

modelsSpecifications:
  text-embedding-ada-002:
    type: embeddings
    maxContext: 2048
    batchSize: 50

Or globally :

embeddings:
  batchSize: 50

Rate Limits

When modelsSpecifications.*.rateLimits.requestsPerMinute or modelsSpecifications.*.rateLimits.tokensPerMinute are defined, an error (customizable via toasts.i18n.*.rateLimit) is returned to any user attempting to exceed the configured limits. These limits are shared across all projects/users using the models.If these limits are reached and modelsSpecifications.*.failoverModel is specified, projects with failover.enabled activated (disabled by default) will automatically switch to the failover model.Notes:

tokensPerMinute limits apply to the entire prompt sent to the LLM, including the user question, system prompt, project prompt, and RAG context.
Failover and tokensPerMinute limits also apply to intermediate queries during response construction (e.g., question suggestions, self-query, enhanced query, source filtering).

Environmental metrics

Environmental metric can be calculated when using models by setting the region where the model is hosted :

region: world | eu-west | eu-north | us-east | us-west | asia

energy consumed per token (in kWh) and PUE (Power Usage Effectiveness) profile :

environmentalMetrics:
  energyPerToken: 4.35e-7
  pueProfile: efficient | average | inefficient

Model display and capabilities

The display section defines the model’s brand, name, icon, eco-score, and cost information:

display:
  brand: Open AI
  name: GPT-4o
  icon: [https://staging-uploads.prisme.ai/cAZmE4C/r3G28CGTInbQSkVXXfDw4.open-ai.svg](https://staging-uploads.prisme.ai/cAZmE4C/r3G28CGTInbQSkVXXfDw4.open-ai.svg)
  ecoScore: high
  trainingDate: '2024-11-20T00:00:00.000Z'
  cost: low

The capabilities section lists which modalities the model supports:

capabilities:
  text:
    enabled: true
  vision:
    enabled: true
  image:
    enabled: true
  file:
    enabled: true
	maxSize: 20000000

Notes:

Models from OpenAI, Gemini, and Bedrock currently support the file capability.
Azure OpenAI does not yet provide this feature, although a community feature request is in progress.
When file.enabled is set to true and the file size is within supported limits, the file is sent directly in Base64 to the LLM.
Upcoming releases will introduce support for passing a file ID instead of raw Base64 data, using the storageProvider parameter (prisme, gcs, or s3). This will enable seamless handling of larger documents by referencing files stored in connected cloud storage rather than embedding their content directly.
the file.maxSize parameter is in Octet.

Prompt Caching (Bedrock only)

Some Bedrock models support static prompt caching.
You can enable this with promptCache.system option :

promptCache:
  system: true

Vector Store Configuration

To enable retrieval-based answers, configure a vector store:

vectorStore:
  provider: redisSearch
  url: '{{secret.redisUrl}}'
  vectorIndexPrefix: 'aik_rag_'

Or with OpenSearch:

vectorStore:
  provider: openSearch
  url: '{{secret.opensearchUrl}}'
  user: '{{secret.opensearchUser}}'
  password: '{{secret.opensearchPassword}}'
  vectorIndexPrefix: 'aik_rag_'

Or with ElasticSearch:

vectorStore:
  provider: elasticSearch
  url: '{{secret.elasticUrl}}'
  user: '{{secret.elasticUser}}'
  password: '{{secret.elasticPassword}}'
  vectorIndexPrefix: 'aik_rag_'

Tools and Capabilities

AI Knowledge enables advanced agents via tools.

file_search

RAG tool for semantic search within indexed documents.

file_summary

Summarize entire files when explicitly requested.

documents_rag

Used to extract context from project knowledge collections.

web_search

Optional tool enabled via Serper API key:

tools:
  webSearch:
    apiKey: '{{secret.serperApiKey}}'

code_interpreter

Python tool for data manipulation and document-based computation.

image_generation

Uses DALL-E or equivalant if enabled in LLM config.

Advanced Features

AI Builder Automation

AI Knowledge projects and agents can be provisioned programmatically via AI Builder workflows.

Failover Models

Specify a backup model to switch to if the main one is overloaded:

failoverModel: gpt-3.5-turbo

Make sure to enable failover in your workspace.

Token Management & Billing

Assign costs per million tokens to track model usage:

pricing:
  input: 2.5
  output: 10

This can be used with usage-based dashboards in AI Insights.

Next Steps

SecureChat

Create a ChatGPT-like agent within your organization — but secure, customizable, and connected to your tools and knowledge.

Monitoring & Logs

Monitor usage and LLM activity

Create an agent

Learn more about agent creation

Overview

Cloud Providers

Docker & Kubernetes Deployment

Entreprise Services

AI Products

Operations

Configuring AI Knowledge

Configuring AI Knowledge

Core Capabilities

LLM Providers

Global Configuration

Models Configuration

Vector Store Configuration

Tools and Capabilities

file_search

file_summary

documents_rag

web_search

code_interpreter

image_generation

Advanced Features

Next Steps

SecureChat

Monitoring & Logs

Create an agent

Overview

Cloud Providers

Docker & Kubernetes Deployment

Entreprise Services

AI Products

Operations

​Configuring AI Knowledge

​Core Capabilities

​LLM Providers

​Global Configuration

​Models Configuration

​Vector Store Configuration

​Tools and Capabilities

file_search

file_summary

documents_rag

web_search

code_interpreter

image_generation

​Advanced Features

​Next Steps

SecureChat

Monitoring & Logs

Create an agent

Configuring AI Knowledge

Core Capabilities

LLM Providers

Global Configuration

Models Configuration

Vector Store Configuration

Tools and Capabilities

Advanced Features

Next Steps