Skip to main content
AI Knowledge Configuration

Configuring AI Knowledge

AI Knowledge is Prisme.ai’s product for agentic assistants powered by tools and retrieval-augmented generation (RAG). It enables teams to build agents that leverage internal knowledge across various formats, interact with APIs via tools, and collaborate with other agents through context sharing — enabling true multi-agent workflows with robust LLM support and enterprise-grade configuration options. This guide explains how to configure AI Knowledge in a self-hosted environment.

Core Capabilities

  • Configure multi-model support with failover and fine-tuned prompts
  • Automate agent provisioning via AI Builder
  • Enforce limits, security, and monitoring
  • Enable builtin tools like summarization, search, code interpreter, web browsing
  • Integrate with OpenSearch, Redis, or other vector stores

LLM Providers

Configure the llm.openai.openai.models field :
llm:
  openai:
    ...
    openai:
      api_key: '{{secret.openaiApiKey}}'
      models:
        - gpt-4
        - gpt-4o
        - o1-preview
        - o1-mini
Configure the llm.openai.azure.resources.*.deployments field.
Multiple resources can be added by appending additional entries to the llm.openai.azure.resources array :
llm:
  openai:
    azure:
      resources:
        - resource: "resource name"
          api_key: '{{secret.azureOpenaiApiKey}}'
          api_version: '2023-05-15'
          deployments:
            - gpt-4
            - embedding-ada
Configure the llm.bedrock.*.models and llm.bedrock.*.region fields.
Multiple regions can be used by appending additional entries to the llm.bedrock array :
llm:
  ...
  bedrock:
    - credentials:
        aws_access_key_id: '{{secret.awsBedrockAccessKey}}'
        aws_secret_access_key: '{{secret.awsBedrockSecretAccessKey}}'
      models:
        - mistral.mistral-large-2402-v1:0
        - amazon.titan-embed-image-v1
      region: eu-west-3
    - credentials:
        aws_access_key_id: '{{secret.awsBedrockAccessKey}}'
        aws_secret_access_key: '{{secret.awsBedrockSecretAccessKey}}'
      models:
        - amazon.titan-embed-text-v1
      region: us-east-1
Configure the llm.openai.vertex field :
llm:
  vertex:
    credentials:
      service_account: '{{secret.vertexServiceAccount}}'
    host: us-central1-aiplatform.googleapis.com
    models:
      - projects/my-project-id/locations/us-central1/publishers/google/models/gemini-2.5-flash-preview-05-20
While deploying a model through Vertex the name of the model should represent the full endpoint name as in the above example.The modelAliases feature comes really handy for this provider! In order to provide better readability to your users the above can be transformed into:
llm:
  vertex:
    credentials:
      service_account: '{{secret.vertexServiceAccount}}'
    host: us-central1-aiplatform.googleapis.com
    models:
      - vertex-gemini-2.5-flash-preview-05-20
 ...
 modelAliases:
  vertex-gemini-2.5-flash-preview-05-20: projects/my-project-id/locations/us-central1/publishers/google/models/gemini-2.5-flash-preview-05-20
Note that the service_account credentials should be ommitted if you deployed your platform on GCP and rely on IAM authentication. Also, the service_account value should either be :
  • JSON
  • Stringified JSON (handy if you save it within a secret)
To configure third-party models on Vertex (for example, Claude), please specify the format tag.

  vertex:
    - credentials:
        service_account: '{{secret.vertexServiceAccount}}'
      host: us-east5-aiplatform.googleapis.com
      models:
        - gemini-2.0-flash-us-east5
        - imagen-3.0-generate-002-us-east5
        - gemini-embedding-exp-03-07-us-east5
    - credentials:
        service_account: '{{secret.vertexServiceAccount}}'
      host: us-east5-aiplatform.googleapis.com
      format: anthropic
      models:
        - vertex-claude-sonnet-3.5-v2-20241022
Configure the llm.openailike field :
llm:
  ...
  openailike:
    - api_key: "{{config.apiKey1}}"
      endpoint: "endpoint 1"
      models:
        - mistal-large
    - api_key: "{{secret.apiKey2}}"
      endpoint: "endpoint 2"
      provider: Mistral
      models:
        - mistral-small
        - mistral-mini
      options:
        excludeParameters:
          - presence_penalty
          - frequency_penalty
          - seed
Optional Parameters:
  • provider: The provider name used in analytics metrics and dashboards.
  • options.excludeParameters: Allows exclusion of certain OpenAI generic parameters not supported by the given model.
Gemini integration :
        - api_key: '{{secret.geminiApiKey}}'
          endpoint: https://generativelanguage.googleapis.com/v1beta/openai/
          models:
            - gemini-2.0-flash
            - gemini-2.0-flash-lite
            - gemini-1.5-pro
            - gemini-2.5-pro-preview-03-25
          options:
            excludeParameters:
              - presence_penalty
              - frequency_penalty

Global Configuration

defaultModels:
  completions: gpt-4o
  embeddings: az-embedding-ada
  enhanceQuery: gpt-3.5-turbo
  toolRouting: gpt-4o
Set base models for completions, embeddings, and query enhancement.
Configure the following fields at the root of the configuration :
openai:
  model-embeddings: '{{config.defaultModels.embeddings}}' # The default model to use for embeddings
  model: '{{config.defaultModels.completions}}' # The default model to use for LLM completions
  temperature: 0.1 
  max_tokens: 2000 # Max LLM response size, INCLUDING reasoning
  top_p: 1
  frequency_penalty: 0
  presence_penalty: 0.6
textSplitter:
  chunkSize: 1000
  chunkOverlap: 200
indexInitialized: true
embeddings:
  numberOfSearchResults: 10
  maxContext: 2048
Rate limits can currently be applied at two stages in messages processing :
  1. When a message is received (requestsPerMinute limits for projects or users).
  2. After RAG stages and before the LLM call (tokensPerMinute limits for projects, users, models, or requestsPerMinute limits for models).
Embedding model rate limits are applied before vectorizing a document, per project or model.This is how to configure token and request limits globally or per user/project:
limits:
  files_count: 20
  llm:
    users:
      tokensPerMinute: 100000
      requestsPerMinute: 20
    projects:
      tokensPerMinute: 30000
      requestsPerMinute: 300
  embeddings:
    projects:
      tokensPerMinute: 1000000
  • limits.llm.users: Defines per-user message/token limits across all projects.
  • limits.llm.projects: Defines default message/token limits per project. These limits can be overridden per project via the /admin page in AI Knowledge.
  • limits.files_count: Specifies the maximum number of documents allowed in AI Knowledge projects. This number can also be overridden per project via the /admin page.
See Models specifications for rate limits per model.
If you have multiple LLM Providers or regions with the same model names (for example gpt-4), you can use custom names:
llm:
   openai:
      azure:
         resources:
            - resource: "resource name"
              api_key: '{{secret.azureOpenaiApiKey}}'
              api_version: '2023-05-15'
              deployments:
                 - gpt-4-openai
      openai:
         api_key: '{{secret.openaiApiKey}}'
         models:
            - gpt-4-azure
And you can map them to the name expected by the provider with the following:
modelAliases:
   gpt-4-openai: gpt-4
   gpt-4-azure: gpt-4
As a reminder, here is how modelsSpecifications could look like :
modelsSpecifications:
  gpt-4-openai:
    displayName: GPT 4 OpenAi
    maxContext: 8192
    ...
  gpt-4-azure:
    displayName: GPT 4 Azure
    maxContext: 8192
    ...
If you have your own SSO configured, you need to explicitly allow SSO authenticated users to access AI Knowledge pages :
  1. Open AI Knowledge workspace
  2. Open Settings > Advanced
  3. Manage roles
  4. Add your SSO provider technical name after prismeai: {} at the very beginning :
authorizations:
  roles:
    editor: {}
    free:
      auth:
        prismeai: {}
        yourOwnSso: {}
By default, sharing an agent with an external email will automatically send an invitation mail to let the external user create an account and access the agent.You can disable this to enforce user control :
disableAccountCreation: true
Only existing users will be able to access shared agents.
AI Knowledge supports onboarding flows, multilingual statuses, and customizable notifications:
status:
  colors:
    published: '#5CA44A'
    pending: '#FF9261'
    draft: '#E5E5E5'
prompt:
  default: |
    You will only answer based on this context:
    ${context}
toasts:
  i18n:
    fr:
      documentCrawled: a été indexé par votre IA 🤖
    en:
      documentCrawled: was indexed by your AI 🤖

Models Configuration

Configure all available models with descriptions, rate limits, and failover:
modelsSpecifications:
  gpt-4o:
    displayName: GPT-4o
    maxContext: 128000
    maxResponseTokens: 2000
    isHiddenFromEndUser: true
    subtitle:
      fr: Modèle hébergé aux USA.
      en: Model hosted in the USA.
    description:
      fr: Le modèle GPT-4o sur OpenAI. Vous pouvez utiliser des documents C1 et C2.
      en: The GPT-4o model on OpenAI. You can use documents C1 and C2.
    rateLimits:
      requestsPerMinute: 1000
      tokensPerMinute: 100000
    failoverModel: 'gpt-4o'
    region: eu-west
	environmentalMetrics:
	  energyPerToken: 4.35e-7
      pueProfile: efficient
    display:
      brand: Open AI
      name: GPT-4o
      icon: https://staging-uploads.prisme.ai/cAZmE4C/r3G28CGTInbQSkVXXfDw4.open-ai.svg
      ecoScore: high
      trainingDate: '2024-11-20T00:00:00.000Z'
      cost: low
    capabilities:
      text:
        enabled: true
      vision:
        enabled: true
      image:
        enabled: true
      file:
        enabled: true
        maxSize: 20000000
  text-embedding-ada-002:
    type: embeddings
    maxContext: 2048
    batchSize: 96
    subtitle: {}
    description: {}
  mistral.mistral-large-2402-v1:0:
    maxContext: 120000
    additionalRequestBody:
      completions:
        guardrailConfig:
          guardrailIdentifier: "..."
          guardrailVersion: '1'
      embeddings: {}
  • All LLM models (excluding those with type: embeddings) will automatically appear in the AI Store menu unless disabled at the agent level, with the configured titles and descriptions.
  • displayName specifies the user-facing name of the model, replacing the technical or original model name to ensure a more intuitive and user-friendly experience.
  • isHiddenFromEndUser specifies that a model in the configuration will be hidden from end users. This feature allows administrators to temporarily disable a model or conceal it from the end-user interface without permanently removing it from the configuration.
  • maxContext specifies the maximum token size of the context that can be passed to the model, applicable to both LLMs (full prompt size) and embedding models (maximum chunk size for vectorization).
  • maxResponseTokens defines the maximum completion size requested from the LLM, which can be overridden in individual agent settings.
  • additionalRequestBody.completions and additionalRequestBody.embeddings specify custom parameters which will be sent within all HTTP request bodies for the given model, used to enable AWS Guardrails in above example
By default, documents paragraphs are vectorized in batches of 96.
You can customize this batchSize per model :
modelsSpecifications:
  text-embedding-ada-002:
    type: embeddings
    maxContext: 2048
    batchSize: 50
Or globally :
embeddings:
  batchSize: 50
When modelsSpecifications.*.rateLimits.requestsPerMinute or modelsSpecifications.*.rateLimits.tokensPerMinute are defined, an error (customizable via toasts.i18n.*.rateLimit) is returned to any user attempting to exceed the configured limits. These limits are shared across all projects/users using the models.If these limits are reached and modelsSpecifications.*.failoverModel is specified, projects with failover.enabled activated (disabled by default) will automatically switch to the failover model.Notes:
  • tokensPerMinute limits apply to the entire prompt sent to the LLM, including the user question, system prompt, project prompt, and RAG context.
  • Failover and tokensPerMinute limits also apply to intermediate queries during response construction (e.g., question suggestions, self-query, enhanced query, source filtering).
Environmental metric can be calculated when using models by setting the region where the model is hosted :
region: world | eu-west | eu-north | us-east | us-west | asia
energy consumed per token (in kWh) and PUE (Power Usage Effectiveness) profile :
environmentalMetrics:
  energyPerToken: 4.35e-7
  pueProfile: efficient | average | inefficient
The display section defines the model’s brand, name, icon, eco-score, and cost information:
display:
  brand: Open AI
  name: GPT-4o
  icon: [https://staging-uploads.prisme.ai/cAZmE4C/r3G28CGTInbQSkVXXfDw4.open-ai.svg](https://staging-uploads.prisme.ai/cAZmE4C/r3G28CGTInbQSkVXXfDw4.open-ai.svg)
  ecoScore: high
  trainingDate: '2024-11-20T00:00:00.000Z'
  cost: low

The capabilities section lists which modalities the model supports:
capabilities:
  text:
    enabled: true
  vision:
    enabled: true
  image:
    enabled: true
  file:
    enabled: true
	maxSize: 20000000
Notes:
  • Models from OpenAI, Gemini, and Bedrock currently support the file capability.
  • Azure OpenAI does not yet provide this feature, although a community feature request is in progress.
  • When file.enabled is set to true and the file size is within supported limits, the file is sent directly in Base64 to the LLM.
  • Upcoming releases will introduce support for passing a file ID instead of raw Base64 data, using the storageProvider parameter (prisme, gcs, or s3). This will enable seamless handling of larger documents by referencing files stored in connected cloud storage rather than embedding their content directly.
  • the file.maxSize parameter is in Octet.

Vector Store Configuration

To enable retrieval-based answers, configure a vector store:
vectorStore:
  provider: redisSearch
  url: '{{secret.redisUrl}}'
  vectorIndexPrefix: 'aik_rag_'
Or with OpenSearch:
vectorStore:
  provider: openSearch
  url: '{{secret.opensearchUrl}}'
  user: '{{secret.opensearchUser}}'
  password: '{{secret.opensearchPassword}}'
  vectorIndexPrefix: 'aik_rag_'
Or with ElasticSearch:
vectorStore:
  provider: elasticSearch
  url: '{{secret.elasticUrl}}'
  user: '{{secret.elasticUser}}'
  password: '{{secret.elasticPassword}}'
  vectorIndexPrefix: 'aik_rag_'

Tools and Capabilities

AI Knowledge enables advanced agents via tools.

file_search

RAG tool for semantic search within indexed documents.

file_summary

Summarize entire files when explicitly requested.

documents_rag

Used to extract context from project knowledge collections.

web_search

Optional tool enabled via Serper API key:
tools:
  webSearch:
    apiKey: '{{secret.serperApiKey}}'

code_interpreter

Python tool for data manipulation and document-based computation.

image_generation

Uses DALL-E or equivalant if enabled in LLM config.

Advanced Features

AI Knowledge projects and agents can be provisioned programmatically via AI Builder workflows.
Specify a backup model to switch to if the main one is overloaded:
failoverModel: gpt-3.5-turbo
Make sure to enable failover in your workspace.
Assign costs per million tokens to track model usage:
pricing:
  input: 2.5
  output: 10
This can be used with usage-based dashboards in AI Insights.

Next Steps

I