Configuring AI Knowledge
Set up the AI Knowledge product for RAG (Retrieval-Augmented Generation) with model orchestration, rate limits, and vector stores.
Configuring AI Knowledge
AI Knowledge is Prisme.ai’s product for agentic assistants powered by tools and retrieval-augmented generation (RAG). It enables teams to build agents that leverage internal knowledge across various formats, interact with APIs via tools, and collaborate with other agents through context sharing — enabling true multi-agent workflows with robust LLM support and enterprise-grade configuration options.
This guide explains how to configure AI Knowledge in a self-hosted environment.
Core Capabilities
- Configure multi-model support with failover and fine-tuned prompts
- Automate agent provisioning via AI Builder
- Enforce limits, security, and monitoring
- Enable builtin tools like summarization, search, code interpreter, web browsing
- Integrate with OpenSearch, Redis, or other vector stores
LLM Providers
OpenAI
OpenAI
Configure the llm.openai.openai.models
field :
OpenAI Azure
OpenAI Azure
Configure the llm.openai.azure.resources.*.deployments
field.
Multiple resources can be added by appending additional entries to the llm.openai.azure.resources
array :
Bedrock
Bedrock
Configure the llm.bedrock.*.models
and llm.bedrock.*.region
fields.
Multiple regions can be used by appending additional entries to the llm.bedrock
array :
Vertex
Vertex
Configure the llm.openai.vertex
field :
While deploying a model through Vertex the name of the model should represent the full endpoint name as in the above example.
The modelAliases
feature comes really handy for this provider!
In order to provide better readability to your users the above can be transformed into:
Note that the service_account
credentials should be ommitted if you deployed your platform on GCP and rely on IAM authentication.
Also, the service_account
value should either be :
- JSON
- Stringified JSON (handy if you save it within a secret)
OpenAI-Compatible Providers
OpenAI-Compatible Providers
Configure the llm.openailike
field :
Optional Parameters:
- provider: The provider name used in analytics metrics and dashboards.
- options.excludeParameters: Allows exclusion of certain OpenAI generic parameters not supported by the given model.
Gemini integration :
Global Configuration
Default models
Default models
Set base models for completions, embeddings, and query enhancement.
Rate Limits
Rate Limits
Rate limits can currently be applied at two stages in messages processing :
- When a message is received (requestsPerMinute limits for projects or users).
- After RAG stages and before the LLM call (tokensPerMinute limits for projects, users, models, or requestsPerMinute limits for models).
Embedding model rate limits are applied before vectorizing a document, per project or model.
This is how to configure token and request limits globally or per user/project:
- limits.llm.users: Defines per-user message/token limits across all projects.
- limits.llm.projects: Defines default message/token limits per project. These limits can be overridden per project via the /admin page in AI Knowledge.
- limits.files_count: Specifies the maximum number of documents allowed in AI Knowledge projects. This number can also be overridden per project via the /admin page.
See Models specifications for rate limits per model.
Model Aliases
Model Aliases
If you have multiple LLM Providers or regions with the same model names (for example gpt-4), you can use custom names:
And you can map them to the name expected by the provider with the following:
As a reminder, here is how modelsSpecifications could look like :
SSO Access
SSO Access
If you have your own SSO configured, you need to explicitly allow SSO authenticated users to access AI Knowledge pages :
- Open AI Knowledge workspace
- Open Settings > Advanced
- Manage roles
- Add your SSO provider technical name after
prismeai: {}
at the very beginning :
Account Management
Account Management
By default, sharing an agent with an external email will automatically send an invitation mail to let the external user create an account and access the agent.
You can disable this to enforce user control :
Only existing users will be able to access shared agents.
Onboarding, Toasts & Statuses
Onboarding, Toasts & Statuses
AI Knowledge supports onboarding flows, multilingual statuses, and customizable notifications:
Models Configuration
Configure all available models with descriptions, rate limits, and failover:
Customize descriptions
Customize descriptions
- All LLM models (excluding those with
type: embeddings
) will automatically appear in the AI Store menu unless disabled at the agent level, with the configured titles and descriptions. displayName
specifies the user-facing name of the model, replacing the technical or original model name to ensure a more intuitive and user-friendly experience.isHiddenFromEndUser
specifies that a model in the configuration will be hidden from end users. This feature allows administrators to temporarily disable a model or conceal it from the end-user interface without permanently removing it from the configuration.
Context & response tokens
Context & response tokens
maxContext
specifies the maximum token size of the context that can be passed to the model, applicable to both LLMs (full prompt size) and embedding models (maximum chunk size for vectorization).maxResponseTokens
defines the maximum completion size requested from the LLM, which can be overridden in individual agent settings.
Provider specific parameters
Provider specific parameters
additionalRequestBody.completions
andadditionalRequestBody.embeddings
specify custom parameters which will be sent within all HTTP request bodies for the given model, used to enable AWS Guardrails in above example
Embeddings batch size
Embeddings batch size
By default, documents paragraphs are vectorized in batches of 96.
You can customize this batchSize
per model :
Or globally :
Rate Limits
Rate Limits
When modelsSpecifications.*.rateLimits.requestsPerMinute
or modelsSpecifications.*.rateLimits.tokensPerMinute
are defined, an error (customizable via toasts.i18n.*.rateLimit
) is returned to any user attempting to exceed the configured limits. These limits are shared across all projects/users using the models.
If these limits are reached and modelsSpecifications.*.failoverModel
is specified, projects with failover.enabled
activated (disabled by default) will automatically switch to the failover model.
Notes:
- tokensPerMinute limits apply to the entire prompt sent to the LLM, including the user question, system prompt, project prompt, and RAG context.
- Failover and tokensPerMinute limits also apply to intermediate queries during response construction (e.g., question suggestions, self-query, enhanced query, source filtering).
Environmental metrics
Environmental metrics
Environmental metric can be calculated when using models by setting the region where the model is hosted :
energy consumed per token (in kWh) and PUE (Power Usage Effectiveness) profile :
Vector Store Configuration
To enable retrieval-based answers, configure a vector store:
Or with OpenSearch:
Tools and Capabilities
AI Knowledge enables advanced agents via tools.
file_search
RAG tool for semantic search within indexed documents.
file_summary
Summarize entire files when explicitly requested.
documents_rag
Used to extract context from project knowledge collections.
web_search
Optional tool enabled via Serper API key:
code_interpreter
Python tool for data manipulation and document-based computation.
image_generation
Uses DALL-E or equivalant if enabled in LLM config.
Advanced Features
AI Builder Automation
AI Builder Automation
AI Knowledge projects and agents can be provisioned programmatically via AI Builder workflows.
Failover Models
Failover Models
Specify a backup model to switch to if the main one is overloaded:
Make sure to enable failover in your workspace.
Token Management & Billing
Token Management & Billing
Assign costs per million tokens to track model usage:
This can be used with usage-based dashboards in AI Insights.