Set up the AI Knowledge product for RAG (Retrieval-Augmented Generation) with model orchestration, rate limits, and vector stores.
OpenAI
llm.openai.openai.models
field :OpenAI Azure
llm.openai.azure.resources.*.deployments
field.llm.openai.azure.resources
array :Bedrock
llm.bedrock.*.models
and llm.bedrock.*.region
fields.llm.bedrock
array :Vertex
llm.openai.vertex
field :modelAliases
feature comes really handy for this provider!
In order to provide better readability to your users the above can be transformed into:service_account
credentials should be ommitted if you deployed your platform on GCP and rely on IAM authentication.
Also, the service_account
value should either be :OpenAI-Compatible Providers
llm.openailike
field :Default models
Default agent parameters
Rate Limits
Model Aliases
SSO Access
prismeai: {}
at the very beginning :Account Management
Onboarding, Toasts & Statuses
Customize descriptions
type: embeddings
) will automatically appear in the AI Store menu unless disabled at the agent level, with the configured titles and descriptions.displayName
specifies the user-facing name of the model, replacing the technical or original model name to ensure a more intuitive and user-friendly experience.isHiddenFromEndUser
specifies that a model in the configuration will be hidden from end users. This feature allows administrators to temporarily disable a model or conceal it from the end-user interface without permanently removing it from the configuration.Context & response tokens
maxContext
specifies the maximum token size of the context that can be passed to the model, applicable to both LLMs (full prompt size) and embedding models (maximum chunk size for vectorization).maxResponseTokens
defines the maximum completion size requested from the LLM, which can be overridden in individual agent settings.Provider specific parameters
additionalRequestBody.completions
and additionalRequestBody.embeddings
specify custom parameters which will be sent within all HTTP request bodies for the given model, used to enable AWS Guardrails in above exampleEmbeddings batch size
batchSize
per model :Rate Limits
modelsSpecifications.*.rateLimits.requestsPerMinute
or modelsSpecifications.*.rateLimits.tokensPerMinute
are defined, an error (customizable via toasts.i18n.*.rateLimit
) is returned to any user attempting to exceed the configured limits. These limits are shared across all projects/users using the models.If these limits are reached and modelsSpecifications.*.failoverModel
is specified, projects with failover.enabled
activated (disabled by default) will automatically switch to the failover model.Notes:Environmental metrics
AI Builder Automation
Failover Models
Token Management & Billing