Model Governance lets you control which AI models are available to your organization, set usage policies, configure routing strategies, and monitor consumption. Access these features from Agents Controls in the sidebar.Documentation Index
Fetch the complete documentation index at: https://docs.prisme.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Agents Controls section has four tabs:| Tab | Description |
|---|---|
| Models | Configure allowed models and policies |
| Usage | Monitor consumption against quotas |
| Service Accounts | Machine-to-machine authentication |
| Agents | Per-agent model restrictions |
Models Configuration
Allowed Models
By default, organizations can use all models enabled by the platform administrator. You can restrict this to a specific list.- Go to Agents Controls > Models
- Toggle Restrict Models
- Select which models to allow
- Click Save
Default Models
Set the default models used when agents don’t specify one:| Setting | Description |
|---|---|
| Default Completion Model | Used for chat and text generation |
| Default Embedding Model | Used for vector embeddings |
Quota Policy
Configure what happens when quota limits are reached:| Policy | Behavior |
|---|---|
| Hard Block | Requests fail with quota exceeded error |
| Soft Downgrade | Fall back to a cheaper model |
Downgrade Mapping
When using soft downgrade, configure which models to substitute:Failover Mapping
Configure automatic failover when a model is unavailable:- 5xx errors: switches to the failover model from the mapping, or falls back to the default completion model (with linear backoff 1s, 2s, 3s…)
- 429 rate limits: retries the same model after a 5s backoff
- Other 4xx errors: returned immediately without retry
- Up to 3 attempts (configurable, hard cap of 10)
- Failover models are validated against governance access controls before use
- Failover only applies to non-streaming requests
Model Routing
The LLM Gateway supports intelligent model routing — selecting the best model for a request based on configurable strategies. Usemodel: "auto" in API calls to trigger routing.
Routing Strategies
| Strategy | Description |
|---|---|
| Disabled | No automatic routing; use specified model |
| Rules | Rule-based routing — first matching rule wins |
| LLM Classifier | Use a cheap LLM to classify the request and map category to model |
| Capabilities | Query model catalog for enabled models matching required tags |
| Cost Optimized | Same as capabilities but iterates cost tiers (low → medium → high) to pick cheapest match |
| Hybrid | Try rules first, fall back to LLM classifier if no rule matches |
Rule-Based Routing
Define conditions to route requests to different models:messages_count with operators <, >, <=, >=, =. If the selected model is blocked by governance, routing falls back to the default.
Usage Monitoring
The Usage tab shows consumption against your subscription quotas.Tracked Metrics
| Metric | Type | Description |
|---|---|---|
llm.requests.rpm | Rate | Requests per minute |
llm.requests.daily | Rate | Requests per day |
llm.tokens.monthly | Cumulative | Total tokens this month |
llm.cost.monthly | Cumulative | Total cost this month |
Understanding Quotas
Quotas are defined in your subscription:- Rate limits reset after the time window (minute, hour, day)
- Cumulative limits accumulate until the billing period resets
Usage Display
Each metric shows:- Current value vs. limit
- Percentage consumed
- Visual progress bar (yellow at 80%, red at 95%)
Per-Agent Model Restrictions
The Agents tab lets you restrict which models specific agents can use.Why Restrict Agents?
- Cost control: Limit expensive model usage to specific agents
- Compliance: Ensure sensitive agents only use approved models
- Testing: Restrict test agents to cheaper models
Configuring Agent Models
- Go to Agents Controls > Agents
- Find the agent to configure
- Click Configure Models
- Select allowed models (or leave empty for org defaults)
- Save changes
Service Accounts
Service accounts provide machine-to-machine authentication. See Identity & Access for details. In the context of Model Governance:- Link service accounts to specific agents
- Track which service accounts consume LLM resources
- Control model access per service account
Model Access Control
When a request arrives, the LLM Gateway validates the requested model against three sequential allowlists:- Org allowlist — Is the model in the organization’s allowed models? (if the list exists and has entries)
- Agent allowlist — Is the model in the agent’s allowed models? (passed by agent-factory)
- API key scopes — Is the model allowed by the API key’s scopes?
- Soft Downgrade: silently swaps to the default completion model
- Hard Block (default): returns a
403error withMODEL_NOT_ALLOWED
Carbon Footprint Tracking
Every LLM call includes an estimated environmental impact in the response:| PUE Profile | Multiplier |
|---|---|
| Efficient | 1.1 |
| Average (default) | 1.58 |
| Inefficient | 2.0 |
Supported Providers
The LLM Gateway abstracts multiple providers behind a unified API:| Provider | Models | Notes |
|---|---|---|
| OpenAI | GPT-5, GPT-4o, o3-mini, embeddings, DALL-E 3 | Direct API |
| Azure OpenAI | GPT-5, GPT-4o, embeddings, Claude (via Azure AI) | Multiple resource configs |
| OpenAI-compatible | Gemini, DeepSeek, Mistral, Cerebras, OVH, Linagora | Via openailike provider type |
| Anthropic | Claude Sonnet 4.5, Claude 3.7 Sonnet, Claude 3.5 Sonnet | Native API |
| Google Vertex AI | Gemini 2.5/3, Imagen 4.0, text-embedding-005 | Via model aliases + service account |
| AWS Bedrock | Claude, Titan, Cohere, Nova, Llama | Multiple region/credential sets |
Best Practices
Start Restrictive
Begin with a limited model list and expand based on need
Use Soft Downgrade
Prefer soft downgrade to maintain service during quota limits
Monitor Usage
Set alerts before hitting quota limits
Configure Failover
Ensure critical workflows have failover models
Common Scenarios
- Cost Control
- Compliance
- High Availability
To minimize costs while maintaining quality:
- Enable Soft Downgrade policy
- Configure downgrade mapping:
claude-3-opus→claude-3-sonnetgpt-4→gpt-3.5-turbo
- Set conservative monthly token limits
- Use rule-based routing to prefer cheaper models for simple requests
Next Steps
Capabilities
Manage tools, MCP servers, and guardrails
Observability
Monitor model costs and performance