Testing Approaches
Prisme.ai supports multiple testing methodologies to ensure your agents meet your organization’s standards:Evaluation Framework
Prisme.ai uses a straightforward evaluation system that makes it easy to assess agent performance:Response Quality
Score: 0 (Poor), 1 (Adequate), 2 (Excellent)
Context Quality
Score: 0 (Poor), 1 (Adequate), 2 (Excellent)
Hallucination Check
Score: 0 (Significant), 1 (Minor), 2 (None)
Automated Evaluation Process
The automated evaluation process uses LLMs as judges to assess agent performance:Create Test Questions
Configure Evaluation Parameters
- Which LLM will serve as the evaluator
- Evaluation frequency (daily, weekly, on-demand)
- Evaluation criteria weighting
Run Evaluations
Review Results
- Overall performance scores
- Performance trends over time
- Breakdowns by question type
- Detailed analysis of retrieved contexts
Export and Share
Human-in-the-Loop Evaluation
Combine automated testing with human expertise for comprehensive quality control: Human reviewers can:- Review and override automated evaluation scores
- Provide qualitative feedback on responses
- Identify subtle issues that automated systems miss
- Add new test questions based on emerging needs
- Validate context quality and relevance
Custom Evaluation with Webhooks
For specialized evaluation needs, you can implement custom processes using Webhooks and AI Builder:Configure Webhook Endpoint
Implement Custom Evaluation Logic
- Domain-specific quality metrics
- Compliance and regulatory checks
- Industry terminology validation
- Integration with existing quality systems
Return Standardized Results
Strategic Benefits of Testing
Comprehensive testing delivers significant benefits beyond simple quality control:Monitor Data Source Changes
Detect when changes to underlying data sources affect response quality.
This allows you to:
- Prevent regressions when content is updated
- Identify when knowledge gaps emerge
- Maintain consistency across content updates
Optimize LLM Selection
Evaluate performance across different LLM providers and models.
This enables you to:
- Select more cost-efficient models
- Reduce energy consumption
- Use specialized or self-hosted models when appropriate
- Make data-driven model migration decisions
Engage Business Stakeholders
Foster ownership of content quality among domain experts.
This helps to:
- Demonstrate the impact of quality source material
- Create accountability for knowledge accuracy
- Build trust in AI system outputs
- Drive continuous content improvement
Establish Tech-Business Alignment
Create a shared understanding of performance metrics and goals.
This leads to:
- Clear performance contracts between teams
- Shared optimization targets
- Better resource allocation
- Transparent communication about capabilities
Testing Methodology: Start Simple
We recommend an iterative testing approach that builds from foundational tests to more complex scenarios:Initial Test Set (15 Questions)
Start with a manageable set of diverse test cases:- “What is our company’s return policy?”
- “Who is the contact person for technical support?”
- “What are the operating hours for customer service?”
Iterative Optimization
After initial testing, systematically adjust and retest to improve performance:Adjust LLM Parameters
- Prompt engineering adjustments
- Temperature and creativity settings
- Different models or model versions
Refine RAG Configuration
- Chunking strategies
- Indexing methods
- Retrieval mechanisms
- Context handling
Integrate Tools
- Calculators for numerical questions
- Structured data tools for comparisons
- Visualization tools for complex data
Expand Test Set
- Add more edge cases
- Include newly discovered user questions
- Create tests for specific user personas
Best Practices
Test Creation
Test Creation
- Base test questions on actual user queries when possible
- Include a mix of simple, moderate, and complex questions
- Create test cases that cover all key knowledge domains
- Update test sets as user needs and content evolve
- Include edge cases and potential failure scenarios
Evaluation Approach
Evaluation Approach
- Use automated evaluation for regular monitoring
- Incorporate human review for high-stakes applications
- Test both positive scenarios (what the agent should do) and negative scenarios (what it shouldn’t do)
- Establish clear evaluation criteria before testing
- Compare performance across different agent configurations
Continuous Improvement
Continuous Improvement
- Schedule regular re-evaluation of agent performance
- Analyze patterns in low-scoring responses
- Document configuration changes and their impact
- Establish feedback loops with end users
- Create a prioritization framework for addressing issues
Team Collaboration
Team Collaboration
- Include both technical and business stakeholders in test creation
- Share testing results transparently across teams
- Establish clear ownership for different aspects of quality
- Create shared performance goals and targets
- Celebrate improvements in agent quality