Evaluating Conversations in AI Insights

Use AI Insights’ conversation evaluation feature to collect structured feedback on your assistant’s interactions. You can define custom evaluation criteria, fill them automatically or manually, and enable the evaluation pipeline on new conversations.


1. Overview

The evaluation form lets you specify exactly what and how to evaluate for each conversation:

  • Purpose: Gather consistent, structured assessments of assistant replies.
  • Use cases: Quality control, performance tracking, fine‐tuning data collection.
  • Modes:
    • Automatic: Every new conversation is evaluated by a large language model (LLM) against your schema.
    • Manual: Reviewers fill in the form via the UI, optionally assisted by the LLM.

2. Configuring the Evaluation Form

AI Insights uses a custom JSON Schema defintion for your form. You can describe each field’s:

  • Key (JSON property name)
  • Type (boolean, string, array, etc.)
  • Title and Description (guidance for the evaluator)
  • Enum or Choices for list selections

3. Example: Defining Your Form

Below is a screenshot of the form editor in AI Insights:

You can add fields, adjust types, and write clear descriptions:

Validate your schema to ensure the form renders correctly within the Inbox:


4. Using the Evaluation Form

4.1 Automatic Mode

To auto‐evaluate every new conversation:

  1. Toggle Automatically fill in new conversations below your evaluation form.
  2. Save changes.

Once enabled, each new conversation is passed through the LLM, and the form is auto‐populated.

4.2 Manual Mode

To review and evaluate on demand:

  1. Open Inbox in the sidebar.
  2. Select a conversation thread.
  3. Click Evaluate in the top-right toolbar (the shaking hands icon).
  4. The evaluation form appears.
  5. Optional : ask the LLM to prefilled them with the “Autofill the form” button.
  6. Adjust fields as needed and Submit.


5. Best Practices

  • Keep descriptions concise but informative.
  • Limit required fields to key metrics.
  • Use enums for common issues to standardize feedback.
  • Regularly review schema based on evaluator feedback.
  • Leverage LLM assistance for faster manual reviews.

Happy evaluating! 🎉