Skip to content

Evaluation

This feature is useful if your Assistant deals with natural language.

The only truth is, there isn't THE best NLU out there. On Prisme.ai you have the possibility to compare all the NLUs available on the platform in order to determine which one is the most efficient and appropriate for your knowledge base.

  • The Prisme platform therefore offers a complete set of tools to measure and compare NLU motors in order to determine which one is the most appropriate for your use case.
  • The analysis is based for the moment on the non-regression tests, hence the interest to add some.

If your Assistant evolves over time, the most relevant NLU engine may change. After each major evolution of your Assistant, launch an evaluation.

Execution time by NLU

NLU Engine

Assessment scores by NLU

  • Precision assesses how often the NLU incorrectly identifies an intent, while recall assesses how often the NLU falls into Fallback when an intent matches the sentence.
  • The f_score is the average of the two, these are two opposing but important criteria to judge the quality of the NLU.

Intent scores

Evaluate specific intents

Confusion matrix

  • It allows the evaluation of the assistant with a given NLU for each individual intention.
  • The columns indicate the intention actually expected, the lines indicate the intention detected by the NLU.
  • A perfect diagonal indicates that the expected intentions always match the intentions found by the NLU engine. It should therefore be understood that there is no error and that it works correctly.
  • If boxes appear outside the diagonal, it means that the NLU engine has not interpreted the intention correctly.

Intents confusion matrix