One of the most common questions - also one of the most difficult to answer! The accuracy of data directly impacts how confident we are in sharing insights and reports. At SentiSum, we are always looking at how your model can be more accurate with an in-house QA and Data Science team. Here, we have gone into detail on how we approach accuracy and why it is a complex subject!

Before we start:

If you feel that there are consistent inaccuracies which mean you are not confident in your insights, please reach out to me and we will investigate asap.

Below is an explanation on accuracy - there are two kinds:

  • Accuracy of AI tags

  • Accuracy of sentiment (identifying positivity or negativity)

Accuracy of AI tags

Each tag has its own level of accuracy. How accurate a specific tag is, depends on a few things:

  • How often it occurs in conversations

  • How specific it is

  • How distinct it is from other tags

If a tag occurs a lot in conversations and has distinct patterns of language, it should perform well e.g. "item not arrived". Even if a tag occurs a lot but the definition of a tag is vague e.g. "delivery quality", the accuracy can suffer, because even a human would not be able to reliably say when it should apply or not.

We look after all of these considerations when we build your model, so you don't need to worry about this.

Accuracy of sentiment

Figuring out if someone is talking about a tag positively or negatively is one of the hardest challenges in NLP (the language side of AI). This is because positive language in one setting is not deemed positive in another. Or, a customer may use sarcasm in their response. Also, someone may not give the whole context so even a human will find it difficult to understand the right sentiment.

At SentiSum, we have tested many different approaches to sentiment and are using the best performing.

When looking at sentiment, we encourage users to look at the general trend of sentiment, rather than focussing on the specific responses. But if you see consistent errors, please do flag this with me or anyone in the team and we will investigate.

Reporting on accuracy

We currently don't report on accuracy in a consistent, regular manner because it is honestly a difficult thing to automate well. The usual way of automatically generating accuracy numbers is full of bias, for example, I have seen results say an AI tag is 97% accurate when that was definitely not the case (!!) because the test data is inherently biased. We want to avoid any misleading data, so this is why we do not provide it readily.

If you do have any concerns about accuracy though, we will investigate thoroughly and send a report with our findings. Our QA team evaluates your data on a weekly basis and constantly works on improvements to accuracy regardless of whether you contact us or not.

For anyone who wants to dig deeper into the topic of accuracy in AI models, I always find this article clarifying:

https://towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c

It will never be 100%

Accuracy in NLP has come a long way in the last couple of years. Seeing how far it has come surprises me! But it will never be 100%. There is so much variety in how people talk; AI will always give it's best guess, and there are ways to help it guess better and better, but that means there will always be inaccuracies. The aim instead should be to confidently report on trends in your data, and make decisions off the back of that data.

If you have any further questions, please don't hesitate to reach out.

Did this answer your question?