The Cruciality of Due Diligence and How to Train the Machine

Last week (take a look) we learnt that Facebook has over twenty-thousand employees who annotate the comments that Facebook collects. This information then becomes the training data for the Artificial Intelligence (AI), Natural Language Processing (NLP) and Machine Learning (ML) Facebook then employs. There are, however, issues with this process. Especially when it comes to due diligence.

black and white words

Data Sets

Two people can read the same comment and come up with very different interpretations at the same time. This is because comments are very subjective. Many have tried to solve this issue. And a lot of work has been put towards coming up with a solution. But very few companies can get higher than 85% accuracy because nobody can agree entirely when analysing comments. Not only is it difficult for people to agree upon what a comment signifies, but it is even more difficult for the machine. 

But, before the machine can even attempt to use the training data provided, we have to ask the question of who decides what to look for in a specific data set? This is what we, at Pansensic, refer to as the taxonomy. Typically, when presented with data, people look for themes in said data-sets. But someone has to decided what themes to look for in the first place. 

Say, for instance, you are analysing comments on employee experience. This data is bound to have a very different set of themes compared to when analysing data on patient or customer experience. And, again, if you look at themes in customer experience for an airline, this will differ vastly to customer themes within a mobile phone company, food and drink company or sports shop. And, let’s say that you get to this level, you then have all the sub themes to decipher. 


question mark stranded in the water


Whoever creates these taxonomies ought to have a holistic understanding of the subject or domain they are analysing. Pansensic believe that the most important thing is that the actual data dictates the themes. Most preconceived models, or even academic models, lack in themes. As a result, the data itself should dictate the themes you are looking for. You need the due diligence to look for this.

To create a taxonomy, the chief taxonomist needs to read large volumes of comments. Without doing so they will not be able to identify the themes. They also have to be humble enough to accept that they have to read all the comments and leave to one side their own subjectivity.

The most difficult aspect to taxonomy is knowing how many themes you want in the first place, and how big these themes are. If you have too few themes you get a high-level metric that you can do very little with. Yet, if you have too many themes you can be swamped and, therefore, lose sight of where the priority is.


In the end, what the majority of taxonomists are trying to do is identify actionable insights. It is, however, exceedingly important that the taxonomist identifies all actionable insights, and then prioritises them. To identify all the actionable insights and not just the ones you want is a tricky task. The danger is that it is easier to identify an actionable insight. But this is not necessarily the insight that should be worked on.

Peter Drucker quote

Crucial Due Diligence

It’s all about doing the right thing, not just about doing something. People often get distracted by what they find interesting. In order to use the data correctly, you can’t pick out what you find most interesting and focus on that. You have to do what is most important. Equally, sometimes people will focus on the most urgent element. This may not necessarily be the most important element either.

Not only does the quantity of themes matter, but so does its structure. Say, for instance, you end up with 70 themes for a particular domain. This flat structure does not work very well as you do not get a clustering effect or even a priority effect. We then have to question if the structure could be tiered or not. If so, the taxonomist has to contend with knowledge management principles. They have to be able to identify which of the themes are parent themes. As well as identify the child themes. And to identify the dependencies between one tier and another. 

All of a sudden you have gone from someone who is reading a Facebook quote, to someone who has to be able to understand the principles of knowledge management. And understand them well. This is no easy task. 


So, when you are looking for a provider to help you understand your unstructured text, you need to go through the due diligence. You need to question how many comments a statement is based on. How many comments are the taxonomists reading themselves and what are the capabilities of the people providing the taxonomy. If the answers to this are vague you can be sure that the comments and data analysis may;

A) Not be very accurate or sensitive.

B) The full potential of your data may be lost.

C) You may action the wrong thing.

Due diligence is crucial to a quality purchase. A purchase you can be confident in giving you the best ROI- Return on Information.

Contact Pansensic for a chat, or for a demo, and compare us like for like, with other providers.

Leave a Reply

Close Menu