When asked about my job (Data Analyst), 9 out of 10 times people respond confused. But most of the time, they do know or at least have some understanding of machine learning. This makes sense because one doesn’t come across data annotation in their everyday life. In this blog post, hopefully, some of the perplexity will be cleared up.
But why is Annotating so important? I always take the example of children learning a language. They are perfectly capable of doing so, but if they never get spoken to, they will also never learn a language. Machine learning works in a similar way. You want to make a system predict or make the right decisions, just as you want your children to get a piece of proper knowledge about language. To have the system make correct predictions, you need to train it with examples (Data). Just as you need to speak to children to have them learn a language.
So we know that the training data is very important, but where can we get it? This is where data annotation comes in. You have to make some correct data examples yourself first, in order to let the system learn. In our comparison with children that means talking with proper sentences against them. Of course, grammatical mistakes are inevitable. This doesn’t have to be a problem, the goal is namely to have children learn a general principle. Same for systems, one wrong data example doesn’t mean the whole system won’t work anymore, instead, the other correct samples make up for it.
Now another misunderstanding about machine learning and data annotation is its difference from classical programming. Where classical programming is just a number of assignments a computer needs to fulfil, and machine learning is where the computer makes certain predictions without being explicitly being programmed to do so. Instead, it makes predictions without explicitly being programmed. Thus by learning from data.
So what’s in it for you? Well the end result of correct training (and thus annotating), is for example that our model can distinguish the difference between award criteria, exclusion grounds, and eligibility requirements. Even Though those three categories are individually just pieces of text for a computer. When multiple examples are combined, the underlying structures of those categories can be learned. And this is where it comes in handy for tender/bid- managers. They can simply choose a category, of which they want to see all examples appearing in the tender, and see them in one simple overview. Which in the end can save you a lot of time.