When we start with a ML project, we often consider that all training data is evaluated with 100% accuracy. However, such a scenario is rarely presented to us. In this talk I'll show how we face problems such as missing or unreliable targets, incomplete model feedback or training data quality. And how we used product representations to evaluate label confidence and improve data quality.
When we talk about an end-to-end workflow in a Machine Learning system in production, we tend to overlook a stage that is critical for success, data labeling. In academic or learning environments, all training data is labelled and the outputs are considered 100% correct, so no further in-depth analysis performed. However, in online production systems, such a scenario is rarely presented to us. Sometimes at the beginning of a project we do not directly have the target values, sometimes we have a fraction of them or some are not correct. At the other end of the pipeline, on the monitoring side, we are presented with a very related problem such as incomplete feedback, that is, many times we will not have visibility of errors made by the model. Which, if not detected, will be perpetuated on the system by reintroducing them as retraining data. All these scenarios lead us to look for methodologies to evaluate the confidence of the labels, in order to maximize the impact of the most reliable ones and to exclude or correct those that generate noise. In this talk I want to show how these problems were attacked within the initiative of moderation of prohibited items in MercadoLibre’s marketplace, and the lessons learned.