TIKI is the trusted ecommerce platform where thousand sellers sell million products. We commit to provide the best quality of service and only authentic products. But day by day, we grow faster and faster… and can not scale without the support of machine. We have to build a process to combine both human and machine to verify our products. Detecting the brand/logo of a product is one step that helps us to prevent the fake products. It is also the first time , we apply AI/Machine Learning to solve the scale problem.
1. Problem introduction
Brand/Logo detection is an automatic verification method which can partially free human force from manual verification. Whenever a request is posted by any seller, the associated images are forwarded to our verification system (hereon called bot), and our bot should be able to return the predicted brand.
The main role of bot is to provide early information for the rejection of violated requests. According to our rejection reasons, logo violation is defined as: “Hình ảnh có chứa logo thương hiệu không khớp với thương hiệu của sản phẩm”.
2. Technical approach
Figure 1. Detection schema.
This problem might be solved by several approaches. At first glance, we can potentially cast this problem to classification problem where the input is the entire image and the output is its corresponding brand. This solution unfortunately does not work well in practice because products of different brands are hardly distinguishable, e.g. glasses, shoes.
What actually do distinguish products of different brands? It is brand logo!!! The problem therefore can be nicely cast to logo detection and prediction. Our detection scheme includes two phases:
- Localization phase. We train darknet which is a well-known implementation of YOLO v3 architecture. Darknet can simultaneously localize the potential logo area, and predict its brand. While darknet is good at localization task, it exposes unsatisfactory performance on prediction task. Therefore, we re-use its localization result but does not rely on its prediction brand.
Prediction phase. We train Resnet (Residual network) on localization results of Darknet to pursue more accurate classification results.
3. Dataset descriptions
The set of considered seven brands includes: Apple, Adidas, Lego, Nike, Kingston, Calvin Klein, Bosch. Datasets used for training Darknet and Resnet are described in the following:
To monitor the training process, we split each training set into training and validation subsets. In the case of Darknet, 20 images / brand → 140 images in tocal are served as the validation set. In the case of Resnet, 10% of the dataset → 1403 images in total are served as the validation set.
4. Evaluation methods
Since Darknet and Resnet are performing different tasks, evaluating them requires slightly different metrics, precisely:
- Evaluating Darknet
- Evaluating Darknet + Resnet
5. Performance of Darknet
We validate Darknet on the holdout validation set including 20 images / brand → 140 images in total. Darknet achieves a 91.73% mAP at IOU threshold 0.5. Intuitively speaking, 91.73% of logo areas are localized and predicted correctly (correctly here means more than 50% area of the true bounding box are localized and assigned to the correct brand). Some good examples are shown in the following:
While at the moment we are targeting only seven brands, we expect Darknet to be robust on an open set scenario where unknown brands appear. For this reason, we further test Darknet on 10,000 unknown images (having no logo or logo outside the set of seven brands) on production environment. On this set, Darknet makes 2.18% mistake by assigning 218 unknown images to seven brands. Some bad examples are shown in the following.
Performance of Darknet on 11,313 unknown images and 1,844 images belonging to seven brands is shown in the following.
Average precision, recall, and F1-score are 88.70%, 93.02%, and 90.62%, respectively.
6. Performance of Darknet + Resnet (the entire system).
Replicating the same evaluation on 10,000 unknown images on production, the entire system Darknet + Resnet reduces mistake of Darknet from 2.18% to 0.26%. It means only 26 images out of 10,000 are wrongly assigned to seven brands.
Finally, we compute the performance of the entire system on the mixture of 11,313 unknown images and 1,844 images belonging to seven brands, shown in the following table.
Average precision, recall, and F1-score are 95.68%, 92.27%, and 94.01%, respectively. By looking closely into the two tables, the combination of Darknet + Resnet improves recall of unknown brands, and precision of seven brands.
The following confusion matrix shows how models are confused among classes.
7. Effectiveness on production.
We further investigate the effectiveness of the entire system on request level rather than image level. Specifically, we would like to count how many predictions coincide with human decisions. We employ two basic rules counting number of correct predictions made by bot in the following:
We crawled requests in the last three months, and report the number of bot corrections according to two cases `approved` and `rejected` related to seven brands in Figure 3.
For approval, bot made very good decisions compared to humans. For rejections, bot ability is rather limited. The main reason is that famous brand logos have been remade with different styles or materials and there are several replications among images that bot is unable to detect.
- Our brand recognition system based on logo detection and classification is highly accurate in classification task. More specifically, we observe very small misclassification error.
- We can deploy this system as a recommender for content QC. Our system will suggest illegal images immediately for content QC to take quick rejection.
- The logo detection rate, according to our test, is still modest. The products do not belong to our seven brands, but contain their logos. Some images are manipulated by sellers mainly to remove logos. In such cases, content QC took rejection action but our system did not because logos are corrupted.
Tín Phan — PhD in Machine Learning/Senior AI Engineer
Bằng Võ — AI Engineer