What is not taught in object detection tutorials and books?

False Positives/Negatives, True Positives/Negatives! How to annotate background images? Power of background images.

Jelal Sultanov
5 min readDec 18, 2020

I have spent days searching for materials about the background images, annotation of background images, the effect of background images and I decided to write an article that sums all my findings in one article so that other practitioners could understand all the power of background images

Recently I have done object detection using different algorithms like YoloV3, YoloV4, MobileNet SSD for a freelance project. I have collected image dataset myself, taking photos using my phone, annotated them using the LabelImg annotation tool. Before I started doing this object detection project, I assumed that it is going to be an easy job, but as I started annotating 6000+ images manually, I realized that object detection with the custom dataset is not really an easy job. There are a lot of automated annotation tools that exist online, but I wanted to experience this process myself, to know how hard it is and what underwater rocks exist in the process of image annotation. As this project was my first serious and big machine learning project, from time to time I used to look at different books and tutorials that were teaching similar projects. What I noticed in these books and tutorials is that almost all of them were not teaching or at least informing the fact of the presence of False Positives/Negatives and True Positives/Negatives. This issue has come up to me when I started testing my trained weights of object detection model on my PC. When I was testing the model, initially I started to see different False Positive outputs in the video frame. I was surprised by this result as I did not see any of these kinds of issues in tutorials and books. I started doing research on this issue and found some useful tips for solving this issue.

More Background Images

Yes! You will need to collect more background images. When I found this solution I did not understand why we need background images and what they are. From further researches, I understood that this is needed to “teach” the model what IS the real target object(an object that you want to detect/classify) and what IS NOT the real target object — as I understand this is similar to generalization method. So background images do not contain your target object in any of its forms. For background images, you can collect random images that do not contain any of your target objects and when you annotate your background images you just skip them without any annotations, so that your images do not have any X, Y coordinates in an annotation file like XML. After my research on this topic, I applied all the helpful steps I found online to my object detection model. The result was as predicted — False Positives are gone forever! I hoped that it is going to work and it really worked! I will share some helpful links below at the bottom of this article regarding the background images and decreasing the FP metric in your object detection project. For now, let me give you some background image annotation hints I learned during my project.

Annotating background images

Make sure that you collected enough images for each class of object: it can vary, depending on your object detection algorithm. In my case, for the MobileNet SSD v2 algorithm minimum required image quantity is 300–400 per class. But in my project, I collected around 6000+ images and annotated them manually by myself using the LabelImg annotation tool. When I found the hint that collecting background images can help you decrease FPs, I was not sure of how to annotate background images. After some additional research on this topic, I found that it can be done easily within the LabelImg tool, just by clicking the “Space” button when you want to annotate a background image in “LabelImg”.

Note: You do not have to save the images, they are saved automatically when you press the “Space” button

Your annotation file(XML) for background images going to look like this:

<folder>My Folder</folder>
<filename>image.jpg</filename>
<path>/home/usuario/Escritorio/PR/My Folder/image.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>1812</width>
<height>2416</height>
<depth>3</depth>
</size>
<segmented>0</segmented>

After you are done with collecting and labeling your images dataset, you can start with the next step: converting it to TFRecords files(if you train using Tensorflow). For that you can have a look at these examples:

Sometimes you learn a lot of things from comments on the issue page of repo in GitHub. Some valuable comments regarding background images:

Background images can decrease your FPs as well as FNs in your object detection/classification project. If you newbie in deep learning, computer vision, or machine learning, do not limit yourself to tutorials, books, articles. If you encounter a problem or an issue in your project, google it, try to find a solution on the issue pages of projects on Github. Most likely that it is not you, the first one who has encountered this kind of issue, probably people have been posting issues, asking questions on Stackoverflow. And it is most likely that you are going to find the answer.

Conclusion:

If you find an answer or solution to some rare issue or problem, you can help others to find that answer much faster if you write an article explaining the answer you found in just one article.

Thanks for reading! If you enjoyed this article, please hit the clap button 👏 as many times as you can. It would mean a lot and encourage me to keep writing stories like this. Let’s connect on Twitter!🐦 Cheers!

--

--

Jelal Sultanov
Jelal Sultanov

Responses (1)