Real-time semantic image segmentation with DeepLab in Tensorflow
A couple of hours ago, I came across the new blog of Google Research. This time the topic addressed was Semantic Segmentation in images, a task of the field of Computer Vision that consists in assigning a semantic label to every pixel in an image. You can refer to the paper for an in-depth explanation of the new version of the algorithm they used (DeepLab-v3+).
Semantic segmentation is a more advanced technique compared to image classification, where an image contains a single object that needs to be classified into some category, and object detection and recognition, where an arbitrary number of objects can be present in an image and the objective is to detect their position in the image (with a bounding box) and to classify them into different categories.
The problem of semantic segmentation can be thought as a much harder object detection and classification task, where the bounding box won’t be a box anymore, but instead will be an irregular shape that should overlap with the real shape of the object being detected. Detecting each pixel of the objects in an image is a very useful method that is fundamental for many applications such as autonomous cars.
In this post, I will share some code so you can play around with the latest version of DeepLab (DeepLab-v3+) using your webcam in real time. All my code is based on the excellent code published by the authors of the paper. I will also share the same notebook of the authors but for Python 3 (the original is for Python 2), so you can save time in case you don’t have tensorflow and all the dependencies installed in Python 2.
But first, a quick example of what I’m talking about:
P.S. Don’t worry, I’m not choking, I just forgot to change the sneaky BGR in OpenCV to RGB.
In order to run my code, you just need to follow the instructions found in the github page of the project, where the authors already prepared an off-the-shelf jupyter notebook to run the algorithm on images. I only use an extra dependency which is OpenCV. And optionally, scikit video, in case you also want to save the video.
Copy the following snippet into a jupyter notebook cell that should be inside the directory of deeplab (that you previously should’ve cloned) and just run it! Now you can see yourself and a real-time segmentation of everything captured by your webcam (of course, only the objects that the net was trained on will be segmented).
If you get an error, you probably need to change the line that shows final = np.zeros((1, 384, 1026, 3))
based on your camera resolution. Here, the shape of color_and_mask
is needed.
Every time you run the code, a new model of approximately 350Mb will be downloaded. So, if you want, you can just change the line where it says model = DeepLabModel(download_path)
to a local path where you stored your downloaded model.
This is the code to run DeepLab-v3+ on your webcam:
And this is the code to run DeepLab-v3+ on images using Python 3:
Have fun segmenting!
EDIT (May 14, 2020): I uploaded a new gist called deeplab_demo_webcam_v2.py
that allows you to run the script as a regular python module (without the need of copy-pasting the code into a Jupyter Notebook). Since the script still makes use of some helper functions to handle the colors, you can either still choose to save deeplab_demo_webcam_v2.py
into tensorflow/models/research/deeplab
and run it from there, or even better, you could run it from anywhere just by making sure that the file get_dataset_colormap.py
is located in the same directory as deeplab_demo_webcam_v2.py
. Such file can be found in tensorflow/models/research/deeplab/utils/get_dataset_colormap.py
.
Once you have that setup, simply open a terminal and run the following command:
python deeplab_demo_webcam_v2.py
References:
@article{deeplabv3plus2018, title={Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation}, author={Liang-Chieh Chen and Yukun Zhu and George Papandreou and Florian Schroff and Hartwig Adam}, journal={arXiv:1802.02611}, year={2018} }