The idea behind using semantic segmentation is to recognize and understand what's in the image at pixel-level granularity. This is currently at topic of extensive deep-learning and Computer Vision research and has shown state-of-the-art results in various image-analytics problems. This talk will explore Fully Convolutional Networks (FCNs) for purpose of semantic segmentation.
Semantic Segmentation networks (SSNs) output an entire image with pixel-level predictions which immensely improve the Decision-Making capabilities of the Computer Vision system at hand. The goal is to output an image that is the same size as the original input image, and roughly resembles the original input image, but in which each pixel in the image is colored one of C colors, where C is the number of classes we are segmenting.
For a brain image scan, this could be as simple as C=2 (“tumor”, or “not tumor”). Or C could capture a much richer class set.
Standard Image Classifiers based on Convolutional Neural Networks output only a single labelled prediction. They also include a few fully connected layers after the convolutional layers which help them in detecting global shapes and structures. Fully Convolutional Networks on the other hand output pixel level predictions and they don't have any fully connected layers. They only involve convolutional layers , and hence the name.
So the final output layer will be the same height and width as the input image, but the number of channels will be equal to the number of classes. If we’re classifying each pixel as one of fifteen different classes, then the final output layer will be height x width x 10 classes. Using a softmax probability function, we can find the most likely class for each pixel.
Most FCNs involve some kind of encoder-decoder architecture and talk will cover most of them like SegNet, Unet etc. We'll go over how SSNs solve the loss of spatial information during the initial downsampling in the network.
We'll go over an implementation of FCNs in Python and Tensorflow and apply them to the complex problem of Urban Scene Understanding.