In the recent times, several meta-algorithms in deep learning have opened up the possibility to use Computer Vision in various applications. Particularly in the field of Object detection, Semantic Segmentation and Instance Segmentation there has been a steady improvement in algorithms like Faster R-CNN, SSD, YOLO, MASK-RCNN. When to apply what and how to leverage those in real-world applications?
Object detection is about identifying object instances of certain class(like person, traffic light, car etc) and localizing the bounding box of objects positions .Image segmentation is the process of assigning a label to every pixel in an image such that boundaries of objects are delineated. Applications of object detection and segmentation are wide and are used in medical imaging, smart video surveillance, satellite imagery, logo detection, product placements, self-driving car, creating personalized experiences to name a few.
Selective Search is a hierarchical grouping algorithm for finding the regional proposals. Earlier methods like R-CNN uses Selective Search for finding the proposals for Object Detection. Faster-RCNN uses smaller convolutional networks like VGGNet or deeper residual network like ResNets as the feature generator. It also uses, RPN(Regional Proposed Network) which outputs the object proposals. RPN uses Anchors to provide an "objectness score" for foreground background binary classification and the bounding box regression. Quantized ROI Pooling down samples features before feeding the data into R-CNN for multi-class classification and bounding box regression.Mask RCNN expands on Faster-RCNN models and provides instance segmentation. Instead of ROI pooling it uses ROI Align for preserving pixel alignment and avoiding information loss. Object detection as a regression problem is an idea behind YOLO and SSD(Single Shot Detectors) and operates without using regional proposals like in the case of Faster R-CNN.
In this talk, I will give you an intuition about how deep learning is applied to object detection and segmentation. Using different models like Faster-RCNN , Mask R-CNN , SSD and YOLO on a new dataset and evaluating the outcome. Focus will be on how we put it all together, from getting the right day, identifying the pre-trained models for transfer training, picking the right meta-algorithms for object detection and segmentation and seeing it all in action. This talk will not cover the basics of the how Convolutional Neural Network work. It will provide high level overview of Object detection algorithms and emphasis will be on how to apply it.