Saturday 4:00 PM–4:45 PM in Speakeasy

Transfer Learning and Finetuning Deep Convolution Neural Network on different domain-specific images

Anusua Trivedi

Audience level:
Experienced

Description

We propose a method to apply a pre-trained deep convolution neural network (DCNN) on images to improve prediction accuracy. We use a pre-trained DCNN on two very different domain specific datasets, and apply fine-tuning to transfer the learned features to the prediction. Our approach improves prediction accuracy on both domain-specific datasets, compared to state-of-the-art approaches.

Abstract

ABSTRACT: In this talk, we propose prediction techniques using deep learning on different types of images datasets – medical images and fashion images. We show how to build a generic deep learning model, which could be used with – 1. A fluorescein angiographic eye image to predict Diabetic Retinopathy 2. A fashion image to predict the clothing type in that image We propose a method to apply a pre-trained deep convolution neural network (DCNN) on images to improve prediction accuracy. We use an ImageNet pre-trained DCNN and apply fine-tuning to transfer the learned features to the prediction. We use this fine-tuned model on two very different domain specific datasets. Our approach improves prediction accuracy on both domain-specific datasets, compared to state-of-the-art Machine Learning approaches.

TALK OUTLINE: 1. Brief introduction to Deep Learning 2. Motivation behind using Deep Learning models for Images: Much work has been done in developing state-of -the-art machine learning algorithms and morphological image processing techniques, that explicitly extract features prevalent in images. The generic workflow used in a standard image classification technique is as follows: • Image preprocessing techniques for noise removal and contrast enhancement. • Feature extraction technique • Classification/Prediction However, these explicit feature extraction processes are very time and effort consuming. Further improvements in prediction accuracy require large quantities of labeled data. Image processing and feature extraction of image dataset is very complex and time-consuming. Thus, we choose to automate the image processing and feature extraction step by using DCNNs. 3. Transfer Learning & Fine-tuning DCNNs: Current trends in the research have demonstrated that DCNNs are very effective in automatically analyzing large collections of images and identifying features that can categorize images with minimum error. DCNNs are rarely trained from scratch, as it is relatively uncommon to have a domain-specific dataset of sufficient size. Since modern DCNNs take 2-3 weeks to train across GPUs, Berkley Vision and Learning Center (BVLC) have released some final DCNN checkpoints. In this work, we use an ImageNet pre-trained DCNN - GoogLeNet. GoogLeNet, which was developed at Google, won the ImageNet challenge in 2014, setting the record for the best contemporaneous results. Motivations for this model were a simultaneously deeper as well as computationally inexpensive architecture. 4. Deep Learning models for Image Classification: Diabetic retinopathy (DR) eye disease is a common cause of vision loss. Screening diabetic patients based on the diabetic retinopathy symptoms in fluorescein angiography (FA) images can potentially reduce the risk of blindness. Data was drawn from a dataset maintained by EyePacs, and provided via Kaggle. The dataset is composed of multiple, smaller datasets of fundus photographs drawn from various sources. We fine-tune the pre-trained generic DCNN to recognize fluorescein angiography images of eyes and improve DR prediction. Our approach is an end-to-end learning strategy, with minimum assumptions about the contents of images. We show that our approach improves DR prediction accuracy upon the results produced by the Support Vector Machine Approach. 5. Deep learning model for Image Tag Prediction: Given the role of clothing apparel in society, fashion classification has many applications. We will focus on optimizing fashion classification for the purposes of annotating images and predicting clothing tags for the fashion images. In this work, we apply a pre-trained Convolutional Network (CNN) and Long short-term memory (LSTM) Recurrent Neural Network (RNN) on Apparel Classification with Style (ACS) images to improve Fashion Image Tag (FIT) prediction. We combine DCNN for fashion image classification with recurrent neural networks (RNN) for sequence modeling, to create a single network that generates clothing tags for images. We fine-tune the same ImageNet pre-trained GoogLeNet model to extract ACS image features. These CNN-features are in turn used as input to the LSTM RNN model to generate tags for images. The RNN is trained in the context of this single “end-to-end” network. Our approach improves FIT prediction accuracy compared to state-of-the-art approaches. 6. For this work, we have used all open source software – I. Theano/Lasagne: Theano is a software package which allows us to write symbolic code and compiles it onto different architectures (in particular, CPU and GPU). It is a Python library that allows us to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It was developed by machine learning researchers at the University of Montreal. Theano is efficient for deep learning of data. Lasagne is a lightweight library to build and train neural networks in Theano. II. Pre-trained model weights: Many researchers and engineers have made Caffe models for different tasks with various DCNN architectures and data. These models are learned and applied for problems ranging from simple regression, to large-scale visual classification, to speech and robotics applications. Several pre-trained model weights are shared by Berkeley Vision and Learning Center (BVLC) via the model zoo framework. Lasagne has implementations for both ImageNet-trained VGG with 16-layers and ImageNet-trained GoogLeNet. The weights for both these models are stored as pickle files. We leverage these pre-trained models for DR prediction, using them as initial weights for fine-tuning the models.

III. GPU, CUDA and cuDNN: Traditional machine learning uses handwritten feature extraction and modality-specific machine learning algorithms to label images or recognize voices. However, this method is computationally very expensive. Advanced deep neural networks use algorithms, big data, and the computational power of the graphics processing units (GPUs) to change this dynamic. CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA and implemented by the GPUs that they produce. The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. Here, we use one NVIDIA Quadro K1200 GPU, CUDA 7.5 and cuDNN 5. 7. Outcome: We show how deep learning out performs state-of-the art image prediction techniques. Below is our prediction accuracy chart for each model - 1. Medical-Image-Classification-Model a. Feature-based SVM Accuracy - 0.66 b. Our fine-tuned GoogLeNet DCNN Accuracy - 0.79 2. Fashion-Image-Classification-Model a. Feature-based SVM Accuracy - 0.72 b. Our fine-tuned GoogLeNet DCNN Accuracy 0.93 AUDIENCE: [Advanced Talk], [Machine Learning], [Deep Learning], [Data Science], [Image Classification], [Image Tag prediction], [Healthcare], [Fashion]