Thursday, August 4, 2016

Caffe Deep Learning: Dog Breed Alexnet Model

This post provides a Caffe [8] Deep Learning [5] model for identification of Dog Breeds. This post discusses a Convolution Neural Network [4] model.

Data and Previous Work:

Often collecting the image data is difficult. Dog breed images have been collected and made available from Stanford Vision Lab [1] has been collected from Imagenet [2]. There has been another effort [3] to create a Caffe model for dog breed identification. I have searched for the model, but could not find it. The following model also does transfer learning [9] from an AlexNet [6] Model.

Transformations were performed on each image to create multiple images. Transformations included:
2 rotation of the images, negative, blur, edge detection, bit rate reduction, etc. Some of these are performed at the convolution stages, but, I created these to augment data with as many transformations as possible for input.

Data Statistics:

Images were annotated, color with 256 x 256, and stretched. There were 12,345 images in the training dataset. 4,111 images in the validation and 4,124 images in testing dataset.

The transformed data set was color 256x256 wotj 123429 training, 41,147 images for validation and 41144 images in test.

Training Mechanism:

Trained using Nvidia Digits [7]. Transfer learning [9] was used using Alexnet model, Training was performed with Nestrov's Momentum [10]. Exponential decay was used for learning rate. The model's accuracy at best could reach around 60% without over fitting. This is ~7-8% improvement over the previous work [3]. This improvement is mostly due to using transfer learning.

Increasing the data set size by transformations gave a large data set that can be used. The training time increased 10 fold, but the total accuracy of the model increased to 83%.

Training without transformations

Training with transformations


I am using 11th iteration of the model generated without transformed images; after which the images are trying to overfit the data. For checking the results, I have used a notebook that I have created and made available at:

How to work with iPython Notebook is available in the post:

The image of golden retriever is identified correctly with 99% confidence in case of the model generated without transformations. However, with the model with transformations, its detected as golden retriever 60% and English Settler with 40%.

Another image of German Sheppard from wikipedia, available under public domain was input and the results from different layers are displayed here.

Output at
Convolution 1
Output at
Convolution 2
Output at
Convolution 3
Output at
Convolution 4
Output at
Convolution 5
Output Probabilities
Top 3 Predicted with
n02106662-German shepherd: 100.0%
n02105162-malinois: 5.09461983711e-06%
n02091467-Norwegian elkhound: 1.37200339978e-07%

Some more samples of dogs from Wikipedia.

Correct Breed: Affenpinscher.
n02110627-affenpinscher: 99.7052073479%
n02113712-miniature poodle: 0.199705199338%
n02091635-otterhound: 0.0273911107797%

Rampur Greyhound was not in the dataset.
n02091134-whippet: 58.7792038918%
n02092002-Scottish deerhound: 22.4151656032%
n02113978-Mexican hairless: 9.97444465756%

Correct Breed: Borzoi.
n02090622-borzoi: 100.0%
n02091635-otterhound: 2.82948864339e-06%
n02100735-English setter: 2.56475923832e-08%

Correct Breed: BloodHound. Not high confidence.
n02087394-Rhodesian ridgeback: 97.7928042412%
n02088466-bloodhound: 2.07519363612%
n02100583-vizsla: 0.105287262704%

Download the model:

Download and play with the models, when using these models, cite the author:

Model without transformed images:

Model with transfored images:

Future Work:

A possible future work is to check if other algorithms to see the improvement in classification accuracy.


[1] Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao and Li Fei-Fei. Novel dataset for Fine-Grained Image Categorization. First Workshop on Fine-Grained Visual Categorization (FGVC), IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
[2] Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.

[3] Hsu, David. Using Convolutional Neural Networks to Classify Dog Breeds. Stanford University.

[4] Convolutional Neural Networks.

[5] Deep Learning.

[6] Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada.

[7] Nvidia Digits Framework.

[8] Caffe Framework.

[9]  Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson. How transferable are features in deep neural networks? NIPS 2014: 3320-3328.
[10] I Sutskever, J Martens, GE Dahl, GE Hinton. On the importance of initialization and momentum in deep learning. ICML (3) 28, 1139-1147.