This post provides a Caffe [8] Deep Learning [5] model for identification of Dog Breeds. This post discusses a Convolution Neural Network [4] model.
Often collecting the image data is difficult. Dog breed images have been collected and made available from Stanford Vision Lab [1] has been collected from Imagenet [2]. There has been another effort [3] to create a Caffe model for dog breed identification. I have searched for the model, but could not find it. The following model also does transfer learning [9] from an AlexNet [6] Model.
Transformations were performed on each image to create multiple images. Transformations included:
2 rotation of the images, negative, blur, edge detection, bit rate reduction, etc. Some of these are performed at the convolution stages, but, I created these to augment data with as many transformations as possible for input.
The transformed data set was color 256x256 wotj 123429 training, 41,147 images for validation and 41144 images in test.
Increasing the data set size by transformations gave a large data set that can be used. The training time increased 10 fold, but the total accuracy of the model increased to 83%.
How to work with iPython Notebook is available in the post: http://gautamsingaraju.blogspot.com/2016/08/caffe-deep-learning-ipython-notebook.html.
The image of golden retriever is identified correctly with 99% confidence in case of the model generated without transformations. However, with the model with transformations, its detected as golden retriever 60% and English Settler with 40%.
Another image of German Sheppard from wikipedia, available under public domain was input and the results from different layers are displayed here.
Model with transfored images:
https://drive.google.com/file/d/0B0XmoZu7-fipcDdBQ3M4RXRxRFk/view?usp=sharing
[1] Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao and Li Fei-Fei. Novel dataset for Fine-Grained Image Categorization. First Workshop on Fine-Grained Visual Categorization (FGVC), IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
Data and Previous Work:
Transformations were performed on each image to create multiple images. Transformations included:
2 rotation of the images, negative, blur, edge detection, bit rate reduction, etc. Some of these are performed at the convolution stages, but, I created these to augment data with as many transformations as possible for input.
Data Statistics:
Images were annotated, color with 256 x 256, and stretched. There were 12,345 images in the training dataset. 4,111 images in the validation and 4,124 images in testing dataset.The transformed data set was color 256x256 wotj 123429 training, 41,147 images for validation and 41144 images in test.
Training Mechanism:
Trained using Nvidia Digits [7]. Transfer learning [9] was used using Alexnet model, Training was performed with Nestrov's Momentum [10]. Exponential decay was used for learning rate. The model's accuracy at best could reach around 60% without over fitting. This is ~7-8% improvement over the previous work [3]. This improvement is mostly due to using transfer learning.Increasing the data set size by transformations gave a large data set that can be used. The training time increased 10 fold, but the total accuracy of the model increased to 83%.
Training without transformations |
Training with transformations |
|
Results:
I am using 11th iteration of the model generated without transformed images; after which the images are trying to overfit the data. For checking the results, I have used a notebook that I have created and made available at: https://github.com/singarajus/caffepythonnotebook.
How to work with iPython Notebook is available in the post: http://gautamsingaraju.blogspot.com/2016/08/caffe-deep-learning-ipython-notebook.html.
The image of golden retriever is identified correctly with 99% confidence in case of the model generated without transformations. However, with the model with transformations, its detected as golden retriever 60% and English Settler with 40%.
Another image of German Sheppard from wikipedia, available under public domain was input and the results from different layers are displayed here.
Input | |
Output at Convolution 1 |
|
Output at Convolution 2 |
|
Output at Convolution 3 |
|
Output at Convolution 4 |
|
Output at Convolution 5 |
|
Output Probabilities | |
Top 3 Predicted with Probabilities. |
n02106662-German shepherd: 100.0% n02105162-malinois: 5.09461983711e-06% n02091467-Norwegian elkhound: 1.37200339978e-07% |
Some more samples of dogs from Wikipedia.
|
|
Rampur Greyhound was not in the dataset.
n02091134-whippet: 58.7792038918%
n02092002-Scottish deerhound: 22.4151656032%
n02113978-Mexican hairless: 9.97444465756%
|
|
Correct Breed: Borzoi. n02090622-borzoi: 100.0% n02091635-otterhound: 2.82948864339e-06% n02100735-English setter: 2.56475923832e-08% |
|
Correct Breed: BloodHound. Not high confidence. n02087394-Rhodesian ridgeback: 97.7928042412%
n02088466-bloodhound: 2.07519363612% n02100583-vizsla: 0.105287262704% |
Download the model:
Download and play with the models, when using these models, cite the author:
Model without transformed images:
https://drive.google.com/file/d/0B0XmoZu7-fipMzFzb1hzSGJaV3c/view?usp=sharingModel without transformed images:
Model with transfored images:
https://drive.google.com/file/d/0B0XmoZu7-fipcDdBQ3M4RXRxRFk/view?usp=sharing
Future Work:
A possible future work is to check if other algorithms to see the improvement in classification accuracy.