My Pomeranian yawning

Udacity Data Scientist nano-degree Capstone project writeup — Dog breed classifier

Ronald Leung

--

After a few months of wrangling and cleaning data, building machine learning pipelines, and implementing recommendation engines, we’ve now arrived at the finale. As I progressed through this data science degree, all of the exercises and projects were invaluable in polishing up my Python skills. I did many hands on exercises using very cool libraries including Pandas, numpy, and many more. In the final Capstone project, we’ll get to build a Convolutional Neural networks for identifying dog breeds.

Project Overview

In this project we will train a Convolutional Neural Network to process dog images and identify their breed. We will first train a model from scratch using Keras and look at its performance. Next, we will use Transfer Learning and leverage a pre-trained model (VGG16 and VGG 19) to see how it dramatically improves the accuracy without requiring significant resources to retrain the entire network. And finally, leveraging the models we’ve built, we will provide a function that will take an image as input, and then try to identify whether there’s a dog or a human face in the picture. And if a face is found, we will predict which dog breed it most closely resembles.

A large set of dogs and human pictures were provided for training and testing purpose as part of the project.

The Jupyter notebook can be found here in this github repo. A note of caution though; To run the notebook, it’s highly recommended that you find a virtual machine that has a GPU instead of a laptop. To complete this project, I used Google Cloud’s AI Platform Notebook, with 1 GPU, 4 CPU, and 15 Gbs of RAM as shown below.

GCP AI Platform notebooks

CRISP-DM

As we always do, we will follow the CRISP-DM method in this project and follow it step by step.

  1. Business Understanding
    For this project, we’d like to have a model that can identify the dog’s breed as accurately as possible. As you can imagine, this is not a trivial tasks. From a picture that only provides one particular angle of the dog, it’s a difficult task even for humans. Achieving 100% accuracy is therefore highly unlikely, and the project requirement has set the model’s minimal accuracy percentage at 60%.
  2. Data Understanding
    We will use multiple different sources of data in this project. For training the model, a large set of dog images were provided by Udacity. From the data folders, we can see there are a total of 133 dog breeds in the training data set. Let’s also take a look at how many pictures we have per breed for training. Using this notebook here in the repo, we can see that the breed that has the most pictures have around 60–70 pictures, and the ones with the least pictures have about 20–30 pictures. From the histogram below we can see that most breeds have 40–60 training pictures. In addition to the provided data set that we will use to train, test, and validate the model, after the final breed identification method is built I have also included a bunch of random picture of humans, dogs, and things in between as the final test.
Breeds with the most and least training pictures

3. Data preparation
This data set is already pre-cleaned and ready to use as mentioned above. The main prep that is required for this project is to convert the jpg images into proper formats and sizes. The pre-process step was also provided as part of this project template, see the “Pre-process the Data” of Step 2 and Step 3 in the Jupyter notebook.

4. Modeling
There are various parts in the modeling step of this project. In the first Convolutional Neural Network model, we trained the network from scratch. I used the provided architecture hint as a start.

Convolutional Neural Network from scratch

With 3 convolutional layers plus a pooling layer after each, and then the final dense layer, it limited the parameters to under 7000. On a regular laptop, this still takes a long time to run, but on a GPU supported VM it runs relatively quickly. As we will soon see in the evaluation section, this model doesn’t do well at all but the goal was just to beat random guesses, which it did. In the next two attempts, we leverage Transfer learning and added dense layers to the pre-trained VGG16 mode and the VGG19 model. To compare the performance of VGG16 and VGG19, I kept the architecture for these two tests same.

Transfer learning using VGG16
Transfer Learning using VGG16
Transfer Learning using VGG19
Same architecture as VGG16 using VGG19

5. Evaluation
Let’s now compare the results of the various models. We will use the test images in the provided dog training set, and check whether it provided the correct breed. Using the first convolutional neural network model built from scratch, the goal was to beat random guesses (~1%) and it achieved~4% accuracy. While it did achieve the goal, it’s not a very useful model.

Next, for the first transfer learning model using VGG16, the model achieved an accuracy of ~72%. With VGG19 using the same transfer learning architecture, the accuracy increased slightly to ~73 percent.

Testing VGG16 Transfer Learning model
Testing VGG19 Transfer Learning model

6. Deployment

And now comes the fun part, where we will use the model to test any image we want. In the training and testing of the model, we’ve only used images with a dog of a certain breed. Now we will try it with a variety of images, including normal dog pictures, random human images, and something in neither category. For the dog pictures we expect it to do fairly well in identifying the breeds. I’m also going to include a few really interesting examples to see what the model will do. The full lists of tests is in the Jupyter notebook, here I will highlight a few interesting ones. In these tests, the function will print out the test file’s name first, which will also include the actual breed of the dog if it’s a dog picture. Then after showing the picture it will print out whether it found a dog or a human face and it’s predicted breed, or show a message saying it found neither.

Let’s start with a few simple tests. We will try some pictures that are similar to those in the training and test sets. These pictures have a decent view of the dog’s face, and the model correctly identified the Pomeranian and the American Eskimo.

The following are some examples where the model didn’t get the exact breed but picked something very similar. This is not suprising as some dog breeds do indeed look very similar.

Now let’s try a few more interesting examples. Here I have an example of a wolf. Wolves and dogs really look quite similar to me so it was really interesting to see that the model actually correctly responded with no dogs found in the picture.

What about a cat? This worked properly too, the model did not find any dog faces.

Then I decided to try a few animated pictures of a human and a dog. The model did not find any dog or human faces in them.

Lastly, this is probably the most interesting example, from a recent popular movie where we have humans, dressed up as cats. The CNN model interestingly predicted this to be a dog.

Conclusion

In this project we had the opportunity to build our own Convolutional Neural Network and also leverage Transfer Learning to build on top of another existing model. There are a few key learnings.

  • Training a model from scratch is significantly resource intensive. As we saw in our example, it’s impractical to train on a laptop and the result is still not good.
  • There are many potential ways to change the CNN model, including number of layers, number of params, optimization algorithms, and so on. In this project we mostly followed the suggested architectures, and it’ll take a lot more practice and hands on experience to get a better sense of what architecture works well in different scenarios.
  • Transfer Learning is an excellent way to leverage pre-built models to achieve great accuracy.

A few potential future improvements:

  • The model uses 224x224 images, which is fairly small considering all the details a dog breed’s facial feature may have. We may get better results with bigger and higher resolution pictures.
  • When we tested the final algorithm, it’s either dog or human or neither. We could attempt to match an image to the name of the person too. And also with the face detection algorithm, we could try to locate and identify all the faces it found in the image.
  • The Jupyter notebook only used a small subset of the many variations / parameters in the CNN model. It’d take some resource and time to train and test but it’d be interesting to compare results with different parameters, models, layers, etc.

--

--