Udacity Dog Breed Classification — Capstone Project

11 min readSep 20, 2020

Introduction

The dog breed classification capstone project is one of the electives of Udacity’s Data Scientist Nanodegree at the end of the program. It provides a great introduction to deep learning and convolutional neural networks. Students, such as myself, who are looking for a more guided deep learning project to get the hang of it, this project might be just it. Stay tuned and finish reading if you are interested how the project went and whether it is something for you to test out. Find the accompanying GitHub repository here.

The Goal of the Project

The goal is to classify images of dogs according to their breed. Provided an image the algorithm is able to detect whether it is a dog, human, or neither dog nor human. Given it is a dog, it classifies the most likely breed. Given it is a human, it provides the most resembling dog breed for the human being.

Strategy

Since this was my first image classification project, I was quite grateful that Udacity provided some guidance on how to tackle such a problem. Here is a rough outline of the steps, I followed to develop the classifier (feel free to follow the repository):

Step 1: Import Datasets
Step 2: Detect Humans
Step 3: Detect Dogs
Step 4: Create a CNN to Classify Dog Breeds (from Scratch)
Step 5: Use a CNN to Classify Dog Breeds (using Transfer Learning)
Step 6: Create a CNN to Classify Dog Breeds (using Transfer Learning)
Step 7: Write your Algorithm
Step 8: Test Your Algorithm

I used Keras to build the Convolutional Neural Network (CNN) to make the dog predictions. Another library possible to use for image classification would have been PyTorch.

My mantra is to achieve continuous improvement or learn from setbacks, therefore, the goal was to improve the accuracy of the classification with each iteration (scratch → transfer learning → improved transfer learning). I used the loss metric on the validation dataset to measure the performance of my models.

To follow and make sense of my thinking, feel free to download or clone the GitHub repository here.

Step 1: Import Datasets

The following datasets are provided by Udacity

Dog images for model training,
Human images for detection,
There are more files in the repository, and
the jupyter notebook to allow for fast coding

We populate a few variables through the use of the load_files function from the scikit-learn library and run the following cell

# import libraries
from sklearn.datasets import load_files       
from keras.utils import np_utils
import numpy as np
from glob import glob# define function to load train, test, and validation datasets
def load_dataset(path):
    data = load_files(path)
    dog_files = np.array(data['filenames'])
    dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
    return dog_files, dog_targets# load train, test, and validation datasets
train_files, train_targets = load_dataset('../../../data/dog_images/train')
valid_files, valid_targets = load_dataset('../../../data/dog_images/valid')
test_files, test_targets = load_dataset('../../../data/dog_images/test')# load list of dog names
dog_names = [item[20:-1] for item in sorted(glob("../../../data/dog_images/train/*/"))]# print statistics about the dataset
print('There are %d total dog categories.' % len(dog_names))
print('There are %s total dog images.\n' % len(np.hstack([train_files, valid_files, test_files])))
print('There are %d training dog images.' % len(train_files))
print('There are %d validation dog images.' % len(valid_files))
print('There are %d test dog images.'% len(test_files))

After importing the libraries, I used the load dataset function from sklearn to import the datasets. The dog_names variable stores a list of the names for the classes which I will use in my final prediction model. If everything works out well, you will see 133 different dog breeds and 8.351 dog images.

import random
random.seed(8675309)# load filenames in shuffled human dataset
human_files = np.array(glob("../../../data/lfw/*/*"))
random.shuffle(human_files)# print statistics about the dataset
print('There are %d total human images.' % len(human_files))

Running through a comparable process for the human images, the descriptive statistics should show around 13.2k human images.

Step 2: Detect Humans

We use OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images. OpenCV provides many pre-trained face detectors, stored as XML files on GitHub. One of these detectors is downloaded and stored it in the haarcascades directory.

In the next code cell, you can see how to use this detector to find human faces and how the predictor performed on the first 100 dog and human images:

import cv2                
import matplotlib.pyplot as plt                        
%matplotlib inline# extract pre-trained face detector
face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')# load color (BGR) image
img = cv2.imread(human_files[3])# convert BGR image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)# find faces in image
faces = face_cascade.detectMultiScale(gray)
print(faces)# print number of faces detected in the image
print('Number of faces detected:', len(faces))# get bounding box for each detected face
for (x,y,w,h) in faces:
    # add bounding box to color image
    cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
    
# convert BGR image to RGB for plotting
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)# display the image, along with bounding box
plt.imshow(cv_rgb)
plt.show()# returns "True" if face is detected in image stored at img_path
def face_detector(img_path):
    img = cv2.imread(img_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray)
    return len(faces) > 0human_files_short = human_files[:100]
dog_files_short = train_files[:100]## function to count human faces in provided file
def count_human_faces(files_paths):
    detected = 0
    total_files = len(files_paths)
    for image_path in files_paths:
            detected += face_detector(image_path)
    return detected, total_files

Percentage of humans correctly identified: 100%
Percentage of dogs recognized as humans: 11%

Step 3: Detect Dogs

I used the pre-trained ResNet-50 model to detect dogs in images. The first line of code downloads the ResNet-50 model, along with weights that have been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.

from keras.applications.resnet50 import ResNet50# define ResNet50 model
ResNet50_model = ResNet50(weights='imagenet')

To use this model with our images, we need to process our images into the correct tensor size for the model. When using TensorFlow as backend, Keras CNNs require a 4D array (which we’ll also refer to as a 4D tensor) as input, with shape (nb_samples,rows,columns,channels), where nb_samples corresponds to the total number of images (or samples), and rows, columns, and channels correspond to the number of rows, columns, and channels for each image, respectively.

The path_to_tensor function takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN. The function first loads the image and resizes it to a square image that is 224×224 pixels. Next, the image is converted to an array, which is then resized to a 4D tensor. In this case, since we are working with color images, each image has three channels. Likewise, since we are processing a single image (or sample), the returned tensor will always have shape (1,224,224,3).

from keras.preprocessing import image                  
from tqdm import tqdmdef path_to_tensor(img_path):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(224, 224))
    # convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
    x = image.img_to_array(img)
    # convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
    return np.expand_dims(x, axis=0)def paths_to_tensor(img_paths):
    list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
    return np.vstack(list_of_tensors)

In the final step, the ResNet-50 is used to make predictions. Getting the 4D tensor ready for ResNet-50, and for any other pre-trained model in Keras, requires some additional processing. First, the RGB image is converted to BGR by reordering the channels. All pre-trained models have the additional normalization step that the mean pixel (expressed in RGB as [103.939,116.779,123.68][103.939,116.779,123.68] and calculated from all pixels in all images in ImageNet) must be subtracted from every pixel in each image. This is implemented in the imported function preprocess_input. By taking the argmax of the predicted probability vector, we obtain an integer corresponding to the model’s predicted object class, which we can identify with an object category through the use of this dictionary.

from keras.applications.resnet50 import preprocess_input, decode_predictionsdef ResNet50_predict_labels(img_path):
    # returns prediction vector for image located at img_path
    img = preprocess_input(path_to_tensor(img_path))
    return np.argmax(ResNet50_model.predict(img))

Step 4: Create a CNN to Classify Dog Breeds (from Scratch)

Now that we have functions for detecting humans and dogs in images, we need a way to predict breed from images. In this step, you create a CNN that classifies dog breeds. You must create your CNN from scratch (so, you can’t use transfer learning yet!), and you must attain a test accuracy of at least 1%. In Step 5 of this notebook, you will have the opportunity to use transfer learning to create a CNN that attains greatly improved accuracy.

The full code is in my GitHub notebook and you can follow along with experimenting and creating your own code from scratch.

The network I chose fulfilled the following requirements:

Use CNN with the proposed architecture (see below) in order to identify complex patterns
Increased filter size by each layer (8, 16, 32)
Added a Max Pooling Layer between each Convolutional Layer in order to reduce dimensionality & increase the training efficiency of the model
Relu Activation function was used in all layers

CNN used for Dog Breed Detection from Scratch

The target was to achieve a CNN with >1% accuracy. The network described above achieved 1.2% accuracy, which was sufficient to proceed.

Step 5: Use a CNN to Classify Dog Breeds (using Transfer Learning)

In this section of the Jupyter notebook, we go through using one of the pre-trained networks available for use with keras.

The model uses the the pre-trained VGG-16 model as a fixed feature extractor, where the last convolutional output of VGG-16 is fed as input to our model. We only add a global average pooling layer and a fully connected layer, where the latter contains one node for each dog category and is equipped with a softmax. Check out the code in the Jupyter notebook to find out more.

This network achieved 44.4% accuracy, which is a great improvement to the one built from scratch.

Step 6: Create a CNN to Classify Dog Breeds (using Transfer Learning)

You will now use transfer learning to create a CNN that can identify dog breed from images. The CNN must attain at least 60% accuracy on the test set to be deemed sufficient.

To make things easier for you, Udacity has pre-computed the features for all of the networks that are currently available in Keras [download links]:

VGG-19 bottleneck features
ResNet-50 bottleneck features
Inception bottleneck features
Xception bottleneck features

I decided to use the Inception model, as the inception bottleneck features are inherently designed for image classification.

bottleneck_features = np.load('bottleneck_features/DogInceptionV3Data.npz')
print(bottleneck_features)train_inception = bottleneck_features['train']
valid_inception = bottleneck_features['valid']
test_inception = bottleneck_features['test']

The above variables contain images that have already been put through the bottleneck extractor. This ensures the quickness of training of our model, as we are applying images where the main features for identification have already been isolated. This means there is only a small number of parameters or weights to backpropagate through.

The architecture built up looked the following:

Inception model for Transfer Learning CNN

After testing many different architectures, I settled for the below architecture. My findings were that

increasing the network’s complexity by adding additional layers is unnecessary.
the inception model increases accuracy to 79%, which is more than enough to pass the threshold of 60%.
the total trainable parameters or weights for this model are 272,517.
for compiling the model, I selected rmsprop for loss optimization function and categorical_crossentropy for loss measurement. I used the accuracy for measuring the model.
for model training, I set the number of epochs (20) to run through all the training images and how many images to train at a time through the batch size (20). Finally, I use the callback parameter [checkpointer] to save my model whilst training for the lowest validation loss.

# compile the model
inception_model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])# train the model
checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.inception.hdf5', 
                               verbose=1, save_best_only=True)inception_model.fit(train_inception, train_targets, 
          validation_data=(valid_inception, valid_targets),
          epochs=20, batch_size=20, callbacks=[checkpointer], verbose=1)

The next step is implementing the model into a function that can be used in a web application. Therefore, you write a function that takes an image path as input and returns the dog breed (Affenpinscher, Afghan_hound, etc) that is predicted by your model.

def predict_breed(img_path):
    
    # obtain path to tensor from image path
    tensor_path = path_to_tensor(img_path)
    
    # extract bottleneck features
    bottleneck_feature = extract_InceptionV3(tensor_path)
    
    # obtain predicted vector
    pred_vector = inception_model.predict(bottleneck_feature)
    
    # return predicted dog breed
    return dog_names[np.argmax(pred_vector)]

Step 7: Write your Algorithm

Write an algorithm that accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,

if a dog is detected in the image, return the predicted breed.
if a human is detected in the image, return the resembling dog breed.
if neither is detected in the image, provide an output that indicates an error.

The algorithm collects together functions you used previously to create a final output and shows the image:

from IPython.core.display import Image, displaydef breed_algorithm(img_path):
    # display image in 200x200 format
    display(Image(img_path, width=200, height=200))
    
    if dog_detector(img_path) == 1:
        print("This is predicted to be a dog, and its breed is: ")
        return predict_breed(img_path).partition('.')[-1]
    
    elif face_detector(img_path) == 1:
        print("This is predicted to be a human, and its spirit animal is a: ")
        return predict_breed(img_path).partition('.')[-1]
    
    else:
        return print("It wasn't possible to identify a human or dog in the image. Please try again.")

The output is better than expected. It managed to identify dog pictures as dogs and pictures of humans as humans. It predicted the correct dogs breed 5 out of 6 times, and the breed that was wrongly predicted (Irish Water Spaniel) is very similar to the correct breed, an American Water Spaniel. For the human faces, the classifier could clearly provide some fun spirit dogs that resemble the human faces.

Three points of improvement for the algorithm:

The model needs to be able to pick up subtle differences between similar breeds. Some of them have similar colors and shapes but differ in size and posture, as the case between an Irish and an American Water Spaniel.
There is more variety needed concerning the type of breeds the model predicts for humans. For instance, J.Bezos and B.Gates are predicted the same breeds, even though I clearly look different.
I think the model could use some improvements on its ability to classify pictures with noise (blurry backgrounds, white backgrounds, etc.). It needs clear images, otherwise the model always give you similar predictions.

Conclusion

The goal of this project was to create a CNN with 60% testing accuracy. Our final model obtained only 79% testing accuracy, which is quite satisfying for a first image classification project.

I loved the challenge and reading up on the material provided by Udacity. Given more time, I believe it is possible to achieve more than 90% accuracy, but I think I would like to pursue other data science projects. Maybe a sports prediction model based on images? Who knows?

Room for improvement in the model can be done by fine-tuning the hyperparameters, adding more layers to the CNN, or changing the number of epochs/batch sizes.

A neat next step for scaling could be to build a web application in Flask for other students to play with. Given the tight deadlines, I will leave that for my future self or colleagues to pursue.

Stay tuned and healthy. Thanks for reading.