Creating a CNN in PyTorch for the CIFAR10 Dataset

8 min readApr 3, 2022

Many new phones in the modern era are capable of performing facial recognition. Due to this capability, you are able to conveniently unlock your phone without having to type the password every time. However, have you ever wondered how a computer can perform this task?

If you think about it… 🤨 Coding a program that can do face detection would be very complex and it would not be possible through simple, if and else statements.

So how is it done? Well, these types of technologies all take advantage of deep learning.

What is deep learning?

Deep learning is a subset of machine learning that uses an artificial neural networks (ANN). Artificial neural networks are based on the brain’s neuronal arrangement and are highly effective at carrying out complex functions like image classification.

What is the architecture of an ANN?

In an ANN there are three types of layers:

Input Layer → Containing the input data (only one)
Hidden Layer(s) → Responsible for conducting the computation (can be multiple)
Output Layer → Gives the final output (only one)

Each layer has a specific number of nodes. Every node is connected to the nodes in the previous and next layer. The connections between the nodes in different layers have specific weights and biases. These weights and biases are what is used to determine the value of the node. For instance if we have two nodes of the same layer connected to 1 node in the next layer, the value of the node in the next layer would be calculated as follows:

Now with this type of arrangement, the neural network is a linear model that can solve problems involving linear patterns. However this makes the network incapable of performing higher-level tasks like classifying images. Thus, to make the network have the ability to do higher-order tasks, the neural network also contains activation functions for each node. Activation functions are similar to how some neurons in the brain send a pulse and others don’t. The most commonly used activation function is the rectified linear activation function (ReLU).

As you can see in the diagram, the ReLU function takes the output generated by the node and if the output is positive then the ReLU will output the same value, while if the output is below 0 then the ReLU will output 0.

Activation functions like ReLU are applied to the value calculated by the net input function. The activation function is the last step and it gives the output of the node. This output of each of the nodes will then act as input for the next layer and the process continues until the output layer is reached. In the following diagram, it can be clearly seen where the activation function is placed for every single node in an ANN:

How does training work?

You might have heard of the word “learning” a lot in the machine learning industry and might have wondered how an ANN “learns”? Well technically this process is called training and it how it works is by adjusting the weights and biases present in the ANN.

To adjust the weights and biases appropriately, there needs to be a training dataset with input data and the corresponding labels (correct outputs for the input data). In the training process, the first step is to conduct forward propagation through the network with an input from the training dataset. Then when we get the prediction made by our network based on the given input, we compare this prediction to the actual label of the input to derive a loss (this tells the network how inaccurate it is). This comparison is done by using a loss function like cross entropy. Now once the loss has been derived, the gradient of the loss to the weights and biases in the network is calculated (step: backward propagation). The negative of this gradient is then multiplied by an appropriate learning rate. This product is then added to the weights biases to adjust them (step: adjustment). The ANN trains in this way continuously for each of the inputs and may go over the dataset multiple times. This way of training results in an ANN that can provide predictions on input data that it hasn’t seen before.

Specific deep learning neural networks for image classification: CNNs

Now the basic neural network in deep learning can perform a multitude of complex tasks. However, for specific situations like image classification, the neural network architecture can be adjusted and modified to maximize accuracy. In the world today, there are more than 25 different types of neural networks where each is designed to conduct a specific task.

For image classification, the specific neural network that has been created is a convolutional neural network (CNN). In terms of architecture, a convolutional neural network has additional layers called convolutional layers that fall between the input and hidden layers of the network. These convolutional layers act as filters for images and are able to capture important parts of the image. A convolutional layer works by using a kernel which is a n by n matrix (where this n can be anything and is decided by the programmer) with weights and it slides over the image. As it slides over the image, the pixels of the image under the filter are multiplied by the corresponding weights and then all those products are added to give us a pixel in the new image being created. The following gif accurately demonstrated how a kernel filters an image:

Convolutional layer visualized

In addition to the convolutional layers for CNNs, there are also pooling layers involved which reduce the number of features of the image and make the training process faster as less computing power is required. Pooling layers work similar to kernels and basically filter the image into a smaller image.

As the CNN is trained, the weights of the filters are adjusted. Once trained, the convolutional layers extract important features in the images . This causes the network to become more accurate.

Implementation of CNNs through PyTorch for the CIFAR10 dataset:

As a teenager that is very curious about deep learning and how neural networks are implemented in the real world, I have started to create networks through PyTorch. One of my newest networks is a CNN that can classify images into one of 10 classes. The classes include a plane, car, bird, cat, deer, dog, frog, horse, ship and truck. After training with 11 epochs and a batch size of 10, this CNN had an accuracy of 75.14% on the train data set. This network was trained on the CIFAR10 dataset.

Walk-through of the PyTorch code:

STEP 1: The first step was importing all the necessary libraries for the project as well as configuring the device on which the computation of the network would be performed. Preferably, it would be faster to run the computations of the network on the GPU, but in my macbook, I do not have a GPU available so the CPU was used. Next, the hyper-parameters like the number of epochs, batch size and learning rate are set.

STEP 2: The second step is creating the appropriate transformations for the train dataset and the test dataset. Next, the train dataset and test dataset are loaded in and the data loaders for both the datasets are created. After this, the classes that the network will classify images into are created.

STEP 3: The convolutional neural network architecture is created with 3 convolutional layers and 1 hidden layer.

STEP 4: The model is set equal to the ConvNet() object creating the model. Next, the criterion (loss function) and optimizer variables are created and they will be very crucial for backward propagation. After this, the training loop is created where first a forward propagation and then a backward propagation is conducted.

STEP 5: Then the accuracy of the network on the test dataset is calculated.

STEP 6: The following console is obtained after running the CNN.

Challenges and Accuracy:

Initially, the CNN had an accuracy of 45% with 2 convolutional layers, 2 pooling layers and 2 hidden layers. To better improve this accuracy, I tried to add more hidden layers. At one point, I had added 6 hidden layers causing the accuracy to fall to 27%. After realizing that adding more hidden layers makes the network more ineffective, I decided to reduce the hidden layers to just one.

After this adjustment, the accuracy rose to 50%. To increase the accuracy more, I added more convolutional layers and made the kernel size 3 rather than 5. By reducing the filter size and by having 3 convolutional layers present, the accuracy of the network rose to 70%. Finally, to further increase accuracy, the hyper-parameters were adjusted giving the model a final accuracy of 75.14%.

Shoutout to Python Engineer!

Before I finish this article, I really wanted to thank Python Engineer as he is the real reason that I have been able to learn so much about programming neural networks in PyTorch. Make sure to go check him out!

To conclude, deep learning is a very powerful technology and personally, I look forward to learning about how to implement it through PyTorch.

A video going over the project