# TP1: Introduction to PyTorch and Neural Networks

The goal of this TP is to introduce you to PyTorch and Neural Networks. We will start by learning the basics of PyTorch, then we will implement a simple neural network to classify the MNIST dataset. First, we will introduce the data loader and the dataset, then we will implement the neural network and train it. Finally, we will evaluate the model and visualize the results.

The objective of this TP is to get you familiar with PyTorch and the basic concepts of:
- Neural Networks
- Data Loaders
- Stochastic Gradient Descent
- Backpropagation
- Training and Testing


The TP is divided into the following sections:
1. PyTorch Basics
- Tensors
- Reshape
- Operations
2. Data Loader and Dataset
- Pre-processing
- Mini-batch
- Data Loader
3. Neural Network
- Initialization
- Forward
4. Training
- Loss
- Backward
- Optimization
5. Evaluation
- Accuracy
- Confusion Matrix

This TP is not graded, but it is important to understand the concepts and be able to implement them. The TP consists of code snippets that you need to complete and question to answer. You can run the code snippets by pressing `Shift + Enter`. You can also modify the code snippets and experiment with different configurations.


## 1. PyTorch Basics

In this section, we will learn the basics of PyTorch. We will start by importing the necessary libraries and then we will learn about Tensors, Reshape, and the different operations that can be performed on Tensors.

First, a Tensor is a multi-dimensional matrix containing elements of a single data type. Tensors are similar to NumPy arrays, but they can be used on a GPU to accelerate computing. In PyTorch, Tensors are used to encode the inputs, the latent variables, and the outputs of a neural network.

In [1]:
import torch                    # PyTorch
import imageio                  # Imageio
import numpy as np              # NumPy
import matplotlib.pyplot as plt # Matplotlib


In [None]:
# Download an image
!wget https://lamsade.dauphine.fr/~averine/DL3AIISO/dauphine.png

### From Numpy to PyTorch

In [None]:
image_png = imageio.v3.imread('dauphine.png')
print("Image class is loaded as a: ", type(image_png))

<font color='red'>**Questions 1.1 What size is the array? What do the dimensions represent?**</font>


Your Answer:


Let's now focus on the different operations that can be performed on Tensors. We will start by creating a Tensor and then we will perform some basic operations such as addition, subtraction, multiplication, and division.

<font color='blue'>TODO:</font> Use the function [torch.Tensor](https://pytorch.org/docs/stable/generated/torch.tensor.html#torch.tensor) to convert a list to a Torch tensor. You can also use the function [torch.from_numpy](https://pytorch.org/docs/stable/generated/torch.from_numpy.html#torch.from_numpy).

In [None]:
### Your code here ###

<font color='blue'>TODO:</font> Use the method [.size()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.size) to get the size of the tensor. You can also use the attribute [.shape](https://pytorch.org/docs/stable/generated/torch.Tensor.shape.html).

In [None]:
### Your code here ###

<font color='blue'> TODO:</font> The dimension for RGB is typically the first one in Torch. Use the function ['torch.permute'](https://pytorch.org/docs/stable/generated/torch.permute.html#torch.permute) to permute the dimensions of the tensor. You can also use the method [.permute](https://pytorch.org/docs/stable/generated/torch.Tensor.permute.html#torch.Tensor.permute) of the tensor.

In [None]:
### Your code here ###

<font color='blue'> TODO:</font> Use the method ['.numpy()'](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.numpy) to convert the tensor to a numpy array. and plot the image using the function ['plt.imshow'](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html).

In [None]:
### Your code here ###

<font color='blue'> TODO: </font> Use the attribute [.dtype](https://pytorch.org/docs/stable/tensor_attributes.html#torch.dtype) to get the data type of the tensor.

In [None]:
### Your code here ###

<font color='blue'> TODO: </font> Use the method [.min()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.min) and [.max()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.max) to get the minimum and maximum values of the tensor.

In [None]:
### Your code here ###

<font color='red'>**Questions 1.2 What happens if we use [.mean()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.mean) on the tensor?**</font>

Your Answer:

<font color='blue'> TODO: </font> Use the method [.float()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.float) to change the type of the tensor to float. You can also change to other types such as double using the method [.double()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.double) or [torch.float64()](https://pytorch.org/docs/stable/tensors.html#torch.float64).

In [None]:
### Your code here ###

<font color='blue'> TODO: </font> Use the method [.mean()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.mean) and [.std()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.std) to get the mean and standard deviation of the tensor. Also print the minimum and maximum values of the tensor.

In [None]:
### Your code here ###


<font color='blue'> TODO: </font> Use the method [.item()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.item) to get the value of a tensor with a single element. Print the mean value as a float and not a tensor.

In [None]:
### Your code here ###

<font color='blue'> TODO: </font> Use the method [.mean()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.mean) to get the mean values of the tensor dimension 0. You can also use the function [torch.mean()](https://pytorch.org/docs/stable/generated/torch.mean.html#torch.mean) of the torch module.

In [None]:
### Your code here ###

<font color='blue'> TODO: </font> Use the method [.view()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view) to reshape the tensor as a 2D tensor with 3 columns and 148*149 rows. You can also use the method [.reshape()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.reshape) to reshape the tensor.

In [None]:
### Your code here ###

<font color='blue'> TODO: </font> Use the method [.reshape()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.reshape) to reshape the tensor as a 1D tensor. You can use the argement [-1] to infer the number of elements in the tensor. You can also use the method [.flatten()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.flatten) to flatten the tensor.

In [None]:
### Your code here ###

### Random Tensors 
We will now use random tensors to introduce the basic operations. First, in Pytorch, we typically work with float tensors between 0 and 1. Instead of using the method [float()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.float) to convert the tensor to float, we can directly divide the tensor by 255.0:

In [16]:
image_tensor = torch.Tensor(image_png).permute(2, 0, 1)/255.5

<font color='blue'> TODO:</font> Use the function [torch.randn](https://pytorch.org/docs/stable/generated/torch.randn.html#torch.randn) to create a tensor with the same size as the tensor 'image_tensor'. You can also use the function [torch.randn_like](https://pytorch.org/docs/stable/generated/torch.randn_like.html#torch.randn_like) to create a tensor with the same size as another tensor.

In [None]:
random_tensor = ### Your code here ###

<font color='blue'> TODO:</font> Use the function [torch.add()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.add) to add the tensor to 'image_tensor'. You can also use the operator '+' or the method [.add()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.add) to add the tensors.

In [18]:
image_tensor_random = ### Your code here ###

<font color='blue'> TODO:</font> Use the function [torch.clamp()](https://pytorch.org/docs/stable/generated/torch.clamp.html#torch.clamp) to make sure the values of the tensor are between 0 and 1. You can also use the method [.clamp()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.clamp) or the function [torch.clip()](https://pytorch.org/docs/stable/generated/torch.clip.html#torch.clip).  

In [None]:
image_tensor_random_clamped = ### Your code here ###


plt.figure(figsize=(6,3))
plt.subplot(1, 2, 1)
plt.imshow(image_tensor.permute(1, 2, 0).numpy())
plt.axis('off')
plt.title('Original image')
plt.subplot(1, 2, 2)
plt.imshow(image_tensor_random_clamped.permute(1, 2, 0).numpy())
plt.axis('off')
plt.title('Random noise added')
plt.show()

<font color='blue'> TODO:</font> Use the function [torch.rand](https://pytorch.org/docs/stable/generated/torch.rand.html#torch.rand) to create a random tensor with the same size as the tensor 'image_tensor'. This random tensor will have values between 0 and 1. You can also use the function [torch.rand_like](https://pytorch.org/docs/stable/generated/torch.rand_like.html#torch.rand_like) to create a random tensor with the same size as another tensor.

In [None]:
random_vector = ### Your code here ###

<font color='blue'> TODO:</font> Use the function [torch.mul()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.mul) to multiply the tensor by 'image_tensor'. You can also use the operator '*' or the method [.mul()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.mul) to multiply the tensors.

In [None]:
image_tensor_random = ### Your code here ###




image_tensor_random_clamped = torch.clamp(image_tensor_random, 0, 1)
plt.figure(figsize=(9,3))
plt.subplot(1, 3, 1)
plt.imshow(image_tensor.permute(1, 2, 0).numpy())
plt.axis('off')
plt.title('Original image')
plt.subplot(1, 3, 2)
plt.imshow(image_tensor_random.permute(1, 2, 0).numpy())
plt.axis('off')
plt.title('Random noise multiplied')
plt.subplot(1, 3, 3)
plt.imshow(image_tensor_random_clamped.permute(1, 2, 0).numpy())
plt.axis('off')
plt.title('Random noise clamped')
plt.show()

<font color='blue'> TODO:</font> Use the function [torch.sub()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.sub) and [torch.div()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.div) to normalize the tensor between 0 and 1. You can also use the operators '-' and '/'. NB: to normalize the tensor, you need to subtract the minimum value and divide by the maximum value minus the minimum value.

In [None]:
image_tensor_random_normalized = ### Your code here ###


plt.figure(figsize=(9,3))
plt.subplot(1, 3, 1)
plt.imshow(image_tensor.permute(1, 2, 0).numpy())
plt.axis('off')
plt.title('Original image')
plt.subplot(1, 3, 2)
plt.imshow(image_tensor_random_clamped.permute(1, 2, 0).numpy())
plt.axis('off')
plt.title('Random noise multiplied')
plt.subplot(1, 3, 3)
plt.imshow(image_tensor_random_normalized.permute(1, 2, 0).numpy())
plt.axis('off')
plt.title('Normalized image')
plt.show()


### Batch Operations
In Deep Learning, we typically work with mini-batches of data. We will now introduce the concept of mini-batch and perform operations on mini-batches. While in a classical example batches are composed of different samples, in this example we will use the same sample with different random noise to simulate a mini-batch. We will use different methods to build the mini-batch and perform operations on the mini-batch.

<font color='blue'> TODO:</font> Use the function [torch.repeat()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.repeat) to repeat the tensor 'image_tensor' 4 times along the first dimension. You can also use the method [.repeat()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.repeat) to repeat the tensor. Then add some random normal noise to the tensor.

In [None]:
batch_tensor = ### Your code here ###


print("Batch tensor shape is: ", batch_tensor.shape)
print("Any identical sample in the batch? ", torch.all(batch_tensor[0] == batch_tensor[1]).item())

<font color='blue'> TODO:</font> Use the function [torch.cat()](https://pytorch.org/docs/stable/torch.html#torch.cat) to concatenate the tensor 'image_tensor' with the tensor 'image_tensor' 4 times. You can also use the method [.cat()](https://pytorch.org/docs/stable/torch.html#torch.Tensor.cat) to concatenate the tensors. Then add some random normal noise to the tensor.

In [None]:
batch_tensor = ### Your code here ###


print("Batch tensor shape is: ", batch_tensor.shape)
print("Any identical sample in the batch? ", torch.all(batch_tensor[0] == batch_tensor[1]).item())

<font color='blue'> TODO:</font> Use the function [torch.stack()](https://pytorch.org/docs/stable/generated/torch.stack.html#torch.stack) to stack the tensor 'image_tensor' with the tensor 'image_tensor' 4 times. Then add some random normal noise to the tensor.


In [None]:
batch_tensor = ### Your code here ###

print("Batch tensor shape is: ", batch_tensor.shape)
print("Any identical sample in the batch? ", torch.all(batch_tensor[0] == batch_tensor[1]).item())

In the three previous examples, we have created a mini-batch by repeating the same sample with different random noise. In practice, we would have different samples in the mini-batch of size [4, 3, 148, 149]. 

<font color='blue'> TODO:</font> Use the function [torch.mean()](https://pytorch.org/docs/stable/generated/torch.mean.html#torch.mean) to compute the mean of the mini-batch. Then print the mean value for each samples. To do so, use the argument *dim* of the function to compute the mean along the first dimension. 

In [None]:
### Your code here ###

<font color='blue'> TODO:</font> Redefine a batch composed of 4 samples with the sample (without noise). Then create a random uniform tensor of size [4, 3] and add it to the batch. Image should received the same value for all pixels of the same channel. You can use the function [torch.unsqueeze()](https://pytorch.org/docs/stable/torch.html#torch.unsqueeze) to add a dimension to the tensor or add the vector as *random_vector[:, :, None, None]* or the method [.view()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view) to reshape the tensor.

In [None]:
batch_tensor = ### Your code here ###

batch_tensor = torch.clamp(batch_tensor, 0, 1)
plt.figure(figsize=(12,3))
for i in range(4):
    plt.subplot(1, 4, i+1)
    plt.imshow(batch_tensor[i].permute(1, 2, 0).numpy())
    plt.axis('off')
    plt.title('Sample %d' % i)
plt.show()


## 2. Data Loader and Dataset

In this section, we will introduce the data loader and the dataset. We will start by loading the MNIST dataset and then we will pre-process the data. We will also introduce the concept of mini-batch and the data loader.

In [28]:
# Load MNIST dataset
from torchvision import datasets, transforms


To do so, we use the datasets and dataloaders from the torch module. The dataset is used to load the data and the data loader is used to create mini-batches of the data. The data loader is an iterable that returns a mini-batch of data at each iteration. The dataset consists of two parts: training samples and testing samples. The training samples are used to train the model and the testing samples are used to evaluate the model. The datasets classes take transorms as input to pre-process the data. The transforms are used to normalize the data, convert the data to a tensor, or augment the data.

The most simple and mandatory transform is [ToTensor()](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.ToTensor) which converts the data to a tensor. The data is typically loaded as a PIL image and then converted to a tensor. The data is also normalized to have values between 0 and 1. The MNIST dataset is composed of images of size 28x28 pixels and the images are grayscale. The images are loaded as PIL images and then converted to tensors. The images are also normalized to have values between 0 and 1.

In [29]:
transform = transforms.Compose([transforms.ToTensor()])

train_dataset = datasets.MNIST(root='.', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='.', train=False, download=True, transform=transform)

<font color='blue'> TODO:</font> Use the python function *len()* to get the length of the training and testing datasets.  You can also directly print the dataset or look at the shape of the tensor train_dataset.data or train_dataset.targets.

In [None]:
### Your code here ###

We can use the function [torch.utils.data.DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) to create a data loader. The data loader takes as input the dataset, the batch size, and the shuffle parameter. The batch size is the number of samples in a mini-batch and the shuffle parameter is used to shuffle the data at each epoch. The data loader is an iterable that returns a mini-batch of data at each iteration. Here we will create a data loader with a batch size of 32 and shuffle the data only for the training dataset.

In [None]:
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)

batch_train_images, batch_train_targets = next(iter(train_loader))
batch_test_images, batch_test_targets = next(iter(test_loader))
print("Batch train images shape is: ", batch_train_images.shape)
print("Batch test images shape is: ", batch_test_images.shape)
plt.figure(figsize=(6,3))
for i in range(4):
    plt.subplot(2, 4, i+1)
    plt.imshow(batch_train_images[i].squeeze().numpy(), cmap='gray')
    plt.axis('off')
    plt.title('Label: %d' % batch_train_targets[i].item())
    plt.subplot(2, 4, i+5)
    plt.imshow(batch_test_images[i].squeeze().numpy(), cmap='gray')
    plt.axis('off')
    plt.title('Label: %d' % batch_test_targets[i].item())
plt.tight_layout()
plt.show()

<font color='blue'> TODO:</font> Run the previous code snippet multiple times and look at the data.

<font color='blue'> TODO :</font> Use different transforms to pre-process the data. You can use the function [torchvision.transforms.Compose](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Compose) to combine different transforms. You can also use the function [torchvision.transforms.Normalize](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Normalize) to normalize the data. The function takes the mean and standard deviation of the data as input. You can also use the function [torchvision.transforms.RandomHorizontalFlip](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.RandomHorizontalFlip) to flip the data horizontally. You can use the function [torchvision.transforms.RandomRotation](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.RandomRotation) to rotate the data. You can also use the function [torchvision.transforms.RandomResizedCrop](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.RandomResizedCrop) to crop the data. 

In [None]:
transform = transforms.Compose([
    ### Your code here ###
                                ])



train_dataset = datasets.MNIST(root='.', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='.', train=False, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)
batch_train_images, batch_train_targets = next(iter(train_loader))
batch_test_images, batch_test_targets = next(iter(test_loader))
print("Batch train images shape is: ", batch_train_images.shape)
print("Batch test images shape is: ", batch_test_images.shape)
plt.figure(figsize=(6,3))
for i in range(4):
    plt.subplot(2, 4, i+1)
    plt.imshow(batch_train_images[i].squeeze().numpy(), cmap='gray')
    plt.axis('off')
    plt.title('Label: %d' % batch_train_targets[i].item())
    plt.subplot(2, 4, i+5)
    plt.imshow(batch_test_images[i].squeeze().numpy(), cmap='gray')
    plt.axis('off')
    plt.title('Label: %d' % batch_test_targets[i].item())
plt.tight_layout()
plt.show()


For the rest of the TP, we will use a simple preprocessing composed of the ToTensor() transform with a larger batch size to speed up the training:

In [189]:
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.MNIST(root='.', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='.', train=False, download=True, transform=transform)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=256, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=256, shuffle=False)
batch_train_images, batch_train_targets = next(iter(train_loader))
batch_test_images, batch_test_targets = next(iter(test_loader))  

## 3. Neural Network

In this section, we will introduce the neural network. We will start by defining the neural network architecture and then we will implement the forward pass. The neural network is composed of layers and activation functions. The layers are used to transform the input data and the activation functions are used to introduce non-linearity in the model. The neural network is used to approximate a function that maps the input data to the output data. The neural network in Pytorch are defined as classes that inherit from the class [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module). The neural network is composed of layers that are defined in the initialization and the forward pass is defined in the method forward.

For instance, we can define a linear model with one layer as follows:

```python
class MyModel(torch.nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.layer1 = torch.nn.Linear(784, 10) 

    def forward(self, x):
        ### Define the forward pass
        x = x.view(-1, 784) # Flatten the input tensor
        x = self.layer1(x) # Apply the first linear layer
        return x
```

<font color='blue'> TODO:</font> Define a neural network with two hidden layers. The first hidden layer is composed of 128 neurons and the second hidden layer is composed of 64 neurons. The input layer is composed of 28x28 neurons and the output layer is composed of 10 neurons. The activation function is the ReLU function. You can use the function [torch.nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU) to define the ReLU function. You can also use the function [torch.nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear) to define a linear layer. The linear layer takes as input the number of input neurons and the number of output neurons. 

In [159]:
class MyModel(torch.nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
            ### Your code here ###

    def forward(self, x):
            ### Your code here ###
        return x
    
model = MyModel()

<font color='blue'> TODO:</font> Verify that your model is correctly defined by printing the model. Compute the output of the model of the first mini-batch of the training dataset.

In [None]:
### Your code here ###

<font color='blue'> TODO:</font> Print the output of the model for the first few samples of the mini-batch. Use the method [.argmax()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.argmax) to get the index of the maximum value. You can also use the method [.max()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.max) to get the maximum value and the index of the maximum value. Compare the output of the model with the target of the mini-batch. 

In [None]:
### Your code here ###

Since the model is not trained yet, it is normal that the output of the model is not correct. The accuracy of the model should be around 10%. 

<font color='blue'> TODO:</font> Compute the numbers of parameters to learn in the model. You can use the function [torch.nn.Module.parameters](https://pytorch.org/docs/stable/generated/torch.nn.Module.parameters.html#torch.nn.Module.parameters) to get the parameters of the model. You can also use the function [torch.nn.Module.named_parameters](https://pytorch.org/docs/stable/generated/torch.nn.Module.named_parameters.html#torch.nn.Module.named_parameters) to get the name of the parameters.

In [None]:
### Your code here ###

## 4. Training

In this section, we will see how can train a Neural Network using the Stochastic Gradient Descent (SGD) algorithm. We will start by defining the loss function and then we will implement the backward pass. The loss function is used to measure the error of the model and the backward pass is used to compute the gradients of the model. The gradients are used to update the parameters of the model. The training is composed of multiple epochs and at each epoch, we iterate over the mini-batches of the training dataset. We compute the output of the model, the loss, and the gradients. We then update the parameters of the model using the gradients.

<font color='blue'> TODO:</font> Define the loss function. You can use the function [torch.nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss) to define the Cross Entropy Loss if you model has no Softmax output activation layer. Otherwise, you can use the function [torch.nn.NLLLoss](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss) to define the Negative Log Likelihood Loss.

In [163]:
### Your code here ###

To perform the gradient step we need to define an optimizer. The optimizer is used to update the parameters of the model. The optimizer takes as input the parameters of the model and the learning rate. The learning rate is a hyperparameter that controls the step size of the gradient descent. Each optimizer has a different update rule. The most common optimizer is the Stochastic Gradient Descent (SGD) optimizer. 

Here, we chose to use SGD with a learning rate of 0.01. You can also use the function [torch.optim.Adam](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam) to define the Adam optimizer. The Adam optimizer is an adaptive learning rate optimizer that is commonly used in practice.

In [164]:
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

<font color='blue'> TODO:</font> Print the gradients of the model. You can use the method [.grad](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.grad) of the parameters of the model and the method [model.parameters()](https://pytorch.org/docs/stable/generated/torch.nn.Module.parameters.html#torch.nn.Module.parameters) to get the parameters of the model.


In [None]:
### Your code here ###

The backpropagation has not been performed yet and therefore the gradients are not computed. To compute the gradients, we need to perform the backward pass. And to perform the backward pass, we need to compute the loss. The loss is computed using the output of the model and the target. The target is the true label of the data. We will do one iteration of the training loop to compute the loss and the gradients:

In [166]:
optimizer.zero_grad()                           # Reset gradients
images, labels = next(iter(train_loader))       # Load a batch
output = model(images)                          # Forward pass
loss = criterion(output, labels)                # Compute loss
loss.backward()                                 # Backward pass

<font color='blue'> TODO:</font> Compute again the gradients of the model. To so do only print the mean of the gradients for each layer. 

In [None]:
### Your code here ###

<font color='blue'> TODO:</font> Print the bias of the second layer of the model. You can use the method [.bias](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear.bias) of the layer.


In [None]:
### Your code here ###

Now that the gradient for every weight has been calculated, we can update the weights of the model:

In [169]:
optimizer.step()

<font color='blue'> TODO:</font> Print again the bias of the second layer of the model and observe the difference.

In [None]:
### Your code here ###

The model's weight has been updated. We can now train the model for multiple epochs. We will iterate over the mini-batches of the training dataset and update the parameters of the model.

<font color='blue'> TODO:</font> Write a function to train the model for any numbers of epochs. At each epoch, iterate over the mini-batches of the training dataset and update the parameters of the model. Print the loss average on the dataset after each epoch. Run your function for 3 epochs.

In [175]:
def training_function(n_epoch, model, train_loader, criterion, optimizer):
    for epoch in range(n_epoch):
        ### Your code here ###

model = MyModel()
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
training_function(5, model, train_loader, criterion, optimizer)

We can observe the data, the target, and the output of the model for a few samples:

In [None]:
plt.figure(figsize=(12,6))
for i, (images, labels) in enumerate(zip(batch_test_images[:8], batch_test_targets[:8])):
    output = model(images)
    plt.subplot(2, 4, i+1)
    plt.imshow(images.squeeze().numpy(), cmap='gray')
    plt.axis('off')
    plt.title('True: %d - Predicted: %d' % (labels, output.argmax().item()))

## 5. Evaluating the model

While training the model is crucial to learn the parameters of the model, evaluating the model is important to assess the performance of the model. We will evaluate the model on the testing dataset. We will compute the accuracy of the model and the confusion matrix. The accuracy is the number of correct predictions divided by the total number of predictions. The confusion matrix is a matrix that shows the number of correct and incorrect predictions for each class.

<font color='blue'> TODO:</font> Write a funtion to evaluate the model. Compute the accuracy of the model on the testing dataset. You can use the function [torch.argmax](https://pytorch.org/docs/stable/generated/torch.argmax.html#torch.argmax) to get the index of the maximum value. You can also use the function [torch.eq](https://pytorch.org/docs/stable/generated/torch.eq.html#torch.eq) to compare the output of the model with the target. You can also use the function [torch.sum](https://pytorch.org/docs/stable/generated/torch.sum.html#torch.sum) to sum the number of correct predictions. To do so, you will have to loop over the mini-batches of the testing dataset.

In [None]:
def testing_function(model, test_loader):
    ### Your code here ###

testing_function(model, test_loader)

<font color='blue'> TODO:</font> Run your training function for one extra epoch then evaluate the model. Compare the accuracy of the model before and after training.

In [None]:
### Your code here ###

<font color='blue'> TODO:</font> Write a function that computes and return the confusion matrix of the model. You can use Scikit-learn to compute the confusion matrix. You can use the function [sklearn.metrics.confusion_matrix](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html) to compute the confusion matrix. 

In [None]:
import sklearn.metrics

def testing_function(model, test_loader):
        ### Your code here ###
    return confusion_matrix


confusion_matrix = testing_function(model, test_loader)
plt.figure(figsize=(6,6))
plt.imshow(confusion_matrix, cmap='Blues')
plt.colorbar()
plt.xlabel('Predicted label')
plt.ylabel('True label')
plt.show()
