PyTorch, Tensors, and deep-learning frameworks (A complete tutorial)

13 min readNov 28, 2022

Introduction

PyTorch has received too many interests in recent years because of its flexibility to build large-scale deep-learning frameworks. As in the computer science area, we move forward, we categorize the fields of artificial intelligence into two groups, machine learning, and deep learning. There are different libraries for building deep models, including TensorFlow, Keras, and PyTorch. PyTorch is an optimized and flexible tensor library for deep model calculation on GPU and CPU.

In this article, we will learn about the basic concepts of the tensors, NumPy arrays, torch, and deep models on PyTorch, and we will build a deep model with a practical Jupyter NoteBook that you can download or work on it online.

When we apply more than one layer of learning to the algorithm, we will have a deep-learning framework.

The advantages of PyTorch in deep-learning frameworks are coming from the following modules;

Autograd: The Neural Network Models need to compute the gradients in the backward movement. Autograd records the operations on the tensors and feeds the network with it.
Optim: In the Neural Network process, we need to minimize the loss function with an optimization technique. This class provided various optimization methods such as SGD and LBGFGS.
NN: With this class, we can define the layers and functions of the network and connect them to run the tensor operation.

To learn from the huge dataset, we have to apply deeper models.

About PyTorch

PyTorch was developed by FaceBook to build machine learning and deep models tools alongside process the of large-scale images. It is built upon C++ and Python and may install on Windows, macOS, and Linux. You can install Pytorch with pip as well, but it is recommended to use an anaconda for this matter.
To do so, you can install the latest version of the Pytorch as follows;

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch

Note that the version of the CUDA which is a GPU acceleration developed by Nvidia should be compatible with the version of the PyTorch.

To check if the installation is correct, in the python environment, import torch and check its version;

import torch
print(torch.__version__)

To access the list of the PyTorch versions and their related CUDA version, please refer here.

Start Working With PyTorch

Tensors

Tensor is the main concept of PyTorch, therefore we need to know its feature and how t work with it.

x = torch.rand(1, 2, 3)
print(x.shape)

>> torch.Size([1, 2, 3])

We can see that we have a random tensor with 3 dimensions here that is the tensor of course.

torch.is_tensor(x)
>> True

Another example of a 2D tensor or linear spacing

torch.zeros((2, 2))

linear and logarithmic spacing

linear = torch.linspace(1, 2, 10)
log = torch.logspace(1, 2, 10)

we can see that the building tensors are almost similar to NumPy. Like arange ;

torch.arange(10)

>> tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

We can also build the array with NumPy or even import it and convert it to the torch tensor using the torch.

x = np.array(x)
print(type(x))

>> <class 'numpy.ndarray'>


x = torch.from_numpy(x)
print(type(x))

>> <class 'torch.Tensor'>

Random number

In different studies in data science, we need to have and generate random data points in space. We can use statistical distributions to build them. One example could be using Uniform Distribution where the probability of each outcome is the same as others.

torch.rand(1, 2, 3)

>> tensor([[[0.1812, 0.5253, 0.5734],
         [0.7365, 0.4622, 0.0479]]])

Max and Min

One important thing in machine learning is finding the minimum and maximum values in an array. We usually use if for classification to find the class based on the highest probability in a tensor.

x = torch.rand(10, 5)
torch.argmin(x, axis=1)

>> tensor([0, 0, 2, 1, 1, 0, 1, 4, 3, 0])

Building different chunks from a tensor

A tensor can be concatenated with another tensor or even split into various chunks along different axes.

torch.chunk(x, 2)

>> (tensor([[0.0602, 0.8056, 0.9265, 0.3025, 0.6211],
         [0.1000, 0.8482, 0.2863, 0.5860, 0.2219],
         [0.8906, 0.5601, 0.5190, 0.9763, 0.7135],
         [0.9637, 0.2302, 0.8827, 0.6149, 0.6066],
         [0.8034, 0.3152, 0.9617, 0.9281, 0.6613]]),
 tensor([[0.0990, 0.0997, 0.9118, 0.1010, 0.8384],
         [0.3959, 0.2461, 0.5256, 0.5889, 0.6299],
         [0.1652, 0.1588, 0.1475, 0.7929, 0.1248],
         [0.4717, 0.4351, 0.8555, 0.1771, 0.3130],
         [0.0801, 0.3683, 0.7401, 0.6129, 0.3549]]))

torch.chunk(x, 2, axis=1)

>> (tensor([[0.0602, 0.8056, 0.9265],
         [0.1000, 0.8482, 0.2863],
         [0.8906, 0.5601, 0.5190],
         [0.9637, 0.2302, 0.8827],
         [0.8034, 0.3152, 0.9617],
         [0.0990, 0.0997, 0.9118],
         [0.3959, 0.2461, 0.5256],
         [0.1652, 0.1588, 0.1475],
         [0.4717, 0.4351, 0.8555],
         [0.0801, 0.3683, 0.7401]]),
 tensor([[0.3025, 0.6211],
         [0.5860, 0.2219],
         [0.9763, 0.7135],
         [0.6149, 0.6066],
         [0.9281, 0.6613],
         [0.1010, 0.8384],
         [0.5889, 0.6299],
         [0.7929, 0.1248],
         [0.1771, 0.3130],
         [0.6129, 0.3549]]))

Split by indices

A common approach in splitting the tensor is to use the indices. For this matter, we need first our list of indices and the tensor which needs to be split. In the following, I selected the first, fifth, and fourth rows (axis = 0) of the tensor x to be selected only. First, we print the x tensor and then select the specific rows of the tensor.

print(x)

>> tensor([[0.0602, 0.8056, 0.9265, 0.3025, 0.6211],
        [0.1000, 0.8482, 0.2863, 0.5860, 0.2219],
        [0.8906, 0.5601, 0.5190, 0.9763, 0.7135],
        [0.9637, 0.2302, 0.8827, 0.6149, 0.6066],
        [0.8034, 0.3152, 0.9617, 0.9281, 0.6613],
        [0.0990, 0.0997, 0.9118, 0.1010, 0.8384],
        [0.3959, 0.2461, 0.5256, 0.5889, 0.6299],
        [0.1652, 0.1588, 0.1475, 0.7929, 0.1248],
        [0.4717, 0.4351, 0.8555, 0.1771, 0.3130],
        [0.0801, 0.3683, 0.7401, 0.6129, 0.3549]])

indices = torch.LongTensor([0, 5, 4])
# Split on the rows
split = torch.index_select(x, 0, indices)
print(split)

>> tensor([[0.0602, 0.8056, 0.9265, 0.3025, 0.6211],
        [0.0990, 0.0997, 0.9118, 0.1010, 0.8384],
        [0.8034, 0.3152, 0.9617, 0.9281, 0.6613]])

These methods we works were just an example of all the methods we can apply to tensors with PyTorch. There are different functions including sigmoid, log, pow, add, frac, exp, unbind, transpose, etc one can use for different purposes.

Probability Distributions

Probabilities are one of the main concepts of PyTorch and deep models as well. Therefore, it is essential to know them and their functionalities. Moreover, the interpretation of the results needs its knowledge.

There are different types of distributions in statistics that we can build random or stochastic variables with them. The distributions are categorized into the following groups;

Continuous distributions
Multivariate distributions
Discrete distributions

We mostly work with Continuous distributions such as uniform , normal .You can access different distributions using the torch.distributions .

One of the parts that these distributions could be useful is to determine the initial values of the weights in different Neural Network structures. Considering that in the early epoch we have not trained the weights, therefore, we need a random value for them. This is where we rely on random statistical distributions.

Bernoulli distribution

Bernoulli Discrete distributions mean that the outcome could be zero or one, determining the happening of an event. We calculate the probability mass function as equation 1.

Therefore, we can generate a random variable using Bernoulli distribution on a random tensor in the range of zero and one.

events = torch.Tensor(2, 2).uniform_(0, 1)
torch.bernoulli(events)

>> tensor([[0., 1.],
        [1., 0.]])

Normal distribution

To calculate the outcome, we use the normal density function. Equation 2.

Equation 2. The Normal probability density function

And to generate the random variable we have

w_normal = torch.normal(mean=torch.arange(1., 11.), std=0.1)

>> tensor([0.9515, 1.9699, 2.9411, 3.9949, 4.9641, 5.8932, 7.1156, 8.1668, 8.9156,
        9.9695])

As a practice try to generate random variables with multinomial and uniform distributions.

variable

Variable in PyTorch can store the gradients and reference to the source functions (To learn more about the Variable and its functionality, please refer to the computational Graph).
We use Variables in PyTorch Mainly to store the partial derivation of the loss function concerning the weights and bias of the Neural Network.

Please note that in the PyTorch, the Variable is built as a wrapper of the tensor, and its values remain unchanged during the backpropagation. It considers data, grad, and func.

from torch.autograd import Variable

x = torch.normal(mean=torch.arange(1., 11.), std=0.1)
variable = Variable(x, requires_grad=True)

print(variable)

>> tensor([ 1.1980,  1.9739,  2.9814,  4.0895,  4.9490,  
          6.1402,  7.0451,  7.9064, 8.8287, 10.0382], requires_grad=True)

Compute gradients

As you already know, and as we saw in this article, the Gradient or slope is the main part of the Neural Network computation. By learning the Variable, we know that we can store the gradient in the variable. Hence, in the following, we will compute basic gradients and store them in a variable to have an idea about the gradient calculations in the Neural Network structure.

First, we define a class that returns a loss value.

class model():

    def __init__(self, w):
        self.w = w

    def _forward(self, x, w):
        return x * w

    def loss_(self, x, y):
        pred = self._forward(x, w)
        return (pred - y)**2

Now we try to update the loss value in two epochs and print the gradients of the weight concerning different losses.

X = [12., 13., 14.]
Y = [15., 16., 17.]

w = Variable(torch.FloatTensor([1.]), requires_grad=True)

example = model(w)

for epoch in range(2):
    for x, y in zip(X, Y):

        loss = example.loss_(x, y)
        loss.backward()
        
        print(w.grad.data[0])
        w.data = w.data - 0.01 * w.grad.data[0]

        w.grad.data.zero_()
    
    print("epoch:", epoch, "loss value:", loss.data[0])
    print("-------")

>> tensor(-72.)
tensor(165.3600)
tensor(-449.9712)
epoch: 0 loss value: tensor(258.2578)
-------
tensor(955.0402)
tensor(-2100.6899)
tensor(5804.8628)
epoch: 1 loss value: tensor(42980.1445)
-------

We can see that the gradient of the weight changes in different epochs regarding the changes in the loss function.

Basically, backward is used when we want to compute the gradients of the current tensor in the computational Graph leave.

CNN with PyTorch

It is a good time to go to build our deep model structure as we learned about the statistical distributions, tensors, and random variables in the computational graph mode.

Build and work with the loss function

We are looking to minimize a loss of function with an optimization process in the network. Therefore, the use of proper loss function based on the nature of the problem would be an essential part. In the following, we will learn about the MSE loss function with an example.

If you need to have the full Python code as a Jupyter NoteBook, please refer here.

First, we define two tensors that could be the input and output of the model.

tensor_a = torch.normal(mean = torch.arange(1., 11.), std=torch.arange(1, 0, -0.1))
tensor_b = torch.normal(mean = torch.arange(5., 15.), std=torch.arange(1, 0, -0.1))

print(tensor_a, '\n')
print(tensor_b)

>> tensor([ 0.9949,  2.2558,  4.5544,  5.3334,  5.1273,  6.5911,  7.5059,  7.8742,
         9.1304, 10.0986]) 

  tensor([ 6.0614,  6.9249,  7.1897,  7.2517,  8.5498,  9.9838, 11.3588, 12.0659,
        13.3526, 13.9605])

later, we compute the output of the model with the fit method and return the loss with the loss method of the class.

class model():

    def __init__(self, x, y, b, w, eta):
        self.x = x
        self.y = y
        self.b = b
        self.w = w
        self.eta = eta

    def fit(self, w, x, b):
        return w * x + b

    def loss(self, w):
        pred = self.fit(w, self.x, self.b)
        sq = (pred - self.y) ** 2
        return sq.mean()

    def out(self, lr):
        w1, w2 = self.update()
        loss_1 = self.loss(self.fit(w1, self.x, self.b))
        loss_2 = self.loss(self.fit(w2, self.x, self.b))
        rate = loss_1 - loss_2
        return self.w - lr * rate

    def update(self):
        return self.w+self.eta, self.w-self.eta

x = torch.normal(mean=torch.arange(1., 11.), std=torch.arange(1, 0, -0.1))
y = torch.normal(mean=torch.arange(5., 15.), std=torch.arange(1, 0, -0.1))

w = torch.ones([1])
b = torch.ones([1])
out = model(x=x, y=y, b=b, w=w, eta=0.2)
print(out.loss(w))

>> tensor(12.2774)

As I promised in the introduction, I will interpret each result. Here we can see the error is pretty small and reasonable; What is the reason you think? It is because of the using normal distribution with middle standard deviation in using the input and output of the model. And we did it with the initialized weights, and if we update the weight, we will have a smaller error as well. Now let’s try updating the model;

print(out.out(1e-3))

>> tensor([-1.0396])

We saw how the loss function works and how we can update the weights to reduce the error. Of course, this was just a simple example to demonstrate the loss function performance of the PyTorch. You can use the loss function directly from torch.nn .

loss_ = torch.nn.MSELoss()
loss = loss_(x, y)

Gradient

To find the slope of the loss function and approximate it, we need to approximate its derivative. We can achieve the derivative using grad in the torch. In the following, we are trying to approximate the derivative using the grad ;

import torch

x = torch.randn(5, 2, requires_grad=True)
y = torch.randn(5, 2)

params = torch.tensor([1.0, 0.0])
epochs = 10
lr = 1e-2


class model():

    def __init__(self, x, y, b, w):
        self.x = x
        self.y = y
        self.b = b
        self.w = w

    def fit(self):
        return self.w * self.x + self.b

    def loss(self, pred):
        sq = (pred - self.y) ** 2
        return sq.mean()

    def grad(self, pred):
        pred = self.fit()
        loss_1 = self.loss(pred) * self.x
        loss_2 = self.loss(pred) * 1.0
        return torch.stack([loss_1.mean(), loss_2.mean()])


for epoch in range(epochs):

    w, b = params
    out = model(x, y, b, w)
    pred = out.fit()

    loss = out.loss(pred)
    print("loss value: % f " % (float(loss)))

    grad = out.grad(pred)

    print("Grad:", grad)
    params = params - lr * grad


>> loss value:  3.018057 
Grad: tensor([-0.4668,  3.0181], grad_fn=<StackBackward0>)
loss value:  3.001167 
Grad: tensor([-0.4642,  3.0012], grad_fn=<StackBackward0>)
loss value:  2.986313 
Grad: tensor([-0.4619,  2.9863], grad_fn=<StackBackward0>)
loss value:  2.973456 
Grad: tensor([-0.4599,  2.9735], grad_fn=<StackBackward0>)
loss value:  2.962558 
Grad: tensor([-0.4582,  2.9626], grad_fn=<StackBackward0>)
loss value:  2.953590 
Grad: tensor([-0.4569,  2.9536], grad_fn=<StackBackward0>)
loss value:  2.946528 
Grad: tensor([-0.4558,  2.9465], grad_fn=<StackBackward0>)
loss value:  2.941352 
Grad: tensor([-0.4550,  2.9414], grad_fn=<StackBackward0>)
loss value:  2.938047 
Grad: tensor([-0.4544,  2.9380], grad_fn=<StackBackward0>)
loss value:  2.936602 
Grad: tensor([-0.4542,  2.9366], grad_fn=<StackBackward0>)

We can see the loss is reducing over epochs. The results also demonstrate the influence of the number of epochs and the learning rate, therefore, by increasing the epochs and changing the learning rate, we will have better convergence.

Note that the parameters in the model should avoid overfitting.

TORCH.OPTIM

As the Pytorch website says; torch.optim is a class executing different optimization algorithms.

The optimizer in Neural Network updated the weights and learning rate to reduce the losses. Therefore, all we learned from previous steps, will be done by a selected optimizer in the Neural Network.

There are differently implemented optimizers in PyTorch with specific approaches and parameters. Each of them has its formulation to solve the learning task. The optimizer includes;

['ASGD',
 'Adadelta',
 'Adagrad',
 'Adam',
 'AdamW',
 'Adamax',
 'LBFGS',
 'NAdam',
 'Optimizer',
 'RAdam',
 'RMSprop',
 'Rprop',
 'SGD',
 'SparseAdam',]

The primary optimizer is SGD, further solvers have been introduced in recent years. Adam is the one I usually use in my works because of its adaptation in updating the learning rate.

params = {torch.tensor([1e-3, 1e-2])}
Optimizer = torch.optim.Adam(params=params)

Convolutional Neural Network (CNN)

Here we will learn to implement the Convolutional Neural Network via PyTorch.

MNIST Dataset

The dataset we use to train the CNN model is the well-known MNIST (Modified National Institute of Standards and Technology database) dataset. This dataset includes handwritten digits with different handwriting. It has ten labels which are the digits from zero to nine, and 60000 training and 10000 test instances. You can see an instance of the first label in the following. Figure 1.

To use this dataset as the input we need to normalize it, the ToTesnorfrom torchvisionwill scale the input PILimage between 0.0 and 1.0.

train_data = torchvision.datasets.MNIST(root='./mnist',
                                        train=True, 
                                        transform=torchvision.transforms.ToTensor(),
                                        download=True)

Implementing CNN

hyper-parameters

To implement the CNN Via PyTorch, we have to set up hyper-parameters. Hence, in the following, I define the hyper-parameters to have a clear idea before training the model

Batch_size: The number of samples to train the model with it through the epoch.
Learning_rate: trade-off controller between weights and loss gradient.
Epochs: Number of epochs to train the model.

Model Structure

We build the model’s layers in a sub-class called CNNinherited from the PyTorch Module. The class will have two methods only, including __init__and forward.

You can find the model structure in the following;

CNN(
  (layer1): Sequential(
    (0): Conv2d(1, 16, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (layer2): Sequential(
    (0): Conv2d(16, 32, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (layer3): Sequential(
    (0): Conv2d(32, 64, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
)

And the model code, here;

class CNN(nn.Module):

    CHANNELS = [1, 16, 32, 64]

    def __init__(self):
        super(CNN, self).__init__()
        CHANNELS = self.CHANNELS

        self.layer1 = nn.Sequential(nn.Conv2d(
            in_channels=CHANNELS[0], out_channels=CHANNELS[1], kernel_size=4, padding=1),
            nn.BatchNorm2d(CHANNELS[1]),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))

        self.layer2 = nn.Sequential(nn.Conv2d(
            in_channels=CHANNELS[1], out_channels=CHANNELS[2], kernel_size=4, padding=1),
            nn.BatchNorm2d(CHANNELS[2]),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))

        self.layer3 = nn.Sequential(nn.Conv2d(
            in_channels=CHANNELS[2], out_channels=CHANNELS[3], kernel_size=4, padding=1),
            nn.BatchNorm2d(CHANNELS[3]),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        
        fc1 = nn.Linear(out.shape[1] * out.shape[2] * out.shape[3] , 100)
        dropout1 = nn.Dropout(0.25)
        fc2 = nn.Linear(100, 10)

        out = out.view(out.size(0), -1)
        out = fc1(out)
        out = dropout1(out)
        out = fc2(out)

        return out

Model Training

To train the model, we will pass the image arrays for our dataset to the model class and compute the loss of the network with the error nn.CrossEntropyLoss() method we have already imported. Following by computation of the gradient in the backward propagation and updating the weight with the optimizer.

for epoch in range(num_epochs):
    for images, labels in train_loader:

        # Transfering images and labels to the available device
        images, labels = images.to(device), labels.to(device)

        # Convert train data to Variable to store the gradients
        train = Variable(images.view(images.shape))
        labels = Variable(labels)

        # Forward pass
        outputs = model(train)
        loss = error(outputs, labels)

        # Initializing a gradient as zero to avoid blending of gradient among the batches.
        optimizer.zero_grad()

        # Propagating backward
        loss.backward()

        # Optimizing the parameters
        optimizer.step()

To have the main code for training the model, please refer here.

Conclusion

In this article, we introduced the well-known deep-learning framework tools in Python Environment developed by FaceBook called PyTorch. We began by installing PyTorch in different OSs and importing the library.
We started the PyTorch from scratch and introduced the Tensors and variables in Pytorch.

To have a better idea about deep neural networks, we need to know about the loss function, statistical distributions, weights, optimizers, and gradients of the loss. Hence we introduced each of these elements by a proper example and interpretation.

Finally, we build a deep Convolutional Layer to train over the well-known image classification dataset called MNIST and train the model.

You can also have the codes of this article as a Jupyter NoteBook, here

References

Mishra, Pradeepta. PyTorch Recipes: A Problem-Solution Approach. Apress, 2019.
PyTorch Documentation