Ever wondered how the phone’s facial recognition system works? or how Facebook automatically tags your friends in photos? or how medical systems identify potential health issues just from X-rays and scans?
In this series of articles, we will develop an understanding of the mechanism that works behind all these visual recognition processes from the very scratch.
In order to comprehend the intricate workings of the visual recognition, its essential to begin by understanding the foundation on which the actual mechanism is built upon.
Neural Networks
Neural Networks are AI algorithms designed to mimic the way a human brain works. Their primary purpose is to enable computer programs to recognize patterns and draw conclusions from data. But how exactly do these algorithms replicate the working of a human brain?
Here is an image of how a neural network looks like. We will come back to this image later and you will have an understanding of what’s going on here.
Lets build a simple neural network of our own to understand how it functions.
Building our own Neural Network
Suppose we have 3 shapes (circle, square, triangle) and a box with 3 sections. Each section can contain 1 shape.
If we consider the total possibilities, we get 27 possible combinations. Lets number each combination for unique identification. Here they are:
Now we will build a simple algorithm that identifies the number of the box when we just provide it with the box image.
Since we are trying to learn how the neural network works, we are assuming that the computer knows a square is a square, a circle is a circle and a triangle is a triangle.
Suppose this is the input we provide to our neural network. We want our algorithm to provide the number of this box.
Step 1: Identifying the shape in the first section of the box
Here there are 3 possibilities, lets label it as layer 1.
In our case, the first possibility is true so we proceed with that one. Hence, the line pointing it is thicker than others.
Step 2: Identifying the shape in the second section of the box
Now that we know the shape in the first section, lets proceed further with the possibilities of shape in second section labelling it as layer 2.
Step 3: Identifying the shape in the third section of the box
Repeating the same process again and building a layer 3.
Now we have identified all the 3 shapes present in the box, we now need to check the number of it by using a 4th layer.
Step 4: Identify the number of the box
Layer 4 contains all the numbers of the boxes and is the final step towards identifying the box and sends it as the final output.
Now lets zoom out a little bit and understand the system more clearly.
Possibilities in each layer
We just looked at the path that our input follows to reach the expected output. Lets calculate the possibilities in each layer.
Layer 1: 3 possibilities
Layer 2: 9 possibilities
Layer 3: 27 possibilities
Layer 4: 27 possibilities
Learning technical terms from the neural network we built
We have been using our own terminology, lets now look at the technical terms that are actually used in the field.
The possibilities we determined are called nodes.
All the layers we build between the input layer and the output layer would be collectively known as hidden layers.
The thickness given to the arrow indicating the strong possibility to the node in the next layer is known as weights.
This is how our neural network can be represented in a technical manner.
Each node (circle) in layer 3 represents one of the 27 combinations we saw earlier. Now go to the first image of this article and it wouldn’t scare you anymore.
So we built our very own neural network from scratch but this one is literally the most simple neural network on earth.
In the next article we will see how to actually code a neural network.