Exploring the World of Multi-Layer Perceptrons
Welcome back to the secret world of introtoArtificialIntelligence! Do you remember the laundry-folding robot we've been working on? Today, get ready to be amazed as we explore how robots are being taught to do the impossible - fold laundry! But how do they do it, you ask? It's all thanks to a powerful technology called Multilayer Perceptron (MLP) - the superhero brain that helps robots recognize patterns in data. With MLP, the laundry-folding robot can analyze clothes based on size, shape, and color, and decide the best way to fold them. It learns and improves with every piece of clothing it folds, and before you know it, it's faster than any human could ever be! Pretty cool, right?
So, come join Yue and Yedhant on a thrilling journey as they uncover more about the amazing world of MLPs!
Let's take a look at what Yue and Yedhant are up to on this beautiful morning. Where the world seems to come alive with the rising sun. The tranquil ambiance is filled with the sweet melody of the chirping birds, which echoes through the lush green landscape, bathed in dew. The refreshing aroma of the dewdrops brings a sense of renewal to the air, as if Mother Nature herself is waking up from her slumber. It's a perfect time to start the day with a positive attitude and a renewed sense of energy.
In this beautiful weather Yue wakes up with a bright smile, eager to meet with Yedhant and learn more about AI. Yedhant is equally excited to see Yue. He checks himself in the mirror, making sure he looks sharp, but he also wants to impress Yue with his knowledge of machine learning and neural networks.
As they head towards their meeting point they're excited to Meet & Learn. Yue strolls along, taking in the beautiful scenery, while Yedhant pedals his bike out towards the secret land of Intro to Artificial Intelligence. Yedhant reached before hand and was waiting for her at the meeting point, eagerly anticipating their time together. Yue arrived she snaaped her finger in front of his eyes and adjusted her hair with the same hand. Yedhant came back to sense but at the same time was mesmerized by the beauty of yue. They greated each other with hello and enquired about each other
Yedhant: Hey there, how's life treating you?
Yue: It's treating me well. What's up with you? Hope I didn't keep you waiting for long.
Yedhant: Nope, I just got here. But, hey, have you eaten anything yet?
Yue: Nah, I'm famished. What about you?
Yedhant: I'm starving too! Let's hit up that nearby cafe. They say their cookies are amazing.
Yue: Sounds like a plan! high-fives
(They head to the cafe and start scanning the menu.)
Yedhant: By the way, what's your favorite ice cream flavor?
Yue: I love vanilla with chocolate syrup. It's simple yet tasty.
Yedhant: Interesting choice! You know what else is called "vanilla"? There's this neural network called Multi-Layer Perceptron (MLP) that's known as the "vanilla" ANN model.
Yue: No way, that's hilarious! Tell me more about it.
Yedhant: Well, MLPs are basic neural networks that were one of the earliest models used in machine learning. They're like the plain vanilla of the ANN world, without any fancy features or tricks.
Yue(with sparkling eyes): That's awesome. I'm excited to learn more about this "vanilla" stuff.
Yedhant: Sounds good! Let's grab our grub and geek out over some neural networks.
They grab there brunch and headed to the vanilla alley
Yue: the cookie was really good
Yedhant: I am glad it kept its hype! 😇 So are you ready for the new adventure
Yue: Yes I am excited to specially to know more about vanilla network
Perceptron
Yedhant: Cool, moving forward from last time do you remember perceptron (yedhant snapped his finger and a single perceptron apears in the sky)
Single Perceptron
(image by author)
Yue: Yes! how can I forget those interesting topics we learned last time about perceptron, which was designed by taking inspiration from neurons in human brain that was developed in the 1950s by Frank Rosenblatt. It is a simple algorithm that can learn to classify inputs into two categories which are linearly separable, usually referred to as binary classification.
Perceptron is like a simple puppy that we want to train to recognize two different things, or categories. We give the puppy some inputs, which could be things like the color or shape of the object it sees. The puppy then combines these inputs together and makes a decision based on them. In this case, it barks if it sees a ball, but not if it sees a toy car.
During training, we give the puppy a bunch of examples of balls and toy cars, and it learns to adjust its "weights" or preferences based on whether it barked at the right things or not. Eventually, it gets pretty good at telling the difference between the two objects.
Yedhant: Thats an interesting way to describe perceptron. You know a fun fact "That perceptron derived its name from perception which reflects its goal of mimicking the perceptual abilities of the brain, which is capable of recognizing complex patterns in visual stimuli. Thus perceptron was developed to classify patterns based on their feature as you mentioned by Frank Rosenbelt."
Yue: Oh thats nice to know, so thats how it got its name?
Perceptron Mathematical
Yedhant: Yup , going into one level deep and try to get our hands dirty with some of its maths. The perceptron can be represented by following equation
y= f*(W.X+b)
(yedhant can see yue is confused and afraid of maths ) Dont worry we will break this equation into easy magic formula.
Think of the perceptron as a brain that takes in information (which are called as input features: in the equation X is a vector of input features , that means x is the collection of all the inputs put together ) and uses that information to make a decision (which can also be called as output: in the equation Y is the output of the perceptron).
Now, imagine that the brain has some preferences for certain types of information. For example, it might care more about the temperature than the wind speed when deciding whether it's a good day for a picnic. (which are called weights in perceptron that means how important an input is to make decision: W is a vector of weights for each input feature )
But the brain also has a threshold for how much information it needs before making a decision. If it doesn't have enough information, it might be unsure what to do. (bias : b is the bias term)
Finally, the brain uses an activation function to make the decision. This is like a filter that either lets the decision through (if the weighted sum of inputs plus the bias is above a certain threshold) or holds it back (if the weighted sum of inputs plus the bias is below that threshold). (in the equation f is the activation function )
So, the perceptron equation Y = f(W.X + b) combines all these elements. It takes in the input features (X), multiplies them by the brain's preferences (W), adds in the bias (b), and then applies the activation function (f) to make a decision (Y).
Overall, the perceptron is a simple but powerful concept that can learn to make decisions based on input features.
Yue: I was so afraid I think I will never forget this equation. So if put everything together Perceptron consists of 3 main components as we see the diagram :
Input(input layer receives data from external sources which are then processed by the perceptron to generate output): Example: input=1, x1,x2,x3 output=y
Weights(weights represent the strength of the connections between the inputs and the perceptron , and they are adjusted during the learning process to improve the accuracy of the model): Example: w0 , w1, w2, w3 ,w4
*W0 is the bias term , the inputs are multiplied by their corresponding weights and the results are summed together along with the bias term before passing to activation function
Activation Function: (After adding the product of input with its weight respectively along with bias, its passed to activation function which determines the output of the perceptron. The type of activation function used can vary depending on the specific use case and desired behavior of the model. Common examples include step function, softmax, sigmoid, ReLU, and tanh)
Yedhant: Thats correct, do you want to see how can we implement this with some good example? 😉
Perceptron Example
Yue: That would be great , but can we take some easier example?
Yedhant: Thats a nice idea , let's take a simple example of an OR gate. You might already be familiar with it.
In a simple language suppose we both want to go to eat lunch.I check my wallet and realize I have some money, but not enough to cover the cost of the meal. You also has some money, but not enough to cover the full cost either. If either one of us has enough money to cover the cost, we can go out for lunch. And if both of us have enough money, we can even order some dessert! This is a lot like an OR gate because the gate outputs is "1" if either or both of its inputs are "1."
Check this out! (Yedhant snaps his fingers and conjures up an OR gate.) Here A represent if the money I have can cover the cost of the meal and B represent if the money you have can cover the cost of the meal
Yue: Thats cool lets see if we can grab a lunch or not😛
OR gate
image by author
( A and B are inputs Y is output)
Yedhant: Have you ever played connect-the-dots? Well, this is kind of like that, but with math! You see this graph here? The dots on the graph show us what happens when we use an OR function. The white dots mean we get a 1 as the output means we can have lunch, the pink dot means we get a 0 i.e we cant have a lunch and the x and y axis are the inputs. Our job is to draw a line that can split the graph into two parts, so that the white dots are on one side and the pink dot is on the other side.And the good news is that since this is a linearly separable input, meaning it can be separated using a single line, so we can use a perceptron for this example.
Yue: Thats looks like fun
OR Function using Perceptron
image by author
Yedhant: DO you remember the perceptron equation
Yue: Yes that is y= f*(W.X+b)
Yedhant: Now since the ouput is 0 ad1 we can use step function as the activation function right
Yue: Step function is when the value is above 0 it is 1 and when its below 0 its 0 ?
Yedhant: You are absolutely correct! So in our case whenever the output would be greater than 0 we will say we can eat lunch ☺️and when its less than 0 we say we cant eat lunch😔 . This could be seen in the above image. as Conditions whenever the output is 0 our activation function should be below 0 when output is 1 the activation function should be greater than 0 .
Deriving equation for perceptron
image by author
Yedhant: Now lets substitute input values that we are given this will. Now using this equations we will try to find the weights which satisfy all the derived conditions. So the solved conditions we have are
w0<0 .......(1)
w2>-w0 .......(2)
w1>-w0 .......(3)
w1+w2>-w0 .......(4)
By using equation 1 we can say w0<0 ,lets assume w0 to be -1.
Now using the value of w0 on equation 2 to 4 we can say w2> 1,w1>1 and w1+w2>1. Now we need to find value of w1 and w2 such that it satisfy all 3 conditions. If we use w1 and w2 as 2 it will satisfy all the conditions.
So one set of possible solution can be w0=-1, w1=2, w2=2
We discovered that these weights define a line that perfectly separates positive inputs from negative inputs.
After applying the conditions, we obtained the equation of the line -1+2x1+2x2=0, in the following way: This means in the given perceptron the weights for two inputs are 2 and biasness of -1
Yue: Thats very easy , but do we need to solve these equation manually to get the weights and biasness
Perceptron Learning Algorithm
Yedhant: No, I just wanted to show how can we calculate the line to classify the points. Because once we get our hands dirty we know whats happening behind the scene. There are various supervised algorithms to determine weights and biasness for example Perceptron Learning Algorithm. We discussed this in the last metting if you remember
Yue: How can I forget? Suppose we want the robot to learn how to distinguish between ripe red tomatoes and unripe green tomatoes. We can give the robot different examples of tomatoes, some of which are ripe red and some of which are unripe green.
For example, we can show the robot a ripe red tomato and say "this is a ripe tomato" and then show it an unripe green tomato and say "this is an unripe tomato." The robot will make a guess based on the tomato's color, but it will likely make some mistakes at first.
When the robot makes a mistake, we correct it by telling it what the correct answer is. For instance, if the robot guesses that a ripe red tomato is unripe, we would correct it by saying "no, that's a ripe tomato."
With enough examples and corrections, the robot will gradually learn to distinguish between ripe red and unripe green tomatoes based on their color. Eventually, it will be able to accurately identify a ripe red tomato as a ripe tomato and an unripe green tomato as an unripe tomato.
Yedhant: That's correct. It seems like you really love tomatoes.
Yue: Yes, I do! Especially the ones I grow in my garden.
Yedhant: Oh I forgot about those. Your tomato treat is due
Yue: i will bring those next time for sure
Yedhant: Lets see 😛
Yedhant: Lets see 😛
Now back to perceptron learning algorithm as you know its a supervised learning algorithm used to train perceptron, as the algorithm uses labeled training data to learn the relationship between the inputs and the outputs.
Now lets use this to train our perceptron to determine if we could have a lunch or not 😛. Like previous time we dont have to create conditions and all, its straight forward.
To use the perceptron learning algorithm for the our function, we first need to randomly assign weights to the inputs and a bias value. Let's say we start with
w = [0.5, 0.5] and b = -0.5.
Now its the time to use perceptron equation as we know y= f*(W.X+b)
First lets focus on the values inside the bracket, we pass the input data (x) through the inner bracket which we can call by z as follow:
z = W * X + b
where * represents the dot product. The output of the perceptron (y_hat) is then determined using a step function which is our activation function (f) in the equation y=f*(W*X+b):
So after applying the activation function the output of the percepttron for the assume weights , bias and given input can be found out as
y_hat = step(z)
Step Function
where the step function is similar which we used before .That returns 1 if z is greater than or equal to 0, and returns 0 otherwise.
Now we have two values
The output derived by the perceptron equation y_hat
The actual output given in initial data
By comparing both output we will update the weights and bias accordingly. If predicted output (y_hat) is equal to actual output(y) we already have optimal weight and bias we dont need to update anything. But if predicted output(y_hat) is not equal to actual output (y) , we adjust the weights and bias.
To update the wights we will add the difference in proportion to difference between actual output and predicted output, to further control the size of the update we make on weights we may use learning rate (alpha).
w = w + alpha * (y - y_hat) * x
Similarly we update the bias with following formula
b = b + alpha * (y - y_hat)
Selecting optimal value of learning rate is very important . Because when the learning rate is too high, the weights can oscillate or diverge, leading to poor performance. On the other hand, if the learning rate is too low, the algorithm may converge very slowly or get stuck in a local minimum.
We keep repeating this process, passing more input data through the perceptron and adjusting the weights and bias values as needed, until the perceptron is able to accurately classify all of the input data for the OR function.
Those interested in exploring and experimenting with the Python code you can find is implementation in our code Store- OR gate Perceptron
Limitations of Perceptron
Yue: That was so easy implementation just 5 steps and its done
{Initialize the weights & Bias > Make a prediction > Calculate the error > Update the weights} >>>> Rinse and Repeat
Yedhant: You are absolutely correct , just these 5 steps unless prediction is equal to actuals and your perceptron is ready
Yue: But nothing can be this perfect I can sense some of the limitations of the perceptron algorithm?
Yedhant: Indeed there are, Yue! Well, one of the main limitations of the perceptron algorithm is that it can only classify data that can be separated by a straight line or hyperplane in higher dimensions. If the data is too complex or not linearly separable, the algorithm won't work well.
Yue: Oh, I get it now. What else should I know?
Yedhant: Another limitation is that the algorithm requires labeled data, which means that each training example must have a label indicating its correct class. This can be a bit of a hassle and time-consuming to obtain, especially for large datasets.
Yue: I see. Anything else?
Yedhant: Yes, the initial weights and bias values can have a big impact on the performance of the algorithm. It can also only find one solution to a classification problem, which may not always be the best one. Other algorithms, like support vector machines, can find multiple solutions and choose the best one based on a certain criteria.
Yue: So what can we do to overcome some of these limitations?
Yedhant: If you're looking for ways to approach this, there are various options. However, one of the efficient technique is to utilize a multi-layer perceptron, which is also referred to as an MLP. This is a commonly used and uncomplicated type of neural network, sometimes even called a vanilla neural network due to its simplicity relative to other advanced neural networks, similar to your favorite ice cream.
Yue: yes yes you mentioned this earlier I am too excited to learn about it.
Yedhant- Are you ready to take a trip back in time to 1986, the year that sparked a revolution in the field of artificial intelligence? That's when Hinton, Rumelhart, and Williams unleashed their groundbreaking paper "Learning representations by back-propagating errors", which give rebirth to the Multilayer Perceptrons (MLPs) we know and love today. Even tough the first mlp was introduced in 1960s as simple feed forwardforward neural network.
But what exactly are MLPs? Think of them as a team of superhero neurons, each with their own special powers, working together to tackle complex problems. They're like the Avengers of the neural network world, ready to take on any challenge that comes their way. So, are you ready to join forces with the MLPs and dive deep into the exciting world of deep learning?
Yue- Aye Aye sir
Yedhant- By name itself you can guess what mlp (multi layer perceptron ) is. So to know more lets get back to perceptron, which we already know about. Now lets take that perceptron and put it in one ball ( he snapped his finger)
Single Perceptron
image by author
Abra ka dabra
Single concised perceptron
image by author
Yue: Ok
Yedhant: Now imagine multiple layers of this ball of perceptron(yedhant clcked his fingers) stacked on top of each other as below. You see, the multi-layer perceptron is just a bunch of perceptrons stacked together in layers. Each layer takes the output from the previous layer and uses it as input, allowing the network to learn more complex features and patterns. It's like a delicious layer cake, but instead of frosting and sprinkles, we have neurons and activation functions!
Yue: "Haha, I love that analogy. So with MLP, we can overcome the limitations of the perceptron algorithm and handle more complex tasks due to these interconnected multiple layers of neurons?"
Yedhant: "You are correct, In MLP these multiple layers of interconnected neurons can learn to recognize complex patterns in the data, even when they're not obvious. This makes it useful for lots of different complex tasks, like classifying images or predicting prices.
And the great thing about MLP is that it can find different solutions to a problem. This gives us more options and flexibility when we're building our models.
Yue: Oh wow, MLP sounds like an amazing tool! I'm really excited to learn more about how it can handle complex data and find different solutions to problems.
Single concised perceptron
image author
Multi Layer Perceptron
image by author
MLP layers
Yedhant: Then let's try to analyze these stacked balls .
If you notice in the image, MLP consists of three or more layers of neurons, namely the Input Layer, one or more Hidden Layers, and the Output Layer. Each neuron in the input layer corresponds to one input feature, such as a pixel value in an image or a word in a sentence. The neurons in the hidden layers and output layer perform computations on the input data and produce intermediate and final outputs, respectively.
In detail we can describe mlp layers as:
Input layer: The input layer of an MLP is the first layer in the network, responsible for receiving the raw input data and passing it on to the next layer for processing.
Hidden layer: Hidden layers are the intermediate layers between the input and output layers. They perform complex transformations on the input data to extract useful features and create more abstract representations of the data.
Output layer: The output layer is the final layer in the network and produces the final output of the network. The number of nodes in the output layer depends on the type of task the network is being trained to perform. For example, in a binary classification task, the output layer would have one node that produces a binary output (0 or 1), while in a multi-class classification task, the output layer would have multiple nodes, each corresponding to a different class label.
Each layer in MLP is fully connected to the next layer, meaning that each neuron in a layer is connected to every neuron in the next layer. The connections between the neurons have weights that determine the strength of the signal transmitted through them. FOr example
Yue: Interesting, So basically these layers are an essential component of MLP. They enable the network to perform complex transformations on the input data and create more abstract representations of the data. But how does the information flows among these layers
Yedhant clicked his finger
Multi Layer Perceptron Feed Forward
image by author
Yedhant: Thats a good question let's visualize it as whats better way to learn, if you see in the set of image here (Multi Layer Perceptron Feed Forward image 1-3) you can see that the information flows from the input layer, through one or more hidden layers, to the output layer. The output of each neuron in one layer is the input to the neurons in the next layer, and there are no feedback loops between the layers, these type of networks are also called as feedforward neural network.
Here are the steps involved in the feedforward process of a MLP:
Input layer: The input layer receives the input data. Each neuron in the input layer represents a feature or an attribute of the input data.
Hidden layers:The input data is passed through one or more hidden layers, where each neuron in a hidden layer receives input from all the neurons in the previous layer. The hidden layers use activation functions to transform the input into a form that is more useful for the output layer.
Output layer: The output layer produces the final output of the MLP. Each neuron in the output layer represents a class or a value that the network is trying to predict. The activation function used in the output layer depends on the problem being solved. For example, for a classification problem, a softmax function may be used to produce probabilities for each class, while for a regression problem, a linear activation function may be used to produce a continuous value.
Yue: Wow, that's a great way to explain it! It's like the input data goes through a journey through the different layers, getting transformed and refined along the way, until it finally reaches the output layer and produces the final prediction. It's like a little adventure for the data!
Yedhant: Yes, exactly! And just like in an adventure, each step along the way is important and can affect the final outcome. That's why it's so important to choose the right activation functions and set up the layers properly. Each neuron in MLP applies an activation function to the weighted sum of its inputs to produce its output. You might recall activation function from our last meeting, overview of the few commonly used activation functions are as follows:
Sigmoid: A common activation function that produces a sigmoid-shaped curve. It is used for binary classification tasks.
ReLU (Rectified Linear Unit): A popular activation function that returns the input if it is positive and zero otherwise. It is used in most cases where non-linearity is required.
Tanh (Hyperbolic Tangent): An activation function that produces a curve similar to the sigmoid function but centered at zero. It is used for multi-class classification tasks.
Softmax: An activation function used in the output layer of MLP for multi-class classification tasks. It produces a probability distribution over the output classes.
With enough practice and training, these MLP ninjas can become some of the best predictors out there
Yue: That's so exciting! It's like the MLP is a little genius-in-the-making, learning and improving with every piece of input data it receives.
Yedhant: Yes, exactly! It's like we're training a little neural ninja to be the best predictor it can be. And to train these MLP ninjas we use supervised learning algorithm such as backpropagation. During training, the weights of the connections between the neurons are adjusted in order to minimize the difference between the predicted output and the actual output. For this the MLP goes through the input-output pairs again and again, adjusting its weights and biases each time, until it becomes a master at making accurate predictions. And just like in ninja training, repetition is key!
Yue: Haha, a neural ninja! I love that analogy. It's like we're teaching it martial arts, but instead of punches and kicks, it's learning how to recognize patterns and make predictions. And the more it trains, the more accurate it becomes. To train these mlp ninjas we update the weights of the connections between the neurons, we use optimization algorithm such as backpropagation. With The goal in mind to minimize the difference between the predicted output and the actual output. I guess practice really does make perfect, even for little MLP ninjas.
Yedhant: Exactly! Isn't this amazing how much our little MLP ninjas can learn through the backpropagation algorithm. It's like they're absorbing knowledge and getting better with each iteration of training. And just like in martial arts, it's all about mastering the fundamentals first before moving on to more advanced techniques. That's why the backpropagation algorithm is so important, as it allows the MLP to learn from its mistakes and continually refine its predictions. The backpropgation algorithm works to train MLP ninjas by adjusting the weights of the connections between neurons so that the network produces the desired output for a given input. This is done by minimizing a loss function that measures the difference between the predicted output and the actual output.
The most commonly used training algorithm for MLPs is backpropagation, which is a gradient-based optimization algorithm. The algorithm works by computing the gradient of the loss function with respect to the weights of the network, and then updating the weights in the direction of the negative gradient. This process is repeated for a fixed number of epochs, or until the validation error stops improving.
Backpropagation involves two main steps: forward propagation and backward propagation.
In forward propagation, the input data is fed through the network, and the output is calculated.
In backward propagation, the error between the predicted output and the actual output is propagated backwards through the network, and the gradients of the loss function with respect to the weights are calculated using the chain rule.
Yue: Whoa, that was quick! You make it sound so easy. But I'm still a bit confused about how exactly backpropagation works. Can you break it down for me?
Yedhant: Sure thing, Yue! Backpropagation is essentially an algorithm for adjusting the weights of the connections between the neurons in the network, based on how much they contribute to the error in the output.
Yue: Okay, I think I get it. So, we first forward propagate the input data through the network to get the predicted output, and then we compare it to the actual output to compute the error. Is that right?
Yedhant: Absolutely! And once we have the error, we can use the chain rule to calculate the gradient of the error with respect to each weight in the network. This tells us how much each weight contributes to the error, and in which direction we should adjust it to reduce the error.
Yue: That makes sense. But how do we actually update the weights based on the gradient?
Yedhant: Good question! We update the weights using a technique called gradient descent, which involves taking a small step in the direction of the negative gradient. This helps us move towards the minimum of the error surface, where the error is lowest.
Yue: Wow, this is starting to sound really like trainig a ninja routine! So we keep doing this over and over until we reach the minimum, or until we reach a certain number of epochs, right?
Yedhant: That's right, Yue! And the number of epochs can vary depending on the complexity of the problem and the size of the dataset. But once the MLP is trained, it can be used to make predictions on new input data with high accuracy by passing it through the feedforward process. We can say our ninja is ready to protect the innocents.
Yue: This is preatty cool algorithm, can we educate a network and see how that works.
MPL Robotic mathematical example
Yedhant: Thats an awesome Idea! Lets see how this algorithms are trained in details by taking a fun example. Lets move from ninjas to robots which are my second love, for this example we will be training a robot. Imagine you have a little robot named Robbie who loves to move around but is very clumsy and keeps bumping into obstacles. We want to help pooe robbie navigate his way through a room without hitting anything.
Robbie has a special ultrasonic sensor that can detect obstacles within a certain range. The sensor tells Robbie how far away the nearest obstacle is in front of him.
Yue: So are we designing mlp to help robbie navigate (had a shiny eyes) 😍
Yedhant : Yes To help Robbie navigate, we have decide to build a little brain for him called a Multi-Layer Perceptron (MLP). This brain will take in the ultrasonic sensor reading as input and give Robbie a decision on which direction to move in as output.
The MLP brain is made up of three parts:
Input layer - This is where Robbie's ultrasonic sensor reading is fed into the brain.
Hidden layer - this is where the brain does some calculations based on the input to try and figure out what direction Robbie should move in. It has 2 hidden layers each having 5 little neurons that work together to make this decision.
Output layer - this is where the brain gives Robbie his final decision on which way to move. There is only one neuron here that outputs a value between 0 and 1, where 0 means "move left" and 1 means "move right".
To train the MLP brain, we need to give it some examples of what it should do in different situations. You start by showing Robbie a few different scenarios where he needs to move either left or right to avoid obstacles.
For each scenario, you record the ultrasonic sensor reading and the correct output decision (i.e., left or right). You use these examples to teach the MLP brain how to make the correct decision based on the ultrasonic sensor reading.
The brain learns by adjusting the weights and biases between its neurons to minimize its mistakes. It uses a technique called gradient descent to do this, which is like a little brain exercise that helps it get better over time.
When Robbie encounters a new obstacle, his ultrasonic sensor sends a reading to the MLP brain. The brain does some calculations and decides whether he should move left or right to avoid the obstacle. Robbie then follows the brain's decision and moves in the chosen direction.
With enough training and practice, Robbie becomes an expert at avoiding obstacles and can navigate his way around the room without any mishaps!
Yue: How does it do it?
Yedhant : (Snaps his finger) the designed mlp can be visualized as this
MLP Neural Network for robot navigation
Yedhant:
We have seen earlier equation of perceptron is
y= f*(W.X+b)
Using this we can say
The output for the mlp for robot navigation can be given by following equation
y = σ(W3 * σ(W2 * σ(W1 * x + b1) + b2)+b3)
where sigmoid(z) is the sigmoid function(which we have seen in ann) applied element-wise to the vector z:
sigmoid(z) = 1 / (1 + exp(-z))
where:
x is the input for ultrasonic sensor
σ is the sigmoid activation function, which maps any real value to the range [0,1] = 1 / (1 + exp(-z))
W1 is a 5x1 weight matrix that connects the input layer to the first hidden layer, with dimensions (5,1)
b1 is a 5x1 bias vector for the first hidden layer, with dimensions (5,1)
W2 is a 1x5 weight matrix that connects the first hidden layer to the second hidden layer, with dimensions (5, 5).
b2 is the bias vector for the second hidden layer, with dimensions (5, 1)
W3 is the weight matrix connecting the second hidden layer to the output layer, with dimensions (1, 5).
b3 is the bias scalar for the output layer.
Yue: We use backpropogation to update weights and bias , right?
Yedhant: Yes , as we know backpropagation is an algorithm used to calculate the gradient of the error function with respect to the weights and biases in a neural network. The gradient is used to update the weights and biases in the network so that the network can learn from the training data and make better predictions.
We apply backpropogation in the following matter:
Repeat until the error is minimal
Random: We assign random weights and biasses when we create MLP neural network
Forward pass: We start by feeding a training example through the network, one data point at a time, and compute the output of each neuron layer by layer, using the current values of the weights and biases.
Compute the predicted output using the sigmoid function
Compute the loss using the square error loss function: We compute the error between the predicted output and the actual output, which we know from the training data. For this example, since we have only one output neuron, we can use the mean squared error (MSE) as our loss function:
mse = (predicted_output - actual_output)^2 / 2
Backward pass: We propagate the error back through the network, layer by layer, to calculate the gradients of the loss function with respect to the weights and biases. We use the chain rule of calculus to do this. For example, to compute the gradient of the loss function with respect to a weight in the output layer, we first compute the gradient of the loss function with respect to the output of the neuron, then the gradient of the output with respect to the weighted sum of inputs to the neuron, and finally the gradient of the weighted sum with respect to the weight.
Update the weights and biases: We use the gradients we just computed to update the values of the weights and biases, moving them in the direction of steepest descent of the loss function. This update is performed by multiplying the gradient by a learning rate and subtracting the result from the current value of the weight or bias. The learning rate determines how big a step we take in the direction of the gradient.
W(new) = W(old) — α ∆W
B(new) = B(old) — α ∆B
Repeat until the error is minimal: We repeat steps for each training example in our dataset, and we do this for multiple epochs until the network converges to a set of weights and biases that minimize the loss function on the training data.
Yue: Can we have some numerical example
Yedhant: Thats a great idea
Let's assume we have the following values for the neural network:
Ultrasonic sensor reading: 0.7
Expected output: move right (1)
Output value: 0.62
Learning rate: 0.1
Bias values: b1 = [0.1, -0.3, 0.4, -0.2, 0.3], b2 = [0.2, -0.1, 0.2, -0.3, 0.1], b3=-0.1
Weights : w1 = [[0.2, 0.3, 0.1, 0.5, 0.6]]
w2 = [[0.7, 0.8, 0.3, 0.1, 0.9],[0.4, 0.2, 0.6, 0.8, 0.3],[0.9, 0.1, 0.4, 0.5, 0.6],[0.5, 0.7, 0.8, 0.2, 0.1],[0.2, 0.9, 0.1, 0.6, 0.7]]
w3 = [[0.6],[0.8],[0.3],[0.9],[0.2]]
let's consider a single neuron in the output layer and backpropagate the error through the network to understand how it works . We will be updating the weights and bias marked in the mpl
Yedhant:
To perform backpropagation, we need to first calculate the error at the output neuron:
error = (expected output - actual output) * actual output * (1 - actual output)
= (1 - 0.62) * 0.62 * (1 - 0.62)
= 0.08972
Next, we can calculate the error at the neuron in the 2nd hidden layer that is connected to the output neuron:
hidden2_error = error * w3[0][0] * hidden2_output * (1 - hidden2_output)
= 0.08972 * 0.6 * 0.46324 * (1 - 0.46324)
= 0.01058
where hidden2_output is the output value of the neuron in the 2nd hidden layer that is connected to the output neuron, and it is calculated as:
hidden2_output = sigmoid(np.dot(w2[0], hidden1_output) + b2[0])
= sigmoid(0.7*0.46324 - 0.1)
= 0.61534
where sigmoid is the sigmoid activation function, and np.dot is the dot product operation.
Similarly, we can calculate the error at the neuron in the 1st hidden layer that is connected to the neuron in the 2nd hidden layer:
hidden1_error = hidden2_error * w2[0][0] * hidden1_output * (1 - hidden1_output)
= 0.01058 * 0.7 * 0.51466 * (1 - 0.51466)
= 0.00124
where hidden1_output is the output value of the neuron in the 1st hidden layer that is connected to the neuron in the 2nd hidden layer, and it is calculated as:
hidden1_output = sigmoid(np.dot(w1[0], input) + b1[0])
= sigmoid(0.2*0.7 + 0.1)
= 0.55797
Now that we have calculated the errors at the neurons, we can update the weights and biases using the learning rate and the error values:
w3[0][0] += learning_rate * error * hidden2_output
= 0.6 + 0.1 * 0.08972 * 0.46324
= 0.63196
w2[0][0] += learning_rate * hidden2_error * hidden1_output
= 0.7 + 0.1 * 0.01058 * 0.51466
= 0.70011
w1[0][0] += learning_rate * hidden1_error * input
= 0.2 + 0.1 * 0.00124 * 0.7
= 0.20009
b3 += learning_rate * error
= -0.1 + 0.1 * 0.08972
= -0.09003
b2[0] += learning_rate * hidden2_error
= 0.2 + 0.1 * 0.01058
= 0.20106
b1[0] += learning_rate * hidden1_error
= 0.1 + 0.1 * 0.00124
= 0.10012
Similarly all the weights and biases are updated , these updated weights and biases can now be used to calculate the output of the neural network for the next input.
Yue: WOW thats pretty cool backpropogation looks like a very good guide for training MPL. And I bet it's also important to have a good guide, like you, to help explain all of this even to a robot!
Yedhant: Haha, well I'm happy to be your guide on this journey through the world of MLPs. Who knows where our next adventure will take us? Maybe to the world of convolutional neural networks or recurrent neural networks!
Yue: Yue: Ooh, that sounds exciting! Let's keep exploring and learning together!
Yedhant: Yes lets meet next week at the same time same place for new adventure..
Yue: Hope I content my excitement till next week, for our new adventure
Yedhant: Same here, tut don't forget to bring those juicy tomatoes we talked about.
Yue: Absolutely, I'll make sure to bring them. Alright, have a wonderful day, Yedhant! See you next week!
Yedhant: You too, Yue! Take care!
Those interested in exploring and experimenting with the Python code you can find is implementation in our code Store- MLP Example: Robot navigation using ultrasonic sensor