All code snippets discussed in this blog post is present in my GitHub repository. Try to understand the entire blog post and get your dirty with code snippets.
Code Compatibility : Python 2.7 , tested on ubuntu 16.04
In my blog post about “Introduction to neural network”, we saw how Artificial Neural Network (ANN) functions. In this blog post we will see how ANN works actually with coding in python step by step.
Figure 1. Artificial Neural Network to solve XOR gate
Lets first we discuss example on which we will apply ANN. I take a simple & complex example of XOR gate
Figure 2. XOR gate truth table
Lets see why I use contradicting title "simple & complex".
Why Simple?? It takes two inputs and produce one output, so when we will code the same, we can very easily understand.
Why complex?? XOR gate is non linear compare to other logical gates like OR,AND,and NOR gate. Looking at complexity given in figure 3 we can easily infer that single straight line can separate outputs in AND & OR gate but it cannot differentiate between two outputs in XOR gate.There is no single function that produces a hyper-plane capable of separating the points of the XOR function. The curve in the image separates the points, but it is not a function.
Figure. 3 (A) AND gate & OR Gate are linearly separable whereas XOR is non-linear
Figure. 3 (B) Non linear character of xor gate
Figure. 3(C) One plane cannot separate classes of XOR gate
To separate the points of XOR, you'll have to use at least two lines (or any other shaped functions). This will require two separate perceptrons. Then, you could use a third perceptron to separate the intermediate results on the basis of sign.
Its very clear that one perceptron that cannot solve XOR function, we will design the smallest network that solves this function.
We will design our network as given in Figure 1. with two input perceptrons two hidden perceptrons and one output perceptron. Typical Neural Network having three components:
1) Perceptron to take input, store intermediate result and output.
2) Weight - connects two preceptrons; can be compared with memory, weights get updated through training and helps in learning.
3) Bias - can also be compared with memory, bias get updated through training and helps in learning.
We will write python code to solve XOR gate and we will use following steps through this tutorial
1) Training data description
2) Initialisation
3) Activation
4) Calculating Error
5) Calculating Updates / Backpropagation
5.1 Calculating Changes in hidden to outer layer
5.2 Calculating Changes in input to hidden layer
6) Updating all parameters
Overall flow for one iteration can be represented as below:
Figure 4. Flowchart : Neural Network with back-propagation
Step 1. Training data description
XOR = [[0, 1, 1], [1, 1, 0], [1, 0, 1], [0, 0, 0]]
In Code XOR[i][0], XOR[i][1] will give input 1 and 2 and XOR[i][2] will give output for any value of i:[0,1,2,3] Where in each sub array first two element represents input 1 and 2 and third element represent desired output.
For below given python snippet, following symbol will be used:
dwij change in weight between curret layer perceptron i and next layer perceptron j - Δwij
dti change in bias of perceptron i equivalent to - Δθi
wij actual weight between curret layer node i and next layer node j equivalent to - wij
t actual bias equivalent to - θi
deli partial error at perceptron i equivalent to δi
y Output at current perceptron e.g. y3 output at perceptron 3
Step 2. Initialisation
Set all the weights and threshold levels of the network to random numbers uniformly distributed inside a small range (0-1):
In our python solution I did it manually
w13 = 0.5 w14 = 0.9 w23 = 0.4 w24 = 1.0 w35 = -1.2 w45 = 1.1 t3 = 0.8 t4 = -0.1 t5 = 0.3
Step 3. Activation
lets say we have two inputs, 1 and 2 and we have to calculate forward pass for the same at point 3, we do it in following way.
Figure 5. Forward pass calculation
here j = 3 and i = 1,2 so our equation for forward pass result at 3 would be sigmoid (X1 * W13 + X2 * W23 - θ3). Sigmoid function can be represented by :
Figure 6. Sigmoid function
over all process by equation it can be represented as :
Figure 7. Sigmoid function for forward pass calculation
and python code for forward pass can be written as:
y3 = 1 / (1 + math.exp(-((XOR[i][0] * w13) + (XOR[i][1] * w23-t3)))) y4 = 1 / (1 + math.exp(-(XOR[i][0] * w14 + XOR[i][1] * w24-t4))) y5 = 1 / (1 + math.exp(-(y3 * w35 + y4 * w45-t5)))
Step 4. Calculating Error
After calculating output at perceptron 3,4 and 5. y5 is our final output. Error from the output can be calculate as : error = ActualOutput – y5 For our problem in Pythonic way it can be written as error = XOR[i][2] – y5
Step 5. Calculating Updates / Backpropagation
5.1 Calculating Changes in hidden to outer layer
The error we got at y5 is total error of the network. To make update we require partial error at each of output and hidden layer perceptron (perceptron 3,4 & 5) . calculating partial error to each is done by for output percepton it is given by δOutput = output * (1 – output) * error and in python the same is written as del5 = y5 * (1 - y5) * error
Figure 8. Partial error / weight change calculation
once we got partial error for the output layer, we can change weight between output and hidden layer as:
Figure 9. change in weights
Where,
α is the learning rate Ybi is output of perceptron i in layer b δbi is partial error of perceptron j in layer c
In python the same is calculated as
dw35 = alpha * y3 * del5 dw45 = alpha * y4 * del5
For the hidden layers partial error are calculated as r represented by following :
Figure 10. Equation for partial error calculation
Where,
δbi is partial error for perceptron i for given hidden layer b, here δb1 = partial error perceptron 3
Ybi is output for perceptron i for given hidden layer b, here Yb1 = output at perceptron 3
δcj is partial error of perceptron j in next layer c, here δc1 = partial error at perceptron
Wij = weight between current layer perceptron i and next layer perceptron j
In pythonic way partial error can be calculated as:
del3 = y3 * (1 - y3) * del5 * w35 del4 = y4 * (1 - y4) * del5 * w45
Bias are represented by θ,
Δθbi = α * (-1) * δbi ,
where,
δbi is partial error for perceptron i for given hidden layer b, here δb1 = partial error perceptron 5
θbi is bias for perceptron i for given hidden layer b, here θb1 = bias perceptron 5
In pythonic way bias can be calculated as:
dt5 = alpha * (-1) * del5
5.2 Calculating Changes in input to hidden layer
Weight and bias changes will be calculated in similar way as we did for all components between hidden to outer layer:
dw13 = alpha * XOR[i][0] * del3
dw23 = alpha * XOR[i][1] * del3
dt3 = alpha * (-1) * del3
dw14 = alpha * XOR[i][0] * del4
dw24 = alpha * XOR[i][1] * del4
dt4 = alpha * (-1) * del4
6 Updating all parmaters
Wij(p+1) = Wij(p) +ΔWij(p+1) θi(p+1)= Δθi(p) + Δθi(p+1)
Where,
p is the current iteration and p+1 is the next iteration, changes are added to weights to become effective in next iteration.
Pythonically it is written as : w14 = w14 + dw14 w23 = w23 + dw23 w24 = w24 + dw24 w35 = w35 + dw35 w45 = w45 + dw45 t3 = t3 + dt3 t4 = t4 + dt4 t5 = t5 + dt5
Below given python code is the overall implementation of XOR gate using ANN
import math
"""
defining XOR gate, [x1, x2 , y]
"""
XOR = [[0, 1, 1], [1, 1, 0], [1, 0, 1], [0, 0, 0]]
# initializing weights
w13 = 0.5
w14 = 0.9
w23 = 0.4
w24 = 1.0
w35 = -1.2
w45 = 1.1
t3 = 0.8
t4 = -0.1
t5 = 0.3
# defining learning rate
alpha = 0.5
# initializing squaredError
squaredError = 0
# initializing error per case
error = 0
# defining epochs
Epochs = 2000
count = 0
# run this repeatedly for number of Epochs
for j in range(Epochs):
print "squaredError", squaredError
# initializing squaredError per epoch
squaredError = 0
for i in range(4): # iterating through each case for given iteration
"""
calculating output at each perceptron
"""
y3 = 1 / (1 + math.exp(-((XOR[i][0] * w13) + (XOR[i][1] * w23))))
y4 = 1 / (1 + math.exp(-(XOR[i][0] * w14 + XOR[i][1] * w24)))
y5 = 1 / (1 + math.exp(-(y3 * w35 + y4 * w45)))
"""
calculating error
"""
error = XOR[i][2] - y5
"""
calculating partial error and change in weight for output and hidden perceptron
"""
del5 = y5 * (1 - y5) * error
dw35 = alpha * y3 * del5
dw45 = alpha * y4 * del5
dt5 = alpha * (-1) * del5
"""
calculating partial error and change in weight for input and hidden perceptron
"""
del3 = y3 * (1 - y3) * del5 * w35
del4 = y4 * (1 - y4) * del5 * w45
dw13 = alpha * XOR[i][0] * del3
dw23 = alpha * XOR[i][1] * del3
dt3 = alpha * (-1) * del3
dw14 = alpha * XOR[i][0] * del4
dw24 = alpha * XOR[i][1] * del4
dt4 = alpha * (-1) * del4
"""
calculating weight and bias update
"""
w13 = w13 + dw13
w14 = w14 + dw14
w23 = w23 + dw23
w24 = w24 + dw24
w35 = w35 + dw35
w45 = w45 + dw45
t3 = t3 + dt3
t4 = t4 + dt4
t5 = t5 + dt5
"""
Since y5 will be in float number between (0 - 1)
Here we have used 0.5 as threshold, if output is above 0.5 then class will be 1 else 0
"""
if y5 < 0.5:
class_ = 0
else:
class_ = 1
"""
uncomment below line to see predicted and actual output
"""
# print ("Predicted",class_," actual ",XOR[i][2])
"""
calculating squared error
"""
squaredError = squaredError + (error * error)
if squaredError < 0.001:
# if error is below 0.001, terminate training (premature termination)
break
If you print squared error for all iterations you will get following plot :
Figure 11. with progress in epochs ANN decreases Error, signifies learning
I know this tutorial is not one time digestible, just got through it, get your hands dirty in code. Ultimately you will win.
Here it took pretty long time 2000 epochs to learn the XOR gate. practically with large data-set this much time is not desirable.
In next blog post we will see how to apply some of the well known optimisation techniques to this raw code of ANN to solve XOR in very less computational time.
Comments