Neural Network Implementation

All code snippets discussed in this blog post is present in my GitHub repository. Try to understand the entire blog post and get your dirty with code snippets.

Code Compatibility : Python 2.7 , tested on ubuntu 16.04

In my blog post about “Introduction to neural network”, we saw how Artificial Neural Network (ANN) functions. In this blog post we will see how ANN works actually with coding in python step by step.

Figure 1. Artificial Neural Network to solve XOR gate

Lets first we discuss example on which we will apply ANN. I take a simple & complex example of XOR gate

Figure 2. XOR gate truth table

Lets see why I use contradicting title "simple & complex".

Why Simple?? It takes two inputs and produce one output, so when we will code the same, we can very easily understand.

Why complex?? XOR gate is non linear compare to other logical gates like OR,AND,and NOR gate. Looking at complexity given in figure 3 we can easily infer that single straight line can separate outputs in AND & OR gate but it cannot differentiate between two outputs in XOR gate.There is no single function that produces a hyper-plane capable of separating the points of the XOR function. The curve in the image separates the points, but it is not a function.

Figure. 3 (A) AND gate & OR Gate are linearly separable whereas XOR is non-linear

Figure. 3 (B) Non linear character of xor gate

Figure. 3(C) One plane cannot separate classes of XOR gate

To separate the points of XOR, you'll have to use at least two lines (or any other shaped functions). This will require two separate perceptrons. Then, you could use a third perceptron to separate the intermediate results on the basis of sign.

Its very clear that one perceptron that cannot solve XOR function, we will design the smallest network that solves this function.

We will design our network as given in Figure 1. with two input perceptrons two hidden perceptrons and one output perceptron. Typical Neural Network having three components:

1) Perceptron to take input, store intermediate result and output.

2) Weight - connects two preceptrons; can be compared with memory, weights get updated through training and helps in learning.

3) Bias - can also be compared with memory, bias get updated through training and helps in learning.

We will write python code to solve XOR gate and we will use following steps through this tutorial

1) Training data description

2) Initialisation

3) Activation

4) Calculating Error

5) Calculating Updates / Backpropagation

5.1 Calculating Changes in hidden to outer layer

5.2 Calculating Changes in input to hidden layer

6) Updating all parameters

Overall flow for one iteration can be represented as below:

Figure 4. Flowchart : Neural Network with back-propagation

Step 1. Training data description

XOR = [[0, 1, 1], [1, 1, 0], [1, 0, 1], [0, 0, 0]]

In Code XOR[i][0], XOR[i][1] will give input 1 and 2 and XOR[i][2] will give output for any value of i:[0,1,2,3] Where in each sub array first two element represents input 1 and 2 and third element represent desired output.

For below given python snippet, following symbol will be used:

dwij change in weight between curret layer perceptron i and next layer perceptron j - Δwij

dti change in bias of perceptron i equivalent to - Δθi

wij actual weight between curret layer node i and next layer node j equivalent to - wij

t actual bias equivalent to - θi

deli partial error at perceptron i equivalent to δi

y Output at current perceptron e.g. y3 output at perceptron 3

Step 2. Initialisation

Set all the weights and threshold levels of the network to random numbers uniformly distributed inside a small range (0-1):

In our python solution I did it manually

w13 = 0.5 w14 = 0.9 w23 = 0.4 w24 = 1.0 w35 = -1.2 w45 = 1.1 t3 = 0.8 t4 = -0.1 t5 = 0.3

Step 3. Activation

lets say we have two inputs, 1 and 2 and we have to calculate forward pass for the same at point 3, we do it in following way.

Figure 5. Forward pass calculation

here j = 3 and i = 1,2 so our equation for forward pass result at 3 would be sigmoid (X1 * W13 + X2 * W23 - θ3). Sigmoid function can be represented by :

Figure 6. Sigmoid function

over all process by equation it can be represented as :

Figure 7. Sigmoid function for forward pass calculation

and python code for forward pass can be written as:

y3 = 1 / (1 + math.exp(-((XOR[i][0] * w13) + (XOR[i][1] * w23-t3)))) y4 = 1 / (1 + math.exp(-(XOR[i][0] * w14 + XOR[i][1] * w24-t4))) y5 = 1 / (1 + math.exp(-(y3 * w35 + y4 * w45-t5)))

Step 4. Calculating Error

After calculating output at perceptron 3,4 and 5. y5 is our final output. Error from the output can be calculate as : error = ActualOutput – y5 For our problem in Pythonic way it can be written as error = XOR[i][2] – y5

Step 5. Calculating Updates / Backpropagation

5.1 Calculating Changes in hidden to outer layer

The error we got at y5 is total error of the network. To make update we require partial error at each of output and hidden layer perceptron (perceptron 3,4 & 5) . calculating partial error to each is done by for output percepton it is given by δOutput = output * (1 – output) * error and in python the same is written as del5 = y5 * (1 - y5) * error

Figure 8. Partial error / weight change calculation

once we got partial error for the output layer, we can change weight between output and hidden layer as:

Figure 9. change in weights

Where,

α is the learning rate Ybi is output of perceptron i in layer b δbi is partial error of perceptron j in layer c

In python the same is calculated as

dw35 = alpha * y3 * del5 dw45 = alpha * y4 * del5

For the hidden layers partial error are calculated as r represented by following :

Figure 10. Equation for partial error calculation

Where,

δbi is partial error for perceptron i for given hidden layer b, here δb1 = partial error perceptron 3

Ybi is output for perceptron i for given hidden layer b, here Yb1 = output at perceptron 3

δcj is partial error of perceptron j in next layer c, here δc1 = partial error at perceptron

Wij = weight between current layer perceptron i and next layer perceptron j

In pythonic way partial error can be calculated as:

del3 = y3 * (1 - y3) * del5 * w35 del4 = y4 * (1 - y4) * del5 * w45

Bias are represented by θ,

Δθbi = α * (-1) * δbi ,

where,

δbi is partial error for perceptron i for given hidden layer b, here δb1 = partial error perceptron 5

θbi is bias for perceptron i for given hidden layer b, here θb1 = bias perceptron 5

In pythonic way bias can be calculated as:

dt5 = alpha * (-1) * del5

5.2 Calculating Changes in input to hidden layer

Weight and bias changes will be calculated in similar way as we did for all components between hidden to outer layer:

dw13 = alpha * XOR[i][0] * del3

dw23 = alpha * XOR[i][1] * del3

dt3 = alpha * (-1) * del3

dw14 = alpha * XOR[i][0] * del4

dw24 = alpha * XOR[i][1] * del4

dt4 = alpha * (-1) * del4

6 Updating all parmaters

Wij(p+1) = Wij(p) +ΔWij(p+1) θi(p+1)= Δθi(p) + Δθi(p+1)

Where,

p is the current iteration and p+1 is the next iteration, changes are added to weights to become effective in next iteration.

Pythonically it is written as : w14 = w14 + dw14 w23 = w23 + dw23 w24 = w24 + dw24 w35 = w35 + dw35 w45 = w45 + dw45 t3 = t3 + dt3 t4 = t4 + dt4 t5 = t5 + dt5

Below given python code is the overall implementation of XOR gate using ANN

import math
"""
defining XOR gate, [x1, x2 , y]
"""
XOR = [[0, 1, 1], [1, 1, 0], [1, 0, 1], [0, 0, 0]]
# initializing weights
w13 = 0.5
w14 = 0.9
w23 = 0.4
w24 = 1.0
w35 = -1.2
w45 = 1.1
t3 = 0.8
t4 = -0.1
t5 = 0.3
# defining learning rate
alpha = 0.5
# initializing squaredError
squaredError = 0
# initializing error per case
error = 0
# defining epochs
Epochs = 2000
count = 0
# run this repeatedly for number of Epochs
for j in range(Epochs):
print "squaredError", squaredError
# initializing squaredError per epoch
squaredError = 0
for i in range(4): # iterating through each case for given iteration
"""
calculating output at each perceptron
"""
y3 = 1 / (1 + math.exp(-((XOR[i][0] * w13) + (XOR[i][1] * w23))))
y4 = 1 / (1 + math.exp(-(XOR[i][0] * w14 + XOR[i][1] * w24)))
y5 = 1 / (1 + math.exp(-(y3 * w35 + y4 * w45)))
"""
calculating error
"""
error = XOR[i][2] - y5
"""
calculating partial error and change in weight for output and hidden perceptron
"""
del5 = y5 * (1 - y5) * error
dw35 = alpha * y3 * del5
dw45 = alpha * y4 * del5
dt5 = alpha * (-1) * del5
"""
calculating partial error and change in weight for input and hidden perceptron
"""
del3 = y3 * (1 - y3) * del5 * w35
del4 = y4 * (1 - y4) * del5 * w45
dw13 = alpha * XOR[i][0] * del3
dw23 = alpha * XOR[i][1] * del3
dt3 = alpha * (-1) * del3
dw14 = alpha * XOR[i][0] * del4
dw24 = alpha * XOR[i][1] * del4
dt4 = alpha * (-1) * del4
"""
calculating weight and bias update
"""
w13 = w13 + dw13
w14 = w14 + dw14
w23 = w23 + dw23
w24 = w24 + dw24
w35 = w35 + dw35
w45 = w45 + dw45
t3 = t3 + dt3
t4 = t4 + dt4
t5 = t5 + dt5
"""
Since y5 will be in float number between (0 - 1)
Here we have used 0.5 as threshold, if output is above 0.5 then class will be 1 else 0
"""
if y5 < 0.5:
class_ = 0
else:
class_ = 1
"""
uncomment below line to see predicted and actual output
"""
# print ("Predicted",class_," actual ",XOR[i][2])
"""
calculating squared error
"""
squaredError = squaredError + (error * error)
if squaredError < 0.001:
# if error is below 0.001, terminate training (premature termination)
break

If you print squared error for all iterations you will get following plot :

Figure 11. with progress in epochs ANN decreases Error, signifies learning

I know this tutorial is not one time digestible, just got through it, get your hands dirty in code. Ultimately you will win.

Here it took pretty long time 2000 epochs to learn the XOR gate. practically with large data-set this much time is not desirable.

In next blog post we will see how to apply some of the well known optimisation techniques to this raw code of ANN to solve XOR in very less computational time.

#NeuralNetworks

M L P

Neural Network Implementation

Never Miss My Post!

Recent Posts: