top of page

Neural Network Implementation

All code snippets discussed in this blog post is present in my GitHub repository. Try to understand the entire blog post and get your dirty with code snippets.

Code Compatibility : Python 2.7 , tested on ubuntu 16.04

 

In my blog post about “Introduction to neural network”, we saw how Artificial Neural Network (ANN) functions. In this blog post we will see how ANN works actually with coding in python step by step.

Artificial Neural Network to solve XOR gate

Figure 1. Artificial Neural Network to solve XOR gate

Lets first we discuss example on which we will apply ANN. I take a simple & complex example of XOR gate

XOR gate truth table

Figure 2. XOR gate truth table

Lets see why I use contradicting title "simple & complex".

Why Simple?? It takes two inputs and produce one output, so when we will code the same, we can very easily understand.

Why complex?? XOR gate is non linear compare to other logical gates like OR,AND,and NOR gate. Looking at complexity given in figure 3 we can easily infer that single straight line can separate outputs in AND & OR gate but it cannot differentiate between two outputs in XOR gate.There is no single function that produces a hyper-plane capable of separating the points of the XOR function. The curve in the image separates the points, but it is not a function.

AND gate & OR Gate are linearly separable whereas XOR is non-linear

Figure. 3 (A) AND gate & OR Gate are linearly separable whereas XOR is non-linear

Non linear character of xor gate

Figure. 3 (B) Non linear character of xor gate

One plane cannot separate classes of XOR gate

Figure. 3(C) One plane cannot separate classes of XOR gate

To separate the points of XOR, you'll have to use at least two lines (or any other shaped functions). This will require two separate perceptrons. Then, you could use a third perceptron to separate the intermediate results on the basis of sign.

Its very clear that one perceptron that cannot solve XOR function, we will design the smallest network that solves this function.

We will design our network as given in Figure 1. with two input perceptrons two hidden perceptrons and one output perceptron. Typical Neural Network having three components:

1) Perceptron to take input, store intermediate result and output.

2) Weight - connects two preceptrons; can be compared with memory, weights get updated through training and helps in learning.

3) Bias - can also be compared with memory, bias get updated through training and helps in learning.

We will write python code to solve XOR gate and we will use following steps through this tutorial

1) Training data description

2) Initialisation

3) Activation

4) Calculating Error

5) Calculating Updates / Backpropagation

5.1 Calculating Changes in hidden to outer layer

5.2 Calculating Changes in input to hidden layer

6) Updating all parameters

Overall flow for one iteration can be represented as below:

Flowchart : Neural Network with back-propagation

Figure 4. Flowchart : Neural Network with back-propagation

Step 1. Training data description

XOR = [[0, 1, 1], [1, 1, 0], [1, 0, 1], [0, 0, 0]]

In Code XOR[i][0], XOR[i][1] will give input 1 and 2 and XOR[i][2] will give output for any value of i:[0,1,2,3] Where in each sub array first two element represents input 1 and 2 and third element represent desired output.

For below given python snippet, following symbol will be used:

dwij change in weight between curret layer perceptron i and next layer perceptron j - Δwij

dti change in bias of perceptron i equivalent to - Δθi

wij actual weight between curret layer node i and next layer node j equivalent to - wij

t actual bias equivalent to - θi

deli partial error at perceptron i equivalent to δi

y Output at current perceptron e.g. y3 output at perceptron 3

Step 2. Initialisation

Set all the weights and threshold levels of the network to random numbers uniformly distributed inside a small range (0-1):

In our python solution I did it manually

w13 = 0.5 w14 = 0.9 w23 = 0.4 w24 = 1.0 w35 = -1.2 w45 = 1.1 t3 = 0.8 t4 = -0.1 t5 = 0.3

Step 3. Activation

lets say we have two inputs, 1 and 2 and we have to calculate forward pass for the same at point 3, we do it in following way.

Forward pass calculation

Figure 5. Forward pass calculation

here j = 3 and i = 1,2 so our equation for forward pass result at 3 would be sigmoid (X1 * W13 + X2 * W23 - θ3). Sigmoid function can be represented by :

Sigmoid function

Figure 6. Sigmoid function

over all process by equation it can be represented as :

Sigmoid function

Figure 7. Sigmoid function for forward pass calculation

and python code for forward pass can be written as:

y3 = 1 / (1 + math.exp(-((XOR[i][0] * w13) + (XOR[i][1] * w23-t3)))) y4 = 1 / (1 + math.exp(-(XOR[i][0] * w14 + XOR[i][1] * w24-t4))) y5 = 1 / (1 + math.exp(-(y3 * w35 + y4 * w45-t5)))

Step 4. Calculating Error

After calculating output at perceptron 3,4 and 5. y5 is our final output. Error from the output can be calculate as : error = ActualOutput – y5 For our problem in Pythonic way it can be written as error = XOR[i][2] – y5

Step 5. Calculating Updates / Backpropagation

5.1 Calculating Changes in hidden to outer layer

The error we got at y5 is total error of the network. To make update we require partial error at each of output and hidden layer perceptron (perceptron 3,4 & 5) . calculating partial error to each is done by for output percepton it is given by δOutput = output * (1 – output) * error and in python the same is written as del5 = y5 * (1 - y5) * error

ANN Partial error calculation

Figure 8. Partial error / weight change calculation

once we got partial error for the output layer, we can change weight between output and hidden layer as:

Figure 9. change in weights

Where,

α is the learning rate Ybi is output of perceptron i in layer b δbi is partial error of perceptron j in layer c

In python the same is calculated as

dw35 = alpha * y3 * del5 dw45 = alpha * y4 * del5

For the hidden layers partial error are calculated as r represented by following :

Figure 10. Equation for partial error calculation

Where,

δbi is partial error for perceptron i for given hidden layer b, here δb1 = partial error perceptron 3

Ybi is output for perceptron i for given hidden layer b, here Yb1 = output at perceptron 3

δcj is partial error of perceptron j in next layer c, here δc1 = partial error at perceptron

Wij = weight between current layer perceptron i and next layer perceptron j


In pythonic way partial error can be calculated as:

del3 = y3 * (1 - y3) * del5 * w35 del4 = y4 * (1 - y4) * del5 * w45​

Bias are represented by θ,

Δθbi = α * (-1) * δbi ,

where,

δbi is partial error for perceptron i for given hidden layer b, here δb1 = partial error perceptron 5

θbi is bias for perceptron i for given hidden layer b, here θb1 = bias perceptron 5

In pythonic way bias can be calculated as:

dt5 = alpha * (-1) * del5

5.2 Calculating Changes in input to hidden layer

Weight and bias changes will be calculated in similar way as we did for all components between hidden to outer layer:

dw13 = alpha * XOR[i][0] * del3

dw23 = alpha * XOR[i][1] * del3

dt3 = alpha * (-1) * del3

dw14 = alpha * XOR[i][0] * del4

dw24 = alpha * XOR[i][1] * del4

dt4 = alpha * (-1) * del4

6 Updating all parmaters

Wij(p+1) = Wij(p) +ΔWij(p+1) θi(p+1)= Δθi(p) + Δθi(p+1)

Where,

p is the current iteration and p+1 is the next iteration, changes are added to weights to become effective in next iteration.

Pythonically it is written as : w14 = w14 + dw14 w23 = w23 + dw23 w24 = w24 + dw24 w35 = w35 + dw35 w45 = w45 + dw45 t3 = t3 + dt3 t4 = t4 + dt4 t5 = t5 + dt5

Below given python code is the overall implementation of XOR gate using ANN

  1. import math

  2. """

  3. defining XOR gate, [x1, x2 , y]

  4. """

  5. XOR = [[0, 1, 1], [1, 1, 0], [1, 0, 1], [0, 0, 0]]

  6. # initializing weights

  7. w13 = 0.5

  8. w14 = 0.9

  9. w23 = 0.4

  10. w24 = 1.0

  11. w35 = -1.2

  12. w45 = 1.1

  13. t3 = 0.8

  14. t4 = -0.1

  15. t5 = 0.3

  16. # defining learning rate

  17. alpha = 0.5

  18. # initializing squaredError

  19. squaredError = 0

  20. # initializing error per case

  21. error = 0

  22. # defining epochs

  23. Epochs = 2000

  24. count = 0

  25. # run this repeatedly for number of Epochs

  26. for j in range(Epochs):

  27. print "squaredError", squaredError

  28. # initializing squaredError per epoch

  29. squaredError = 0

  30. for i in range(4): # iterating through each case for given iteration

  31. """

  32. calculating output at each perceptron

  33. """

  34. y3 = 1 / (1 + math.exp(-((XOR[i][0] * w13) + (XOR[i][1] * w23))))

  35. y4 = 1 / (1 + math.exp(-(XOR[i][0] * w14 + XOR[i][1] * w24)))

  36. y5 = 1 / (1 + math.exp(-(y3 * w35 + y4 * w45)))

  37. """

  38. calculating error

  39. """

  40. error = XOR[i][2] - y5

  41. """

  42. calculating partial error and change in weight for output and hidden perceptron

  43. """

  44. del5 = y5 * (1 - y5) * error

  45. dw35 = alpha * y3 * del5

  46. dw45 = alpha * y4 * del5

  47. dt5 = alpha * (-1) * del5

  48. """

  49. calculating partial error and change in weight for input and hidden perceptron

  50. """

  51. del3 = y3 * (1 - y3) * del5 * w35

  52. del4 = y4 * (1 - y4) * del5 * w45

  53. dw13 = alpha * XOR[i][0] * del3

  54. dw23 = alpha * XOR[i][1] * del3

  55. dt3 = alpha * (-1) * del3

  56. dw14 = alpha * XOR[i][0] * del4

  57. dw24 = alpha * XOR[i][1] * del4

  58. dt4 = alpha * (-1) * del4

  59. """

  60. calculating weight and bias update

  61. """

  62. w13 = w13 + dw13

  63. w14 = w14 + dw14

  64. w23 = w23 + dw23

  65. w24 = w24 + dw24

  66. w35 = w35 + dw35

  67. w45 = w45 + dw45

  68. t3 = t3 + dt3

  69. t4 = t4 + dt4

  70. t5 = t5 + dt5

  71. """

  72. Since y5 will be in float number between (0 - 1)

  73. Here we have used 0.5 as threshold, if output is above 0.5 then class will be 1 else 0

  74. """

  75. if y5 < 0.5:

  76. class_ = 0

  77. else:

  78. class_ = 1

  79. """

  80. uncomment below line to see predicted and actual output

  81. """

  82. # print ("Predicted",class_," actual ",XOR[i][2])

  83. """

  84. calculating squared error

  85. """

  86. squaredError = squaredError + (error * error)

  87. if squaredError < 0.001:

  88. # if error is below 0.001, terminate training (premature termination)

  89. break

If you print squared error for all iterations you will get following plot :

Figure 11. with progress in epochs ANN decreases Error, signifies learning


I know this tutorial is not one time digestible, just got through it, get your hands dirty in code. Ultimately you will win.

Here it took pretty long time 2000 epochs to learn the XOR gate. practically with large data-set this much time is not desirable.

In next blog post we will see how to apply some of the well known optimisation techniques to this raw code of ANN to solve XOR in very less computational time.

If you like this tutorial please share with your colleague. Discuss doubts, ask for changes on GitHub. It's free, No charges for anything. Let me to get inspired from your responses and deliver even better.

Never Miss My Post!
bottom of page