top of page

Artificial Neural Network - Effect of Adaptive Learning Rate

All codes related to this tutorial can be found at my Github repository.

  1. For better understanding, before starting with this tutorial prefer to go through my previous tutorial on Artificial Neural Network

 

In previous implementation, We have kept learning rate constant throughout the training. We can apply a trick here also to make training faster.

Learning rate and convergence

Figure 1. Learning rate and convergence


As illustrated in Figure 1, in any machine learning technique our goal is to minimize error and to achieve global Minima with function f(x). global Minima is the point in the n dimensional space where error is minimum and our predictions are very close to actual ones.

In given Figure 1. our goal is to reach convergence point and we have three choices

1) With constant learning rate reach to convergence, it wold take longer time [BLUE]

2) With larger learning rate try to reach to convergence and ultimately get diverted. with no convergence at all. [RED]

3) Middle approach between above two, initially keep high learning rate (bigger steps), and slowly decrease step size. [GREEN]


To accelerate the convergence and yet avoid the danger of instability, we can apply two heuristics:

Heuristic 1 - If the change of the sum of squared errors has the same algebraic sign for several consequent epochs, then the learning rate parameter, α, should be increased.

Heuristic 2 - If the algebraic sign of the change of the sum of squared errors alternates for several consequent epochs, then the learning rate parameter, α, should be decreased.

Here we are going to take help of our old friends squared error and trend-line function.

Change in Squared error w.r.t. Learning progress

Figure 2. Change in Squared error w.r.t. Learning progress

In Figure 2. we have measured slops at different locations throughout the progress of learning (epochs). It is very much evident that when learning is faster the error will steeply decrease and value of slop will increase in consecutive steps (e.g. flop m2 is higher than m1) when decrease in error become slower the slop will decrease in consecutive steps (e.g m3 is smaller than m2). looking at increase and decrease in slop behavior we can change learning rate accordingly.


So the overall process can be summarized in two steps:

1) Increase the learning rate by certain percentage when slop at given point is greater than previous one.

2) Decrease the learning rate by certain percentage when slop at given point is lesser than previous one.

Figure 3. illustarting process of change in learning rate (alpha)

Along with our previous Artificial neural network code, we will be having two more functions

1) Initializing all network parameters

2) findTrendline - To find trend-line

3) changeAdaptively - To measure increase or decrease in slop w.r.t. previous point and accordingly change learning rate.

4) Main function - which is very similar to the simple Artificial Neural Network discussed previously.


1) Initializing all network parameters: defining XOR gate, initilalizing weights and bias. slopArray is declared as global as it would store slop for all epochs (iteration).

  1. import math

  2. """

  3. defining XOR gate, [x1, x2 , y]

  4. """

  5. XOR = [[0, 1, 1], [1, 1, 0], [1, 0, 1], [0, 0, 0]]

  6. # initialising weights

  7. w13 = 0.5

  8. w14 = 0.9

  9. w23 = 0.4

  10. w24 = 1.0

  11. w35 = -1.2

  12. w45 = 1.1

  13. t3 = 0.8

  14. t4 = -0.1

  15. t5 = 0.3

  16. # defining learning rate

  17. alpha = 0.5

  18. # initialising squaredError

  19. squaredError = 0

  20. # initializing error per case

  21. error = 0

  22. # defining epochs

  23. Epochs = 2000

  24. count = 0

  25. # run this repeatedly for number of Epochs

  26. global slopArray # to store slops

  27. slopArray = []

2) To find trend-line we use following function

  1. def findTrendline(xArray, yArray):

  2. """

  3. used to find trend-line

  4. :param xArray: Array with all elements in X

  5. :param yArray: Array with all elements in Y

  6. :return:

  7. """

  8. # print xArray

  9. # print yArray

  10. xAvg = sum(xArray) / len(xArray)

  11. yAvg = sum(yArray) / len(yArray)

  12. upperPart = 0

  13. lowerPart = 0

  14. m = 0

  15. # implementing mathematics behind trendline

  16. for i in range(0, len(xArray)):

  17. upperPart += (xArray[i] - xAvg) * (yArray[i] - yAvg)

  18. lowerPart += math.pow(xArray[i] - xAvg, 2)

  19. m = upperPart / lowerPart

  20. b = yAvg - m * xAvg

  21. return m, b

3) changeAdaptively - To make increase or decrease in learning rate after looking at current and previous slop.

  1. def changeAdaptively(squaredErrorArray, EpochArray, alpha):

  2. """

  3. to change learning rate(alpha) adaptively

  4. :param squaredErrorArray: Array containing squared error for all completed epochs: [1.2340,0.1,0.45,0.85,0.4,0.4430,0.3244]

  5. :param EpochArray: Array containing all completed epochs: [1,2,3,4,5,....50,51.]

  6. :param alpha: [current learning rate]

  7. :return:

  8. """

  9. # print "went ti this func"

  10. # print squaredErrorArray[-10:],EpochArray[-10:]

  11. m, b = findTrendline(squaredErrorArray[-10:], EpochArray[-10:]) # find slop for current lat 10 error value

  12. try:

  13. if m > slopArray[-1]: #slopArray[-1] previous slop, m current slop

  14. """

  15. If present slop is greater than previous one, it indicates decrease in error gradually

  16. This usually happens at beginning and middle of learning process

  17. then increase learning rate further to decelerate error further

  18. """

  19. slopArray.append(m)

  20. newAlpha = alpha * 1.08

  21. else:

  22. """

  23. If present slop is less than previous, it indicates instability or very less chane in error

  24. this usually happens near to convergence point.

  25. then decrease learning rate

  26. """

  27. slopArray.append(m)

  28. newAlpha = alpha / 1.04

  29. return newAlpha

  30. except:

  31. # for first iteration when nothing will be there in slopArray

  32. # so slop will throw exception and will be handled by except block

  33. slopArray.append(m)

  34. return alpha

4) Main function

  1. def Main():

  2. EpochArray = []

  3. squaredErrorArray = []

  4. for j in range(Epochs):

  5. # printing squaredError, alpha after each epoch

  6. print"squaredError", squaredError, alpha

  7. # making update to learning rate after every 10 epochs

  8. if j % 10 == 0 and j != 0:

  9. alpha = changeAdaptively(squaredErrorArray, EpochArray, alpha)

  10. # appending squared error to squaredErrorArray

  11. squaredErrorArray.append(squaredError)

  12. # appending number of completed epochs to EpochArray

  13. EpochArray.append(j)

  14. squaredError = 0

  15. for i in range(4): # iterating through each case for given iteration

  16. """

  17. calculating output at each perceptron

  18. """

  19. y3 = 1 / (1 + math.exp(-((XOR[i][0] * w13) + (XOR[i][1] * w23 - t3))))

  20. y4 = 1 / (1 + math.exp(-(XOR[i][0] * w14 + XOR[i][1] * w24 - t4)))

  21. y5 = 1 / (1 + math.exp(-(y3 * w35 + y4 * w45 - t5)))

  22. """

  23. calculating error

  24. """

  25. error = XOR[i][2] - y5

  26. """

  27. calculating partial error and change in weight for output and hidden perceptron

  28. """

  29. del5 = y5 * (1 - y5) * error

  30. dw35 = alpha * y3 * del5

  31. dw45 = alpha * y4 * del5

  32. dt5 = alpha * (-1) * del5

  33. """

  34. calculating partial error and change in weight for input and hidden perceptron

  35. """

  36. del3 = y3 * (1 - y3) * del5 * w35

  37. del4 = y4 * (1 - y4) * del5 * w45

  38. dw13 = alpha * XOR[i][0] * del3

  39. dw23 = alpha * XOR[i][1] * del3

  40. dt3 = alpha * (-1) * del3

  41. dw14 = alpha * XOR[i][0] * del4

  42. dw24 = alpha * XOR[i][1] * del4

  43. dt4 = alpha * (-1) * del4

  44. """

  45. calculating weight and bias update

  46. """

  47. w13 = w13 + dw13

  48. w14 = w14 + dw14

  49. w23 = w23 + dw23

  50. w24 = w24 + dw24

  51. w35 = w35 + dw35

  52. w45 = w45 + dw45

  53. t3 = t3 + dt3

  54. t4 = t4 + dt4

  55. t5 = t5 + dt5

  56. """

  57. Since y5 will be in float number between (0 - 1)

  58. Here we have used 0.5 as threshold, if output is above 0.5 then class will be 1 else 0

  59. """

  60. if y5 < 0.5:

  61. class_ = 0

  62. else:

  63. class_ = 1

  64. """

  65. uncomment below line to see predicted and actual output

  66. """

  67. # print ("Predicted",class_," actual ",XOR[i][2])

  68. """

  69. calculating squared error

  70. """

  71. squaredError = squaredError + (error * error)

  72. if squaredError < 0.001:

  73. # if error is below 0.001, terminate training (premature termination)

  74. break

After running code line no 6 in main function will print Squared error and alpha, this I have compared with the precious ANN code with constant alpha(0.5), After plotting three of them together I got following graph.

Figure. 4 Accelerated Learning with adaptive rate

In Figure 4, Initially learning rate remain very high up-to 350 epochs then it started decreases and also ensure faster convergence (blue line) w.r.t. the ordinary method without optimization (yellow line).

If you like this tutorial please share with your colleague. Discuss doubts, ask for changes on GitHub. It's free, No charges for anything. Let me to get inspired from your responses and deliver even better.

Never Miss My Post!
bottom of page