top of page

Stepwise Regression

You may refer this literature for mathematical explanation of below implemented algorithm

1) http://statweb.stanford.edu/~ckirby/lai/pubs/2009_Stepwise.pdf

All codes discussed here can be found at my Github repository


For effective learning I suggest, you to calmly go through the explanation given below, run the same code from Github and then read mathematical explanation from above given links.


Code compatibility : Python 2.7 Only


Often it is found that due to changing requirement with change in package versions code fails to run. I have added file namedrequirement.txt file to GitHub, which shows all packages with version present in my system when I executed this code. This will help in replicating working environment and debugging.


To get this code running run main.py file as given in GitHub repository

 
Few factors affecting Home Price in Boston

Figure 1. Few of the many factors affecting Home Price in Boston

Step Wise Regression is all about selecting attributes those helps best in prediction.

We can have large number of attributes [x1, x2, x3 ,,,,,x100] and out of these only few are actually contributing to outcome (Y).

How to choose these variables then?. This can be done with Step Wise Regression. In Step Wise Regression we will separately use each attribute column to predict on Y and then all those attribute columns will be selected having minimum Squared errors.

For present tutorial I have selected house price data-set [insert hyperlink]. This data is about factor affecting housing price in Boston. In entire data-set there are different values for 13 factors (attributes) are provided. Here we are suppose to predict MEDV (Median value of owner-occupied homes in $1000's)

1. CRIM per capita crime rate by town 2. ZN proportion of residential land zoned for lots over 25,000 sq.ft. 3. INDUS proportion of non-retail business acres per town 4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) 5. NOX nitric oxides concentration (parts per 10 million) 6. RM average number of rooms per dwelling 7. AGE proportion of owner-occupied units built prior to 1940 8. DIS weighted distances to five Boston employment centres 9. RAD index of accessibility to radial highways 10. TAX full-value property-tax rate per $10,000 11. PTRATIO pupil-teacher ratio by town 12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town 13. LSTAT % lower status of the population 14. MEDV Median value of owner-occupied homes in $1000's

Workflow

Our job is to find those attributes out of above given 13 represent the data or affect (house price) the most.

Here is the work flow of entire implementation.

Step Wise Regression

Figure 2. Step Wise Regression working implementation

As shown in figure we will be mainly going through following steps: 1) Take a data-set, we already took :) 2) Apply min max normalization for more information regarding min-max normalization, refers this tutorial. 3) Separate X and Y from data-set with function

  1. def getXandY(self,dataset):

  2. """

  3. To seperate predictor and predicate from given dataset

  4. :param dataset: 2D array

  5. :return: predictors : X and predicate : y

  6. """

  7. x =[]

  8. y=[]Github

  9. for row in dataset:

  10. x.append(row[:-1]) # all X

  11. y.append(row[-1]) # last column is Y

  12. return x, y

4) stepwise_Regression function will pass each column (attribute) to the giveErrorForColumn function and collect error for each column separately. It will take following parameters.

X : columns (attribute Y : predicate learningRate (η) : Its is a value greater than 0 and lesser than or equal to 1 [ 0< η >=1] numberOfEpoches : Number of time the same data to be given to the machine learning algorithm so that it can learn. MaxPredictor : Number of most relevant column to be selected, It should be more than 0 and less than max(attributes)

  1. def stepwise_Regression(self, X, Y, learningRate, numberOfEpoches, maxPredictor):

  2. """

  3. :param Xtrain: 2D array[[-0.04315767605226367, 1.165157142272148, -0.45268840110500463,,,],[-0.04315767605226367, 1.165157142272148, -0.45268840110500463,,,],[],[]]

  4. :param Ytrain: 2d Array [[.45],[.67],[.46],[.4432]]

  5. :param learningRate: float between 0 and 1

  6. :param numberOfEpoches: any integer

  7. :param maxPredictor: number of column to be selected for final run with highest correlation with predicate

  8. :return: acceptedColsNo,coefficientValue

  9. e.g.

  10. (7.5460202999842885, [-0.04315767605226367, 1.165157142272148, -0.45268840110500463, -0.43397502762968604, -0.7226568412142056])

  11. """

  12. """

  13. For each column in train dataset we will be having one coefficient

  14. if training dataset having 5 column per array than

  15. coefficient array will be something like this [0.0, 0.0, 0.0, 0.0, 0.0]

  16. """

  17. allSquaredError = []

  18. coefficientValue = []

  19. for columnNo in range(0, len(X[0])):

  20. "getting squared error for each column"

  21. XColumn = loadDataInstance.getSpecificColumns(X, [columnNo]) # getting data for each column

  22. squaredError, coefficient =self.giveErrorForColumn(XColumn, Y, learningRate, numberOfEpoches) # passing to estimate error for given column w.r.t. output column

  23. allSquaredError.append(squaredError) # storing squared error for each column

  24. coefficientValue.append(coefficient) # stroing coefficient for each column

  25. print "Squared Error : ",allSquaredError

  26. "getting maxcolumn e.g. maxPredictor with lowerst error"

  27. acceptedColsNo = self.accepetedColumns(allSquaredError, maxPredictor)

  28. "Making prediction on selected column"

  29. return acceptedColsNo,coefficientValue

5) giveErrorForColumn - It is the same stochastic gradient descent function we saw in the earlier tutorial. We will be using the same here to minimise error and return back Squared error.It takes following attributes:

Xcolumn : columns (attribute) Y : predicate learningRate (η) : Its is a value greater than 0 and lesser than or equal to 1 [ 0< η >=1] numberOfEpoches : Number of time the same data to be given to the machine learning algorithm so that it can learn.

  1. def giveErrorForColumn(self, Xcolumn, Y, learningRate, numberOfEpoches):

  2. """

  3. :param Xtrain:

  4. :param Ytrain:

  5. :param learningRate:

  6. :param numberOfEpoches:

  7. :return:

  8. """

  9. coefficient = [0.1 for i in range(len(Xcolumn[0]) + 1)] # initializing coefficient

  10. # print coefficient

  11. for epoch in range(numberOfEpoches):

  12. """

  13. for each epoch repeat this operations

  14. """

  15. squaredError = 0

  16. # print X

  17. for rowNo in range(len(Xcolumn)): #

  18. """

  19. for each row calculate following things

  20. where each row will be like this [3.55565,4.65656,5.454654,1] ==> where last element in a row remains Y [so called actual value y-actual]

  21. """

  22. # print Xtrain[rowNo]

  23. Ypredicted = self.predict(Xcolumn[rowNo], coefficient) # sending row and coefficient for prediction

  24. """

  25. row[-1] is last elemment of row, can be considered as Yactual; Yactual - Ypredicted gives error

  26. """

  27. error = Y[rowNo] - Ypredicted

  28. "Updating squared error for each iteration"

  29. squaredError += error ** 2

  30. """

  31. In order to make learning, we should learn from our error

  32. here we will use stochastic gradient as a optimization function

  33. Stochastic gradient for each coefficient [b0,b1,b1,.....] can be formalized as

  34. coef[i+1] = coef[i+1] + learningRate * error * Ypredicted(1.0 - Ypredicted)* X[i]

  35. For a row containing elements [x1, x2, x3, x4, x5], coefficient [bo, b1, b2, b3, b4, b5]

  36. where each coefficient belongs to each element in a row

  37. e.g. b1 for X1, b2 for x2 and so on..

  38. As coefficient[i] here is equal to bo, e.g. row element independent, we will update it separately.

  39. """

  40. #updating Bo

  41. coefficient[0] = coefficient[0] + learningRate * error * Ypredicted * (1 + Ypredicted)

  42. #moving in to entire column

  43. for i in range(len(Xcolumn[rowNo])):

  44. #updating Bn, where n can be any number from 1 to number of attribute(N)

  45. coefficient[i + 1] = coefficient[i + 1] + learningRate * error * Ypredicted * (1.0 - Ypredicted) * \

  46. Xcolumn[rowNo][i]

  47. """

  48. lets print everything as to know whether or not the error is really decreasing or not

  49. """

  50. # print (">>> Epoch : ", epoch, " | Error : ", squaredError, "| Average Error : ", averageError)

  51. return squaredError,coefficient

6) predict - giveErrorForColumn function will call predict function with following parameters

Xrow: given value of row in column coefficients : [Bo and Bn], where n can be any number from 1 to number of attribute(N)

  1. def predict(self, Xrow, coefficients):

  2. """

  3. for prediction based on given row and coefficients

  4. :param Xrow: [3.55565,4.65656,5.454654,1] where last element in a row remains Y [so called actual value y-actual]

  5. :param coefficients: [0.155,-0.2555,0.5456] Random initialization

  6. :return: Ypredicted

  7. This function will return coefficient as it is real thing we get after training

  8. coefficient can be actually compared with memory from learning and be applied for further predictions

  9. """

  10. Ypredicted = coefficients[0]

  11. for i in range(len(Xrow)):

  12. Ypredicted += float(Xrow[i]) * float(coefficients[i + 1])

  13. return 1.0 / (1.0 + exp(-Ypredicted))

7) accepetedColumns – It takes squared error for all the columns and column numbers with minimum error will be returned. this takes following parameters

squaredError - Squered errors for all attributes

[19.870694243392535, 18.49064887825751, 16.173082492771197, 20.453729207679526, 17.349173281094615, 15.891363738470233, 18.141040302112437, 19.951988433235435, 18.044399089376622, 16.43915654444947, 15.89859is acce1777034301, 19.20710063941564, 10.42727645916291]

maxPredictor – max number of columns with minimum squared error to be returned

def accepetedColumns(self,squaredError,maxPredictor): """ get columns with least r square value :param squaredError: An array with R square value e.g. [12.399790911951232, 11.14467573048574, 10.99939946366844, 12.313118044897763, 11.812905343161896, 8.530664160073936, 11.642709319002446, 12.377547637064676, 12.376152652375172, 11.935468510009718, 10.221164713630898, 12.258299424118913, 8.627925610329616] :param maxPredictor: An integer indicating number of column out of these to be selected e.g. 4 :return: selected column number [5, 11, 9, 2] A list """ acceptedColsNo = [] for i in range(maxPredictor): minValue = min(squaredError) acceptedColsNo.append(squaredError.index(minValue)) squaredError.remove(minValue) return acceptedColsNo

After running main.py file you will get following output : [12, 5, 9, 2, 7]. I have selected data for all [12, 5, 9, 2, 7] columns and plotted against median house price, I got the plot given below.

Step Wise Regression

Figure 3. Selected Columns by Step Wise Regression


Step wise regression can be used very well for feature(attribute) selection, that affects our class the most. There is only one drawback of this method is id we cannot use it for categorical variables.

There are other kind of step-wise regression where nature of activation function is changed and whichever activation function gives best result is accepted. We will see one example for this in future.

In future we will see more methods like Principle Component Analysis which are used for feature space minimization.

If you like this tutorial please share with your colleague. Discuss doubts, ask for changes on GitHub. It's free, No charges for anything. Let me to get inspired from your responses and deliver even better.

Never Miss My Post!
bottom of page