top of page

Multivariate Adaptive Regression Splines (MARS), PART - 2

You may refer this literature for mathematical explanation of below implemented algorithm 1) http://www.lans.ece.utexas.edu/courses/ee380l_ese/2013/mars.pdf

2) https://www.assct.com.au/media/pdfs/Ag%2022%20Everingham%20and%20Sexton.pdf


All codes discussed here can be found at my Github repository

For effective learning I suggest, you to calmly go through the explanation given below, run the same code from Github and then read mathematical explanation from above given links. Code compatibility : Python 2.7 Only Often it is found that due to changing requirement with change in package versions code fails to run. I have added file namedrequirement.txt file to GitHub, which shows all packages with version present in my system when I executed this code. This will help in replicating working environment and debugging. To get this code running execute swidish.py file as given in GitHub repository.

 
applying MARS to Swedish Fertility data

Figure 1. Showing year-wise Swedish fertility.

Here we will start with Part - 2 of Multivariate Adaptive Regression Splines (MARS). In this section we will apply previous code to a real data. we will see shortcomings of previous code and will improve it further.

applying MARS to Swedish Fertility data

Figure 2. All Separated fragments in Swedish fertility Data

In swidish.py I have written 5 functions with following functionality

  1. loadFromcsv - load data from csv file , Here "SwidishFertility" file to be loaded in string form

  2. convertDataToFloat - converting all string value to float

  3. findTrendline - will find trendline - simillar to as discussed in MARS PART - 1

  4. findBreakPoints - will find break points - simillar to as discussed in MARS PART - 1

Now we will enhance functionality of find breakpoint function so that instead of giving individual point as a fragment it will give group of related points

Now we will implement forward run part of MARS, basically to combine two related points we can have two approaches

1) Join individual points - Forward Run'

2) Controlled breaking of larger series in to smaller once - Backward Run

Now we are going to implement forward run algorithm - Refer forwardRun.py Now onwards

I have modified findBreakPoints function a little and named it as findBreakPointsAdvance.

I have changed it a little in "if condition". Earlier we used to mark any point as a break point where there is change in direction of difference. Now in "If" we have attached one more condition, if precentage difference between a point and surrounding two points is greater than 150% or lesser than 50% , then and then breakpoint will be considered.

def findBreakPointsAdvance(difference): """ find breakpoints for allocation of different slop and b0 to each fragment :param difference: is the array which shows what is the difference between current point and the same point on trend-line :return: subFragments, also referred as breakpoints """ subFragments = []

lastPoint = 0 interMediatePoints = [] for i in range(1, len(difference) - 1): interMediatePoints.append(i)

# percentageDifferenceBetweenThreePoints - it measures, what is percentage difference between i and i-1 and i+1 point percentageDifferenceBetweenThreePoints = difference[i] * 10 / (difference[i - 1] + difference[i + 1]) / 2

# application of some more conditions in below given function will avoid formation of segments containing only one point if (difference[i - 1] > difference[i] < difference[i + 1]) and percentageDifferenceBetweenThreePoints < 50: # if i lower than i-1 and i+1 it will be defined as breakpoint # with additional condition the difference must be lesser than 50% # this will be a crest in local region \/ lastPoint = i subFragments.append(interMediatePoints) interMediatePoints = [] if (difference[i - 1] < difference[i] > difference[i + 1]) and percentageDifferenceBetweenThreePoints > 150: # if i higher than i-1 and i+1 it will be defined as breakpoint # with additional condition the difference must be lesser than 150% # this will be a trough in local region /\ lastPoint = i subFragments.append(interMediatePoints) interMediatePoints = [] # processing last fragment subFragments.append([j for j in range(lastPoint + 1, len(difference) + 1)]) return subFragments

Weather or not a particular nearby fragments mearges or not also depends on the slop. I will explain to you how. there can be two type of slops 1) Positive slop; 2) Negative slop

Figure 3. Type of slop and their significance

Coefficient represents point on y axis where trend-line is meeting y axis (shown by Intercept in Fig. 3 and slop represents direction of trend-line. To combine two nearby fragment we need to consider these two factors. If two fragment's slop and coefficient lies in definite range then only we can combine them.

findBreakPointsAdvance = takes care of boundaries related to slop

fragmentCombiner - will take care of boundaries related to coefficient.

def fragmentCombiner(breakPoints, slopAndCoefficientArray): """ combine smaller fragments to bigger one based on regression coefficient of nearby fragments

:param breakPoints: all breakpoints [[1,2,3],[4,5,6,7],...[..][..]] :param slopAndCoefficientArray: slop and regression coefficient for all fragments e.g.[[.06,6.4],[.76,3.4]] where .06 is the slop & 6.4 is the regression coefficient :return: """ newForwardArray = [] # to store fragments after merger of one or many fragments for i in range(0, len(breakPoints) - 2, 2):

slopAndCoefficientCurrent = slopAndCoefficientArray[i] # coefficient of current fragment slopAndCoefficientNext = slopAndCoefficientArray[i + 1] # coefficient of Next fragment

# percentage difference between coefficient of current and next fragment percentageDiff = slopAndCoefficientNext[1] * 100 / slopAndCoefficientCurrent[1]

""" if percentage difference between coefficient is between 20 then combine. """ if (percentageDiff < 20 or percentageDiff > -20) : temp1 = [] # if this condition is satisfied then two fragments will be merged in to one temp1.extend(breakPoints[i]) temp1.extend(breakPoints[i + 1]) # new array with less number of fragments newForwardArray.append(temp1)

return newForwardArray

You may happily run the below given function in file forwardRun.py

def withForwardRun(): """ This function implements Multivariate adaptive regression splines with forward run In short all fragment wgich can be combined with out significant loss in accuracy can will be combined as bigger fragmnets :return: """ # loading data-set # https://datamarket.com/data/set/22s2/annual-swedish-fertility-rates-1000s-1750-1849-thomas-1940#!ds=22s2&display=line # Annual Swedish fertility rates (1000's) 1750-1849 Thomas (1940)

# loading dataset array = loadFromcsv("SwidishFertility") # loded dataset in previous step is in string form, converting to float array = convertDataToFloat(array) xArray = [] yArray = []

# Seperating data in to x and y for eachXYPAir in array: x = eachXYPAir[0] y = eachXYPAir[1] xArray.append(x) yArray.append(y)

# getting trend line for the entire data m, b = findTrendline(xArray, yArray)

difference = [] # will store difference between actual value of y and Ytrend (y derieved from trendline) yArray = []

for eachXYPAir in array: x = eachXYPAir[0] y = eachXYPAir[1] Ytrend = m * x + b # finding derived value of Y called as Ytrend (y derieved from trendline) from so found m and b in previous code block yArray.append(y) difference.append(abs(y - Ytrend)) """ Breakpoints here are the point which separates group of data having similar slop, locally called fragments """ breakPoints = findBreakPointsAdvance(difference) # print breakPoints

# below given code fragment will give slop and coefficient, for each fragment. slopAndCoefficientArray = [] # m = slop and b = regression coefficient for eachFragment in breakPoints: # print eachFragment, xFragment = xArray[eachFragment[0] - 1:eachFragment[-1]] yFragment = yArray[eachFragment[0] - 1:eachFragment[-1]] m, b = findTrendline(xFragment, yFragment) slopAndCoefficientArray.append([m, b])

# print slopAndCoefficientArray # having m = slop and b = regression coefficient for each fragment

# differenceArray = fragmentCombiner(breakPoints, slopAndCoefficientArray) # print differenceArray

# Fragment combiner is a function that combines relavant fragment on preset criteria # Will explore the function in detail soon # intially the fragment number was 35, now decreased to 17, after application of fragmentCombiner function. breakPoints = fragmentCombiner(breakPoints, slopAndCoefficientArray) # print breakPoints

for eachFragment in breakPoints: # print eachFragment,

# Getting x and y for the given Fragment range xFragment = xArray[eachFragment[0] - 1:eachFragment[-1]] yFragment = yArray[eachFragment[0] - 1:eachFragment[-1]]

# finding trendline for the given Fragment range m, b = findTrendline(xFragment, yFragment) # print m,b

# Getting x and y for the given Fragment range xValueForFragment = [xArray[j - 1] for j in range(eachFragment[0], eachFragment[-1] + 1)] yValueForFragment = [yArray[j - 1] for j in range(eachFragment[0], eachFragment[-1] + 1)] yDerived = [xArray[j - 1] * m + b for j in range(eachFragment[0], eachFragment[-1] + 1)]

# printing value of y and yTrend for printing for i in range(0, len(xValueForFragment)): print yValueForFragment[i], yDerived[i] print ""

After application of all these function to our data new trend-lines looks something like this:

applying MARS to Swedish Fertility data

Figure 4. Finally, all relevant fragment got merged; better result in comparison to Figure 2.

We can see that all related fragments got merged . you may also implement backward run similarly.

For n fragments we can have n number of slopes and n number of regression coefficient overall equation predict with MARS can be represented by γ =βn + MnΧ where n is any number from 1 to number of total fragments (n)

Now if I want to predict for any point in graph, local coefficient and slop has to be taken in equation.

Hope you understood the concept well. If not ask your queries. get your hands dirty run the code and debug.

If you like this tutorial please share with your colleague. Discuss doubts, ask for changes on GitHub. It's free, No charges for anything. Let me to get inspired from your responses and deliver even better.

Never Miss My Post!
bottom of page