Text Generation with LSTM; Trump's speech generationThe output produced in the previous blog-post 1) seems to be repetitive where words or short phrases are repeated . Although Its very tough to debug reason behind such behavior, I can guess two things.2)
1) The training data was too small to produce reasonable output. 2) Max length was kept 100, which in turn signifies the visual field to predict 101th character If this parameter is set still longer than we can get the longer visual field and hence less repetition.
Amid this challenges, we have a short sighted way out known as “multinomial distribution”. I found this good short definition of Multinomial distribution on Numpy documentation page. “The multinomial distribution is a multivariate generalization of the binomial distribution. Take an experiment with one of p possible outcomes. An example of such an experiment is throwing a dice, where the outcome can be 1 through 6. ” There are two ways we usually want to use samples. The first is just to generate a random value to be used later: for example, randomly drawing cards in a computer game of poker. The second way that samples are used is for estimation. For example, if you suspected that your friend was playing with loaded dice, you might want to roll the dice many times to see if some numbers came up more often than you would expect. Similarly, Here in our case we have a softmax probability and we want to find out the most probable case (here, most probable character). Throw a dice 20 times:
np.random.multinomial(20, [1/6.]*6, size=1) array([[4, 1, 7, 5, 2, 1]]) It landed 4 times on 1, once on 2, etc.
To allow multinomial probs make our text generation better we need to implement one more function to our previous code of trumps speech generation. The sample function is used to sample an index from a probability array. For example, given preds=[0.5,0.2,0.3] and a default temperature, the function return index 0 with probability 0.5, 1 with probability 0.2, or 2 with probability 0.3. It is used to avoid generating the same sentence over and over again.
def sample(softmax_probability, temperature=1.0): # helper function to sample an index from a probability array softmax_probability = np.asarray(softmax_probability).astype('float64') softmax_probability = np.log(softmax_probability) / temperature exp_preds = np.exp(softmax_probability) preds = exp_preds / np.sum(exp_preds) probas = np.random.multinomial(1, preds, 1) return np.argmax(probas)
This function "sample" takes probabilities output of soft max function and outputs index of the character to which is most probable. Lets us understand the whole function deeply and clearly line by line.
1) softmax_probability = np.log(softmax_probability) / temperature To motivate this, consider that probabilities must range between 0 and 1 (inclusive). NumPy has a useful function, finfo, that will tell us the limits of floating point values for our system. For example, on a 64-bit machine, we see that the smallest usable positive number (given by tiny) is:
>>> import numpy as np >>> np.finfo(float).tiny 2.2250738585072014e-308
While that may seem very small, it is not unusual to encounter probabilities of this magnitude, or even smaller. Moreover, it is a common operation to multiply probabilities, yet if we try to do this with very small probabilities, we encounter underflow problems:
>>> tiny = np.finfo(float).tiny >>> # if we multiply numbers that are too small, we lose all precision >>> tiny * tiny 0.0
However, taking the log can help alleviate this issue because we can represent a much wider range of numbers with logarithms than we can normally. Officially, log values range from −∞ to zero. In practice, they range from the min value returned by finfo, which is the smallest number that can be represented, to zero. The min value is much smaller than the log of the tiny value.
>>> # this is our lower bound normally >>> np.log(tiny) -708.39641853226408
Regarding division by temperature, here is the explanation Decreasing the temperature from 1 to some lower number (e.g. 0.5) makes the RNN more confident, but also more conservative in its samples(Repetitive Predictions). Conversely, higher temperatures will give more diversity but at cost of more mistakes (e.g. spelling mistakes, initial non-coherency)
2) exp_preds = np.exp(softmax_probability) Exponential function amplifies the given probability distribution, it creates mark-able discrimination between two values which are very near otherwise.
3) preds = exp_preds / np.sum(exp_preds)
This step is also called scalling step and has two significance. 1) It again rescale all predictions between 0 and 1. 2) It also make sure that sum of all predictions is 1, because if sum of predictions is less than 1 then as per convention np.random.multinomial would add remaining amount to last predictiona dn intern will affect predictions a lot. If total is more than 1 then np.random.multinomial would throw error.
4)Lastly, np.argmax
Argmax would choose the index with highest probability and such controlled randomization would prevent the network from repeating certain phrases over and over. To see the effect of our sample function I have applied it to __________ were in prediction are made from the same model with and without using multinomial distribution function.
Epoch 1:
WITH MULTINOMIAL DISTRIBUTION, SEED TEXT : ', 'When Mexico sends its people, they\xe2\x80\x99re not sending the best. They\xe2\x80\x99re not sending you, they\xe2\x80\x99re s follow up with: When Mexico sends its people, they’re not sending the best. They’re not sending you, they’re s you know he’s not ardean. in is is agezant. you want to brings, bean, i oa tillion menoly the dongentbid sare os sead, onr fonally conise the mayer. thes’ve and i’m ane, they wwthen very fonly dosrome wellier it bebleond ono he'rs doinght absucouns wers ic ald you, i thank i wouth da, – af cheragovering everatievsing all by afprene foll remimaruricll. we want thel of aty sucl tell wist thm ouysufite shat’sbon i cempobtht otr enach ssowe aid doun thay wi’re bodidn. and chaca arf amee WITHOUT MULTINOMIAL DISTRIBUTION, SEED TEXT : ', 'When Mexico sends its people, they\xe2\x80\x99re not sending the best. They\xe2\x80\x99re not sending you, they\xe2\x80\x99re s follow up with: When Mexico sends its people, they’re not sending the best. They’re not sending you, they’re s i want to be a lot of the want the people and they want to be a lot of the want the people and they want to be a lot of the want the people and they want to be a lot of the want the people and they want to be a lot of the want the people and they want to be a lot of the want the people and they want to be a lot of the want the people and they want to be a lot of the want the people and they want to be a lot of the want the people and they want to be a lot of the want the people and they want to
Epoch 2:
WITH MULTINOMIAL DISTRIBUTION, SEED TEXT : ', 'When Mexico sends its people, they\xe2\x80\x99re not sending the best. They\xe2\x80\x99re not sending you, they\xe2\x80\x99re s
follow up with: When Mexico sends its people, they’re not sending the best. They’re not sending you, they’re s we care for and people – i’m ond roff. inded we have — they ever incods. we ringlo maks isve houring. they said i just down the canuan awsy whore and if we’re going on the olkers someno has co curabee trane. wi’ll centrince, these, bun tremp'd me, butly. so pood. nebysy, that is bouncs trangm. they and just about thes. lemploment of govera-sex sumet gothad thoses me becouse dofen a -bast there my, many frever whing dompelse deat . the ward on, nixe one things are saych. we’re going WITHOUT MULTINOMIAL DISTRIBUTION, SEED TEXT : ', 'When Mexico sends its people, they\xe2\x80\x99re not sending the best. They\xe2\x80\x99re not sending you, they\xe2\x80\x99re s follow up with: When Mexico sends its people, they’re not sending the best. They’re not sending you, they’re s they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t know what they don’t kno
Epoch 3:
WITH MULTINOMIAL DISTRIBUTION, SEED TEXT : ', 'When Mexico sends its people, they\xe2\x80\x99re not sending the best. They\xe2\x80\x99re not sending you, they\xe2\x80\x99re s follow up with: When Mexico sends its people, they’re not sending the best. They’re not sending you, they’re s they’ol even went there" that’s chonies and we vote firs "i bad, aal utyers, me because you want to do.
a degoty. i’m obliaded out. but the teres, lapally going to susp correace that dow’ve never ever faitly bake it just about that. everyo merest of a read it have the enperses people and she speok. s, i don’t then this storget milet, but they because i’m filling why phin did im wine the mome act our expantty who gefinie. he medulatere intelesteve is now thiswidion homes lake loven WITHOUT MULTINOMIAL DISTRIBUTION, SEED TEXT : ', 'When Mexico sends its people, they\xe2\x80\x99re not sending the best. They\xe2\x80\x99re not sending you, they\xe2\x80\x99re s follow up with: When Mexico sends its people, they’re not sending the best. They’re not sending you, they’re s i don’t want to be a lot of the people and the people are going to be a lot of the people and the people are going to be a lot of the people and the people are going to be a lot of the people and the people are going to be a lot of the people and the people are going to be a lot of the people and the people are going to be a lot of the people and the people are going to be a lot of the people and the people are going to be a lot of the people and the people are going to be a lot of the people
Two things are very clear after looking at above explanation.
1) Application of Multinomial distribution prevents repetitive text generation 2) Application of Multinomial distribution generates text with frequent spelling mistakes compared to text generation without Multinomial distribution.
I have given output for few of the epochs, you may go for higher epochs to see improvement achieved by using multinomial probability.
This explanation contains some content from below given sources.
http://www.aosabook.org/en/500L/a-rejection-sampler.html https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.multinomial.html https://en.wikipedia.org/wiki/Multinomial_distribution http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Comentarios