Introduction To Deep Neural Network

Sunil Patel
Jul 1, 2017
5 min read

Artificial neural networks were invented in 1943 but could not get popular for application in wider areas. Few reasons Behind failure of ANN are summarized below:

1) Training issues

Machine needs sufficient representative examples in order to capture the underlying structure that allows it to generalize to new cases. In nineties, Information technology world was flourishing and there was lack of sufficient data for training. After 1970, major pioneer institute start working on the task of gathering data, Image net, word net, dbpedia are result of this. Major companies like Facebook, google, Microsoft started crowd sourcing data from users by providing free image (google image, one-drive), video(YouTube), text(Messenger, Allo, gmail, Hotmail) services. With penetration of mobile in life, data started increasing day by day and we reached at point where data was sufficient. Now with this data algorithm could be trained to achieve certain level of generalization.

2) Theoretical issue

A neural network can have more than one hidden layers, more hidden layers "builds"new abstractions on top of previous layers. And as we mentioned before, you can often learn better in-practice with larger networks.

However, increasing the number of hidden layers leads to two known issues:

A) Vanishing gradients : As more layers are added back propagation fails to communicate errors from output layer towards input layer. As we have seen in the Artificial neural network implantation that we calculate error gradient at each perceptron in each layer, these error gradients become gradually very small as we back propagate to multiple hidden layer. As error gradient vanishes we no more effective gradients, updates to weights cannot be made and in-turn learning cannot takes place with multiple hidden layers.

Figure 1. Illustrating vanishing error gradient(δ), while back propagating from output layers to input layers.

B) Over-fitting : Smaller data is repeatedly exposed to a network, a machine start over-fitting to it. It means, on the same data machine performed better but perform poor on new data. It can be compared with cramming the task by learning algorithm and do not get generalised well to the newer cases.

Figure 2. The green line represents an over-fitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on it and it is likely to have a higher error rate on new unseen data, compared to the black line.(Source - Wikipedia)

3) Hardware issues

Artificial neural network are costlier in them of computational requirement. I will give you one practical example :=

I am was working with below given Convolution network which is a sub-part of VGG16 network. VGG16 is a huge network with 30+ stacked layers and the full network is having 15 lack+ parameters.

Figure 3. Architecture of VGG16 Network

model = Sequential()

model.add(ZeroPadding2D((1,1),input_shape=(3,256,256)))

model.add(Convolution2D(64, 3, 3, activation='relu'))