top of page

Convolution Networks - Image Input To Visualization

Everything discussed under this tutorial including working code available on my GITHUB repository .

Often it is found that due to changing requirement with change in package versions code fails to run. I have added file named requirements.txt file to GitHub, which shows all packages with version present in my system when I executed this code. This will help in replicating working environment and debugging.

Code Compatibility : python 2.7 , tested on ubuntu 16.04 with theano as backend

 

In machine learning, a convolution neural network (CNN, or ConvNet) is a type of feed-forward artificial neural network in which the connectivity pattern between its neurons is inspired by the organisation of the animal visual cortex, whose individual neurons are arranged in such a way that they respond to overlapping regions tiling the visual field. Convolution networks were inspired by biological processes and are variations of multi-layer perceptrons designed to use minimal amounts of reprocessing. They have wide applications in image and video recognition, recommended systems and natural language processing. (Source :Wikipedia)

Figure 1. An image generated from convolution layers

For current illustration, we will be using "MNIST dataset".The MNIST database (Mixed National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original data-sets. The creators felt that since NIST's training data-set was taken from American Census Bureau employees, while the testing data-set was taken from American high school students, it was not well-suited for machine learning experiments.[5] Furthermore, the black and white images from NIST were normalised to fit into a 28x28 pixel bounding box and anti-aliased, which introduced gray-scale levels.

We will be learning CNN in 3 parts :

  • Importing data / data preparation

  • CNN implementation in KERAS

  • Visualising hidden layer data

STEP - 1 Importing data / data preparation

# loading requirements from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Dropout, Activation, Flatten from keras.layers import Convolution2D, MaxPooling2D from keras.utils import np_utils import matplotlib.pyplot as plt import theano import numpy as np from keras import backend as K K.set_image_dim_ordering('th') %matplotlib inline

Keras by default keeps MNIST data set for the purpose of illustrations, so will be using the same. mnist.load_data() function will load data and divide data in to train and test set.Training set having 60,000 and testing set having 10,000 examples.By default each image is of dimension 28*28.

(X_train, y_train), (X_test, y_test) = mnist.load_data()

Below given snippet will show dimension of each partition (e.g. Train and Test)

# lets print data size of all four section print ("X_train Size: ", len(X_train)," y_train Size :" ,len(y_train)," X_test Size : ",len(X_test)," y_testSize : ",len(y_test))

Output will be : ('X_train Size: ', 60000, ' y_train Size :', 60000, ' X_test Size : ', 10000, ' y_testSize : ', 10000). There are total 70000 images, out of which 60000 for training and 10000 for to test.

Now lets convert this array to Numpy array to shape (60000, 1, 28, 28). Question is why reshape ? can't we use as it is?? No, because convolution2D layer require 3D array as input in preset image we have 2D input with dimension 28*28. To comply with dimensional ordering we will add one more dimension to it. If we see in pythonic way, its just wrapping each image data with extra bracket and no change in internal data.

X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)

X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)

Now if you do X_train.shape it will give dimensions as (60000, 1, 28, 28) , No change in matrix shape it just change representation.

If you will do X_train[0], It will show what is there in the matrix. Each sub-array represnt, each row of pixel in 28*28 image.

array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 18, 18, 18, 126, 136, 175, 26, 166, 255, 247, 127, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 30, 36, 94, 154, 170, 253, 253, 253, 253, 253, 225, 172, 253, 242, 195, 64, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 49, 238, 253, 253, 253, 253, 253, 253, 253, 253, 251, 93, 82, 82, 56, 39, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 18, 219, 253, 253, 253, 253, 253, 198, 182, 247, 241, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 80, 156, 107, 253, 253, 205, 11, 0, 43, 154, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 1, 154, 253, 90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 139, 253, 190, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 190, 253, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35, 241, 225, 160, 108, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 240, 253, 253, 119, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 186, 253, 253, 150, 27, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 93, 252, 253, 187, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 249, 253, 249, 64, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 46, 130, 183, 253, 253, 207, 2, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 148, 229, 253, 253, 253, 250, 182, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 114, 221, 253, 253, 253, 253, 201, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 23, 66, 213, 253, 253, 253, 253, 198, 81, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 18, 171, 219, 253, 253, 253, 253, 195, 80, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 55, 172, 226, 253, 253, 253, 253, 244, 133, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 136, 253, 253, 253, 212, 135, 132, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)

Its 28*28 matrix. Numbers between 0-255 represents pixel density 255 is black and 0 is white, and all intermediate states are represented by integers.

X_train = X_train.astype('float32')

X_test = X_test.astype('float32')

Converting X_train and X_test to floating point number.

X_train /= 255

X_test /= 255

Normalization : as all pixel values are between 0 - 255, dividing them by 255 will bring them in to range of 0 - 1.

Y_train = np_utils.to_categorical(y_train, 10)

Y_test = np_utils.to_categorical(y_test, 10)

In downloaded data we hve classes in form of numerical variables, e,g, 1,2,3,4,5,6, etc To make it easy to understand for machine we need to convert each classes to categorical variables for example if the class is 5 then it will be converted to [0,0,0,0,1,0,0,....,0,0], this is also known as one-hot-vector representation.

###########visualize##############

plt.imshow(X_train[1].squeeze())

plt.show()

Figure.2 : shows reconstructed image of "0" from 2D matrix of pixel value ranging from 0 - 1.

STEP - 2 CNN implementation in KERAS

batch_size = 128 # number of samples to be processed each time nb_classes = 10 #number of classes nb_epoch = 1 # only one epoch - enough for this time # input image dimensions img_rows, img_cols = 28, 28 # number of convolutional filters to use nb_filters = 32 # size of pooling area for max pooling nb_pool = 2 # convolution kernel size kernel_size = (3, 3)

Here we defined all the variable which are required for construction of convolution network.

Now defining convolution network architecture.

model = Sequential() model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='valid', input_shape=(1, img_rows, img_cols))) model.add(Activation('relu')) model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1])) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(nb_pool, nb_pool))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(128)) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(nb_classes)) model.add(Activation('softmax'))

Compiling convolution network and building model - Since this is a classification problem, we will use categorical_crossentropy as a loss function. Information regarding why Adadelta and other parameters are used is provided in my other blog.

model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])

Predicting on Test data

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, validation_data=(X_test, Y_test))

It will produce following output - 97% accuracy

Train on 60000 samples, validate on 10000 samples

Epoch 1/1

60000/60000 [==============================] - 100s - loss: 0.3773 - acc: 0.8846 - val_loss: 0.0889 - val_acc: 0.9723

STEP - 3 Visualizing Convolution2D layer

By default keras model is a dictionary with each layer's configuration its weights are stored in the form of key value pair. below given snippet will print what are all layers present in the model.

for layerNo in range(0,len(model.layers)): print layerNo,model.layers[layerNo]

It will give following output - total 11 layers

0 <keras.layers.convolutional.Convolution2D object at 0x7fbd39692e10>

1 <keras.layers.core.Activation object at 0x7fbd3a6cf0d0>

2 <keras.layers.convolutional.Convolution2D object at 0x7fbd3a6cf110>

3 <keras.layers.core.Activation object at 0x7fbd3a6cfb10>

4 <keras.layers.pooling.MaxPooling2D object at 0x7fbd3a700c90>

5 <keras.layers.core.Dropout object at 0x7fbd3a6f5750>

6 <keras.layers.core.Flatten object at 0x7fbd3a6f5910>

7 <keras.layers.core.Dense object at 0x7fbd3a700d10>

8 <keras.layers.core.Activation object at 0x7fbd3a662650>

9 <keras.layers.core.Dropout object at 0x7fbd3a662d50>

10 <keras.layers.core.Dense object at 0x7fbd3a662490>

11 <keras.layers.core.Activation object at 0x7fbd3a662d10>

Now we will examine what is there in layer 2 (Convolution2D). I will be using very simple code to examine whats there in and code given below will be explained by commenting in code itself.

"""

TO VISUALIZE ONE FILTER

""" big_array = [] # out of all above shown layers, I am showing you whats there in convolution layer 2 <keras.layers.convolutional.Convolution2D object at 0x7f8a766bd7d0> layer = model.layers[2]

g=layer.get_config() """ g is having configuration about current layer as shown: {'W_constraint': None, 'b_constraint': None, 'name': 'convolution2d_7', 'activity_regularizer': None, 'trainable': True, 'dim_ordering': 'th', 'nb_col': 3, 'subsample': (1, 1), 'init': 'glorot_uniform', 'bias': True, 'nb_filter': 32, 'b_regularizer': None, 'W_regularizer': None, 'nb_row': 3, 'activation': 'linear', 'border_mode': 'valid'} """ h=layer.get_weights() """ h is having weights of current layer [array([[[[ 0.07282107, -0.03278938, 0.11698628], [ 0.05283581, 0.05940171, -0.0462735 ], [-0.01122687, 0.05821611, 0.08387677]], .....................................]]]

""" # shape of first element in current layer h[0].shape #(32, 32, 3, 3) for outShapeNo in h[0]: for innerShapeNo in outShapeNo: # removing border from image fig = plt.figure(frameon=False) # setting image size fig.set_size_inches(3,3) #setting image axis to off (not to show axis) ax = plt.Axes(fig, [0., 0., 1., 1.]) ax.set_axis_off() fig.add_axes(ax) # creating columns inside html html.writelines("<td>") plt.axis('off') ax.imshow(innerShapeNo.squeeze(),interpolation='nearest', aspect='normal') #,interpolation='nearest' break break

Above snippet will give following image as output:

Figure 3. Visualising one of the convolution filter

As the entire convolution is large 2d matrix (32*32 number images of dimensions 3*3 pixel), printing it in to Ipython notebook as it is may hang up PC so we will save each image separately and willl simultaneously generate a HTML page which will show these images in form of 2D matrix [in border less table].

"""

TO VISUALISE ALL FILTER

"""

big_array = [] # out of all above shown layers, I am showing you whats there in convolution layer 2 <keras.layers.convolutional.Convolution2D object at 0x7f8a766bd7d0> layer = model.layers[2]

g=layer.get_config() """ g is having configuration about current layer as shown: {'W_constraint': None, 'b_constraint': None, 'name': 'convolution2d_7', 'activity_regularizer': None, 'trainable': True, 'dim_ordering': 'th', 'nb_col': 3, 'subsample': (1, 1), 'init': 'glorot_uniform', 'bias': True, 'nb_filter': 32, 'b_regularizer': None, 'W_regularizer': None, 'nb_row': 3, 'activation': 'linear', 'border_mode': 'valid'} """ h=layer.get_weights() """ h is having weights of current layer [array([[[[ 0.07282107, -0.03278938, 0.11698628], [ 0.05283581, 0.05940171, -0.0462735 ], [-0.01122687, 0.05821611, 0.08387677]], .....................................]]]

""" # shape of first element in current layer h[0].shape #(32, 32, 3, 3) """ this dimention corelates with our networks setting (32, 32, 3, 3) says there are 32 images 3*3 size in one layer and such 32 filters exists 3*3 was our kernal size and 32 was out filter size """ # next we will create all these images in "images" folder and simultaneously create a html file "html.html"#which will show these image file combinely in form of row and cols

html = open('html.html','w') # creating table in html file to put image inside shell html.writelines("<table>") count=0 for outShapeNo in h[0]: # creating row inside html html.writelines("<tr>") for innerShapeNo in outShapeNo: # removing border from image fig = plt.figure(frameon=False) # setting image size fig.set_size_inches(3,3) #setting image axis to off (not to show axis) ax = plt.Axes(fig, [0., 0., 1., 1.]) ax.set_axis_off() fig.add_axes(ax) # creating columns inside html html.writelines("<td>") plt.axis('off') ax.imshow(innerShapeNo.squeeze(),interpolation='nearest', aspect='normal') #,interpolation='nearest' # writting image to file fig.savefig("images/"+str(count)+".png") # provideing the same image path to html table cell html.writelines("<img src='images/"+str(count)+".png"+"'") html.writelines("</td>") count = count+1 plt.close() html.writelines("</tr>") html.writelines("</table>")# next we will create all these images in "images" folder and simultaneously create a html file "html.html"#which will show these image file combinly in form of row and cols

Now open html.html in your browser and you will see an image similar to given below.

This is very basic tutorial to peep inside layer of convolution network. In next part of the tutorial we will continue further in ConvNets.

If you like this tutorial please share with your colleague. Discuss doubts, ask for changes on GitHub. It's free, No charges for anything. Let me to get inspired from your responses and deliver even better.

Never Miss My Post!
bottom of page