Object Detection with Keras in Python

Sunil Patel
Jul 8, 2017
7 min read

All codes discusses in this blog post are placed at GitHub repository

In case of any error refer to requirements.txt file for python package compatibility or comment below.

Code Compatibility : python 2.7 , tested on ubuntu 16.04 with theano as backend

Figure 1. Object Detection with Convolution Neural Network.

Object detection is an application of computational vision related to identifying definite number of objects (such as car, cat or human) from image and video in the given scope of study. Object detection is broader topic which also includes face detection pedestrian detection, physical intrusion detection, and motion detection. Object detection has great applications including in image retrieval and video surveillance.

While searching for older techniques of object detection, I came across this page of Wikipedia describing “Techniques and algorithms” of object detection.

“The advantage we are having is, an image is made of pixels. So in most cases we know the location of next point, it will be connected to our current pixel. Starting with circles, take an image, convert it to grey scale, and detect edges. Move along edges, draw normal, they will intersect at center. Do this for entire circle or find connected edges and calculate Euclidean distance between center and connected points. Another algorithm is move along connected edges rotation of tangent will be uniform, because of symmetry. So whenever there is an abrupt change in rotation, you are out of circle. For squares, move along edges. First of all check if they are straight lines or not (check if pixels are having either same x or y co-ordinates). After that look for a 90 degree change in angle(if you were moving along a horizontal line then at corner y co-ordinate will stop changing and x will start changing).”

With recent advancements in object detection we have moved billion miles ahead of rudimentary technique so described. In present techniques, manual feature extraction is almost neglected, only raw data is passes to powerful algorithm and algorithm take care of automatically extracting features for us.

You can detect objects using a variety of methods, including:

1. Feature-based object detection

Random sample consensus, or RANSAC. RANSAC works by estimating mathematical model of data contains outlier. Then it ignores outlier and predict using rest of the data.

RANSAC is accomplished with the following steps

1. Randomly selecting a subset or subspace of the data set

2. Fitting a predefined model to the selected subset

3. Determining the number of outliers

4. Repeating steps 1-3 for a prescribed number of iterations

RANSAC is primarily used in generating stereo vision.

2. Viola-Jones object detection

This method is popularly used in face detection. All human faces share similar property, these regularity can be matched using Haar features. For more information regarding these features, refer Wikipedia.

3. Image segmentation and blob analysis

Image segmentation is the process of dividing an image into multiple parts. This is usually used to identify objects or other relevant information from images. Segmentation can be performed using any of the relevant algorithm from below given list 1. Otsu’s method 2. K-means clustering 3. watershed segmentation 4. texture filters

With invent of Convolution Neural Network and GPU computing, many things became past. CNN is proved to be of Ace in area of image processing. CNN eliminated roughly 80% of manual feature selection and mathematical tuning.

Visual Geometry group form Department of Engineering Science, University of Oxford applied CNN on ImageNet dataset and achieved state of are results. [http://www.robots.ox.ac.uk/~vgg/research/very_deep/]. Image Net is a huge dataset with 14,197,122 labelled images. As a part of competition ILSVRC-2014, provided images of ImageNet subset had to be classified in to 1000 classes. A convolution network based architecture known as VGG16 achieved highest accuracy in the above said compaction.

Training VGG16 network form ImageNet dataset would require state of art GPUs, unfortunately I cannot afford those, so I found a shortcut. Whenever we train a model, model memorize it in the form of weight in connecting layers. Fortunately, Visual Geometry Group at oxford had provided trained weights in form of model on their official site.

So our task to identify object goes in this way.

1) Get Keras compatible model of VGG16.

2) Construct VGG16 network in Keras

3) Arrange all weights in network so constructed

4) Take a random image

5) Resize and reshape the image as per required dimension by model

6) Get objects in image identified

Additional step:

7) Print name of top 5 object on image with their percentage of confidence as provided by network

It’s very clear from the discussion that we are not going to train a model, we will be just predicting using someone’s trained model. Let’s start step by step.

1) Get Keras compatible model of VGG16.

I got the model form some known source and kept it in my personal drive. You may obtain the model form my google drive.

2) Construct VGG16 network in Keras

In Keras VGG16 can be defined as follow.

model = Sequential()

model.add(ZeroPadding2D((1, 1), batch_input_shape=(1,3, 224,224)))

model.add(Convolution2D(64, 3, 3, activation='relu'))