Handwritten digits recognition based on neural networks

During summer 2012 and summer 2013 I supervised two master internship about handwritten digit recognition at Gestform digitizing company. The first one was about basic notions for reading, in a simple case, a handwritten digit and the second one was about segmentation of digits.

Here is an example of what we want to do :

HandwrittenDigitsTwo things are done here : 1) recognizing which part of the text contain digits or not and then 2) reading the digits. Almost the same tools are used for part 1 and 2.

Which part of the text contain digits ?

First, we try to segment the text line by line. Several algorithms can be apply for doing that, here a simple one is used : RLSA.

RLSAThen each connected component are extracted and analyzed in order to classified it as digit or non-digit.

CCCC2As we can see, before extracting connected components some preprocessing should be done such as morphological mathematics (dilatation and erosion) because some parts of digits are cut.

In order to extract the features homogeneously, some preprocessing are applied before feature extraction in order to normalize the features : correction the slope and the angle of the digit.

Then many features are extracted such as :

  • Hu invariant moments
  • Projection histograms
  • Profile histograms
  • Intersection with horizontal lines
  • Position of holes
  • Extremities
  • Junction points
Profile

Profile histogram

Projection

Projection histogram

lines

Intersection with horizontal and vertical lines

They are all concatenated into one vector of 124 dimensions. Another vector is build from Freeman chain code (an histogram of 128 dimensions).

Freeman

Freeman chain code

After extracting features, two neural networks are used in order to classify connected component as digit or non-digit. The first one have 124 inputs and the second one 128, each have 2 output : D (digit) or R (reject or non-digit). Many example have to be used in order to train the classifier (around 10 000 for each class).

NonDigits

Reading the digits

Here, the same features and neural network are used, but instead of 2 classes (digit / non-digit) 10 classes are used (0,1,2,3,4,5,6,7,8,9).

You can download training examples here. Some examples of 6 digits :

mnist6

 

Correcting some classification errors

Many digits are touching others and are classified as R. So we introduced a new class. as “DD” (double digit). Furthermore, by using the sequences it is possible to correct some errors. By example, if you are looking for a 5 digit postal code it is possible to change a result such as : RRDDDRDRR as RRDDDDDRR or also filtering noise : RRRRRRDRRRR -> RRRRRRRRRRR. In order to do this HMM is used. Here is a HMM designed for postal code with D (digit) DD (double digits) and R (reject / non-digit) classes :

HMM

Double digits segmentation

In order to do double digits segmentation, the “drop fall” algorithm is used.
dropFallThe drop fall algorithm can be seen as if a drop of water is sliding along the digits. 4 drop fall can be done depending if the starting point is set up/left, up/right, down/right or down/left. Then, in order to chose the best segmentation the digits are recognized by the neural network, the couple of digits with the best recognition rate is kept.

Bibliography

Yi-Kai Chen and Jhing-Fa Wang. Segmentation of single-or multiple-touching handwritten numeral string using background and foreground analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(11) :1304–1317, 2000.

Britto de S, Robert Sabourin, Flavio Bortolozzi, Ching Y Suen, et al. A string length predictor to control the level building of hmms for handwritten numeral recognition. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, volume 4, pages 31–34. IEEE, 2002.

RV Kulkarni and PN Vasambekar. An overview of segmentation techniques for handwritten
connected digits. In Signal and Image Processing (ICSIP), 2010 International
Conference on, pages 479–482. IEEE, 2010.

Umapada Pal, Abdel Belaïd, and Christophe Choisy. Water reservoir based approach
for touching numeral segmentation. In Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on, pages 892–896. IEEE, 2001.

Ma Rui, Du Jie, Gu Yunhua, and Yan Yunyang. An improved drop-fall algorithm based
on background analysis for handwritten digits segmentation. In Intelligent Systems,
2009. GCIS’09. WRI Global Congress on, volume 4, pages 374–378. IEEE, 2009.

Javad Sadri, Ching Y Suen, and Tien D Bui. Automatic segmentation of unconstrained
handwritten numeral strings. In Frontiers in Handwriting Recognition, 2004. IWFHR-9
2004. Ninth International Workshop on, pages 317–322. IEEE, 2004.

Jie Zhou and Ching Y Suen. Unconstrained numeral pair recognition using enhanced
error correcting output coding : a holistic approach. In Document Analysis and Recognition, Proceedings. Eighth International Conference on, pages 484–488. IEEE, 2005.

Clément Chatelain, Guillaume Koch, Laurent Heutte, and Thierry Paquet. Une méthode dirigée par la syntaxe pour l’extraction de champs numériques dans les courriers entrants. 2006.

Document image recognition

Document image matching

The problem of the recognition of document image can be complex because it requires to be robust in translation, rotation and zoom. It may also happen that the documents are degraded (noise, spots, cuts, etc.).

Techniques based on using interest points such as SIFT and SURF are commonly used in natural images (pictures). I worked on an extension of this method to quickly recognize patterns document given by a user, such as an identity card, a passport, train ticket, etc..

The method is simple and extensible to many other image document, it is divided into four main steps:

  1. Extraction of interest points. (SURF)
  2. Description of points. (SURF)
  3. Matching the current image points with those of the query image. (FLANN)
  4. Estimation of a 4-parameter transformation. (RANSAC)

Technological choices in brackets will be changed in the future by new more efficient algorithms and more suitable to the context.

The details of the technique can be found in the publication in 2012 CIFED: Recognition and Extraction of identity documents (in French).

 

Face detection

Face detection (1)

Face detection is widely used in video analysis and natural image. It can also be used in the recognition of identity document.

The most famous is the one developed by Viola and Jones in 2001 to detect the presence of a face image in real time.

Viola et Jones

The technique uses the pseudo-Haar features. It consists in defining rectangular and adjacent areas. Then the sum of the intensities of pixels of the image in these zones are computed. The difference between black and white rectangles give the pseudo-Haar features.

Examples of windows used for processing the pseudo-Haar features[1]

To determine the presence of a face in an image, Viola and Jones use a technique of supervised classification. A wide number of training images are needed. The authors use a classification by boosting. This classification is also used to select relevant features.

To go further

The algorithm is implemented in the OpenCV library and can be used easily.

This same method can be used for vehicle or pedestrian recognition. [2]

Bibliography

[1] Viola et Jones (Wikipedia).

[1] Detecting pedestrians using patterns of motion and appearance, Viola, P. and Jones, M.J. and Snow, D., International Journal of Computer Vision, volume 63, number 2, pages 153–161, 2005.

EMGU CV

Emgu cv is a wrapper for OpenCV in C #.

It contains almost all features of OpenCV to perform all basic tasks of image analysis and processing. There are also tools of machine learning such as SVM, Neural Networks, Naive Bayes, Decision Tree …

Since version 2.3, there is also the Tesseract OCR included.

Examples are available with the libraries as : face detection, SVM, motion detection, pedestrian detection, recognition panels, SURF, number plate recognition.

http://www.emgu.com/wiki/index.php/Main_Page

 

SIFT and SURF

SIFT interest point matchings- image Wikipédia

The detection of interest points in images is used more and more for many tasks: object recognition, image stitching, 3D modeling, video tracking, etc. The key points extracted from an image are used to characterize the image. By comparing the key points of an image and those of another image, we can deduce if common information are present in both images.

The creation of a panorama by stitching image 2 by 2 – Wikipédia

Released in 1999, the SIFT descriptor [1] is commonly used for the extraction and description of key points. What made the success of this descriptor is that it is robust to the change of intensity, scaling and rotation. It is based on the difference of Gaussians. The Wikipedia article is very detailed in a scientific point of view.

A little more recently (2006 and 2008), a variant called SURF [2] appeared. The main advantage of this technique is that the extraction of interest points is faster, thanks to the use of integral images. Interest points are not extracted in exactly the same way as SIFT and are not characterized in the same way. A comparative study [3] highlights the advantages and disadvantage of SIFT, PCA-SIFT and SURF in the following table:

 

Method Temps scale Rotation blur Illumination Affine
SIFT normal best best best normal good
PCA-SIFT good normal good normal good good
SURF best good normal good best good

SIFT and SURF are impemented in openCV and EMGU CV. I use it for logo detection, identity card detection and more generally for matching a sub part of an image with another part.

References :

[1] David G. Lowe, « Object recognition from local scale-invariant features »,  Proceedings of the International Conference on Computer Vision, vol. 2, 1999, p. 1150–1157.

[2] Herbert Bay, Tinne Tuytelaars et Luc Van Gool, « SURF: Speeded Up Robust Features », Proceedings of the European Conference on Computer Vision, 2006, p. 404-417.

[3] Juan L., Gwun O., « A comparison of sift, pca-sift and surf », International Journal of Image Processing (IJIP), 2010.