Document image recognition

Document image matching

The problem of the recognition of document image can be complex because it requires to be robust in translation, rotation and zoom. It may also happen that the documents are degraded (noise, spots, cuts, etc.).

Techniques based on using interest points such as SIFT and SURF are commonly used in natural images (pictures). I worked on an extension of this method to quickly recognize patterns document given by a user, such as an identity card, a passport, train ticket, etc..

The method is simple and extensible to many other image document, it is divided into four main steps:

  1. Extraction of interest points. (SURF)
  2. Description of points. (SURF)
  3. Matching the current image points with those of the query image. (FLANN)
  4. Estimation of a 4-parameter transformation. (RANSAC)

Technological choices in brackets will be changed in the future by new more efficient algorithms and more suitable to the context.

The details of the technique can be found in the publication in 2012 CIFED: Recognition and Extraction of identity documents (in French).


Document image deskew

A useful preprocessing for document image analysis is to detect the orientation of the document and then to deskew it.

Straight document image and skewed document image (1)

To do this, several methods exist. But you should be aware that most of techniques will be effective on documents containing text and can be disrupted if photos or lines are present on the document. You can simply remove the key components related or select the components likely to be text.

The two most simple and most commonly used are: horizontal projection profile and line detection with Hough. They are applied to a binarized (black and white) document.

Horizontal projection profile.

The method consists in calculating, for each horizontal line of pixels, the number of black pixel. This is an histogram.
Then the image is rotated by an angle and a new histogram is computed again.
The histogram with the longest peaks is the histogram corresponding to an horizontal sheet. We can then deduce the rotation angle.
Of course if many different angles have to been tested, the method will take more time.

Profil de projection (1)


Hough can be used with the center of connected components, or pixels. Usually, all the image pixels are not used, but only the black pixels that have a white pixel below them, the goal is to use the footer row of characters. For more details on Hough we can refer to this article.

Other techniques

Boris Epshtein [2] from Google have published a paper to the ICDAR  conference in 2011. It is based on using interline space.


[1] Document image skew detection: Survey and annotated bibliography, Hull J.J., SERIES IN MACHINE PERCEPTION AND ARTIFICIAL INTELLIGENCE, volume 29, pages 40–66, 1998, WORLD SCIENTIFIC PUBLISHING.

[2] Determining Document Skew Using Inter-Line Spaces, Epshtein, B., Document Analysis and Recognition (ICDAR), 2011 International Conference on, pages 27–31, 2011, IEEE.

Face detection

Face detection (1)

Face detection is widely used in video analysis and natural image. It can also be used in the recognition of identity document.

The most famous is the one developed by Viola and Jones in 2001 to detect the presence of a face image in real time.

Viola et Jones

The technique uses the pseudo-Haar features. It consists in defining rectangular and adjacent areas. Then the sum of the intensities of pixels of the image in these zones are computed. The difference between black and white rectangles give the pseudo-Haar features.

Examples of windows used for processing the pseudo-Haar features[1]

To determine the presence of a face in an image, Viola and Jones use a technique of supervised classification. A wide number of training images are needed. The authors use a classification by boosting. This classification is also used to select relevant features.

To go further

The algorithm is implemented in the OpenCV library and can be used easily.

This same method can be used for vehicle or pedestrian recognition. [2]


[1] Viola et Jones (Wikipedia).

[1] Detecting pedestrians using patterns of motion and appearance, Viola, P. and Jones, M.J. and Snow, D., International Journal of Computer Vision, volume 63, number 2, pages 153–161, 2005.


Emgu cv is a wrapper for OpenCV in C #.

It contains almost all features of OpenCV to perform all basic tasks of image analysis and processing. There are also tools of machine learning such as SVM, Neural Networks, Naive Bayes, Decision Tree …

Since version 2.3, there is also the Tesseract OCR included.

Examples are available with the libraries as : face detection, SVM, motion detection, pedestrian detection, recognition panels, SURF, number plate recognition.


CIFED 2012

CORIA and CIFED are the meeting points of Francophone communities in information retrieval and analysis of written and scanned documents. While preserving the specificities of each conference, this edition will be an opportunity for both communities to gather around issues such as research of multimedia documents, interaction models with the user, search system information, performance evaluation tools for information retrieval, etc. More than 120 participants will exchange knowledge during this workshop.

Date: March 21-22-23 2012

Location: LABRI, University of Bordeaux 1

For more information: