During summer 2012 and summer 2013 I supervised two master internship about handwritten digit recognition at Gestform digitizing company. The first one was about basic notions for reading, in a simple case, a handwritten digit and the second one was about segmentation of digits.
Here is an example of what we want to do :
Which part of the text contain digits ?
First, we try to segment the text line by line. Several algorithms can be apply for doing that, here a simple one is used : RLSA.
In order to extract the features homogeneously, some preprocessing are applied before feature extraction in order to normalize the features : correction the slope and the angle of the digit.
Then many features are extracted such as :
- Hu invariant moments
- Projection histograms
- Profile histograms
- Intersection with horizontal lines
- Position of holes
- Junction points
They are all concatenated into one vector of 124 dimensions. Another vector is build from Freeman chain code (an histogram of 128 dimensions).
After extracting features, two neural networks are used in order to classify connected component as digit or non-digit. The first one have 124 inputs and the second one 128, each have 2 output : D (digit) or R (reject or non-digit). Many example have to be used in order to train the classifier (around 10 000 for each class).
Reading the digits
Here, the same features and neural network are used, but instead of 2 classes (digit / non-digit) 10 classes are used (0,1,2,3,4,5,6,7,8,9).
You can download training examples here. Some examples of 6 digits :
Correcting some classification errors
Many digits are touching others and are classified as R. So we introduced a new class. as “DD” (double digit). Furthermore, by using the sequences it is possible to correct some errors. By example, if you are looking for a 5 digit postal code it is possible to change a result such as : RRDDDRDRR as RRDDDDDRR or also filtering noise : RRRRRRDRRRR -> RRRRRRRRRRR. In order to do this HMM is used. Here is a HMM designed for postal code with D (digit) DD (double digits) and R (reject / non-digit) classes :
Double digits segmentation
In order to do double digits segmentation, the “drop fall” algorithm is used.
The drop fall algorithm can be seen as if a drop of water is sliding along the digits. 4 drop fall can be done depending if the starting point is set up/left, up/right, down/right or down/left. Then, in order to chose the best segmentation the digits are recognized by the neural network, the couple of digits with the best recognition rate is kept.
Yi-Kai Chen and Jhing-Fa Wang. Segmentation of single-or multiple-touching handwritten numeral string using background and foreground analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(11) :1304–1317, 2000.
Britto de S, Robert Sabourin, Flavio Bortolozzi, Ching Y Suen, et al. A string length predictor to control the level building of hmms for handwritten numeral recognition. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, volume 4, pages 31–34. IEEE, 2002.
RV Kulkarni and PN Vasambekar. An overview of segmentation techniques for handwritten
connected digits. In Signal and Image Processing (ICSIP), 2010 International
Conference on, pages 479–482. IEEE, 2010.
Umapada Pal, Abdel Belaïd, and Christophe Choisy. Water reservoir based approach
for touching numeral segmentation. In Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on, pages 892–896. IEEE, 2001.
Ma Rui, Du Jie, Gu Yunhua, and Yan Yunyang. An improved drop-fall algorithm based
on background analysis for handwritten digits segmentation. In Intelligent Systems,
2009. GCIS’09. WRI Global Congress on, volume 4, pages 374–378. IEEE, 2009.
Javad Sadri, Ching Y Suen, and Tien D Bui. Automatic segmentation of unconstrained
handwritten numeral strings. In Frontiers in Handwriting Recognition, 2004. IWFHR-9
2004. Ninth International Workshop on, pages 317–322. IEEE, 2004.
Jie Zhou and Ching Y Suen. Unconstrained numeral pair recognition using enhanced
error correcting output coding : a holistic approach. In Document Analysis and Recognition, Proceedings. Eighth International Conference on, pages 484–488. IEEE, 2005.