Using Flickr images for 3D reconstruction

At the Electronic Imaging symposium of 2013, Steve Seitz from Washington University and Google presented a very interesting keynote entitled “a trillion photos”.

The principle is to exploit the millions of images found in databases such as Flickr. The aim of the project Building Rome in a Day is to harvest images from Flickr by simply typing the keywords “Rome” or “Venice”. Many images are unusable because they can not be matched with other images  – such as pictures of a restaurant, of a family, etc. On the other hand, the most touristic places such as “San Marco” are taken in picture from many different angles. By using a standard processing stream such as SIFT + RANSAC + FLANN it is possible to match the images and then to do the 3D reconstruction.

In this video , the pyramids represent the estimated shooting positions. The reconstruction was made by ​​using 14,079 pictures. The reconstruction of Venice is made by ​​using 250,000 images, 496 computing cores. 27h are necessary for matching and 38h for reconstruction.

Understanding how bags of visual words work

From 10 years ago [1], bags of visual words (also called bags of features or bags of keypoints) has been widely used in computer vision community for image classification and recognition.

Computing similarity between two pictures is complicated because there are many pixels in one image. Usually, scientists try to extract features such as color, shape or texture in order to compare images. One difficulty with the standard techniques is to compute features robust to rotation, zoom, illumination, noise and occlusion. Another difficulty is that most of the techniques need to segment the object before describing it.

Interest points (or keypoint) such as SIFT, SURF, etc. solve most of the problem : they are robust and do not need any segmentation, so it is very easy to use them. Extracting interest points for comparing images is a good idea. After extracting points, there are mainly two options : 1) matching the points of one image with the points of another image in order to do stitching, object recognition and localization or 2) make a statistical description of the images by counting the different “kind of” keypoints contained in image, this is the Bags of Visual Words – BoVW – technique. BoVW is used for image classification.

How the bags of visual words works ?

Here is the principle in 4 simple steps :

  1. Extracting the keypoints of images. You can use SURF to do this.
  2. Creating a visual dictionary by clustering all the keypoints. You can use k-means and fix k between 200 and 2 000, for example 1 000.
  3. For one image, you have to check in which cluster is each keypoint. So you will build a histogram with 1 000 bins, where each bin correspond to a cluster. The value of one bin is equal to the number of keypoints of the image that are in the related cluster.
  4. Each image is described by a vector so you can do supervised classification by using SVM.


[1] Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004, May). Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV (Vol. 1, No. 1-22, pp. 1-2).

Handwritten digits recognition based on neural networks

During summer 2012 and summer 2013 I supervised two master internship about handwritten digit recognition at Gestform digitizing company. The first one was about basic notions for reading, in a simple case, a handwritten digit and the second one was about segmentation of digits.

Here is an example of what we want to do :

HandwrittenDigitsTwo things are done here : 1) recognizing which part of the text contain digits or not and then 2) reading the digits. Almost the same tools are used for part 1 and 2.

Which part of the text contain digits ?

First, we try to segment the text line by line. Several algorithms can be apply for doing that, here a simple one is used : RLSA.

RLSAThen each connected component are extracted and analyzed in order to classified it as digit or non-digit.

CCCC2As we can see, before extracting connected components some preprocessing should be done such as morphological mathematics (dilatation and erosion) because some parts of digits are cut.

In order to extract the features homogeneously, some preprocessing are applied before feature extraction in order to normalize the features : correction the slope and the angle of the digit.

Then many features are extracted such as :

  • Hu invariant moments
  • Projection histograms
  • Profile histograms
  • Intersection with horizontal lines
  • Position of holes
  • Extremities
  • Junction points

Profile histogram


Projection histogram


Intersection with horizontal and vertical lines

They are all concatenated into one vector of 124 dimensions. Another vector is build from Freeman chain code (an histogram of 128 dimensions).


Freeman chain code

After extracting features, two neural networks are used in order to classify connected component as digit or non-digit. The first one have 124 inputs and the second one 128, each have 2 output : D (digit) or R (reject or non-digit). Many example have to be used in order to train the classifier (around 10 000 for each class).


Reading the digits

Here, the same features and neural network are used, but instead of 2 classes (digit / non-digit) 10 classes are used (0,1,2,3,4,5,6,7,8,9).

You can download training examples here. Some examples of 6 digits :



Correcting some classification errors

Many digits are touching others and are classified as R. So we introduced a new class. as “DD” (double digit). Furthermore, by using the sequences it is possible to correct some errors. By example, if you are looking for a 5 digit postal code it is possible to change a result such as : RRDDDRDRR as RRDDDDDRR or also filtering noise : RRRRRRDRRRR -> RRRRRRRRRRR. In order to do this HMM is used. Here is a HMM designed for postal code with D (digit) DD (double digits) and R (reject / non-digit) classes :


Double digits segmentation

In order to do double digits segmentation, the “drop fall” algorithm is used.
dropFallThe drop fall algorithm can be seen as if a drop of water is sliding along the digits. 4 drop fall can be done depending if the starting point is set up/left, up/right, down/right or down/left. Then, in order to chose the best segmentation the digits are recognized by the neural network, the couple of digits with the best recognition rate is kept.


Yi-Kai Chen and Jhing-Fa Wang. Segmentation of single-or multiple-touching handwritten numeral string using background and foreground analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(11) :1304–1317, 2000.

Britto de S, Robert Sabourin, Flavio Bortolozzi, Ching Y Suen, et al. A string length predictor to control the level building of hmms for handwritten numeral recognition. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, volume 4, pages 31–34. IEEE, 2002.

RV Kulkarni and PN Vasambekar. An overview of segmentation techniques for handwritten
connected digits. In Signal and Image Processing (ICSIP), 2010 International
Conference on, pages 479–482. IEEE, 2010.

Umapada Pal, Abdel Belaïd, and Christophe Choisy. Water reservoir based approach
for touching numeral segmentation. In Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on, pages 892–896. IEEE, 2001.

Ma Rui, Du Jie, Gu Yunhua, and Yan Yunyang. An improved drop-fall algorithm based
on background analysis for handwritten digits segmentation. In Intelligent Systems,
2009. GCIS’09. WRI Global Congress on, volume 4, pages 374–378. IEEE, 2009.

Javad Sadri, Ching Y Suen, and Tien D Bui. Automatic segmentation of unconstrained
handwritten numeral strings. In Frontiers in Handwriting Recognition, 2004. IWFHR-9
2004. Ninth International Workshop on, pages 317–322. IEEE, 2004.

Jie Zhou and Ching Y Suen. Unconstrained numeral pair recognition using enhanced
error correcting output coding : a holistic approach. In Document Analysis and Recognition, Proceedings. Eighth International Conference on, pages 484–488. IEEE, 2005.

Clément Chatelain, Guillaume Koch, Laurent Heutte, and Thierry Paquet. Une méthode dirigée par la syntaxe pour l’extraction de champs numériques dans les courriers entrants. 2006.

How to create a word cloud from text files with R

Here is the result of the creation of word cloud applied to 100 scientific papers from ICDAR 2013 conference :wordcloud

Text mining is a useful tool for making an overview of subjects or important words in a text collection such as website, books, articles, etc.

Creating a word cloud from text files with R is easy. The first thing to do is to install R and two packages : “tm” and “wordcloud” (maybe these package will need others packages, you just have to follow R instructions). Then,  put all the text files you want to analyze in the same directory, and write the following code in R :


# Loading libraries

# Define the folder where the text files are
a <-Corpus(DirSource("C:/MyPath/FolderContaining/TxtFiles"), readerControl = list(language="lat"))

# Preprocessing text
a <- tm_map(a, removeNumbers) # Not necessary if numbers are important for you
a <- tm_map(a, removePunctuation)
a <- tm_map(a , stripWhitespace)
a <- tm_map(a, tolower)
# Stopwords are words such as "we" "the" "and" "so", etc. You can add your own words to the list
a <- tm_map(a, removeWords, c(stopwords("english"), "can", "also", "may"))
# a <- tm_map(a, stemDocument, language = "english") # You can also do steamming if you want

# Computing the term document matrix
tdm <- TermDocumentMatrix(a)

# Transforming data for wordcloud
m <- as.matrix(tdm)
v <- sort(rowSums(m), decreasing=TRUE)
myNames <- names(v)
d <- data.frame(word=myNames, freq=v)

# Making and displaying the cloud
wordcloud(d$word, d$freq, min.freq=150)

Printing electrical circuits at home

Researchers of Tokyo University, MIT and Microsoft have realized the dream of electronics engineer !

They published today [1], at UbiComp conference, a new paper showing the possibility of printing electrical circuits using a home inkjet printer with special ink developed by Mitshubishi.

Many applications such as RFID, sensors, PCB, etc. can be applied from this technology.

[1] Yoshihiro Kawahara, Steve Hodges, Benjamin Cook, Cheng Zhang, and Gregory Abowd, Instant Inkjet Circuits: Lab-based Inkjet Printing to Support Rapid Prototyping of UbiComp Devices, in to appear in Proceedings of UbiComp 2013, ACM, September 2013