Compact Discrete Representations for Scalable Similarity Search

Thursday November 19, 2015 at 1:00 p.m. Mohammad Norouzi, PhD candidate in computer science at the University of Toronto, will be presenting “Compact Discrete Representations for Scalable Similarity Search”.

Speaker: Mohammad Norouzi
PhD Candidate

Day & Time: Thursday, November 19, 2015
1:00 p.m. – 2:00 p.m.

Location: Room ENG 106
George Vari Engineering and Computing Centre
Ryerson University
245 Church Street

Organizer: IEEE Toronto Computer, Magnetics and Instrument-Measurement Chapters

Contact: Maryam Davoudpour,

Abstract: Scalable similarity search on images, documents, and user activities benefits generic search, data visualization, and recommendation systems. This talk concerns the design of algorithms and machine learning tools for faster and more accurate similarity search. The proposed techniques advocate the use of discrete codes for representing the similarity structure of data in a compact way. In particular, I will discuss how one can learn to map high-dimensional data onto binary codes with a metric learning approach. Then, I will describe a simple algorithm for fast exact nearest neighbour search in Hamming distance, which exhibits sub-linear query time performance. Going beyond binary codes, I will highlight a compositional generalization of k-means clustering which maps data points onto integer codes with storage and search costs that grow sub-linearly in the number of cluster centers. This representation improves upon binary codes, and provides an even more precise approximation of Euclidean distance. Experimental results are reported on multiple datasets including a dataset of SIFT descriptors with 1B entries.

Biography: Mohammad Norouzi is a PhD candidate in computer science at the University of Toronto. His research lies at the intersection of machine learning and computer vision. He is a recipient of a Google US/Canada PhD fellowship in machine learning. He is going to join Google as a research scientist in January 2016.