Seminar: Locality Sensitive Hashing for Big Data

                 Department of Systems Engineering and Engineering Management
                              The Chinese University of Hong Kong
Date: Friday, September 26, 2014, 16:30 - 17:30
Speaker: Prof. Wei Wang, School of Computer Science and Engineering, The University of New South Wales, Australia.
Title: Locality Sensitive Hashing for Big Data
Finding (approximate) nearest neighbor is a fundamental problem in computational
geometry, and plays an important role in many database, multimedia, and machine
learning problems, as real-world objects are usually modelled or represented as
one or several high-dimensional feature vectors. The problem is notoriously hard
due to the phenomenon named "curse of dimensionality", and it becomes even more
challenging in the big data era, calling for new ideas and methodologies.
While there are numerous heuristic methods to solve this problem, methods based
on Locality Sensitive Hashing (LSH) are arguble the most popular and promising
approach as they have both rigorous theoretical guarantees on space and time
complexities and excellent query performance in practice. However, to achieve
such performance, LSH-based methods require huge amount of index. Recent work
reduces the index size from O((nd)^{1.5}) to O(n log(n)) (n is the number of
data points and d is the dimensionality), but the index is still typically an
order of magnitude larger than the size of the data.
In this talk, we give a brief survey of existing approaches to attack the
problem and then focus on our recent breakthrough in this problem. We propose a
novel method to solve the problem with theoretical guarantees and with a tiny
index which is linear in n. Furthermore, our method supports a variety of
functionalities, such as finding the exact nearest neighbors with guaranteed
probabilities. In the experiment, our methods demonstrate superior performance
against the state-of-the-art LSH-based methods, and scale up well to a billion
128-dimensional points on a single commodity PC.
Short Biography:
Dr. Wei Wang is an Associate Professor in the School of Computer Science and
Engineering, The University of New South Wales, Australia. His current research
interests include keyword search on (semi-)structured data, similarity query
processing, high dimensional indexing, and spatial databases. He has published
over ninety research papers, with many in premier database journals (TODS, VLDB
J, and TKDE) and conferences (SIGMOD, VLDB, ICDE, and WWW). He is currently
serving on the editorial boards of IEEE Transactions on Knowledge and Data
Engineering (TKDE).
Everyone is welcome to attend the talk! All the year one MPhil/PhD students need to attend the talk.
Venue: Room 513,
       William M.W. Mong Engineering Building (ERB),
       (Engineering Building Complex Phase 2)
       The Chinese University of Hong Kong.
The talk will be hosted by:
Prof. Hong Cheng,
Department of Systems Engineering and Engineering Management,
The Chinese University of Hong Kong,
SEEM-5201 Website:
Friday, September 26, 2014 - 08:30 to 09:30