- Seminar Calendar
- Seminar Archive
- 2023-2024 Semester 2
- 2023-2024 Semester 1
- 2022-2023 Semester 2
- 2022-2023 Semester 1
- 2021-2022 Semester 2
- 2021-2022 Semester 1
- 2020-2021 Semester 2
- 2020-2021 Semester 1
- 2019-2020 Semester 2
- 2019-2020 Semester 1
- 2018-2019 Semester 2
- 2018-2019 Semester 1
- 2017-2018 Semester 2
- 2017-2018 Semester 1
- 2016-2017 Semester 2
- 2016-2017 Semester 1
- 2015-2016 Semester 1
- 2015-2016 Semester 2
- 2014-2015 Semester 2
- 2014-2015 Semester 1
- 2013-2014 Semester 2
- 2013-2014 Semester 1
- 2012-2013 Semester 2
- 2012-2013 Semester 1
- 2011-2012 Semester 2
- 2011-2012 Semester 1
- 2010-2011 Semester 2
- 2010-2011 Semester 1
- 2009-2010 Semester 2
- 2009-2010 Semester 1
- 2008-2009 Semester 2
- 2008-2009 Semester 1
- 2007-2008 Semester 2
- 2007-2008 Semester 1
- 2006-2007 Semester 2
- 2006-2007 Semester 1
- 2005-2006 Semester 2
- 2005-2006 Semester 1
- Contact
- Site Map
Title: Labeling Massive Data from Noisy, Incomplete and Crowdsourced Annotations
Seminar
Department of Systems Engineering and Engineering Management
The Chinese University of Hong Kong
----------------------------------------------------------------------------------------------------------
Title: Labeling Massive Data from Noisy, Incomplete and Crowdsourced Annotations
Speaker: Prof. Xiao Fu
School of Electrical Engineering and Computer Science
Oregon State University
Abstract: Labeled data has been in high demand in the era of deep learning. Crowdsourcing techniques aim to produce accurate labels by effectively integrating noisy, non-expert annotations from multiple annotators. The (arguably) most notable method from the statistical machine learning community is an expectation maximization (EM) algorithm proposed by Dawid and Skene in 1979. However, theoretical understanding to the Dawid-Skene approach is still limited. Recently, elegant tensor decomposition-based methods were proposed to establish identifiability of the Dawid-Skene model. One challenge is that tensor methods may suffer from high sample complexity, since they involve third-order statistics of the annotator responses – which are hard to estimate accurately with limited annotations. In addition, tensor decomposition-based methods are often associated with challenging computation problems.
In this talk, I will introduce a simple algebraic algorithm that can efficiently solve large-scale crowdsourcing problems under the Dawid-Skene model – with provable guarantees. Our approach uses second-order statistics of the annotator responses, and thus naturally enjoys much lower sample complexity relative to the tensor methods. I will also introduce a coupled matrix decomposition based algorithm that enhances performance of the algebraic algorithm under more challenging scenarios. Experiments show that the proposed algorithms outperform the state-of-art algorithms under a variety of scenarios. Extensions beyond the Dawid-Skene model will also be briefly touched upon.
Date:
Friday, April 17, 2020 - 11:30