Georgios Giannakis

University of Minnesota

[intermediate/advanced] Learning from Unreliable Labels via Crowdsourcing

Summary

Crowdsourcing has emerged as a powerful paradigm for tackling various machine learning, data mining, and data science tasks, by enlisting inexpensive crowds of human workers, or annotators, to accomplish a given learning and inference task. While conceptually similar to distributed data and decision fusion, crowdsourcing seeks to not only aggregate information from multiple human annotators or unreliable (a.k.a. weak) sources, but to also assess their reliabilities. Thus crowdsourcing can be readily adapted to information fusion tasks in unknown or contested environments, where data may be provided from unreliable and even adversarial agents. The overarching goal of this tutorial is a unifying framework for learning from unreliable (or “weak”) information sources, while being resilient to adversarial attacks. Focusing on the classification task, exposition will start with classical tools for crowdsourced label aggregation that simultaneously infer annotator reliabilities and true labels. Contemporary methods that leverage the statistical moments of annotator responses will be presented next. Building on the aforementioned models, a host of approaches that deal with data dependencies, including dynamic, networked data, and Gaussian Process- as well as Deep Learning-based tools will be presented. Finally, approaches that can identify coalitions of colluding adversaries will be presented. Impact of the unified framework will be demonstrated through extensive synthetic and real-data tests.

Syllabus

Introduction: Context, motivation, and timeliness. (20 mins)
Crowdsourced classification: (45 mins)
- Annotator models for label aggregation
- Probabilistic and Bayesian algorithms for label aggregation
- Moment-matching approaches
Data-aware crowdsourcing: (30 mins)
- Models for sequential and networked data
- Gaussian process and deep learning algorithms
Adversarially-robust crowdsourcing: (45 mins)
- Identifying spammer annotators
- Identifying arbitrary adversaries
Open issues: Challenging yet promising research directions. (15 mins)

References

S. Basu, I. Davidson, and K. Wagstaff, Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman-Hall/CRC, 2008.

J. Besag, “On the statistical analysis of dirty pictures,” Journal of the Royal Statistical Society B, pp. 48–259, 1986.

P. Ruiz, P. Morales-Álvarez, R. Molina, and A. K. Katsaggelos, “Learning from Crowds with Variational Gaussian Processes”, Pattern Recognition, vol. 88, 298-311, 2019.

A. P. Dawid and A. M. Skene, “Maximum likelihood estimation of observer error-rates using the EM algorithm,” Applied Statistics, pp. 20–28, 1979.

J. Deng et al, “Imagenet: A large-scale hierarchical image database,” Proc. of IEEE CVPR, pp. 248–255, 2009.

A. Jaffe, E. Fetaya, B. Nadler, T. Jiang, and Y. Kluger, “Unsupervised ensemble learning with dependent classifiers,” Artificial Intelligence and Statistics, 2016, pp. 351–360.

P. A. Traganitis, A. Pagès-Zamora, and G. B. Giannakis, “Blind multiclass ensemble classification,” IEEE Trans. on Signal Processing, pp. 4737–4752, Sept. 2018.

P. A. Traganitis and G. B. Giannakis, “Unsupervised ensemble classification with sequential and networked data,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 10, pp. 5009–5022, 2022.

P. A. Traganitis and G. B. Giannakis, “Bayesian crowdsourcing with constraints,” in Machine Learning and Knowledge Discovery in Databases. Research Track, Cham, pp. 543–559, Springer International Publishing, 2021.

P. A. Traganitis and G. B. Giannakis, “Detecting adversaries in crowdsourcing,” in 2021 IEEE International Conference on Data Mining (ICDM), pp. 1373–1378, 2021.

P. Welinder et al, “The multidimensional wisdom of crowds,” Proc. of NIPS, pp. 2424–2432, 2010.

D. Zhou et al, “Learning from the wisdom of crowds by minimax entropy,” Proc. of NIPS, pp. 2195–2203, 2015.

Pre-requisites

The target audience includes graduate students and researchers with basic background in machine learning, statistical signal processing, and interests in data science as well as data fusion, for learning and decision making problems. The audience will become familiar with state-of-the-art approaches to crowdsourcing and decision fusion tools under various scenaria, including the presence of adversaries; will obtain in-depth understanding of their merits and key technical challenges involved; and will leverage their potential on a spectrum of learning tasks.

Short bio

Georgios B. Giannakis received his Diploma in Electrical Engineering (EE) from the National Technical University of Athens, Greece, 1981. From 1982 to 1986 he was with the University of Southern California, where he received his MSc. in EE, 1983, MSc. in Mathematics, 1986, and Ph.D. in EE, 1986. He was with the University of Virginia from 1987 to 1998, and since 1999 he has been with the University of Minnesota (UMN), where he held an Endowed Chair of Telecommunications, served as director of the Digital Technology Center from 2008 to 2021, and since 2016 he has been a UMN Presidential Chair in ECE. His interests span the areas of statistical learning, communications, and networking — subjects on which he has published more than 485 journal papers, 800 conference papers, 25 book chapters, 2 edited books and 2 research monographs. His current research focuses on data science with applications to IoT, and power networks with renewables. He is the (co-)inventor of 36 issued patents, and the (co-)recipient of 10 best journal paper awards from the IEEE Signal Processing (SP) and Communications Societies, including the G. Marconi Prize. He also received the IEEE-SPS ‘Norbert Wiener’ Society Award (2019); EURASIP’s ‘A. Papoulis’ Society Award (2020); Technical Achievement Awards from the IEEE-SPS (2000), and from EURASIP (2005); the IEEE ComSoc Education Award (2019); and the IEEE Fourier Technical Field Award (2015). He is a member of the Academia Europaea, the Academy of Athens, Greece, and Fellow of the US National Academy of Inventors, the European Academy of Sciences, IEEE, and EURASIP. He has served the IEEE in a number of posts, including that of a Distinguished Lecturer for the IEEE-SPS.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.