Georgios Giannakis
[intermediate/advanced] Learning from Unreliable Labels via Crowdsourcing
Summary
Crowdsourcing has emerged as a powerful paradigm for tackling various machine learning, data mining, and data science tasks, by enlisting inexpensive crowds of human workers, or annotators, to accomplish a given learning and inference task. While conceptually similar to distributed data and decision fusion, crowdsourcing seeks to not only aggregate information from multiple human annotators or unreliable (a.k.a. weak) sources, but to also assess their reliabilities. Thus crowdsourcing can be readily adapted to information fusion tasks in unknown or contested environments, where data may be provided from unreliable and even adversarial agents. The overarching goal of this tutorial is a unifying framework for learning from unreliable (or “weak”) information sources, while being resilient to adversarial attacks. Focusing on the classification task, exposition will start with classical tools for crowdsourced label aggregation that simultaneously infer annotator reliabilities and true labels. Contemporary methods that leverage the statistical moments of annotator responses will be presented next. Building on the aforementioned models, a host of approaches that deal with data dependencies, including dynamic, networked data, and Gaussian Process- as well as Deep Learning-based tools will be presented. Finally, approaches that can identify coalitions of colluding adversaries will be presented. Impact of the unified framework will be demonstrated through extensive synthetic and real-data tests.
Syllabus
- Introduction: Context, motivation, and timeliness. (20 mins)
- Crowdsourced classification: (45 mins)
- Annotator models for label aggregation
- Probabilistic and Bayesian algorithms for label aggregation
- Moment-matching approaches
- Data-aware crowdsourcing: (30 mins)
- Models for sequential and networked data
- Gaussian process and deep learning algorithms
- Adversarially-robust crowdsourcing: (45 mins)
- Identifying spammer annotators
- Identifying arbitrary adversaries
- Open issues: Challenging yet promising research directions. (15 mins)
References
S. Basu, I. Davidson, and K. Wagstaff, Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman-Hall/CRC, 2008.
J. Besag, “On the statistical analysis of dirty pictures,” Journal of the Royal Statistical Society B, pp. 48–259, 1986.
P. Ruiz, P. Morales-Álvarez, R. Molina, and A. K. Katsaggelos, “Learning from Crowds with Variational Gaussian Processes”, Pattern Recognition, vol. 88, 298-311, 2019.
A. P. Dawid and A. M. Skene, “Maximum likelihood estimation of observer error-rates using the EM algorithm,” Applied Statistics, pp. 20–28, 1979.
J. Deng et al, “Imagenet: A large-scale hierarchical image database,” Proc. of IEEE CVPR, pp. 248–255, 2009.
A. Jaffe, E. Fetaya, B. Nadler, T. Jiang, and Y. Kluger, “Unsupervised ensemble learning with dependent classifiers,” Artificial Intelligence and Statistics, 2016, pp. 351–360.
P. A. Traganitis, A. Pagès-Zamora, and G. B. Giannakis, “Blind multiclass ensemble classification,” IEEE Trans. on Signal Processing, pp. 4737–4752, Sept. 2018.
P. A. Traganitis and G. B. Giannakis, “Unsupervised ensemble classification with sequential and networked data,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 10, pp. 5009–5022, 2022.
P. A. Traganitis and G. B. Giannakis, “Bayesian crowdsourcing with constraints,” in Machine Learning and Knowledge Discovery in Databases. Research Track, Cham, pp. 543–559, Springer International Publishing, 2021.
P. A. Traganitis and G. B. Giannakis, “Detecting adversaries in crowdsourcing,” in 2021 IEEE International Conference on Data Mining (ICDM), pp. 1373–1378, 2021.
P. Welinder et al, “The multidimensional wisdom of crowds,” Proc. of NIPS, pp. 2424–2432, 2010.
D. Zhou et al, “Learning from the wisdom of crowds by minimax entropy,” Proc. of NIPS, pp. 2195–2203, 2015.
Pre-requisites
The target audience includes graduate students and researchers with basic background in machine learning, statistical signal processing, and interests in data science as well as data fusion, for learning and decision making problems. The audience will become familiar with state-of-the-art approaches to crowdsourcing and decision fusion tools under various scenaria, including the presence of adversaries; will obtain in-depth understanding of their merits and key technical challenges involved; and will leverage their potential on a spectrum of learning tasks.
Short bio
Georgios B. Giannakis received his Diploma in Electrical Engineering (EE) from the National Technical University of Athens, Greece, 1981. From 1982 to 1986 he was with the University of Southern California, where he received his MSc. in EE, 1983, MSc. in Mathematics, 1986, and Ph.D. in EE, 1986. He was with the University of Virginia from 1987 to 1998, and since 1999 he has been with the University of Minnesota (UMN), where he held an Endowed Chair of Telecommunications, served as director of the Digital Technology Center from 2008 to 2021, and since 2016 he has been a UMN Presidential Chair in ECE. His interests span the areas of statistical learning, communications, and networking — subjects on which he has published more than 485 journal papers, 800 conference papers, 25 book chapters, 2 edited books and 2 research monographs. His current research focuses on data science with applications to IoT, and power networks with renewables. He is the (co-)inventor of 36 issued patents, and the (co-)recipient of 10 best journal paper awards from the IEEE Signal Processing (SP) and Communications Societies, including the G. Marconi Prize. He also received the IEEE-SPS ‘Norbert Wiener’ Society Award (2019); EURASIP’s ‘A. Papoulis’ Society Award (2020); Technical Achievement Awards from the IEEE-SPS (2000), and from EURASIP (2005); the IEEE ComSoc Education Award (2019); and the IEEE Fourier Technical Field Award (2015). He is a member of the Academia Europaea, the Academy of Athens, Greece, and Fellow of the US National Academy of Inventors, the European Academy of Sciences, IEEE, and EURASIP. He has served the IEEE in a number of posts, including that of a Distinguished Lecturer for the IEEE-SPS.