João Gama
[introductory] Learning from Data Streams: Challenges, Issues, and Opportunities
Summary
In this tutorial we will discuss the problem of learning from data streams generated by evolving nonstationary processes. It will overview the advances of techniques, methods, and tools that are dedicated to manage, exploit and interpret data streams generated from time-evolving environments. In particular, the tutorial will examine the problems of learning classification and regression models from high-speed streams of non-stationary data. How to design the experimental setup and evaluate those models. We will also discuss issues related to concept drift, change detection, and novelty detection. Auto-ML for data streams.
Syllabus
- Data Streams: Concepts and Methods
- Window models and Exponential Histograms
- Counting Algorithms and Frequent Items
- Clustering Data Streams
- Hoeffding Algorithms for Classification and Regression
- Concept drift, change detection and Novelty detection
- Evaluation of streaming algorithms
- Auto-ML for Data streams
References
- João Gama: Knowledge Discovery from Data Streams. CRC Press 2010
- Albert Bifet, João Gama: IoT data stream analytics. Ann. des Télécommunications 75 (2020)
- João Gama, Indre Zliobaite, Albert Bifet, Mykola Pechenizkiy, Abdelhamid Bouchachia: A survey on concept drift adaptation. ACM Comput. Surv. 46(4) (2014)
- Gianmarco De Francisci Morales, Albert Bifet: SAMOA: scalable advanced massive online analysis. J. Mach. Learn. Res. 16: 149-153 (2015)
- Albert Bifet, Geoff Holmes, Richard Kirkby, Bernhard Pfahringer: MOA: Massive Online Analysis. J. Mach. Learn. Res. 11: 1601-1604 (2010)
Pre-requisites
Basic concepts of machine learning and data mining.
Short bio
João Gama is a Full Professor at the School of Economics, University of Porto, Portugal. He received his Ph.D. in Computer Science from the University of Porto in 2000. He is EurIA Fellow, IEEE Fellow, and member of the board of directors of the LIAAD, a group belonging to INESC Porto. His h-index at Google Scholar is 58. He is an Editor of several top-level Machine Learning and Data Mining journals. He has been ACM Distinguish Speaker. He served as Program Chair of ECMLPKDD 2005, DS09, ADMA09, EPIA 2017, DSAA 2017, served as Conference Chair of IDA 2011, ECMLPKDD 2015, DSAA’2021, and a series of Workshops on KDDS and Knowledge Discovery from Sensor Data with ACM SIGKDD. His main research interests are in knowledge discovery from data streams, evolving data, probabilistic reasoning, and causality. He published more than 300 reviewed papers in journals and major conferences. He has an extensive list of publications in data stream learning.