Matias Carrasco Kind
[intermediate] Anomaly Detection
Summary
In the age of big data and high volume information, anomaly detection finds many areas of application, including network security, financial data, medical data analysis, and discovery of celestial events from astronomical surveys, among many more. The need for reliable and efficient algorithms is plentiful, and there are many techniques that have been developed over the years to address this need including multivariate data and more recently, streaming data with need for updates on data with missing variables. Anomalous data can have as much scientific value as normal data or in some cases even more, and it is of vital importance to have robust, fast, and reliable algorithms to detect and flag such anomalies. We will discuss different algorithms to identify anomalies in all kinds of data, including multi-dimensional and time-series data, with a deep dive into the fundamentals and tips to identify the best algorithm suited for each situation, including the ones we have developed. Code, data and python examples will be provided.
Syllabus
- Introduction to Anomalies and Outlier detection
- Deep dive into the statistical and probability framework for anomalies
- Machine Learning algorithms for anomaly detection, supervised, unsupervised and parameter-free
- Deep Learning algorithms for anomaly detection, including VAE, GANs and others
- Outlier detection for time series data
References
- Zimek, Arthur; Schubert, Erich (2017), “Outlier Detection”, Encyclopedia of Database Systems, Springer New York, pp. 1–5.
- Hodge, V. J.; Austin, J. (2004). “A Survey of Outlier Detection Methodologies” . Artificial Intelligence Review. 22 (2): 85–126.
- S. Hariri, M. Carrasco Kind and R. J. Brunner, “Extended Isolation Forest,” in IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 4, pp. 1479-1489, 1 April 2021, doi: 10.1109/TKDE.2019.2947676.
- Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
Pre-requisites
General knowledge of statistical learning. Basic knowledge of probability and statistics. Basic knowledge of linear algebra.
Short bio
Matias Carrasco Kind is currently the Director of the Data Science Research Services at the Gies College of Business at the University of Illinois at Urbana-Champaign in the U.S, where he is also a Faculty in Accountancy, Astronomy and at the National Center for Supercomputing Applications.
He is interested in challenging problems involving data-intensive science, machine, and deep learning, data mining, data analysis and visualization, image processing, AI generative models, scientific platforms and cyberinfrastructure, data management, software engineering, and scientific cloud computing, among others. Most of his research has been focused on Astrophysics but given the multidisciplinary nature of his work, and the common needs and tools across multiple fields regarding data, he has also applied these techniques to earth sciences, bio-imaging, veterinary, agricultural economics, finance research, and accounting.
Matias obtained his PhD in Astronomy with a Computational Science and Engineering option at the University of Illinois which focused on machine learning techniques applied to astronomy at large scales.