Nathalie Japkowicz

American University

[intermediate/advanced] Learning from Class Imbalances

Summary

The class imbalance problem was first coined in the mid-1990s, when machine learning algorithms became robust enough to be applied in real-world settings. At that point, a myriad of new problems came up including the ubiquitous class imbalance problem. Since then, many remedies have been proposed, but the problem persists. This course will introduce the problem and relate it to other common problems in machine learning including cost-sensitive learning, long tailed distributions, data scarcity, fairness, anomaly detection, and evaluation. It will then show how the seriousness of the problem increases in the presence of different kinds of data characteristics and what the effect of increasing neural network depth has on it. It will then present the different kinds of solutions that have been proposed to deal with the problem, including cost-sensitive approaches, data resampling methods, and one-class learning. The discussion will span both classification and other learning paradigms.

Syllabus

Lecture 1: Introduction to the class imbalance problem and its relation to other common problems

Lecture 2: Understanding the causes of the class imbalance problem and the effect of network depth on it

Lecture 3: Proposed solutions for dealing with class imbalances

References

Japkowicz, N. and Shaju Stephen. “The class imbalance problem: A systematic study.” Intell. Data Anal. 6 (2002): 429-449.
He, Haibo and Edwardo A. Garcia. “Learning from Imbalanced Data.” IEEE Transactions on Knowledge and Data Engineering 21 (2009): 1263-1284.
Branco, Paula et al. “A Survey of Predictive Modeling on Imbalanced Domains.” ACM Computing Surveys (CSUR) 49 (2016): 1 – 50.
Krawczyk, B.. “Learning from imbalanced data: open challenges and future directions.” Progress in Artificial Intelligence 5 (2016): 221-232.
Johnson, Justin M. and T. Khoshgoftaar. “Survey on deep learning with class imbalance.” Journal of Big Data 6 (2019): 1-54.
Ghosh, Kushankur et al. “On the combined effect of class imbalance and concept complexity in deep learning.” ArXiv abs/2107.14194 (2021): n. pag.

Pre-requisites

Introductory course on Machine Learning or Data Mining.

Short bio

Nathalie Japkowicz is a Professor and Chair of the Computer Science Department at American University, Washington DC. She was previously with the School of Electrical Engineering and Computer Science at the University of Ottawa where she lead the Laboratory for Research on Machine Learning for Defense and Security. Her work has spanned different areas of Machine learning, but focused primarily on the class imbalance problem, anomaly detection using one-class learning, and machine learning evaluation. She worked in a number of domains in the areas of radiation protection, cyber security, medicine and molecular biology among others. She has supervised over thirty graduate students, received funding from Canadian and American institutions, worked with governmental agencies as well as private companies, and published over 180 peer-reviewed journal articles and conference papers, together with special issues and books including Evaluating Learning Algorithms: A Classification Perspective, with Mohak Shah (Cambridge University Press, 2011). She is a past president of the Canadian Artificial Intelligence Association and she received a number of best paper awards as well as the Canadian Artificial Intelligence Association’s Distinguished Service Award.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.