Samira Ebrahimi Kahou

University of Calgary

[intermediate/advanced] Explainability in Machine Learning

Summary

This three-part lecture series explores key topics in explainable machine learning (XML). The first session introduces foundational concepts of XML and covers topics like inherently explainable models, feature attribution methods and concept bottleneck models. The second session focuses on advanced methods for explainability in large language models, covering their unique challenges and recent advancements. The final session will provide an overview of state-of-the-art methods in explainable reinforcement learning (RL), and efforts to make policies and decision-making processes more transparent.

Syllabus

Lecture 1 (explainable machine learning):

Definition and importance of interpretability
Categorization of interpretability methods
Inherently interpretable models
Feature-attribution methods
Concept bottleneck models

Lecture 2 (explainability in large language models):

Probing-based explanations
Neuron activation explanation
Concept-based explanations
Mechanistic interpretability
Challenges

Lecture 3 (explainable reinforcement learning):

Sequential decision making
Markov decision processes
Metrics for evaluating explainable RL methods
Converting learned policies to decision trees
Clustering-based identification of behaviors

References

Bereska, L. and Gavves, E. Mechanistic Interpretability for AI Safety — A Review. 2024

Koh, P.W. et al. Concept bottleneck models. 2020

Milani, S. et al. Explainable Reinforcement Learning: A Survey and Comparative Review. 2024

Molnar, C. Interpretable machine learning – A Guide for Making Black Box Models Explainable. 2020

Sheth, I. and Ebrahimi Kahou, S. Auxiliary losses for learning generalizable concept-based models. 2024

Zhao, H. et al. Explainability for Large Language Models: A Survey. 2024

Pre-requisites

Basics of machine learning, large language models, and reinforcement learning (preferred).

Short bio

Samira is an Assistant Professor at the University of Calgary, an Adjunct Professor at École de technologie supérieure and an Adjunct Professor at McGill University. She is a member of the Quebéc AI Institute (Mila) and holds a Canada CIFAR AI Chair. Samira received her Ph.D. in Computer Engineering from Polytechnique Montréal/Mila with an award for the best thesis in the department. Samira also worked as a Postdoctoral Fellow at McGill and as a Researcher at Microsoft Research Montréal.

Samira’s pioneering work in visual reasoning includes the two well-known datasets “Something Something” and “FigureQA”. Her current focus is on enhancing generalization and interpretability in machine learning, with a particular focus on large language models and sequential decision making.

Samira also works on diverse applications of machine learning, e.g. for drug dosage recommendation, medical imaging, or environmental forecasting. Samira’s work has been published in top-tier venues, such as NeurIPS, ICLR, ICML, ICCV, CVPR, TMLR and CoRL. She is a recipient of the Ten-Year Technical Impact Runner-Up Award at the 25th ACM International Conference on Multimodal Interaction.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.