Samira Ebrahimi Kahou
[intermediate/advanced] Explainability in Machine Learning
Summary
This three-part lecture series explores key topics in explainable machine learning (XML). The first session introduces foundational concepts of XML and covers topics like inherently explainable models, feature attribution methods and concept bottleneck models. The second session focuses on advanced methods for explainability in large language models, covering their unique challenges and recent advancements. The final session will provide an overview of state-of-the-art methods in explainable reinforcement learning (RL), and efforts to make policies and decision-making processes more transparent.
Syllabus
Lecture 1 (explainable machine learning):
- Definition and importance of interpretability
- Categorization of interpretability methods
- Inherently interpretable models
- Feature-attribution methods
- Concept bottleneck models
Lecture 2 (explainability in large language models):
- Probing-based explanations
- Neuron activation explanation
- Concept-based explanations
- Mechanistic interpretability
- Challenges
Lecture 3 (explainable reinforcement learning):
- Sequential decision making
- Markov decision processes
- Metrics for evaluating explainable RL methods
- Converting learned policies to decision trees
- Clustering-based identification of behaviors
References
Bereska, L. and Gavves, E. Mechanistic Interpretability for AI Safety — A Review. 2024
Koh, P.W. et al. Concept bottleneck models. 2020
Milani, S. et al. Explainable Reinforcement Learning: A Survey and Comparative Review. 2024
Molnar, C. Interpretable machine learning – A Guide for Making Black Box Models Explainable. 2020
Sheth, I. and Ebrahimi Kahou, S. Auxiliary losses for learning generalizable concept-based models. 2024
Zhao, H. et al. Explainability for Large Language Models: A Survey. 2024
Pre-requisites
Basics of machine learning, large language models, and reinforcement learning (preferred).
Short bio
Samira is an Assistant Professor at the University of Calgary, an Adjunct Professor at École de technologie supérieure and an Adjunct Professor at McGill University. She is a member of the Quebéc AI Institute (Mila) and holds a Canada CIFAR AI Chair. Samira received her Ph.D. in Computer Engineering from Polytechnique Montréal/Mila with an award for the best thesis in the department. Samira also worked as a Postdoctoral Fellow at McGill and as a Researcher at Microsoft Research Montréal.
Samira’s pioneering work in visual reasoning includes the two well-known datasets “Something Something” and “FigureQA”. Her current focus is on enhancing generalization and interpretability in machine learning, with a particular focus on large language models and sequential decision making.
Samira also works on diverse applications of machine learning, e.g. for drug dosage recommendation, medical imaging, or environmental forecasting. Samira’s work has been published in top-tier venues, such as NeurIPS, ICLR, ICML, ICCV, CVPR, TMLR and CoRL. She is a recipient of the Ten-Year Technical Impact Runner-Up Award at the 25th ACM International Conference on Multimodal Interaction.