Wojciech Samek
[introductory/intermediate] From Feature Attributions to Next-Generation Explainable AI
Summary
The domain of Explainable Artificial Intelligence (XAI) has made significant strides in recent years. Various explanation techniques have been devised, each serving distinct purposes. Some of them explain individual predictions of AI models by highlighting influential input features, while others enhance comprehension of the model’s internal operations by visualizing the concepts encoded by individual neurons. These initial XAI techniques have proven valuable in scrutinizing models and detecting flawed prediction strategies (referred to as “Clever Hans” behaviors). This tutorial will give a structured overview of the prominent approaches in XAI and will discuss next-generation techniques that provide more human-understandable and actionable explanations, thus deliver maximum usefulness in real-world scenarios. Additionally, the advancement of generative AI, notably the emergence of exceedingly large language models (LLMs), has underscored the necessity for next-generation explanation methodologies tailored to this fundamentally distinct category of models and challenges. This tutorial will address this necessity from various angles and discuss recent methodological breakthroughs, which allow to gain deeper insights into the mysterious world of LLMs.
Syllabus
The first part of the tutorial will discuss “classical” XAI techniques, their applications and theoretical unterpinnings, as well as challenges and misconceptions, which were common during the first wave of explainable AI research. The second part will focus on more recent developments in the field. In particular, we will discuss some next-generation XAI methods, which provide more complete, more human-understandable and more actionable explanations, thereby enabling the expert user to systematically understand, debug and improve his or her AI model. The last part will present recent developments around XAI for Foundation Models.
The topics covered are:
- Motivations: Black-box models and the “Clever Hans” effect
- Classical Explainable AI: Concepts, methods & applications
- Challenges and Common Misconceptions in XAI
- Next-generation XAI methods: From feature attribution to concept-level, human-understandable and actionable explanations
- XAI-based model debugging & improvement
- XAI and Foundation Models
References
W Samek, G Montavon, S Lapuschkin, C Anders, KR Müller. Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications, Proceedings of the IEEE, 109(3):247-278, 2021.
https://doi.org/10.1109/JPROC.2021.3060483
Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, Simone Stumpf:
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions
Information Fusion, 106:102301, 2024
https://arxiv.org/abs/2310.19775
Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Aakriti Jain, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek:
AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers
arXiv:2402.05602, 2024
http://arxiv.org/abs/2402.05602
Maximilian Dreyer, Reduan Achtibat, Wojciech Samek, Sebastian Lapuschkin:
Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations
arXiv:2311.16681, 2023
https://arxiv.org/abs/2311.16681
Johanna Vielhaben, Sebastian Lapuschkin, Grégoire Montavon, Wojciech Samek:
Explainable AI for Time Series via Virtual Inspection Layers
Pattern Recognition, 150:110309, 2024
https://doi.org/10.1016/j.patcog.2024.110309
Maximilian Dreyer, Frederik Pahde, Christopher J. Anders, Wojciech Samek, Sebastian Lapuschkin:
From Hope to Safety: Unlearning Biases of Deep Models via Gradient Penalization in Latent Space
Proceedings of the Thirty-Eight AAAI Conference on Artificial Intelligence, 2024
https://arxiv.org/abs/2308.09437
Frederik Pahde, Maximilian Dreyer, Wojciech Samek, Sebastian Lapuschkin:
Reveal to Revise: An Explainable AI Life Cycle for Iterative Bias Correction of Deep Models
Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. LNCS, 14221:596-606, Springer, Cham, 2023
https://doi.org/10.1007/978-3-031-43895-0_56
C Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1, 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x
S M Lundberg, G Erion, H Chen et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2, 56–67 (2020).
https://doi.org/10.1038/s42256-019-0138-9
F Doshi-Velez, B Kim. Towards A Rigorous Science of Interpretable Machine Learning. arXiv:1702.08608.
https://arxiv.org/abs/1702.08608
P Schramowski, W Stammer, S Teso, S. et al. Making deep neural networks right for the right scientific reasons by interacting with their explanations. Nat Mach Intell 2, 476–486 (2020).
https://doi.org/10.1038/s42256-020-0212-3
Pre-requisites
Basic understanding of machine learning and deep learning.
Short bio
Wojciech Samek is a Professor in the EECS Department at TU Berlin and is jointly heading the AI Department at Fraunhofer HHI. He is a Fellow at BIFOLD – Berlin Institute for the Foundation of Learning and Data, the ELLIS Unit Berlin, and the DFG Research Unit DeSBi. Furthermore, he is a Senior Editor for IEEE TNNLS, an Associate Editor for Pattern Recognition, and an elected member of the IEEE MLSP Technical Committee and the Germany’s Platform for AI. He has co-authored more than 200 papers and was leading editor of the Springer book “Explainable AI: Interpreting, Explaining and Visualizing Deep Learning” (2019), and co-editor of the open access Springer book “xxAI – Beyond explainable AI” (2022). He has served as Program Co-Chair for IEEE MLSP’23, and as Area Chair for NAACL’21 and NeurIPS’23, and is a recipient of multiple best paper awards, including the 2020 Pattern Recognition Best Paper Award and the 2022 Digital Signal Processing Best Paper Prize.