Shih-Chieh Hsu
[intermediate/advanced] Real-Time Artificial Intelligence for Science and Engineering
Summary
Artificial Intelligence (AI) applications have exploded in the last decades across a wealth of research domains and industries. With edge computing, real-time inference of deep neural networks on custom hardware has become increasingly relevant to these applications. The Large Hadron Collider (LHC) experiments at CERN are running AI algorithms on field-programmable gate arrays (FPGAs) to detect rare physics events from millions of proton-proton collisions in every second. Smartphone companies are incorporating AI chips in their design for on-device inference to improve user experience and tighten data security. The autonomous vehicle industry is turning to application-specific integrated circuits (ASICs) to achieve low latency. The typically acceptable latency for real-time inference in applications above is ranged between O(1) microsecond to nanoseconds, and resources are strictly limited. To address this challenge, software tools have been developed to utilize specialized hardware for inference acceleration. These tools can improve the overall latency and throughputof inference, reduce the computing complexity, and significantly lower the cost for users to develop optimized workflows.
Syllabus
In this lecture, I will give an overview about the challenges of the physics community regarding AI across latency and throughput regimes, and the tools and resources to address these challenges. I will introduce various techniques for model compression using state-of-the-art techniques such as pruning and quantization. There will be tutorials for you to get familiar with these techniques using the hls4ml library. This library converts pre-trained Machine Learning models into FPGA firmware targeting extreme low-latency inference to stay within the strict constraints imposed by the CERN particle detectors.
References
Lectures will be based on the following papers:
P. Harris et. al., “Physics Community Needs, Tools, and Resources for Machine Learning,” https://arxiv.org/abs/2203.16255
E. E. Khoda et al., “Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml,” https://arxiv.org/abs/2207.00559
A. Eland et al., “Graph Neural Networks for Charged Particle Tracking on FPGAs,” Front. Big Data 5 (2022) 828666
F. Fahim et al., “hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices,” TinyML Research Symposium 2021 https://arxiv.org/abs/2103.05579
J. Duarte et al., “Low-latency machine learning inference on FPGAs,” NeurIPS ML4PS 2019 74
Tutorials will be based on the hls4ml library https://github.com/fastmachinelearning/hls4ml-tutorial
Pre-requisites
Basic knowledge of machine learning and neural networks.
Short bio
Shih-Chieh Hsu earned a MS degree in Physics from National Taiwan University and a PhD in Physics from University of California San Diego. He is currently an Associate Professor in Physics, Adjunct Associate Professor in Electrical and Computer Engineering at University of Washington, and Director of NSF HDR Institute: Accelerated Artificial Intelligence Algorithms for Data-Driven Discovery (A3D3). He is working on experimental particle physics using proton-proton collision data from the Large Hadron Collider. His research interests range from dark matter searches with the ATLAS experiment, neutrino cross-section measurement with the FASER experiment, innovative Artificial Intelligence algorithms for data-intensive discovery, and accelerated machine learning with heterogeneous computing. He is a recipient of DOE Early career award and Undergraduate research mentor award.