Dhabaleswar K. Panda

Ohio State University

[intermediate] Exploiting High-performance Computing for Deep Learning: Why and How?

Summary

The recent advances in Deep Learning (DL) have led to many exciting challenges and opportunities for CS and AI researchers alike. Modern DL frameworks like TensorFlow, PyTorch, and several others have emerged that offer ease of use and flexibility to train, and deploy various types of Deep Neural Networks (DNNs). In this tutorial, we will provide an overview of interesting trends in DNN design and how cutting-edge hardware architectures and high-performance interconnects are playing a key role in moving the field forward. We will also present an overview of different DNN architectures and DL frameworks. Most DL frameworks started with a single-node design. However, approaches to parallelize the process of DNN training are also being actively explored. The DL community has moved along different distributed training designs that exploit communication runtimes like gRPC, MPI, and NCCL. We highlight new challenges and opportunities for communication runtimes to exploit high-performance CPU and GPU architectures to efficiently support large-scale distributed DNN training. We also highlight some of our co-design efforts to utilize MPI for large-scale DNN training on cutting-edge CPU and GPU architectures available on modern HPC clusters. Finally, we include hands-on exercises to enable the attendees to gain first-hand experience of running distributed DNN training experiments on a modern GPU cluster.

Syllabus

Introduction of Deep Learning (DL) and Its Applications
Overview of Execution Environments
Parallel and Distributed DNN Training
Latest Trends in HPC Technologies
Challenges in Exploiting HPC Technologies for DL
Solutions and Case Studies in Distributed DNN Training
Hands-on Exercises
Open Issues and Challenges

References

[1] A. Jain, A. Awan, A. Aljuhani, J. Hashmi, Q. Anthony, H. Subramoni, D. Panda, R. Machiraju, and A. Parwani, “Gems: Gpu-enabled memory-aware model-parallelism system for distributed dnn training,” in 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 621–635, IEEE ComputerSociety, 2020.

[2] Awan A.A., Jain A., Anthony Q., Subramoni H., Panda D.K. (2020) HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow. In: Sadayappan P., Chamberlain B., Juckeland G., Ltaief H. (eds) High Performance Computing. ISC High Performance 2020. Lecture Notes in Computer Science, vol 12151. Springer, Cham. https://doi.org/10.1007/978-3-030-50743-5_5

[3] Ammar Ahmad Awan, Khaled Hamidouche, Jahanzeb Maqbool Hashmi, and Dhabaleswar K. Panda. 2017. S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters. SIGPLAN Not. 52, 8 (August 2017), 193–205. DOI:https://doi.org/10.1145/3155284.3018769

[4] A. Jain, A. A. Awan, Q. Anthony, H. Subramoni and D. K. D. Panda, “Performance Characterization of DNN Training using TensorFlow and PyTorch on Modern Clusters,” 2019 IEEE International Conference on Cluster Computing (CLUSTER), 2019, pp. 1-11, doi: 10.1109/CLUSTER.2019.8891042.

[5] Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, D., Chen, M., … & Wu, Y. (2019). Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems, 32, 103-112.

[6] Lu, W., Yan, G., Li, J., Gong, S., Han, Y., & Li, X. (2017, February). Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) (pp. 553-564). IEEE.

[7] Wang, H., Potluri, S., Luo, M., Singh, A. K., Sur, S., & Panda, D. K. (2011). MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. Computer Science-Research and Development, 26(3-4), 257.

Pre-requisites

There is no fixed prerequisite. As long as the attendee has a general knowledge in HPC and Networking, he/she will be able to understand and appreciate it. The tutorial is designed in such a way that an attendee gets exposed to the topics in a smooth and progressive manner.

Short bios

Dr. Dhabaleswar K (DK) Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He is also the Founder and CEO of X-ScaleSolutions, Inc. He has published over 500 papers in the area of high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 3,200 organizations worldwide (in 89 countries). More than 1.45M downloads of this software have taken place from the project’s site. This software is empowering several InfiniBand clusters (including the 4th, 10th, 12th, 20th, and 31st ranked ones in the TOP500 list). Prof. Panda’s research group at OSU has been focusing on High-performance and scalable Distributed Training of popular Deep Learning Frameworks (TensorFlow and PyTorch) using MPI-driven libraries. These enhanced versions are available from https://hidl.cse.ohio-state.edu. Multiple software libraries for Big Data processing and management (Spark, Hadoop and Dask), designed and developed by the group under High-Performance Big Data Project (http://hibd.cse.ohio-state.edu) are available. Dr. Panda is a Fellow of IEEE and a member of ACM. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda

Dr. Hari Subramoni received the Ph.D. degree in Computer Science from The Ohio State University, Columbus, OH, in 2013. He has been a research scientist in the Department of Computer Science and Engineering at the Ohio State University, USA, since September 2015. His current research interests include high performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, big data and cloud computing. He has published over 50 papers in international journals and conferences related to these research areas. Recently, Dr. Subramoni has been doing research and working on the design and development of MVAPICH2, MVAPICH2-GDR, and MVAPICH2-X software packages. He is a member of IEEE. More details about Dr. Subramoni are available from http://www.cse.ohio-state.edu/~subramon.

Arpan Jain received his B.Tech. and M.Tech. degrees in Information Technology from ABV-IIITM, India. Currently, Arpan is working towards his Ph.D. degree in Computer Science and Engineering at The Ohio State University. His current research focus lies at the intersection of High-Performance Computing (HPC) libraries and Deep Learning (DL) frameworks. He is working on parallelization and distribution strategies for large-scale Deep Neural Network (DNN) training. He previously worked on speech analysis, time series modeling, hyperparameter optimization, and object recognition. He actively contributes to projects like HiDL (high-performance deep learning), MVAPICH2-GDR software, and LBANN deep learning framework. He is a member of IEEE. More details about Arpan are available at https://u.osu.edu/jain.575.

Dr. Aamir Shafi is currently a Research Scientist in the Department of Computer Science & Engineering at the Ohio State University where he is involved in the High Performance Big Data project led by Dr. Dhabaleswar K. Panda. Dr. Shafi was a Fulbright Visiting Scholar at the Massachusetts Institute of Technology (MIT) in the 2010-2011 academic year where he worked with Prof. Charles Leiserson on the award-winning Cilk technology. Dr. Shafi received his PhD in Computer Science from the University of Portsmouth, UK in 2006. He got his Bachelors in Software Engineering degree from NUST, Pakistan in 2003. Dr. Shafi’s current research interests include architecting robust libraries and tools for Big Data computation with emphasis on Machine and Deep Learning applications. Dr. Shafi co-designed and co-developed a Java-based MPI-like library called MPJ Express. More details about Dr. Shafi are available from https://people.engineering.osu.edu/people/shafi.16.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_9	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.