DeepLearn 2023 Winter
8th International School
on Deep Learning
Bournemouth, UK · January 16-20, 2023
Registration
Downloads
  • Call DeepLearn 2023 Winter
  • Poster DeepLearn 2023 Winter
  • Lecture Materials
  • Home
  • Schedule
  • Lecturers
  • News
  • Accommodation
  • Info
    • Travel from London to Bournemouth
    • Sponsoring
    • Code of conduct
    • Visa
    • Testimonials
  • Home
  • Schedule
  • Lecturers
  • News
  • Accommodation
  • Info
    • Travel from London to Bournemouth
    • Sponsoring
    • Code of conduct
    • Visa
    • Testimonials
deeplearn-speaker-dhabaleswar-k-panda

Dhabaleswar K. Panda

Ohio State University

[intermediate] Exploiting High-performance Computing for Deep Learning: Why and How?

Summary

The recent advances in Deep Learning (DL) have led to many exciting challenges and opportunities for CS and AI researchers alike. Modern DL frameworks like TensorFlow, PyTorch, and several others have emerged that offer ease of use and flexibility to train, and deploy various types of Deep Neural Networks (DNNs). In this tutorial, we will provide an overview of interesting trends in DNN design and how cutting-edge hardware architectures and high-performance interconnects are playing a key role in moving the field forward. We will also present an overview of different DNN architectures and DL frameworks. Most DL frameworks started with a single-node design. However, approaches to parallelize the process of DNN training are also being actively explored. The DL community has moved along different distributed training designs that exploit communication runtimes like gRPC, MPI, and NCCL. We highlight new challenges and opportunities for communication runtimes to exploit high-performance CPU and GPU architectures to efficiently support large-scale distributed DNN training. We also highlight some of our co-design efforts to utilize MPI for large-scale DNN training on cutting-edge CPU and GPU architectures available on modern HPC clusters. Finally, we include hands-on exercises to enable the attendees to gain first-hand experience of running distributed DNN training experiments on a modern GPU cluster.

Syllabus

  • Introduction of Deep Learning (DL) and Its Applications
  • Overview of Execution Environments
  • Parallel and Distributed DNN Training
  • Latest Trends in HPC Technologies
  • Challenges in Exploiting HPC Technologies for DL
  • Solutions and Case Studies in Distributed DNN Training
  • Hands-on Exercises
  • Open Issues and Challenges

References

[1] A. Jain, A. Awan, A. Aljuhani, J. Hashmi, Q. Anthony, H. Subramoni, D. Panda, R. Machiraju, and A. Parwani, “Gems: Gpu-enabled memory-aware model-parallelism system for distributed dnn training,” in 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 621–635, IEEE ComputerSociety, 2020.

[2] Awan A.A., Jain A., Anthony Q., Subramoni H., Panda D.K. (2020) HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow. In: Sadayappan P., Chamberlain B., Juckeland G., Ltaief H. (eds) High Performance Computing. ISC High Performance 2020. Lecture Notes in Computer Science, vol 12151. Springer, Cham. https://doi.org/10.1007/978-3-030-50743-5_5

[3] Ammar Ahmad Awan, Khaled Hamidouche, Jahanzeb Maqbool Hashmi, and Dhabaleswar K. Panda. 2017. S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters. SIGPLAN Not. 52, 8 (August 2017), 193–205. DOI:https://doi.org/10.1145/3155284.3018769

[4] A. Jain, A. A. Awan, Q. Anthony, H. Subramoni and D. K. D. Panda, “Performance Characterization of DNN Training using TensorFlow and PyTorch on Modern Clusters,” 2019 IEEE International Conference on Cluster Computing (CLUSTER), 2019, pp. 1-11, doi: 10.1109/CLUSTER.2019.8891042.

[5] Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, D., Chen, M., … & Wu, Y. (2019). Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems, 32, 103-112.

[6] Lu, W., Yan, G., Li, J., Gong, S., Han, Y., & Li, X. (2017, February). Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) (pp. 553-564). IEEE.

[7] Wang, H., Potluri, S., Luo, M., Singh, A. K., Sur, S., & Panda, D. K. (2011). MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. Computer Science-Research and Development, 26(3-4), 257.

Pre-requisites

There is no fixed prerequisite. As long as the attendee has a general knowledge in HPC and Networking, he/she will be able to understand and appreciate it. The tutorial is designed in such a way that an attendee gets exposed to the topics in a smooth and progressive manner.

Short bios

Dr. Dhabaleswar K (DK) Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He is also the Founder and CEO of X-ScaleSolutions, Inc. He has published over 500 papers in the area of high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 3,200 organizations worldwide (in 89 countries). More than 1.45M downloads of this software have taken place from the project’s site. This software is empowering several InfiniBand clusters (including the 4th, 10th, 12th, 20th, and 31st ranked ones in the TOP500 list). Prof. Panda’s research group at OSU has been focusing on High-performance and scalable Distributed Training of popular Deep Learning Frameworks (TensorFlow and PyTorch) using MPI-driven libraries. These enhanced versions are available from https://hidl.cse.ohio-state.edu. Multiple software libraries for Big Data processing and management (Spark, Hadoop and Dask), designed and developed by the group under High-Performance Big Data Project (http://hibd.cse.ohio-state.edu) are available. Dr. Panda is a Fellow of IEEE and a member of ACM. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda

Dr. Hari Subramoni received the Ph.D. degree in Computer Science from The Ohio State University, Columbus, OH, in 2013. He has been a research scientist in the Department of Computer Science and Engineering at the Ohio State University, USA, since September 2015. His current research interests include high performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, big data and cloud computing. He has published over 50 papers in international journals and conferences related to these research areas. Recently, Dr. Subramoni has been doing research and working on the design and development of MVAPICH2, MVAPICH2-GDR, and MVAPICH2-X software packages. He is a member of IEEE. More details about Dr. Subramoni are available from http://www.cse.ohio-state.edu/~subramon.

Arpan Jain received his B.Tech. and M.Tech. degrees in Information Technology from ABV-IIITM, India. Currently, Arpan is working towards his Ph.D. degree in Computer Science and Engineering at The Ohio State University. His current research focus lies at the intersection of High-Performance Computing (HPC) libraries and Deep Learning (DL) frameworks. He is working on parallelization and distribution strategies for large-scale Deep Neural Network (DNN) training. He previously worked on speech analysis, time series modeling, hyperparameter optimization, and object recognition. He actively contributes to projects like HiDL (high-performance deep learning), MVAPICH2-GDR software, and LBANN deep learning framework. He is a member of IEEE. More details about Arpan are available at https://u.osu.edu/jain.575.

Dr. Aamir Shafi is currently a Research Scientist in the Department of Computer Science & Engineering at the Ohio State University where he is involved in the High Performance Big Data project led by Dr. Dhabaleswar K. Panda. Dr. Shafi was a Fulbright Visiting Scholar at the Massachusetts Institute of Technology (MIT) in the 2010-2011 academic year where he worked with Prof. Charles Leiserson on the award-winning Cilk technology. Dr. Shafi received his PhD in Computer Science from the University of Portsmouth, UK in 2006. He got his Bachelors in Software Engineering degree from NUST, Pakistan in 2003. Dr. Shafi’s current research interests include architecting robust libraries and tools for Big Data computation with emphasis on Machine and Deep Learning applications. Dr. Shafi co-designed and co-developed a Java-based MPI-like library called MPJ Express. More details about Dr. Shafi are available from https://people.engineering.osu.edu/people/shafi.16.

Other Courses

deeplearn-speaker-yi-maYi Ma
Daphna WeinshallDaphna Weinshall
Eric P. XingEric P. Xing
Matias Carrasco KindMatias Carrasco Kind
deeplearn-speaker-nitesh-chawlaNitesh Chawla
Sumit ChopraSumit Chopra
Luc De RaedtLuc De Raedt
Marco DuarteMarco Duarte
Joao GamaJoão Gama
Claus HornClaus Horn
Zhiting Hu & Eric P. XingZhiting Hu & Eric P. Xing
deeplearn-speaker-nathalie-japkowiczNathalie Japkowicz
deeplearn-speaker-gregor-kasieczkaGregor Kasieczka
Karen LivescuKaren Livescu
deeplearn-speaker-david-mcallersterDavid McAllester
Fabio RoliFabio Roli
Bracha ShapiraBracha Shapira
deeplearn-speaker-kunal-tawarKunal Talwar
Tinne TuytelaarsTinne Tuytelaars
deeplearn-speaker-lyle-ungarLyle Ungar
speakers-bram-van-ginnekenBram van Ginneken
deeplearn-speaker-yudong-zhangYu-Dong Zhang

DeepLearn 2022 Winter

CO-ORGANIZERS

Bournemouth University
Department of Computing and Informatics

Universitat Rovira i Virgili, Tarragona

Institute for Research Development, Training and Advice – IRDTA, Brussels/London

Active links
  • DeepLearn 2023 Summer – 10th International Gran Canaria School on Deep Learning
  • BigDat 2023 Summer – 7th International School on Big Data
  • DeepLearn 2023 Spring – 9th International School on Deep Learning
Past links
  • DeepLearn 2022 Autumn
  • DeepLearn 2022 Summer
  • DeepLearn 2022 Spring
  • DeepLearn 2021 Summer
  • DeepLearn 2019
  • DeepLearn 2018
  • DeepLearn 2017
© IRDTA 2022. All Rights Reserved.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
Cookie SettingsAccept All
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-advertisement1 yearThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSIDsessionThis cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThis cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_91 minuteThis cookie is set by Google and is used to distinguish users.
_gid1 dayThis cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT
Powered by CookieYes Logo