DeepLearn 2026
13th International School on Deep Learning
Orléans, France · July 20-24, 2026
Registration
Downloads
  • Call DeepLearn 2026
  • Poster DeepLearn 2026
  • Lecture Materials
  • Home
  • Lecturers
  • Schedule
  • Sponsors
  • News
  • Info
    • Travel and Venue
    • Accommodation
    • Visa
    • Code of conduct
    • Testimonials
  • Home
  • Lecturers
  • Schedule
  • Sponsors
  • News
  • Info
    • Travel and Venue
    • Accommodation
    • Visa
    • Code of conduct
    • Testimonials
Jianfei Chen

Jianfei Chen

Tsinghua University

Jianfei Chen (Tsinghua University), [intermediate] Efficient Large Model Training and Inference

Summary

Large language models have become extraordinarily expensive to train and serve, putting frontier research seemingly out of reach for academic groups and small teams. Yet recent systems such as DeepSeek demonstrate that careful co-design of model architecture, learning algorithms, and GPU kernels — guided by an awareness of the underlying hardware — can deliver order-of-magnitude gains in efficiency without sacrificing capability. This tutorial uses DeepSeek as a running case study to unpack the principles and practice behind state-of-the-art efficient machine learning.

Participants will move from first principles — the GPU performance model and the arithmetic of a transformer forward pass — through the modern toolbox of efficient attention, mixture-of-experts, low-precision computation, and structured sparsity. Each topic is approached both as an idea (why it works, what it costs) and as an implementation (how to write, profile, and validate it in Triton or PyTorch). By the end of the three sessions, participants will be able to explain what makes DeepSeek-class models efficient and will have the conceptual and practical tools to apply the same recipes to their own research under limited academic compute budgets.

Syllabus

Lecture 1 — Foundations: Transformers, GPU Performance Models, and Triton

  • The economics of large model training and the academic compute gap
  • Transformer architecture revisited from a systems perspective: FLOPs, memory traffic, and activation footprint
  • The GPU performance model: roofline analysis, arithmetic intensity, memory hierarchy, tensor cores
  • Profiling a real model: where the time and memory actually go
  • Custom kernels in Triton: a hands-on walkthrough from element-wise ops to fused matmul

Lecture 2 — Efficient Attention and Mixture-of-Experts

  • Why attention is the bottleneck: quadratic cost, KV-cache, and the long-context regime
  • IO-aware attention: FlashAttention and its descendants
  • Sparse and linear attention families; native sparse attention
  • Multi-head Latent Attention (MLA) as used in DeepSeek-V2/V3
  • Mixture-of-Experts: routing, load balancing, expert parallelism, and DeepSeek’s MoE design
  • Putting it together: how attention + MoE choices reshape the training and serving budget

Lecture 3 — Quantization and Sparsity

  • Numerical formats for deep learning: FP16, BF16, FP8, INT8, INT4, microscaling formats
  • Post-training quantization vs. quantization-aware training; outlier handling
  • Low-precision training: FP8 training as deployed in DeepSeek-V3
  • Activation, weight, and gradient sparsity; structured vs. unstructured patterns
  • SageAttention and other low-precision attention kernels
  • Wrapping up: a checklist for “what would I co-design if I were building my own DeepSeek?”

References

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS).

Grattafiori, A., et al. (2024). The Llama 3 herd of models. arXiv:2407.21783.

DeepSeek-AI. (2024). DeepSeek-V2: A strong, economical, and efficient mixture-of-experts language model. arXiv:2405.04434.

DeepSeek-AI. (2024). DeepSeek-V3 technical report. arXiv:2412.19437.

DeepSeek-AI. (2025). DeepSeek-V3.2-Exp: Boosting long-context efficiency with DeepSeek sparse attention. https://aarnphm.xyz/thoughts/papers/DeepSeek_V3_2.pdf.

Dao, T., Fu, D. Y., Ermon, S., Rudra, A., & Ré, C. (2022). FlashAttention: Fast and memory-efficient exact attention with IO-awareness. In Advances in Neural Information Processing Systems (NeurIPS).

Shah, J., Bikshandi, G., Zhang, Y., Thakkar, V., Ramani, P., & Dao, T. (2024). FlashAttention-3: Fast and accurate attention with asynchrony and low-precision. In Advances in Neural Information Processing Systems (NeurIPS).

Yuan, J., et al. (2025). Native sparse attention: Hardware-aligned and natively trainable sparse attention. arXiv:2502.11089.

Lu, E., et al. (2025). MoBA: Mixture of block attention for long-context LLMs. arXiv:2502.13189.

Xiao, G., Tian, Y., Chen, B., Han, S., & Lewis, M. (2024). Efficient streaming language models with attention sinks. In International Conference on Learning Representations (ICLR).

Xiao, G., Lin, J., Seznec, M., Wu, H., Demouth, J., & Han, S. (2023). SmoothQuant: Accurate and efficient post-training quantization for large language models. In International Conference on Machine Learning (ICML).

Mishra, A., Latorre, J. A., Pool, J., Stosic, D., Stosic, D., Venkatesh, G., Yu, C., & Micikevicius, P. (2021). Accelerating sparse deep neural networks. arXiv:2104.08378.

Sun, M., Liu, Z., Bair, A., & Kolter, J. Z. (2024). A simple and effective pruning approach for large language models. In International Conference on Learning Representations (ICLR).

Zhang, J., Huang, H., Zhang, P., Xu, J., & Chen, J. (2024). SageAttention: Accurate 8-bit attention for plug-and-play inference acceleration. In Advances in Neural Information Processing Systems (NeurIPS).

Pre-requisites

A working knowledge of deep learning and the transformer architecture (the level of a graduate ML course, or having trained / fine-tuned an LLM at least once). Familiarity with PyTorch is assumed. Prior exposure to GPU programming, CUDA, or Triton is helpful but not required — Lecture 1 introduces the necessary systems background from scratch.

Short bio

Jianfei Chen is an Associate Professor in the Department of Computer Science at Tsinghua University. His research focuses on efficient machine learning, with contributions across efficient training and inference algorithms, low-precision computation, and accelerated sampling for generative models. He has open-sourced several widely adopted projects — including DPM-Solver, SageAttention, and TurboDiffusion — which together have accumulated 10K+ GitHub stars and are deployed in many large-scale commercial generative models. His work has been recognized at top machine learning venues (NeurIPS, ICML, ICLR) and underpins production systems used by millions of users.

Other Courses

Yingbin LiangYingbin Liang
deeplearn26--le-songLe Song
Nitesh ChawlaNitesh Chawla
deeplearn26-yuejie-chiYuejie Chi
Bo HanBo Han
deeplearn26-jiawei-hanJiawei Han
deeplearn26-mingyi-hongMingyi Hong
deeplearn26-cho-jui-hsiehCho-Jui Hsieh
Furong HuangFurong Huang
Tara JavidiTara Javidi
Yan LiuYan Liu
deeplearn26-zhijin-qinZhijin Qin
Aarti SinghAarti Singh
Suvrit SraSuvrit Sra
Ivor TsangIvor Tsang
Ming-Hsuan YangMing-Hsuan Yang
deeplearn26-tong-zhangTong Zhang

CO-ORGANIZERS

Université d’Orléans

Collège Doctoral Centre-Val de Loire

Institute for Research Development, Training and Advice – IRDTA, Luxembourg/London

Active links
  • AIces 2026
Past links
  • DeepLearn 2025
  • DeepLearn 2024
  • DeepLearn 2023 Summer
  • DeepLearn 2023 Spring
  • DeepLearn 2023 Winter
  • DeepLearn 2022 Autumn
  • DeepLearn 2022 Summer
  • DeepLearn 2022 Spring
  • DeepLearn 2021 Summer
  • DeepLearn 2019
  • DeepLearn 2018
  • DeepLearn 2017
© IRDTA 2025. All Rights Reserved.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
Cookie SettingsAccept All
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-advertisement1 yearThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
PHPSESSIDsessionThis cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThis cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_74880351_91 minuteThis cookie is set by Google and is used to distinguish users.
_gid1 dayThis cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT
Powered by CookieYes Logo