
Fenglong Ma & Cao (Danica) Xiao
[introductory/intermediate] Transforming Healthcare and Drug Development through Multimodal AI with LLMs and Generative AI Technologies
Summary
In this tutorial, we will examine how researchers and practitioners are leveraging large language models (LLMs) and generative AI to power diverse healthcare data modalities, including text, electronic health records, medical imaging, clinical trial protocols, and molecular structures, and drive breakthroughs in patient care, disease diagnosis, and drug development. The tutorial will cover recent advancements in multimodal data synthesis, fusion, pretraining, summarization, and privacy-preserving methods tailored to healthcare applications. Finally, we will discuss open challenges and future research directions, offering insights into the rapidly evolving landscape of AI in healthcare.
Syllabus
- Introduction to multimodal healthcare data and key tasks in healthcare and drug development.
- Applications of LLMs and GenAI in healthcare, including patient care, clinical decision support, and hospital operational efficiency.
- Applications of LLMs and GenAI in drug development, including clinical trial design, planning, and patient recruitment.
- Overcoming challenges in data availability, privacy, and model trustworthiness: leveraging multimodal data generation and privacy-preserving learning techniques.
- Open challenges and opportunities.
References
Pengcheng Jiang, Lang Cao, Cao Xiao, Parminder Bhatia, Jimeng Sun, Jiawei Han. Knowledge Graph Fine-Tuning Upon Open-World Knowledge from Large Language Models, NeurIPS 2024.
Yuan Zhong, Xiaochen Wang, Jiaqi Wang, Xiaokun Zhang, Yaqing Wang, Mengdi Huai, Cao Xiao, Fenglong Ma. Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models, KDD 2024.
Xiaochen Wang, Junyu Luo, Jiaqi Wang, Yuan Zhong, Xiaokun Zhang, Yaqing Wang, Parminder Bhatia, Cao Xiao, Fenglong Ma. Unity in Diversity: Collaborative Pre-training Across Multimodal Medical Sources, ACL 2024.
Pengcheng Jiang, Cao Xiao, Zifeng Wang, Parminder Bhatia, Jimeng Sun, Jiawei Han. TriSum: Learning Summarization Ability from Large Language Models. NAACL 2024.
Pengcheng Jiang, Cao Xiao, Adam Cross, Jimeng Sun. GraphCare: enhancing healthcare predictions with personalized knowledge graphs. ICLR 2024.
Zifeng Wang, Chufan Gao, Cao Xiao, Jimeng Sun. MediTab: Scaling Medical Tabular Data Predictors via Data Consolidation, Enrichment, and Refinement, IJCAI 2024.
Brandon Theodorou, Cao Xiao, Jimeng Sun. ConSequence: Synthesizing Sequences for Electronic Health Record Generation, AAAI 2024.
Zifeng Wang, Cao Xiao, Jimeng Sun. AutoTrial: Prompting Language Models for Clinical Trial Design. EMNLP 2023.
Brandon Theodorou, Cao Xiao, Jimeng Sun. Synthesize high-dimensional longitudinal electronic health records via hierarchical autoregressive language model. Nature Communications, 2023.
Zifeng Wang, Brandon Theodoru, Tianfan Fu, Cao Xiao, Jimeng Sun. PyTrial: Machine Learning Software and Benchmark for Clinical Trial Applications, (https://arxiv.org/abs/2306.04018).
Jintai Chen, Yaojun Hu, Yue Wang, Yingzhou Lu, Xu Cao, Miao Lin, Hongxia Xu, Jian Wu, Cao Xiao, Jimeng Sun, Lucas Glass, Kexin Huang, Marinka Zitnik, Tianfan Fu. TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets, (https://arxiv.org/abs/2407.00631).
Pre-requisites
Basic understanding of machine learning principles, including neural networks and language models, is recommended. Some familiarity with healthcare data is also beneficial.
Short bio
Dr. Fenglong Ma is an Assistant Professor in the College of Information Sciences and Technology and a faculty co-hire at the Institute for Computational and Data Sciences at Pennsylvania State University, where he has served since August 2019. He earned his Ph.D. in Computer Science and Engineering from the State University of New York at Buffalo in 2019. Dr. Ma’s research interests encompass data mining and machine learning, with a focus on healthcare data mining, safety in machine learning, natural language processing, and multimodal learning. He has an impressive publication record, with approximately 150 papers in top-tier venues, including NeurIPS, ICML, KDD, ACL, EMNLP, AAAI, and IJCAI. Dr. Ma has been the recipient of numerous prestigious accolades, including the NSF CAREER Award, Sony Research Award, PSU IST Junior Faculty Excellence in Research Award, PSU IST George J. McMurtry Junior Faculty Excellence in Teaching and Learning Award, and the UB CSE 2019 Best Ph.D. Dissertation Award. He has also been recognized as an AI 2000 Most Influential Scholar Honorable Mention in Data Mining (2022–2024) and one of the 2022 Global Top 50 Chinese Rising Stars in Data Mining.