Title
Cloud-Edge Computing for Machine Learning and Large Models
Abstract
The rise of large foundation models and generative AI has introduced unprecedented demands in computing power, particularly for model pre-training and real-time inference. Cloud infrastructure and cloud-edge services play a pivotal role in supporting machine learning development, enabling an optimal balance between training efficiency and inference performance. In this talk, we will explore key challenges in resource management and model compression for cloud-native data centers tailored to large-scale AI models. We will present recent research advances that enhance the efficiency of large model deployment and discuss their implications for real-world AI applications. Finally, we will also share insights on the future evolution of large model operating systems and their role in advancing AI infrastructure.
Bio
Dr. Cheng-Zhong Xu is a Chair Professor of Computer Science and the Dean of the Faculty of Science and Technology, University of Macau. He served as Chief Scientist for key national projects on “Internet of Things for Smart City” (Ministry of Science and Technology of China) and “Intelligent Driving” (Macau SAR, China). He was also Director of Institute of Advanced Computing and Digital Engineering at the Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences. Before these roles, he spent over 18 years as a faculty member at Wayne State University, USA. Dr. Xu's research focuses on parallel and distributed systems, cloud computing, intelligent driving and smart city applications. He has published over 600 papers and held more than 150 patents. His work has garnered over 23000 citations and has been cited in 340+ international patents, including 240 U.S. patents. Dr. Xu chaired IEEE Technical Committee of Distributed Processing from 2014 to 2020. He earned his B.S. and M.S. in Computer Science from Nanjing University and his Ph.D. from the University of Hong Kong in 1993. He is an IEEE fellow, due to contributions in resource management in parallel and distributed systems.