Hao Shao

Hao Shao 邵昊

I am an third-year PhD student in Multimedia Laboratory in the Chinese University of Hong Kong. I'm supervised by Prof. Hongsheng Li and Prof.Xiaogang Wang. Before that, I received my Master's degree from Tsinghua University in 2022, and my Bachelor degree from the University of Electronic Science and Technology of China in 2019.

My research interests lie in the area of Generative models and Autonomous Driving. Specifically, I'm particularly interested in multi-modal large language model, video generation, end-to-end autonomous driving. Please email me if you have any questions or want to collaborate.

I will be on the job market for 2026. Please feel free to reach out if you have openings in industry or academia.

Email / Google Scholar / Github

Education

Aug. 2022 - , Department of Electronic Engineering, the Chinese University of Hong Kong

PhD Student

Sept. 2019 - Jun. 2022 , School of Software Engineering, Tsinghua University

Master

Sept. 2015 - Jun. 2019 , School of Software Engineering, University of Electronic Science and Technology of China

Bachelor GPA: 3.98/4

Preprint

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

Hao Shao, Shulun Wang, Yang Zhou, Guanglu Song, Dailan He, Shuo Qin, Zhuofan Zong, Bingqi Ma, Yu Liu, Hongsheng Li
[Project Page], [Paper], [Code]

We propose a diffusion-based framework for video face swapping, featuring hybrid training, an AIDT dataset, and 3D reconstruction for superior identity preservation and temporal consistency.

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Zhuofan Zong, Dongzhi Jiang, Bingqi Ma, Guanglu Song, Hao Shao, Dazhong Shen, Yu Liu, Hongsheng Li
[Project Page], [Paper], [Code]

We introduce a novel plug-and-play adaptation method that enables diffusion models to be conditioned on multiple reference images and the text prompt.

Selected Publications

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li
Neurips, (Spotlight), 2024, [Project Page], [Paper], [Code]

We propose Visual CoT, including a new pipeline/ dataset/ benchmark that enhances the interpretability of MLLMs by incorporating visual Chain-of-Thought reasoning, optimizing for complex visual inputs.

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu
Neurips, 2024, [Project Page], [Paper], [Code]

MoVA is a novel MLLM that can adaptively route and fuse multiple task-specific vision experts in a coarse-to-fine mechanism, alleviating the bias of CLIP vision encoder.

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

Yang Zhou*, Hao Shao*, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu
ICLR, 2025, [Paper], [Code]

We propose SmartPretrain, a general and scalable self-supervised learning framework for motion prediction, designed to be both model-agnostic and dataset-agnostic.

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L. Waslander, Yu Liu, Hongsheng Li
CVPR, 2024, [Project Page], [Paper], [Code]

We propose a novel end-to-end, closed-loop, language-based autonomous driving framework, LMDrive, which interacts with the dynamic environment via multi-modal multi-view sensor data and natural language instructions.

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

Yang Zhou*, Hao Shao*, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu
CVPR, 2024, [Paper], [Code]

We introduce a novel scenario-adaptive refinement strategy to refine trajectory prediction with minimal additional computation.

ReasonNet: End-to-End Driving with Temporal and Global Reasoning

Hao Shao, Letian Wang, Ruobing Chen, Steven L Waslander, Hongsheng Li, Yu Liu
CVPR, 2023, [Paper], [Code]

We present ReasonNet, a novel end-to-end driving framework that extensively exploits both temporal and global information of the driving scene.

Efficient Reinforcement Learning for Autonomous Driving with Parameterized Skills and Priors

Letian Wang, Jie Liu, Hao Shao, Wenshuo Wang, Ruobing Chen, Yu Liu, Steven L Waslander
RSS, 2023, [Paper], [Code]

We present an efficient reinforcement learning (ASAP-RL) that simultaneously leverages parameterized motion skills and expert priors for autonomous vehicles to navigate in complex dense traffic.

Safety-enhanced autonomous driving using interpretable sensor fusion transformer

Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, Yu Liu
CoRL, 2022, [Paper], [Code]

We propose a safety-enhanced autonomous driving framework to fully process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection.

Blending anti-aliasing into vision transformer

Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia
NeurIPS, 2021, [Paper]

We propose a plug-and-play Aliasing-Reduction Module (ARM) to alleviate the problem of aliasing in vision transformer.

Temporal interlacing network

Hao Shao, Shengju Qian, Yu Liu
AAAI, 2020, [Paper], [Code]

We present a simple yet powerful operator – temporal interlacing network (TIN). TIN fuses the two kinds of information by interlacing spatial representations from the past to the future, and vice versa.

Industry Experience

Apr 2019 - Now, XLab

Researcher(intern). Beijing, China

Sep 2018 - Apr 2019, Computer Vision

Research intern. Shenzhen, China

Jul 2017 - May 2018, Recommend System

Research intern. Beijing, China

Honors and Awards

Postgraduate Scholarship, the Chinese University of Hong Kong, 2023 ~ now
The First Prize, CARLA Autonomous Driving Challenge (Sensor track), 2022
The First Prize, CVPR20 ActivityNet Challenge (Kinetics700 track and AVA track), 2020
The First Prize, ICCV19 Multi-Moments in Time (MIT) Challenge, 2019
Outstanding Graduate of UESTC, 2019
National Scholarship, University of Electronic Science and Technology of China, 2017

Service

Conference Reviewer: CVPR, ICLR, Neurips, AISTATS, ICML, ICCV
Journal Reviewer: Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Transactions on Multimedia (TMM), Transactions on Circuits and Systems for Video Technology (TSCVT), Intelligence Vehicle (IV)
Teaching: ELEG5760: Machine Learning for Multimedia Applications, ENGG1130: Multivariable Calculus for Engineers, ELEG2310B: Principles of Communication Systems, ELEG5491: Introduction to Deep Learning

Projects

X-Temporal , Easily implement SOTA video understanding methods with PyTorch on multiple machines and GPUs
Awesome End-to-End Autonomous Driving , Paper list about end-to-end autonomous driving
DI-drive , Decision Intelligence Platform for Autonomous Driving simulation
Fast Jieba , Fast Chinese word segmentation library, rewriting Jieba core functions (accumulated 300K downloads)