Hao Shao 邵昊
I am an third-year PhD student in Multimedia Laboratory in the
Chinese University of Hong Kong. I'm supervised by Prof. Hongsheng Li and Prof.Xiaogang Wang.
Before that, I received my Master's degree from Tsinghua University in 2022,
and my Bachelor degree from the University of Electronic Science and Technology of China
in 2019.
My research interests lie in the area of Generative models and Autonomous Driving.
Specifically, I'm particularly interested in multi-modal large language model, end-to-end autonomous driving, video generation, motion prediction.
Email /
Google Scholar /
Github
|
|
|
Aug. 2022 - , Department of Electronic Engineering,
the Chinese University of Hong Kong
PhD Student
|
|
Sept. 2019 - Jun. 2022 , School of Software Engineering,
Tsinghua University
Master
|
|
Sept. 2015 - Jun. 2019 , School of Software Engineering,
University of Electronic Science and Technology of China
Bachelor GPA: 3.98/4
|
|
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Hao Shao, Shulun Wang, Yang Zhou, Guanglu Song, Dailan He, Shuo Qin, Zhuofan Zong, Bingqi Ma, Yu Liu, Hongsheng Li
[Project Page],
[Paper],
[Code]
We propose a diffusion-based framework for video face swapping, featuring hybrid training, an AIDT dataset, and 3D reconstruction for superior identity preservation and temporal consistency.
|
|
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li
Neurips, (Spotlight), 2024,
[Project Page],
[Paper],
[Code]
We propose Visual CoT, including a new pipeline/dataset/benchmark that enhances the interpretability of MLLMs by incorporating visual Chain-of-Thought reasoning, optimizing for complex visual inputs.
|
|
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu
Neurips, 2024,
[Project Page],
[Paper],
[Code]
MoVA is a novel MLLM that can adaptively route and fuse multiple task-specific vision experts in a coarse-to-fine mechanism, alleviating the bias of CLIP vision encoder. Without any bells and whistles, MoVA can achieve significant performance gains over current state-of-the-art methods.
|
|
SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Yang Zhou*, Hao Shao*, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu
ICLR, 2025,
[Paper],
[Code]
We propose SmartPretrain, a general and scalable self-supervised learning framework for motion prediction, designed to be both model-agnostic and dataset-agnostic.
|
|
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L. Waslander, Yu Liu, Hongsheng Li
CVPR, 2024,
[Project Page],
[Paper],
[Code]
We propose a novel end-to-end, closed-loop, language-based autonomous driving framework, LMDrive, which interacts with the dynamic environment via multi-modal multi-view sensor data and natural language instructions.
|
|
Temporal interlacing network
Hao Shao, Shengju Qian, Yu Liu
AAAI, 2020,
[Paper],
[Code]
We present a simple yet powerful operator – temporal interlacing network (TIN). TIN fuses the two kinds of information by interlacing spatial representations from the past to the future, and vice versa.
|
|
Apr 2019 - Now,
XLab
Researcher(intern). Beijing, China
|
|
Sep 2018 - Apr 2019,
Computer Vision
Research intern. Shenzhen, China
|
|
Jul 2017 - May 2018,
Recommend System
Research intern. Beijing, China
|
-
Postgraduate Scholarship, the Chinese University of Hong Kong, 2023 ~ now
-
The First Prize, CARLA Autonomous Driving Challenge (Sensor track), 2022
-
The First Prize, CVPR20 ActivityNet Challenge (Kinetics700 track and AVA track), 2020
-
The First Prize, ICCV19 Multi-Moments in Time (MIT) Challenge, 2019
-
Outstanding Graduate of UESTC, 2019
-
National Scholarship, University of Electronic Science and Technology of China, 2017
-
Conference Reviewer: CVPR, ICLR, Neurips, AISTATS, ICML
-
Journal Reviewer: Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Transactions on Multimedia (TMM), Transactions on Circuits and Systems for Video Technology (TSCVT), Intelligence Vehicle (IV)
-
Teaching: ELEG5760: Machine Learning for Multimedia Applications, ENGG1130: Multivariable Calculus for Engineers, ELEG2310B: Principles of Communication Systems, ELEG5491: Introduction to Deep Learning
-
X-Temporal , Easily implement SOTA video understanding methods with PyTorch on multiple machines and GPUs
-
Awesome End-to-End Autonomous Driving , Paper list about end-to-end autonomous driving
-
DI-drive , Decision Intelligence Platform for Autonomous Driving simulation
-
Fast Jieba , Fast Chinese word segmentation library, rewriting Jieba core functions (accumulated 138K downloads)
|