Hao Shao 邵昊
I am an third-year PhD student in Multimedia Laboratory in the
Chinese University of Hong Kong. I'm supervised by Prof. Hongsheng Li and Prof.Xiaogang Wang.
Before that, I received my Master's degree from Tsinghua University in 2022,
and my Bachelor degree from the University of Electronic Science and Technology of China
in 2019.
My research interests lie in the area of Generative models and Autonomous Driving.
Specifically, I'm particularly interested in multi-modal large language model, end-to-end autonomous driving, video generation, motion prediction.
Email /
Google Scholar /
Github
|
Mount Gongga, Sichuan
|
|
Aug. 2022 - , Department of Electronic Engineering,
the Chinese University of Hong Kong
PhD Student
|
|
Sept. 2019 - Jun. 2022 , School of Software Engineering,
Tsinghua University
Master
|
|
Sept. 2015 - Jun. 2019 , School of Software Engineering,
University of Electronic Science and Technology of China
Bachelor GPA: 3.98/4
|
|
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Hao Shao, Shulun Wang, Yang Zhou, Guanglu Song, Dailan He, Shuo Qin, Zhuofan Zong, Bingqi Ma, Yu Liu, Hongsheng Li
[Project Page],
[Paper],
[Code]
We propose a diffusion-based framework for video face swapping, featuring hybrid training, an AIDT dataset, and 3D reconstruction for superior identity preservation and temporal consistency.
|
|
SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Yang Zhou*, Hao Shao*, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu
arXiv, 2024,
[Paper],
[Code]
We propose SmartPretrain, a general and scalable self-supervised learning framework for motion prediction, designed to be both model-agnostic and dataset-agnostic.
|
|
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li
Neurips, (Spotlight), 2024,
[Project Page],
[Paper],
[Code]
We propose Visual CoT, including a new pipeline/dataset/benchmark that enhances the interpretability of MLLMs by incorporating visual Chain-of-Thought reasoning, optimizing for complex visual inputs.
|
|
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L. Waslander, Yu Liu, Hongsheng Li
CVPR, 2024,
[Project Page],
[Paper],
[Code]
We propose a novel end-to-end, closed-loop, language-based autonomous driving framework, LMDrive, which interacts with the dynamic environment via multi-modal multi-view sensor data and natural language instructions.
|
|
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
Yang Zhou*, Hao Shao*, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu
CVPR, 2024,
[Paper],
[Code]
We introduce a novel scenario-adaptive refinement strategy to refine trajectory prediction with minimal additional computation.
|
|
Efficient Reinforcement Learning for Autonomous Driving with Parameterized Skills and Priors
Letian Wang, Jie Liu, Hao Shao, Wenshuo Wang, Ruobing Chen, Yu Liu, Steven L Waslander
RSS, 2023,
[Paper],
[Code]
We present an efficient reinforcement learning (ASAP-RL) that simultaneously leverages parameterized motion skills and expert priors for autonomous vehicles to navigate in complex dense traffic.
|
|
ReasonNet: End-to-End Driving with Temporal and Global Reasoning
Hao Shao, Letian Wang, Ruobing Chen, Steven L Waslander, Hongsheng Li, Yu Liu
CVPR, 2023,
[Paper],
[Code]
We present ReasonNet, a novel end-to-end driving framework that extensively exploits both temporal and global information of the driving scene.
|
|
Safety-enhanced autonomous driving using interpretable sensor fusion transformer
Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, Yu Liu
CoRL, 2022,
[Paper],
[Code]
We propose a safety-enhanced autonomous driving framework to fully process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection.
|
|
Blending anti-aliasing into vision transformer
Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia
NeurIPS, 2021,
[Paper]
We propose a plug-and-play Aliasing-Reduction Module (ARM) to alleviate the problem of aliasing in vision transformer.
|
|
Temporal interlacing network
Hao Shao, Shengju Qian, Yu Liu
AAAI, 2020,
[Paper],
[Code]
We present a simple yet powerful operator – temporal interlacing network (TIN). TIN fuses the two kinds of information by interlacing spatial representations from the past to the future, and vice versa.
|
|
Apr 2019 - Now,
XLab
Researcher(intern). Beijing, China
|
|
Sep 2018 - Apr 2019,
Computer Vision
Research intern. Shenzhen, China
|
|
Jul 2017 - May 2018,
Recommend System
Research intern. Beijing, China
|
-
Postgraduate Scholarship, the Chinese University of Hong Kong, 2023 ~ now
-
The First Prize, CARLA Autonomous Driving Challenge (Sensor track), 2022
-
The First Prize, CVPR20 ActivityNet Challenge (Kinetics700 track and AVA track), 2020
-
The First Prize, ICCV19 Multi-Moments in Time (MIT) Challenge, 2019
-
Outstanding Graduate of UESTC, 2019
-
National Scholarship, University of Electronic Science and Technology of China, 2017
-
Conference Reviewer: CVPR, ICLR, Neurips, AISTATS, ICML
-
Journal Reviewer: Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Intelligence Vehicle (IV)
-
Teaching: ELEG5760: Machine Learning for Multimedia Applications, ENGG1130: Multivariable Calculus for Engineers, ELEG2310B: Principles of Communication Systems
-
X-Temporal , Easily implement SOTA video understanding methods with PyTorch on multiple machines and GPUs
-
Awesome End-to-End Autonomous Driving , Paper list about end-to-end autonomous driving
-
DI-drive , Decision Intelligence Platform for Autonomous Driving simulation
-
Fast Jieba , Fast Chinese word segmentation library, rewriting Jieba core functions (accumulated 138K downloads)
|