Hao Shao     邵昊

I am an third-year PhD student in Multimedia Laboratory in the Chinese University of Hong Kong. I'm supervised by Prof. Hongsheng Li and Prof.Xiaogang Wang.

Before that, I received my Master's degree from Tsinghua University in 2022, and my Bachelor degree from the University of Electronic Science and Technology of China in 2019.

My research interests lie in the area of Generative models and Autonomous Driving. Specifically, I'm particularly interested in multi-modal large language model, end-to-end autonomous driving, video generation, motion prediction.

Email / Google Scholar / Github

Mount Gongga, Sichuan
Education
CUHK

Aug. 2022 - , Department of Electronic Engineering, the Chinese University of Hong Kong

PhD Student

THU

Sept. 2019 - Jun. 2022 , School of Software Engineering, Tsinghua University

Master

UESTC

Sept. 2015 - Jun. 2019 , School of Software Engineering, University of Electronic Science and Technology of China

Bachelor GPA: 3.98/4

Preprint
viscot

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

Hao Shao, Shulun Wang, Yang Zhou, Guanglu Song, Dailan He, Shuo Qin, Zhuofan Zong, Bingqi Ma, Yu Liu, Hongsheng Li
[Project Page], [Paper], [Code]

We propose a diffusion-based framework for video face swapping, featuring hybrid training, an AIDT dataset, and 3D reconstruction for superior identity preservation and temporal consistency.

viscot

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

Yang Zhou*, Hao Shao*, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu
arXiv, 2024, [Paper], [Code]

We propose SmartPretrain, a general and scalable self-supervised learning framework for motion prediction, designed to be both model-agnostic and dataset-agnostic.

Selected Publications
viscot

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li
Neurips, (Spotlight), 2024, [Project Page], [Paper], [Code]

We propose Visual CoT, including a new pipeline/dataset/benchmark that enhances the interpretability of MLLMs by incorporating visual Chain-of-Thought reasoning, optimizing for complex visual inputs.

lmdrive

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L. Waslander, Yu Liu, Hongsheng Li
CVPR, 2024, [Project Page], [Paper], [Code]

We propose a novel end-to-end, closed-loop, language-based autonomous driving framework, LMDrive, which interacts with the dynamic environment via multi-modal multi-view sensor data and natural language instructions.

smartrefine

SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction

Yang Zhou*, Hao Shao*, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu
CVPR, 2024, [Paper], [Code]

We introduce a novel scenario-adaptive refinement strategy to refine trajectory prediction with minimal additional computation.

asprl

Efficient Reinforcement Learning for Autonomous Driving with Parameterized Skills and Priors

Letian Wang, Jie Liu, Hao Shao, Wenshuo Wang, Ruobing Chen, Yu Liu, Steven L Waslander
RSS, 2023, [Paper], [Code]

We present an efficient reinforcement learning (ASAP-RL) that simultaneously leverages parameterized motion skills and expert priors for autonomous vehicles to navigate in complex dense traffic.

reasonnet

ReasonNet: End-to-End Driving with Temporal and Global Reasoning

Hao Shao, Letian Wang, Ruobing Chen, Steven L Waslander, Hongsheng Li, Yu Liu
CVPR, 2023, [Paper], [Code]

We present ReasonNet, a novel end-to-end driving framework that extensively exploits both temporal and global information of the driving scene.

interfuser

Safety-enhanced autonomous driving using interpretable sensor fusion transformer

Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, Yu Liu
CoRL, 2022, [Paper], [Code]

We propose a safety-enhanced autonomous driving framework to fully process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection.

antialiasing

Blending anti-aliasing into vision transformer

Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia
NeurIPS, 2021, [Paper]

We propose a plug-and-play Aliasing-Reduction Module (ARM) to alleviate the problem of aliasing in vision transformer.

tin

Temporal interlacing network

Hao Shao, Shengju Qian, Yu Liu
AAAI, 2020, [Paper], [Code]

We present a simple yet powerful operator – temporal interlacing network (TIN). TIN fuses the two kinds of information by interlacing spatial representations from the past to the future, and vice versa.

Industry Experience
Sensetime

Apr 2019 - Now, XLab

Researcher(intern). Beijing, China

Tencent

Sep 2018 - Apr 2019, Computer Vision

Research intern. Shenzhen, China

ByteDance

Jul 2017 - May 2018, Recommend System

Research intern. Beijing, China

Honors and Awards

Service
  • Conference Reviewer: CVPR, ICLR, Neurips, AISTATS, ICML

  • Journal Reviewer: Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Intelligence Vehicle (IV)

  • Teaching: ELEG5760: Machine Learning for Multimedia Applications, ENGG1130: Multivariable Calculus for Engineers, ELEG2310B: Principles of Communication Systems


Projects
  • X-Temporal stars , Easily implement SOTA video understanding methods with PyTorch on multiple machines and GPUs

  • Awesome End-to-End Autonomous Driving stars , Paper list about end-to-end autonomous driving

  • DI-drive stars , Decision Intelligence Platform for Autonomous Driving simulation

  • Fast Jieba stars , Fast Chinese word segmentation library, rewriting Jieba core functions (accumulated 138K downloads)