Shuo He

Shuo He

E-mail: shuohe123@gmail.com
[Google Scholar] [Twitter] [GitHub]



About Me

I am currently a Postdoctoral Fellow at Nanyang Technological University (NTU), Singapore. I earned my PhD degree from University of Electronic Science and Technology of China (UESTC) in 2024 and my Bachelor's and Master's degrees from Southwest University (SWU) in 2017 and 2020.


My current research interests focus on exploring agentic reinforcement learning techniques for training LLM-based agents deployed in real-world long-horizon agentic applications.

Publications and Preprints

(* indicates equal contribution, indicates corresponding author)
Topics: Agentic RL / AI safety / Machine learning

HGPO HGPO

Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks

Shuo He, Lang Feng, Qi Wei, Xin Cheng, Lei Feng, Bo An

ICLR 2026   Paper   Code

We show that step-wise group-based RL for long-horizon LLM agents can suffer from historical context inconsistency, which severely biases stepwise advantage estimates and harms optimization. Hence, we propose HGPO, which forms multiple hierarchical groups per step based on historical-context consistency and adaptively aggregates their advantage estimates to get a better bias-variance tradeoff without extra models or rollouts.

KPO

Online Causal Kalman Filtering for Stable and Effective Policy Optimization

Shuo He, Lang Feng, Xin Cheng, Lei Feng, Bo An

Preprint   Paper  Code

We make LLM policy training more stable by using an online causal Kalman filter to smooth noisy token-level importance ratios across a sequence, which gives steadier updates and better results on challenging math reasoning tasks.

DR-MAS

Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems

Lang Feng, Longtao Zheng, Shuo He, Fuxiang Zhang, Bo An

Preprint   Paper  Code

We show that multi-agent LLM RL becomes unstable when we use one global normalization for all agents, and we fix it with Dr. MAS by normalizing each agent’s advantages using its own reward statistics, which stabilizes training and improves results on multi-agent math and search tasks.

RVPT

Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning

Zhifang Zhang, Shuo He, Haobo Wang, Bingquan Shen, Lei Feng

NeurIPS 2025   Paper   Code

We defend backdoored multimodal models like CLIP by using Repulsive Visual Prompt Tuning (RVPT), which tunes only small visual prompts on a few clean samples and uses a feature-repelling loss to make the model ignore trigger features, reducing attack success from 89.70% to 2.76% and generalizing across datasets.

BDetCLIP

BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection

Yuwei Niu*, Shuo He*, Qi Wei, Feng Liu, Lei Feng

ICML 2025   Paper   Code

We propose BDetCLIP, a fast test-time method that detects backdoored CLIP inputs by using contrastive prompts and checking that backdoored images stay almost unchanged when we change the class text.

Backdoor CLIP

A Closer Look at Backdoor Attacks on CLIP

Shuo He, Zhifang Zhang, Feng Liu, Roy Ka-Wei Lee, Bo An, Lei Feng

ICML 2025   Paper   Code

We study how backdoor attacks change CLIP by breaking image features into patches, attention heads, and MLPs, we find different attacks infect different parts of the model, and we use these findings to detect and repair infected parts (or filter suspicious samples) at inference time.

Representation Surgery in Model Merging with Probabilistic Modeling

Qi Wei, Shuo He, Jiahan Zhang, Lei Feng, Bo An

ICML 2025   Paper   Code

Influence-Based Fair Selection

Influence-Based Fair Selection for Sample-Discriminative Backdoor Attack

Qi Wei, Shuo He, Enneng Yang, Tingcong Liu, Haobo Wang, Lei Feng, Bo An

AAAI 2025 (Oral)   Paper

We find that when triggers are very small, sample-discriminative backdoor attacks pick poisoned samples unfairly across classes and get uneven success, so we propose IFS that uses influence scores with efficient pruning and class-aware thresholds to select high-impact but class-balanced poisoned samples, improving attack success on four datasets.

Candidate Label Set Pruning: A Data-centric Perspective for Deep Partial-label Learning

Shuo He, Chaojie Wang, Guowu Yang, Lei Feng

ICLR 2024 (Oral)   Paper   Code

Partial-label Learning with Mixed Closed-set and Open-set Out-of-candidate Examples

Shuo He, Lei Feng, Guowu Yang

SIGKDD 2023   Paper   Code

Candidate-aware Selective Disambiguation Based on Normalized Entropy for Instance-dependent Partial-label Learning

Shuo He, Guowu Yang, Lei Feng

ICCV 2023   Paper   Code

A Generalized Unbiased Risk Estimator for Learning with Augmented Classes

Senlin Shu, Shuo He, Haobo Wang, Hongxin Wei, Tao Xiang, Lei Feng

AAAI 2023   Paper   Code

Partial Label Learning with Semantic Label Representations

Shuo He, Lei Feng, Fengmao Lv, Wen Li, Guowu Yang

SIGKDD 2022   Paper   Code

Discriminatively Relabel for Partial Multi-label Learning

Shuo He, Ke Deng, Li Li, Senlin Shu, Li Liu

ICDM 2019 (Oral)   Paper   Code

Collaboration Based Multi-label Learning

Lei Feng, Bo An, Shuo He

AAAI 2019   Paper

Estimating Latent Relative Labeling Importances for Multi-label Learning

Shuo He, Lei Feng, Li Li

ICDM 2018   Paper   Code

Academic Services

Invited to be a Session Chair (Machine Learning) for AAAI 2026.
Conference reviewer: Journal reviewer: