Shuo He

E-mail: shuohe123@gmail.com
[Google Scholar] [Twitter] [GitHub]

About Me

I am currently a Postdoctoral Fellow at Nanyang Technological University (NTU), Singapore. I earned my PhD degree from University of Electronic Science and Technology of China (UESTC) in 2024 and my Bachelor's and Master's degrees from Southwest University (SWU) in 2017 and 2020.

My current research interests focus on exploring agentic reinforcement learning techniques for training LLM-based agents deployed in real-world long-horizon agentic applications.

Publications and Preprints

(^* indicates equal contribution, ^† indicates corresponding author)
Topics: Agentic RL / AI safety / Machine learning

Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks

Shuo He, Lang Feng, Qi Wei, Xin Cheng, Lei Feng^†, Bo An

ICLR 2026 Paper Code

We show that step-wise group-based RL for long-horizon LLM agents can suffer from historical context inconsistency, which severely biases stepwise advantage estimates and harms optimization. Hence, we propose HGPO, which forms multiple hierarchical groups per step based on historical-context consistency and adaptively aggregates their advantage estimates to get a better bias-variance tradeoff without extra models or rollouts.

Online Causal Kalman Filtering for Stable and Effective Policy Optimization

Shuo He, Lang Feng, Xin Cheng, Lei Feng^†, Bo An

Preprint Paper Code

We make LLM policy training more stable by using an online causal Kalman filter to smooth noisy token-level importance ratios across a sequence, which gives steadier updates and better results on challenging math reasoning tasks.

Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems

Lang Feng, Longtao Zheng, Shuo He, Fuxiang Zhang, Bo An

Preprint Paper Code

We show that multi-agent LLM RL becomes unstable when we use one global normalization for all agents, and we fix it with Dr. MAS by normalizing each agent’s advantages using its own reward statistics, which stabilizes training and improves results on multi-agent math and search tasks.

Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning

Zhifang Zhang, Shuo He^†, Haobo Wang, Bingquan Shen, Lei Feng^†

NeurIPS 2025 Paper Code

We defend backdoored multimodal models like CLIP by using Repulsive Visual Prompt Tuning (RVPT), which tunes only small visual prompts on a few clean samples and uses a feature-repelling loss to make the model ignore trigger features, reducing attack success from 89.70% to 2.76% and generalizing across datasets.

BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection

Yuwei Niu^*, Shuo He^*, Qi Wei, Feng Liu, Lei Feng

ICML 2025 Paper Code

We propose BDetCLIP, a fast test-time method that detects backdoored CLIP inputs by using contrastive prompts and checking that backdoored images stay almost unchanged when we change the class text.

A Closer Look at Backdoor Attacks on CLIP

Shuo He, Zhifang Zhang, Feng Liu, Roy Ka-Wei Lee, Bo An, Lei Feng

ICML 2025 Paper Code

We study how backdoor attacks change CLIP by breaking image features into patches, attention heads, and MLPs, we find different attacks infect different parts of the model, and we use these findings to detect and repair infected parts (or filter suspicious samples) at inference time.

Representation Surgery in Model Merging with Probabilistic Modeling

Qi Wei, Shuo He, Jiahan Zhang, Lei Feng, Bo An

ICML 2025 Paper Code

Influence-Based Fair Selection for Sample-Discriminative Backdoor Attack

Qi Wei, Shuo He, Enneng Yang, Tingcong Liu, Haobo Wang, Lei Feng, Bo An

AAAI 2025 (Oral) Paper

We find that when triggers are very small, sample-discriminative backdoor attacks pick poisoned samples unfairly across classes and get uneven success, so we propose IFS that uses influence scores with efficient pruning and class-aware thresholds to select high-impact but class-balanced poisoned samples, improving attack success on four datasets.

Candidate Label Set Pruning: A Data-centric Perspective for Deep Partial-label Learning

Shuo He, Chaojie Wang, Guowu Yang, Lei Feng

ICLR 2024 (Oral) Paper Code

Partial-label Learning with Mixed Closed-set and Open-set Out-of-candidate Examples

Shuo He, Lei Feng, Guowu Yang

SIGKDD 2023 Paper Code

Candidate-aware Selective Disambiguation Based on Normalized Entropy for Instance-dependent Partial-label Learning

Shuo He, Guowu Yang, Lei Feng

ICCV 2023 Paper Code

A Generalized Unbiased Risk Estimator for Learning with Augmented Classes

Senlin Shu, Shuo He, Haobo Wang, Hongxin Wei, Tao Xiang, Lei Feng

AAAI 2023 Paper Code

Partial Label Learning with Semantic Label Representations

Shuo He, Lei Feng, Fengmao Lv, Wen Li, Guowu Yang

SIGKDD 2022 Paper Code

Discriminatively Relabel for Partial Multi-label Learning

Shuo He, Ke Deng, Li Li, Senlin Shu, Li Liu

ICDM 2019 (Oral) Paper Code

Collaboration Based Multi-label Learning

Lei Feng, Bo An, Shuo He

AAAI 2019 Paper

Estimating Latent Relative Labeling Importances for Multi-label Learning

Shuo He, Lei Feng, Li Li

ICDM 2018 Paper Code

Academic Services

Invited to be a Session Chair (Machine Learning) for AAAI 2026.
Conference reviewer:

Senior Program Chair for AAAI Conference on Artificial Intelligence (AAAI) 2026
European Conference on Computer Vision (ECCV): 2026
International Conference on Learning Representations (ICLR): 2025, 2026
AAAI Conference on Artificial Intelligence (AAAI): 2022, 2023, 2024
Conference on Computer Vision and Pattern Recognition (CVPR): 2026
Annual Conference on Neural Information Processing Systems (NeurIPS): 2025
International Conference on Machine Learning (ICML): 2024, 2025, 2026
IEEE/CVF International Conference on Computer Vision (ICCV): 2025
International Joint Conference on Artificial Intelligence (IJCAI): 2024,2026

Journal reviewer:

Information (Guest Editor)
Pattern Recognition (PR)
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)