Yao Liu

About me

My goal is to create AI agents that can make decisions in complex environments using evaluative feedback, which is formalized under the reinforcement learning (RL) framework. Currently, I work on using RL algorithms to make large language models (LLMs) better interact with human and environment.

Previously, I worked on applications of bandit and RL in large-scale recommendation system for a year at ByteDance. Before that, I obtained my Ph.D. in computer science from Stanford in 2021, advised by Emma Brunskill. With my collaborators, we proposed the first finite sample error bound of batch RL without full coverage assumption and convergent batch policy gradient with function approximation. I also worked on applications of batch RL in the real world: helping chronic condition patients, treatment policy evaluation on clinical data with practicing intensivist, and teaching kids math. I obtained my B.S. in machine intelligence from Peking University in 2016.

Experiences

2024.11 - Now: sr. applied scientist at Amazon, working on LLM alignment for shopping.
2022.08 - 2024.11: applied -> sr. applied scientist at AWS, working on RL research and LLM alignment (drove the RLHF finetuning of Amazon Titan models).
2021.08 - 2022.08: research scientist at ByteDance, working on applications of bandit and RL in TikTok and Douyin video recommendation.
2021: Ph.D. in computer science from Stanford, advised by Emma Brunskill, working on batch reinforcement learning.
2016: B.S. in machine intelligence from Peking University.

Preprints and Publications

From Demonstrations to Rewards: Alignment Without Explicit Human Preferences
Siliang Zeng, Yao Liu, Huzefa Rangwala, George Karypis, Mingyi Hong, Rasool Fakoor
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Zhepeng Cen, Yao Liu, Siliang Zeng, Pratik Chaudhar, Huzefa Rangwala, George Karypis, Rasool Fakoor
TMLR
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
Ke Yang, Yao Liu, Sapana Chaudhary, Rasool Fakoor, Pratik Chaudhari, George Karypis, Huzefa Rangwala
ICLR 2025
EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data
Jesse Zhang, Minho Heo, Zuxin Liu, Erdem Biyik, Joseph J Lim, Yao Liu, Rasool Fakoor
CoRL 2024
Learning the Target Network in Function Space
Kavosh Asadi*, Yao Liu*, Shoham Sabach*, Yin Ming*, Rasool Fakoor
ICML 2024
TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models
Zuxin Liu, Jesse Zhang, Kavosh Asadi, Yao Liu, Ding Zhao, Shoham Sabach, Rasool Fakoor
ICLR 2024
Budgeting Counterfactual for Offline RL
Yao Liu, Pratik Chaudhari, Rasool Fakoor
NeurIPS 2023
TD Convergence: An Optimization Perspective
Kavosh Asadi, Shoham Sabach, Yao Liu, Omer Gottesman, Rasool Fakoor
NeurIPS 2023
Reinforcement Learning Tutor Better Supported Lower Performers in a Math Task
Sherry Ruan, Allen Nie, William Steenbergen, Jiayu He, JQ Zhang, Meng Guo, Yao Liu, Kyle Dang Nguyen, Catherine Y Wang, Rui Ying, James A Landay, Emma Brunskill
Machine Learning Journal
Provably Sample-Efficient RL with Side Information about Latent Dynamics
Yao Liu, Dipendra Misra, Miro Dudík, Robert E. Schapire.
NeurIPS 2022
Offline Policy Optimization with Eligible Actions
Yao Liu, Yannis Flet-Berliac, Emma Brunskill.
UAI 2022
Provably Good Batch Reinforcement Learning Without Great Exploration
Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill.
NeurIPS 2020
Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
Yao Liu, Pierre-Luc Bacon, Emma Brunskill
ICML 2020
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Anthony Celi, Emma Brunskill, Finale Doshi-Velez
ICML 2020
All-Action Policy Gradient Methods: A Numerical Integration Approach
Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, Jimmy Smith, Pierre-Luc Bacon
Off-Policy Policy Gradient with State Distribution Correction
Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
UAI 2019 (Oral)
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
Omer Gottesman, Yao Liu, Scott Sussex, Emma Brunskill, Finale Doshi-Velez
ICML 2019 (Oral)
Representation Balancing MDPs for Off-Policy Policy Evaluation
Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill.
NeurIPS 2018
When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms
Yao Liu, Emma Brunskill.
EWRL 2018
Behaviour Policy Estimation in Off-Policy Evaluation: Calibration Matters
Aniruddh Raghu, Omer Gottesman,Yao Liu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill
ICML 2018 Workshops
Switched Trajectories for Off-Policy Learning
Scott Sussex , Omer Gottesman,Yao Liu, Susan Murphy, Emma Brunskill, Finale Doshi-Velez
ICML 2018 Workshops
Model Selection for Off-Policy Policy Evaluation
Yao Liu, Philip S. Thomas, Emma Brunskill
RLDM 2017, Extended Abstract
PAC Continuous State Online Multitask Reinforcement Learning with Identification
Yao Liu, Zhaohan Guo, Emma Brunskill
AAMAS 2016
Local Orthogonality Preserving Alignment for Nonlinear Dimensionality Reduction
Tong Lin, Yao Liu, Bo Wang, Liwei Wang, Hongbin Zha
Journal of Computer Science and Technology, 31(3): 512-524, 2016.

Professional Service

Journal Reviewing: JMLR, IEEE TPAMI, Machine Learning, Artificial Intelligence, Biometrika

Conference Reviewing: NeurIPS, ICLR, ICML, AISTATS, UAI, AAAI