Yao Liu

About me

My goal is to create AI agents that can make decisions in complex environments using evaluative feedback, which is formalized under the reinforcement learning (RL) framework. Currently, I work on using RL algorithms to help large language models (LLMs) better interact with human and environment.

Experiences

2022.08 - Now: Applied Scientist, Senior Applied Scientist at Amazon. I am working on RL, LLM and agents. I built the RL(HF) finetuning of Amazon Titan models and Rufus.
2021.08 - 2022.08: Research Scientist at ByteDance. I worked on RL for recommendation problems (in arguably the largest scale recommender system).
2021: Ph.D. in Computer Science at Stanford, advised by Emma Brunskill. I worked on theory and algorithms of reinforcement learning, especially in the offline settings. Our work provide some early theoretical results on batch RL with function approximation, the pessimistic value estimates principle, as well as real world application of batch RL in healthcare and education.
2016: B.S. in Machine Intelligence at Peking University.

Preprints and Publications

Teaching Large Language Models to Reason through Learning and Forgetting
Tianwei Ni, Allen Nie, Sapana Chaudhary, Yao Liu, Huzefa Rangwala, Rasool Fakoor
From Demonstrations to Rewards: Alignment Without Explicit Human Preferences
Siliang Zeng, Yao Liu, Huzefa Rangwala, George Karypis, Mingyi Hong, Rasool Fakoor
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Zhepeng Cen, Yao Liu, Siliang Zeng, Pratik Chaudhar, Huzefa Rangwala, George Karypis, Rasool Fakoor
TMLR
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
Ke Yang, Yao Liu, Sapana Chaudhary, Rasool Fakoor, Pratik Chaudhari, George Karypis, Huzefa Rangwala
ICLR 2025
EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data
Jesse Zhang, Minho Heo, Zuxin Liu, Erdem Biyik, Joseph J Lim, Yao Liu, Rasool Fakoor
CoRL 2024
Learning the Target Network in Function Space
Kavosh Asadi*, Yao Liu*, Shoham Sabach*, Yin Ming*, Rasool Fakoor
ICML 2024
TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models
Zuxin Liu, Jesse Zhang, Kavosh Asadi, Yao Liu, Ding Zhao, Shoham Sabach, Rasool Fakoor
ICLR 2024
Budgeting Counterfactual for Offline RL
Yao Liu, Pratik Chaudhari, Rasool Fakoor
NeurIPS 2023
TD Convergence: An Optimization Perspective
Kavosh Asadi, Shoham Sabach, Yao Liu, Omer Gottesman, Rasool Fakoor
NeurIPS 2023
Reinforcement Learning Tutor Better Supported Lower Performers in a Math Task
Sherry Ruan, Allen Nie, William Steenbergen, Jiayu He, JQ Zhang, Meng Guo, Yao Liu, Kyle Dang Nguyen, Catherine Y Wang, Rui Ying, James A Landay, Emma Brunskill
Machine Learning Journal
Provably Sample-Efficient RL with Side Information about Latent Dynamics
Yao Liu, Dipendra Misra, Miro Dudík, Robert E. Schapire.
NeurIPS 2022
Offline Policy Optimization with Eligible Actions
Yao Liu, Yannis Flet-Berliac, Emma Brunskill.
UAI 2022
Provably Good Batch Reinforcement Learning Without Great Exploration
Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill.
NeurIPS 2020
Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
Yao Liu, Pierre-Luc Bacon, Emma Brunskill
ICML 2020
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Anthony Celi, Emma Brunskill, Finale Doshi-Velez
ICML 2020
All-Action Policy Gradient Methods: A Numerical Integration Approach
Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, Jimmy Smith, Pierre-Luc Bacon
Off-Policy Policy Gradient with State Distribution Correction
Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
UAI 2019 (Oral)
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
Omer Gottesman, Yao Liu, Scott Sussex, Emma Brunskill, Finale Doshi-Velez
ICML 2019 (Oral)
Representation Balancing MDPs for Off-Policy Policy Evaluation
Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill.
NeurIPS 2018
When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms
Yao Liu, Emma Brunskill.
EWRL 2018
Behaviour Policy Estimation in Off-Policy Evaluation: Calibration Matters
Aniruddh Raghu, Omer Gottesman,Yao Liu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill
ICML 2018 Workshops
Switched Trajectories for Off-Policy Learning
Scott Sussex , Omer Gottesman,Yao Liu, Susan Murphy, Emma Brunskill, Finale Doshi-Velez
ICML 2018 Workshops
Model Selection for Off-Policy Policy Evaluation
Yao Liu, Philip S. Thomas, Emma Brunskill
RLDM 2017, Extended Abstract
PAC Continuous State Online Multitask Reinforcement Learning with Identification
Yao Liu, Zhaohan Guo, Emma Brunskill
AAMAS 2016
Local Orthogonality Preserving Alignment for Nonlinear Dimensionality Reduction
Tong Lin, Yao Liu, Bo Wang, Liwei Wang, Hongbin Zha
Journal of Computer Science and Technology, 31(3): 512-524, 2016.

Professional Service

Journal Reviewing: JMLR, IEEE TPAMI, Machine Learning, Artificial Intelligence, Biometrika

Conference Reviewing: NeurIPS, ICLR, ICML, AISTATS, UAI, AAAI