About me

My goal is to create AI agents that can make decisions in complex environments using evaluative feedback, which is formalized under the reinforcement learning (RL) framework. Currently, I work on using RL algorithms to make large language models (LLMs) better interact with human and environment.

Previously, I worked on applications of bandit and RL in large-scale recommendation system for a year at ByteDance. Before that, I obtained my Ph.D. in computer science from Stanford in 2021, advised by Emma Brunskill. With my collaborators, we proposed the first finite sample error bound of batch RL without full coverage assumption and convergent batch policy gradient with function approximation. I also worked on applications of batch RL in the real world: helping chronic condition patients, treatment policy evaluation on clinical data with practicing intensivist, and teaching kids math. I obtained my B.S. in machine intelligence from Peking University in 2016.

Experiences

Preprints and Publications

Professional Service

Journal Reviewing: JMLR, IEEE TPAMI, Machine Learning, Artificial Intelligence, Biometrika

Conference Reviewing: NeurIPS, ICLR, ICML, AISTATS, UAI, AAAI