About me

I am an applied scientist at Amazon working on reinforcement learning. My research goal is creating AI agents that are capable of making decisions in unknown and complex environments, besides perception and prediction. Such problems are often formalized under the reinforcement learning (RL) framework. I am interested in understanding and developing RL algorithms from the first principles, with a focus on causality [paper], scalability [paper], and value function learning dynamics [paper] in RL. Besides the fundamental RL research, I work on user alignment of foundation models (Amazon Bedrock).

Previously, I was a research scientist at ByteDance from 2021 to 2022. I worked on applications of bandit and RL in TikTok and Douyin video recommendation. I obtained my Ph.D. in computer science from Stanford in 2021, advised by Emma Brunskill. My dissertation work is about batch reinforcement learning. With my collaborators, we proposed the first finite sample error bound of batch RL without full coverage assumption [paper] and convergent batch policy gradient with function approximation [paper]. I also worked on batch RL applications in the real world: helping chronic condition patients (at Livongo), treatment policy evaluation on clinical data with practicing intensivist [paper], and teaching kids math [paper]. I complete my B.S. in machine intelligence from Peking University in 2016.

Preprints and Publications

Professional Service

Journal Reviewing: JMLR, IEEE TPAMI, Machine Learning, Artificial Intelligence, Biometrika

Conference Reviewing: NeurIPS, ICLR, ICML, AISTATS, UAI, AAAI