I am a research scientist at TikTok. I received my PhD in Computer Science at Stanford University in 2021, advised by Emma Brunskill. I was also a member of the Stanford AI Lab and the Statistical Machine Learning Group there. During my PhD, I had the fortune of collaborating with Finale Doshi-Velez from Harvard, Adith Swaminathan and Alekh Agarwal from Microsoft Research. I also spent times at Simons Institute, Microsoft Research (NYC lab and Redmond), and Livongo Health (a digital health start-up helping chronic condition patients). Before that, I obtained my B.S. from the Department of Machine Intelligence at Peking University in 2016, and learned about machine learning from Liwei Wang.
I believe the key aspect of AI is to make decisions in an interactive environment, besides to perceive and to predict. I am interested in interactive machine learning (e.g. reinforcement learning, imitation learning, contextual bandit) under the real-world constraints about sample efficiency and safety. I have been working on or interested in the following topics:
- Off-policy reinforcement learning. My dissertation work focus on batch RL with limited exploration.
- Transfer abstract prior information to rich-observation interactive environments.
- Causality and fundamental difference between interactive and batch/supervised learning.
- Applications in education, healthcare, recommendation and personalization problems.
Preprints and Publications
Provably Sample-Efficient RL with Side Information about Latent Dynamics
Offline Policy Optimization with Eligible Actions
Provably Good Batch Reinforcement Learning Without Great Exploration
Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
All-Action Policy Gradient Methods: A Numerical Integration Approach
Off-Policy Policy Gradient with State Distribution Correction
UAI 2019 (Oral)
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
ICML 2019 (Oral)
Representation Balancing MDPs for Off-Policy Policy Evaluation
When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms
Behaviour Policy Estimation in Off-Policy Evaluation: Calibration Matters
ICML 2018 Workshops
Switched Trajectories for Off-Policy Learning
ICML 2018 Workshops
Model Selection for Off-Policy Policy Evaluation
RLDM 2017, Extended Abstract
PAC Continuous State Online Multitask Reinforcement Learning with Identification
Local Orthogonality Preserving Alignment for Nonlinear Dimensionality Reduction
Journal of Computer Science and Technology, 31(3): 512-524, 2016.
CS234: Reinforcement Learning, Teaching Assistant, Winter 2019-2020.
CS229: Machine Learning, Teaching Assistant, Spring 2020-2021.
Journal Reviewing: Biometrika, JMLR, IEEE TPAMI, Machine Learning, Artificial Intelligence
Conference Reviewing: NeurIPS (2019 - ), ICLR (2019 - ), ICML(2020 - ), AISTATS (2020 - ), UAI(2020)