About Me
I am a research scientist at TikTok. I received my PhD in Computer Science at Stanford University in 2021, advised by Emma Brunskill. I was also a member of the Stanford AI Lab and the Statistical Machine Learning Group there. During my PhD, I had the fortune of collaborating with Finale Doshi-Velez from Harvard, Adith Swaminathan and Alekh Agarwal from Microsoft Research. I also spent times at Simons Institute, Microsoft Research (NYC lab and Redmond), and Livongo Health (a digital health start-up helping chronic condition patients). Before that, I obtained my B.S. from the Department of Machine Intelligence at Peking University in 2016, and learned about machine learning from Liwei Wang.
I believe the key aspect of AI is to make decisions in an interactive environment, besides to perceive and to predict. I am interested in interactive machine learning (e.g. reinforcement learning, imitation learning, contextual bandit) under the real-world constraints about sample efficiency and safety. I have been working on or interested in the following topics:
- Off-policy reinforcement learning. My dissertation work focus on batch RL with limited exploration.
- Transfer abstract prior information to rich-observation interactive environments.
- Causality and fundamental difference between interactive and batch/supervised learning.
- Applications in education, healthcare, recommendation and personalization problems.
Preprints and Publications
-
Provably Sample-Efficient RL with Side Information about Latent Dynamics
-
Offline Policy Optimization with Eligible Actions
UAI 2022
-
Provably Good Batch Reinforcement Learning Without Great Exploration
NeurIPS 2020
-
Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
ICML 2020
-
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
ICML 2020
-
All-Action Policy Gradient Methods: A Numerical Integration Approach
-
Off-Policy Policy Gradient with State Distribution Correction
UAI 2019 (Oral) -
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
ICML 2019 (Oral) -
Representation Balancing MDPs for Off-Policy Policy Evaluation
NeurIPS 2018
-
When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms
EWRL 2018
-
Behaviour Policy Estimation in Off-Policy Evaluation: Calibration Matters
ICML 2018 Workshops
-
Switched Trajectories for Off-Policy Learning
ICML 2018 Workshops
-
Model Selection for Off-Policy Policy Evaluation
RLDM 2017, Extended Abstract
-
PAC Continuous State Online Multitask Reinforcement Learning with Identification
AAMAS 2016
-
Local Orthogonality Preserving Alignment for Nonlinear Dimensionality Reduction
Journal of Computer Science and Technology, 31(3): 512-524, 2016.
Teaching
CS234: Reinforcement Learning, Teaching Assistant, Winter 2019-2020.
CS229: Machine Learning, Teaching Assistant, Spring 2020-2021.
Professional Service
Journal Reviewing: Biometrika, JMLR, IEEE TPAMI, Machine Learning, Artificial Intelligence
Conference Reviewing: NeurIPS (2019 - ), ICLR (2019 - ), ICML(2020 - ), AISTATS (2020 - ), UAI(2020)