I am a research scientist at TikTok. I received my PhD in Computer Science at Stanford University in 2021, advised by Emma Brunskill. I was also a member of the Stanford AI Lab and the Statistical Machine Learning Group there. During my PhD, I had the fortune of collaborating with Finale Doshi-Velez from Harvard, Adith Swaminathan and Alekh Agarwal from Microsoft Research. I also spent times at Simons Institute for the Theory of Computing, Microsoft Research (NYC lab and Redmond), and Livongo Health (a digital health start-up helping chronic condition patients). Before that, I obtained my B.S. from the Department of Machine Intelligence at Peking University in 2016, and learned about machine learning from Liwei Wang.
A key aspect of AI is to make decisions, beyond to perceive and to predict. Thus it need to interact with environments. I am interested in interactive machine learning (e.g. reinforcement learning, imitation learning, contextual bandit) under the real-world constraints about sample efficiency and safety. I have been working on the following aspects of this problem:
- Batched interactive learning algorithm to leverage historical data and avoid risky online sampling. My dissertation work is trying to answer the questions of how to perform batch RL with limited exploration in the dataset.
- Transfer and adaptation from abstractions to high-dimensional, noisy observations.
- Applications of interactive learning in education (on real-world experiments) and healthcare (on real-world datasets).
- Application of interactive learning in recommendation systems and personalization (improves recommendation systems and user growth at TikTok).
Preprints and Publications
Offline Policy Optimization with Eligible Actions
Provably Good Batch Reinforcement Learning Without Great Exploration
Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
All-Action Policy Gradient Methods: A Numerical Integration Approach
Off-Policy Policy Gradient with State Distribution Correction
UAI 2019 (Oral)
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
ICML 2019 (Oral)
Representation Balancing MDPs for Off-Policy Policy Evaluation
When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms
Behaviour Policy Estimation in Off-Policy Evaluation: Calibration Matters
ICML 2018 Workshops
Switched Trajectories for Off-Policy Learning
ICML 2018 Workshops
Model Selection for Off-Policy Policy Evaluation
RLDM 2017, Extended Abstract
PAC Continuous State Online Multitask Reinforcement Learning with Identification
Local Orthogonality Preserving Alignment for Nonlinear Dimensionality Reduction
Journal of Computer Science and Technology, 31(3): 512-524, 2016.
CS234: Reinforcement Learning, Teaching Assistant, Winter 2019-2020.
CS229: Machine Learning, Teaching Assistant, Spring 2020-2021.
Journal Reviewing: Biometrika, JMLR, IEEE TPAMI, Machine Learning, Artificial Intelligence
Conference Reviewing: NeurIPS (2019 - ), ICLR (2019 - ), ICML(2020 - ), AISTATS (2020 - ), UAI(2020)