Research
My research is motivated by a key problem of AI which is making decisions in an interactive environment, besides perception and prediction. I am broadly interested in interactive machine learning (e.g. reinforcement learning and bandit) under the real-world constraints such as sample efficiency, safety, robustness and alignment.
Recently, I work on reinforcement learning methods in training foundation models (project: Amazon Bedrock) and foundation models for decision making.
Education and Experiences
-
Research Scientist at AWS, Aug 2022 - Now
-
Research Scientist at ByteDance, Aug 2021 - Aug 2022
-
Ph.D. in Computer Science at Stanford, June 2021
-
B.S. in Machine Intelligence at Peking University, June 2016
Preprints and Publications
-
Budgeting Counterfactual for Offline RL
NeurIPS 2023
-
TD Convergence: An Optimization Perspective
NeurIPS 2023
-
Provably Sample-Efficient RL with Side Information about Latent Dynamics
NeurIPS 2022
-
Offline Policy Optimization with Eligible Actions
UAI 2022
-
Provably Good Batch Reinforcement Learning Without Great Exploration
NeurIPS 2020
-
Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
ICML 2020
-
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
ICML 2020
-
All-Action Policy Gradient Methods: A Numerical Integration Approach
-
Off-Policy Policy Gradient with State Distribution Correction
UAI 2019 (Oral) -
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
ICML 2019 (Oral) -
Representation Balancing MDPs for Off-Policy Policy Evaluation
NeurIPS 2018
-
When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms
EWRL 2018
-
Behaviour Policy Estimation in Off-Policy Evaluation: Calibration Matters
ICML 2018 Workshops
-
Switched Trajectories for Off-Policy Learning
ICML 2018 Workshops
-
Model Selection for Off-Policy Policy Evaluation
RLDM 2017, Extended Abstract
-
PAC Continuous State Online Multitask Reinforcement Learning with Identification
AAMAS 2016
-
Local Orthogonality Preserving Alignment for Nonlinear Dimensionality Reduction
Journal of Computer Science and Technology, 31(3): 512-524, 2016.
Teaching
CS234: Reinforcement Learning, Teaching Assistant, Winter 2019-2020.
CS229: Machine Learning, Teaching Assistant, Spring 2020-2021.
Professional Service
Journal Reviewing: JMLR, IEEE TPAMI, MLJ, AIJ, Biometrika
Conference Reviewing: NeurIPS (2019 - 2021), ICLR (2019 - 2021, 2023), ICML(2020, 2021), AISTATS (2020 - 2022), UAI (2020), AAAI (2022, 2023)