![]() |
Yi Su
|
I am a research scientist at Google Deepmind. Previously, I was a PostDoctoral Researcher at EECS, UC Berkeley, working with Professor Sergey Levine. I obtained my PhD in Statistics from Cornell University, advised by Professor Thorsten Joachims. Prior to that, I received my BSc (Honors) in Mathematics from Nanyang Technological University in the beautiful Singapore.
I work on bandits, RL, and their intersections with recommender systems, LLM, etc.
Training language models to self-correct via reinforcement learning
Aviral Kumar*, Vincent Zhuang*, Rishabh Agarwal*, Yi Su*, John D Co-Reyes, Avi Singh, Kate Baumli, Shariq Iqbal, Colton Bishop, Rebecca Roelofs, Lei M Zhang, Kay McKinney, Disha Shrivastava, Cosmin Paduraru, George Tucker, Doina Precup, Feryal Behbahani, Aleksandra Faust
[PDF] International Conference on Learning Representations (ICLR), 2025, Oral
EVOLvE: Evaluating and Optimizing LLMs For Exploration
Allen Nie*, Yi Su*, Bo Chang*, Jonathan N Lee, Ed H Chi, Quoc V Le, Minmin Chen
[PDF] arXiv, 2024
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Gemini Team, Google Deepmind
[PDF] arXiv, 2024
Online Feature Updates Improve Online (Generalized) Label Shift Adaptation
Ruihan Wu, Siddhartha Datta, Yi Su, Dheeraj Baby, Yu-Xiang Wang, Kilian Q Weinberger
[PDF] Neural Information Processing Systems (NeurIPS), 2024
Value of Exploration: Measurements, Findings and Algorithms
Yi Su, Xiangyu Wang, Elaine Ya Le, Liang Liu, Yuening Li, Haokai Lu, Benjamin Lipshitz, Sriraj Badam, Lukasz Heldt, Shuchao Bi, Ed Chi, Cristos Goodrow, Su-Lin Wu, Lexi Baugher, Minmin Chen
[PDF] Best Paper Award, ACM International Conference on Web Search and Data Mining (WSDM), 2024
Multi-Task Neural Linear Bandit for Exploration in Recommender Systems
Yi Su, Haokai Lu, Yuening Li, Liang Liu, Shuchao Bi, Ed H Chi, Minmin Chen
[PDF] ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2024
Nonlinear Bandits Exploration for Recommendations
Yi Su, Minmin Chen
[PDF] ACM Recommender Systems (Recsys), 2023
Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective
Zeyu Zhang, Yi Su, Hui Yuan, Yiran Wu, Rishab Balasubramanian, Qingyun Wu, Huazheng Wang, Mengdi Wang
[PDF] Neural Information Processing Systems (NeurIPS), 2023
Offline RL for Natural Language Generation with Implicit Language Q Learning
Charlie Snell, Ilya Kostrikov, Yi Su, Mengjiao Yang, Sergey Levine
[PDF] International Conference on Learning Representations (ICLR), 2023
Optimizing Rankings for Recommendation in Matching Markets
Yi Su, Magd Bayoumi, Thorsten Joachims
[PDF] World Wide Web Conference (WWW), 2022
Data-Driven Model-Based Optimization via Invariant Representation Learning
Han Qi, Yi Su, Aviral Kumar, Sergey Levine
Neural Information Processing Systems (NeurIPS), 2022
Context-Aware Language Modeling for Goal-Oriented Dialogue Systems
Charlie Snell, Sherry Yang, Justin Fu, Yi Su, Sergey Levine
[PDF] Proceedings of NAACL, 2022
Online Adaptation to Label Distribution Shift
Ruihan Wu, Chuan Guo, Yi Su, Kilian Q Weinberger
[PDF] Neural Information Processing Systems (NeurIPS), 2021
Recommendations as Treatments
Thorsten Joachims, Ben London, Yi Su, Adith Swaminathan, Lequn Wang
A.I.Magazine, 2021
Doubly robust off-policy evaluation with shrinkage
Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dudik
[PDF] International Conference on Machine Learning (ICML), 2020
Adaptive estimator selection for off-policy evaluation
Yi Su, Pavithra Srinath, Akshay Krishnamurthy
[PDF] International Conference on Machine Learning (ICML), 2020
Off-policy Bandits with Deficient Support
Noveen Sachdeva*, Yi Su*, Thorsten Joachims
[PDF] ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2020
CAB: Continuous Adaptive Blending for Policy Evaluation and Learning
Yi Su*, Lequn Wang*, Michele Santacatterina, Thorsten Joachims
[PDF] International Conference on Machine Learning (ICML), 2019
Learning from Logged Bandit Feedback of Multiple Loggers
Yi Su, Aman Agarwal, Thorsten Joachims
[PDF] CausalML Workshop at the International Conference on Machine Learning (CausalML), 2018
Off-policy Evaluation and Learning for Interactive Systems
Invited talk at SIGIR’21 Workshop on Causality Search and Recommendation, June 2021.
Invited talk at SIGIR’21 Workshop on Deep Reinforcement Learning for Information Retrieval, June 2021.
Adaptive Estimator Selection for Off-policy Evaluation
Netflix Research Seminar, June 2021.
RL Theory Virtual Seminar, March 2021.
Off-policy Bandits with Deficient Support
Bloomberg AI, August 2020.
Bloomberg Data Science Fellowship, 2019 - 2021
Rising Stars in EECS, 2020
Lee Kuan Yew Gold Medal, 2016
International Conference on Machine Learning (ICML), 2019, 2020 (top reviewer), 2021 (expert reviewer), 2022, 2023
Neural Information Processing Systems (NeurIPS), 2019 - 2023
Conference on Artificial Intelligence (AAAI), 2020, 2021, 2022
International Conference on Learning Representations (ICLR), 2021 - 2023
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022 - 2023
1st Conference on Causal Learning and Reasoning (CLeaR), 2022 - 2023
European Workshops on Reinforcement Learning (EWRL), 2022
NeurIPS Workshop: Offline Reinforcement Learning, 2020
ICML Workshop: Theoretical Foundations of Reinforcement Learning, 2020
ICML Workshop: Reinforcement Learning Theory, 2021
KDD Workshop: 2nd Workshop on Online and Adaptive Recommender Systems (OARS), 2022