- Seminar Calendar
- Seminar Archive
- 2024-2025 Semester 1
- 2023-2024 Semester 2
- 2023-2024 Semester 1
- 2022-2023 Semester 2
- 2022-2023 Semester 1
- 2021-2022 Semester 2
- 2021-2022 Semester 1
- 2020-2021 Semester 2
- 2020-2021 Semester 1
- 2019-2020 Semester 2
- 2019-2020 Semester 1
- 2018-2019 Semester 2
- 2018-2019 Semester 1
- 2017-2018 Semester 2
- 2017-2018 Semester 1
- 2016-2017 Semester 2
- 2016-2017 Semester 1
- 2015-2016 Semester 1
- 2015-2016 Semester 2
- 2014-2015 Semester 2
- 2014-2015 Semester 1
- 2013-2014 Semester 2
- 2013-2014 Semester 1
- 2012-2013 Semester 2
- 2012-2013 Semester 1
- 2011-2012 Semester 2
- 2011-2012 Semester 1
- 2010-2011 Semester 2
- 2010-2011 Semester 1
- 2009-2010 Semester 2
- 2009-2010 Semester 1
- 2008-2009 Semester 2
- 2008-2009 Semester 1
- 2007-2008 Semester 2
- 2007-2008 Semester 1
- 2006-2007 Semester 2
- 2006-2007 Semester 1
- 2005-2006 Semester 2
- 2005-2006 Semester 1
- Contact
- Site Map
Algorithmic Decision-making with Endogenous Rewards
----------------------------------------------------------------------------------------------------
Department of Systems Engineering and Engineering Management
The Chinese University of Hong Kong
----------------------------------------------------------------------------------------------------
Date: Friday, February 3, 4:30 pm – 6:00 pm
Venue: ERB 513, The Chinese University of Hong Kong
Title: Algorithmic Decision-making with Endogenous Rewards
Speaker: Prof. Xiaowei Zhang, University of Hong Kong
Abstract:
In the standard data analysis framework, data is collected (once for all), and then data analysis is carried out. However, with the advancement of digital technology, decision-makers constantly analyze past data and generate new data through their decisions. We model this as a Markov decision process and show that the dynamic interaction between data generation and data analysis leads to a new type of bias (reinforcement bias) that exacerbates the endogeneity problem in standard data analysis. We propose a class of instrument variable (IV)-based reinforcement learning (RL) algorithms to correct for the bias and establish their theoretical properties by incorporating them into a stochastic approximation (SA) framework. Our analysis accommodates iterate-dependent Markovian structures and, therefore, can be used to study RL algorithms with policy improvement. Furthermore, we derive a sharper trajectory concentration bound: with a polynomial rate, the entire future trajectory of the SA iterates, after a given finite time, falls within a ball centered at the true parameter and shrinking at another polynomial rate. We also provide formulas for inference on optimal policies of the IV-RL algorithms. These formulas highlight how intertemporal dependencies of the Markovian environment affect the inference.
Biography:
Xiaowei Zhang is an assistant professor in the Faculty of Business and Economics, University of Hong Kong. He received his PhD in Management Science and Engineering from Stanford University and B.S. in Mathematics from Nankai University. His recent research interests include high-dimensional simulation optimization and causal reinforcement learning. His research has been published at top journals such as Management Science, Operations Research, and Mathematics of Operations Research.
Everyone is welcome to attend the talk!
SEEM-5202 Website: http://seminar.se.cuhk.edu.hk
Email: seem5202@se.cuhk.edu.hk
Date:
Friday, February 3, 2023 - 16:30 to 18:00