Applications of Bayesian Reinforcement Learning in the Control Domain
Abstract: The main objective of reinforcement learning (RL) is to seek for optimal parameters of a given parameterized policy. Policy search algorithms make the RL suitable for application to complex dynamical systems, such as robotics and control domains, where the environment is comprised of high-dimensional state and action spaces. Although many policy search algorithms are based on the policy gradient methods, their performance might be affected by slow convergence or local optima complications. The reason for this is due to the urge for computation of the gradient components of the parameterized policy. To tackle this, a Bayesian gradient-free algorithm is tailored. The problem of interest is to control a discrete-time Markov decision process (MDP) with continuous state and action spaces. One can consider the algorithm as a particle Markov chain Monte Carlo (P-MCMC) which generates samples for the policy parameters from a posterior distribution, instead of performing gradient approximations. To do so, a prior density over policy parameters is assigned and the posterior distribution is targeted where the ‘likelihood’ is assumed to be the expected total reward. In terms of risk-sensitive scenarios, where a multiplicative expected total reward is employed to measure the performance of the policy, rather than its cumulative counterpart, our methodology is fit for purpose because, via a multiplicative reward, one can fully take sequential Monte Carlo (SMC), known as the particle filter within the iterations of the P-MCMC. Furthermore, to deal with the challenging problem of the policy search in large-dimensional state spaces, an adaptive MCMC algorithm is proposed.
Speaker Biography: After graduating from Tabriz Azad University’s Electrical and Electronics Engineering department in 2009, he pursued an MSc in Control Engineering at Istanbul Technical University. During his master’s studies, he specialized in the application of Fuzzy Logic and Reinforcement Learning in Control Engineering. In 2013, he began his doctoral studies in the Mechatronics Engineering Department at Sabanci University, where he also worked as a Research and Teaching Assistant. His PhD research focused on the applications of Reinforcement Learning and Bayesian Learning in Control Engineering. After completing a brief postdoctoral visit at Sabanci University in 2020, he joined the Mechatronics Engineering Department at Bahçeşehir University as an Assistant Professor. In 2022, he transitioned to Istinye University, where he continued his role as an Assistant Professor in the Electrical and Electronics Engineering Department. He contributed significantly to the university’s research and educational objectives during his tenure through high-quality research and teaching. In July 2023, he embarked on a new professional journey as a Battery Development Senior Engineer at AVL List Türkiye, participating in two European Horizon projects as a researcher. He also serves as a part-time lecturer in the Engineering Department at Istinye University. His primary research interests include Reinforcement Learning, Bayesian Learning, Fuzzy Logic, Control Systems, and Battery Management Systems.