A Moreau-Yosida regularization for Markov decision processes
Keywords:Discounted Markov decision processes, Uniqueness of optimal policies, Moreau-Yosida regularization
This paper addresses a class of sequential optimization problems known as Markov decision processes. These kinds of processes are considered on Euclidean state and action spaces with the total expected discounted cost as the objective function. The main goal of the paper is to provide conditions to guarantee an adequate Moreau-Yosida regularization for Markov decision processes (named the original process). In this way, a new Markov decision process that conforms to the Markov control model of the original process except for the cost function induced via the Moreau-Yosida regularization is established. Compared to the original process, this new discounted Markov decision process has richer properties, such as the differentiability of its optimal value function, strictly convexity of the value function, uniqueness of optimal policy, and the optimal value function and the optimal policy of both processes, are the same. To complement the theory presented, an example is provided.
C. D. Aliprantis and K. C. Border, Infinite dimensional analysis: a hitchhikers guide. Berlin: Springer, 2006, doi: 10.1007/3-540-29587-9
A. Belloni, Lecture notes for IAP 2005 course introduction to Bundle methods, 2005. [On line]. Available: https://bit.ly/3qbJfmb
D. P. Bertsekas, Dynamic programming and stochastic control. New York, NY: Academic Press, 1976.
J. M. Borwein and Q. J. Zhu, Techniques of variational analysis. New York: Springer, 2005.
P. G. Ciarlet, B. Miara, and J. M. Thomas, Introduction to Numerical Linear Algebra and Optimisation. New York, NY: Cambridge University Press, 1989.
D. Cruz-Suárez, R. Montes-de-Oca, and F. Salem-Silva, “Conditions for the uniqueness of optimal policies of discounted Markov decision processes”, Mathematical methods of operational research, vol. 60, no. 3, pp. 415–436, Dec. 2004, doi: 10.1007/s001860400372
J. Dugundji, Topology. Boston, MA: Allyn and bacon, 1966. [On line]. Available: https://bit.ly/398XKQA
I. Ɖuranović-Miličić and M. Gardašević-Filipović, “On an Algorithm in nondifferential convex optimization”, Yugoslav journal of operations research, vol. 23, no.1, pp. 59-71, 2013, doi: 10.2298/YJOR110501024D
S. H. Friedberg, A. Insel, and L. Spence, Linear algebra. Upper Saddle River, NJ: Pearson, 2003.
M. Geist, B. Scherrer, O. Pietquin, “Theory of Regularized Markov Decision Processes”, Proceedings of machine learning research , vol. 97, pp. 2160-2169, 2019. [On line]. Available: https://bit.ly/3bdfpcC
O. Hernández-Lerma, Adaptive Markov control processes. New York, NY: Springer, 1989.
O. Hernández-Lerma and J. B. Lasserre, Markov control processes: basic optimality criteria: discrete-time. New York, NY: Springer, 1996.
O. Hernández-Lerma and J. B. Lasserre, Discrete-time Markov control processes. New York, NY: Springer, 1999.
C. Lemaréchal and C. Sagastizábal, “Practical aspects of the Moreau-Yosida regularization: theoretical preliminaries”, SIAM journal on optimization, vol. 7, no. 2, pp. 367-385, 1997, doi: 10.1137/S1052623494267127
R. T. Rockafellar, Convex analysis. Princeton, NJ: Princeton University Press, 1970.
R. T. Rockafellar and R. J.-B. Wets, Variational analysis. Heidelberg: Springer, 2004.
H. L. Royden, Real analysis. New York, NY: Macmillan, 1988.
How to Cite
Copyright (c) 2021 Israel Ortega-Gutiérrez, Hugo Cruz-Suárez
This work is licensed under a Creative Commons Attribution 4.0 International License.