IDEAS home Printed from https://ideas.repec.org/a/spr/jglopt/v73y2019i2d10.1007_s10898-018-0698-y.html
   My bibliography  Save this article

A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning

Author

Listed:
  • Hoai An Le Thi

    (University of Lorraine)

  • Vinh Thanh Ho

    (University of Lorraine)

  • Tao Pham Dinh

    (University of Normandie)

Abstract

We investigate a powerful nonconvex optimization approach based on Difference of Convex functions (DC) programming and DC Algorithm (DCA) for reinforcement learning, a general class of machine learning techniques which aims to estimate the optimal learning policy in a dynamic environment typically formulated as a Markov decision process (with an incomplete model). The problem is tackled as finding the zero of the so-called optimal Bellman residual via the linear value-function approximation for which two optimization models are proposed: minimizing the $$\ell _{p}$$ ℓ p -norm of a vector-valued convex function, and minimizing a concave function under linear constraints. They are all formulated as DC programs for which attractive DCA schemes are developed. Numerical experiments on various examples of the two benchmarks of Markov decision process problems—Garnet and Gridworld problems, show the efficiency of our approaches in comparison with two existing DCA based algorithms and two state-of-the-art reinforcement learning algorithms.

Suggested Citation

  • Hoai An Le Thi & Vinh Thanh Ho & Tao Pham Dinh, 2019. "A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning," Journal of Global Optimization, Springer, vol. 73(2), pages 279-310, February.
  • Handle: RePEc:spr:jglopt:v:73:y:2019:i:2:d:10.1007_s10898-018-0698-y
    DOI: 10.1007/s10898-018-0698-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10898-018-0698-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10898-018-0698-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Le An & Pham Tao, 2005. "The DC (Difference of Convex Functions) Programming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems," Annals of Operations Research, Springer, vol. 133(1), pages 23-46, January.
    2. Abhijit Gosavi, 2009. "Reinforcement Learning: A Tutorial Survey and Recent Advances," INFORMS Journal on Computing, INFORMS, vol. 21(2), pages 178-192, May.
    3. Manlio Gaudioso & Giovanni Giallombardo & Giovanna Miglionico & Adil M. Bagirov, 2018. "Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations," Journal of Global Optimization, Springer, vol. 71(1), pages 37-55, May.
    4. Kaisa Joki & Adil M. Bagirov & Napsu Karmitsa & Marko M. Mäkelä, 2017. "A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes," Journal of Global Optimization, Springer, vol. 68(3), pages 501-535, July.
    5. João Carlos O. Souza & Paulo Roberto Oliveira & Antoine Soubeyran, 2016. "Global convergence of a proximal linearized algorithm for difference of convex functions," Post-Print hal-01440298, HAL.
    6. R. Blanquero & E. Carrizosa, 2000. "Optimization of the Norm of a Vector-Valued DC Function and Applications," Journal of Optimization Theory and Applications, Springer, vol. 107(2), pages 245-260, November.
    7. Rafael Blanquero & Emilio Carrizosa, 2010. "On the norm of a dc function," Journal of Global Optimization, Springer, vol. 48(2), pages 209-213, October.
    8. Le Thi, H.A. & Pham Dinh, T. & Le, H.M. & Vo, X.T., 2015. "DC approximation approaches for sparse optimization," European Journal of Operational Research, Elsevier, vol. 244(1), pages 26-46.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Welington Oliveira, 2020. "Sequential Difference-of-Convex Programming," Journal of Optimization Theory and Applications, Springer, vol. 186(3), pages 936-959, September.
    2. Welington Oliveira, 2019. "Proximal bundle methods for nonsmooth DC programming," Journal of Global Optimization, Springer, vol. 75(2), pages 523-563, October.
    3. Manlio Gaudioso & Giovanni Giallombardo & Giovanna Miglionico, 2023. "Sparse optimization via vector k-norm and DC programming with an application to feature selection for support vector machines," Computational Optimization and Applications, Springer, vol. 86(2), pages 745-766, November.
    4. Manlio Gaudioso & Giovanni Giallombardo & Giovanna Miglionico & Adil M. Bagirov, 2018. "Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations," Journal of Global Optimization, Springer, vol. 71(1), pages 37-55, May.
    5. A. M. Bagirov & N. Hoseini Monjezi & S. Taheri, 2021. "An augmented subgradient method for minimizing nonsmooth DC functions," Computational Optimization and Applications, Springer, vol. 80(2), pages 411-438, November.
    6. Manlio Gaudioso & Giovanni Giallombardo & Giovanna Miglionico, 2020. "Essentials of numerical nonsmooth optimization," 4OR, Springer, vol. 18(1), pages 1-47, March.
    7. Hoai An Le Thi & Manh Cuong Nguyen, 2017. "DCA based algorithms for feature selection in multi-class support vector machine," Annals of Operations Research, Springer, vol. 249(1), pages 273-300, February.
    8. M. V. Dolgopolik, 2020. "New global optimality conditions for nonsmooth DC optimization problems," Journal of Global Optimization, Springer, vol. 76(1), pages 25-55, January.
    9. Najmeh Hoseini Monjezi & S. Nobakhtian, 2021. "A filter proximal bundle method for nonsmooth nonconvex constrained optimization," Journal of Global Optimization, Springer, vol. 79(1), pages 1-37, January.
    10. Manlio Gaudioso & Giovanni Giallombardo & Giovanna Miglionico, 2022. "Essentials of numerical nonsmooth optimization," Annals of Operations Research, Springer, vol. 314(1), pages 213-253, July.
    11. Chungen Shen & Xiao Liu, 2021. "Solving nonnegative sparsity-constrained optimization via DC quadratic-piecewise-linear approximations," Journal of Global Optimization, Springer, vol. 81(4), pages 1019-1055, December.
    12. Blanquero, R. & Carrizosa, E. & Hendrix, E.M.T., 2011. "Locating a competitive facility in the plane with a robustness criterion," European Journal of Operational Research, Elsevier, vol. 215(1), pages 21-24, November.
    13. Rafael Blanquero & Emilio Carrizosa & Pierre Hansen, 2009. "Locating Objects in the Plane Using Global Optimization Techniques," Mathematics of Operations Research, INFORMS, vol. 34(4), pages 837-858, November.
    14. Tao Pham Dinh & Van Ngai Huynh & Hoai An Le Thi & Vinh Thanh Ho, 2022. "Alternating DC algorithm for partial DC programming problems," Journal of Global Optimization, Springer, vol. 82(4), pages 897-928, April.
    15. Outi Montonen & Kaisa Joki, 2018. "Bundle-based descent method for nonsmooth multiobjective DC optimization with inequality constraints," Journal of Global Optimization, Springer, vol. 72(3), pages 403-429, November.
    16. Wim Ackooij & Welington Oliveira, 2019. "Nonsmooth and Nonconvex Optimization via Approximate Difference-of-Convex Decompositions," Journal of Optimization Theory and Applications, Springer, vol. 182(1), pages 49-80, July.
    17. M. V. Dolgopolik, 2022. "DC Semidefinite programming and cone constrained DC optimization I: theory," Computational Optimization and Applications, Springer, vol. 82(3), pages 649-671, July.
    18. Voelkel, Michael A. & Sachs, Anna-Lena & Thonemann, Ulrich W., 2020. "An aggregation-based approximate dynamic programming approach for the periodic review model with random yield," European Journal of Operational Research, Elsevier, vol. 281(2), pages 286-298.
    19. Min Tao & Jiang-Ning Li, 2023. "Error Bound and Isocost Imply Linear Convergence of DCA-Based Algorithms to D-Stationarity," Journal of Optimization Theory and Applications, Springer, vol. 197(1), pages 205-232, April.
    20. Hoai An Le Thi & Van Ngai Huynh & Tao Pham Dinh, 2018. "Convergence Analysis of Difference-of-Convex Algorithm with Subanalytic Data," Journal of Optimization Theory and Applications, Springer, vol. 179(1), pages 103-126, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jglopt:v:73:y:2019:i:2:d:10.1007_s10898-018-0698-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.