IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0283207.html
   My bibliography  Save this article

Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem

Author

Listed:
  • Jin Zhang
  • Qing Liu
  • XiaoHang Han

Abstract

In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Considering its problems of slow convergence and low accuracy, four strategies within the QL framework are designed first: the weighting function-based reward matrix, the power function-based initial Q-table, a self-adaptive ε-beam search strategy, and a new Q-value update formula. Then, a self-adaptive beam search Q-learning (ABSQL) algorithm is designed. To solve the problem that the sub-route is not fully optimized in the ABSQL algorithm, a dynamic sub-route optimization strategy is introduced outside the QL framework, and then the DSRABSQL algorithm is designed. Experiments are conducted to compare QL, ABSQL, DSRABSQL, our previously proposed variable neighborhood discrete whale optimization algorithm, and two advanced reinforcement learning algorithms. The experimental results show that DSRABSQL significantly outperforms the other algorithms. In addition, two groups of algorithms are designed based on the QL and DSRABSQL algorithms to test the effectiveness of the five strategies. From the experimental results, it can be found that the dynamic sub-route optimization strategy and self-adaptive ε-beam search strategy contribute the most for small-, medium-, and large-scale instances. At the same time, collaboration exists between the four strategies within the QL framework, which increases with the expansion of the instance scale.

Suggested Citation

  • Jin Zhang & Qing Liu & XiaoHang Han, 2023. "Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem," PLOS ONE, Public Library of Science, vol. 18(3), pages 1-31, March.
  • Handle: RePEc:plo:pone00:0283207
    DOI: 10.1371/journal.pone.0283207
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0283207
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0283207&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0283207?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Gilles Pesant & Michel Gendreau & Jean-Yves Potvin & Jean-Marc Rousseau, 1998. "An Exact Constraint Logic Programming Algorithm for the Traveling Salesman Problem with Time Windows," Transportation Science, INFORMS, vol. 32(1), pages 12-29, February.
    2. Ahmed Stohy & Heba-Tullah Abdelhakam & Sayed Ali & Mohammed Elhenawy & Abdallah A Hassan & Mahmoud Masoud & Sebastien Glaser & Andry Rakotonirainy, 2021. "Hybrid pointer networks for traveling salesman problems optimization," PLOS ONE, Public Library of Science, vol. 16(12), pages 1-17, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dieter, Peter & Caron, Matthew & Schryen, Guido, 2023. "Integrating driver behavior into last-mile delivery routing: Combining machine learning and optimization in a hybrid decision support framework," European Journal of Operational Research, Elsevier, vol. 311(1), pages 283-300.
    2. Filippo Focacci & Andrea Lodi & Michela Milano, 2002. "A Hybrid Exact Algorithm for the TSPTW," INFORMS Journal on Computing, INFORMS, vol. 14(4), pages 403-417, November.
    3. Roberto Baldacci & Aristide Mingozzi & Roberto Roberti, 2012. "New State-Space Relaxations for Solving the Traveling Salesman Problem with Time Windows," INFORMS Journal on Computing, INFORMS, vol. 24(3), pages 356-371, August.
    4. Sanjeeb Dash & Oktay Günlük & Andrea Lodi & Andrea Tramontani, 2012. "A Time Bucket Formulation for the Traveling Salesman Problem with Time Windows," INFORMS Journal on Computing, INFORMS, vol. 24(1), pages 132-147, February.
    5. Tarhan, İstenç & Oğuz, Ceyda, 2022. "A matheuristic for the generalized order acceptance and scheduling problem," European Journal of Operational Research, Elsevier, vol. 299(1), pages 87-103.
    6. Olli Bräysy & Michel Gendreau, 2005. "Vehicle Routing Problem with Time Windows, Part II: Metaheuristics," Transportation Science, INFORMS, vol. 39(1), pages 119-139, February.
    7. Ann M. Campbell & Barrett W. Thomas, 2008. "Probabilistic Traveling Salesman Problem with Deadlines," Transportation Science, INFORMS, vol. 42(1), pages 1-21, February.
    8. Majed G. Alharbi & Ahmed Stohy & Mohammed Elhenawy & Mahmoud Masoud & Hamiden Abd El-Wahed Khalifa, 2021. "Solving Traveling Salesman Problem with Time Windows Using Hybrid Pointer Networks with Time Features," Sustainability, MDPI, vol. 13(22), pages 1-12, November.
    9. Fontaine, Romain & Dibangoye, Jilles & Solnon, Christine, 2023. "Exact and anytime approach for solving the time dependent traveling salesman problem with time windows," European Journal of Operational Research, Elsevier, vol. 311(3), pages 833-844.
    10. Jeffrey W. Ohlmann & Barrett W. Thomas, 2007. "A Compressed-Annealing Heuristic for the Traveling Salesman Problem with Time Windows," INFORMS Journal on Computing, INFORMS, vol. 19(1), pages 80-90, February.
    11. Pesant, Gilles & Gendreau, Michel & Potvin, Jean-Yves & Rousseau, Jean-Marc, 1999. "On the flexibility of constraint programming models: From single to multiple time windows for the traveling salesman problem," European Journal of Operational Research, Elsevier, vol. 117(2), pages 253-263, September.
    12. Albiach, José & Sanchis, José Marí­a & Soler, David, 2008. "An asymmetric TSP with time windows and with time-dependent travel times and costs: An exact solution through a graph transformation," European Journal of Operational Research, Elsevier, vol. 189(3), pages 789-802, September.
    13. John S. F. Lyons & Peter C. Bell & Mehmet A. Begen, 2018. "Solving the Whistler-Blackcomb Mega Day Challenge," Interfaces, INFORMS, vol. 48(4), pages 323-339, August.
    14. Natashia L. Boland & Martin W. P. Savelsbergh, 2019. "Perspectives on integer programming for time-dependent models," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 27(2), pages 147-173, July.
    15. Vianney Coppé & Xavier Gillard & Pierre Schaus, 2024. "Decision Diagram-Based Branch-and-Bound with Caching for Dominance and Suboptimality Detection," INFORMS Journal on Computing, INFORMS, vol. 36(6), pages 1522-1542, December.
    16. Gerardo Berbeglia & Gilles Pesant & Louis-Martin Rousseau, 2011. "Checking the Feasibility of Dial-a-Ride Instances Using Constraint Programming," Transportation Science, INFORMS, vol. 45(3), pages 399-412, August.
    17. Mor, A. & Speranza, M.G. & Viegas, J.M., 2020. "Efficient loading and unloading operations via a booking system," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 141(C).
    18. Anirudh Subramanyam & Chrysanthos E. Gounaris, 2018. "A Decomposition Algorithm for the Consistent Traveling Salesman Problem with Vehicle Idling," Transportation Science, INFORMS, vol. 52(2), pages 386-401, March.
    19. Russell, Robert, 2013. "A constraint programming approach to designing a newspaper distribution system," International Journal of Production Economics, Elsevier, vol. 145(1), pages 132-138.
    20. Andrea Lodi & Michela Milano & Louis-Martin Rousseau, 2006. "Discrepancy-Based Additive Bounding Procedures," INFORMS Journal on Computing, INFORMS, vol. 18(4), pages 480-493, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0283207. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.