IDEAS home Printed from https://ideas.repec.org/a/gam/jlands/v14y2025i8p1612-d1720508.html
   My bibliography  Save this article

Urban Subway Station Site Selection Prediction Based on Clustered Demand and Interpretable Machine Learning Models

Author

Listed:
  • Yun Liu

    (College of Earth Sciences, Yunnan University, Kunming 650500, China)

  • Xin Yao

    (College of Earth Sciences, Yunnan University, Kunming 650500, China)

  • Hang Lv

    (College of Earth Sciences, Yunnan University, Kunming 650500, China)

  • Dingjie Zhou

    (Yunnan Provincial Institute of Surveying and Mapping, Kunming 650500, China)

  • Zhiqiang Xie

    (College of Earth Sciences, Yunnan University, Kunming 650500, China)

  • Xiaoqing Zhao

    (College of Earth Sciences, Yunnan University, Kunming 650500, China)

  • Quan Zhu

    (Kunming Urban Transport Institute, Kunming 650500, China)

  • Cong Chai

    (Kunming Urban Transport Institute, Kunming 650500, China)

Abstract

With accelerating urbanization, the development of rail transit systems—particularly subways—has become a key strategy for alleviating urban traffic congestion. However, existing studies on subway station site selection often lack a spatially continuous evaluation of site suitability across the entire study area. This may lead to a disconnect between planning and actual demand, resulting in issues such as “overbuilt infrastructure” or the “island effect.” To address this issue, this study selects Kunming City, China, as the study area, employs the K-means++ algorithm to cluster existing subway stations based on passenger flow, integrates multi-source spatial data, applies a random forest algorithm for optimal positive sample selection and driving factor identification, and subsequently uses a LightGBM-SHAP explainable machine learning framework to develop a predictive model for station location based on mathematical modeling. The main findings of the study are as follows: (1) Using the random forest model, 20 key drivers influencing site selection were identified. SHAP analysis revealed that the top five contributing factors were connectivity, nighttime lighting, road network density, transportation service, and residence density. Among these, transportation-related factors accounted for three out of five and emerged as the primary determinants of subway station site selection. (2) The site selection prediction model exhibited strong performance, achieving an R 2 value of 0.95 on the test set and an average R 2 of 0.79 during spatial 5-fold cross-validation, indicating high model reliability. The spatial distribution of predicted suitability indicated that the core urban area within the Second Ring Road exhibited the highest suitability, with suitability gradually declining toward the periphery. High-suitability areas outside the Third Ring Road in suburban regions were primarily aligned along existing subway lines. (3) The cumulative predicted probability within a 300 m buffer zone around each station was positively correlated with passenger flow levels. Overlaying the predicted results with current station locations revealed strong spatial consistency, indicating that the model outputs closely align with the actual spatial layout and passenger usage intensity of existing stations. These findings provide valuable decision-making support for optimizing subway station layouts and planning future transportation infrastructure, offering both theoretical and practical significance for data-driven site selection.

Suggested Citation

  • Yun Liu & Xin Yao & Hang Lv & Dingjie Zhou & Zhiqiang Xie & Xiaoqing Zhao & Quan Zhu & Cong Chai, 2025. "Urban Subway Station Site Selection Prediction Based on Clustered Demand and Interpretable Machine Learning Models," Land, MDPI, vol. 14(8), pages 1-29, August.
  • Handle: RePEc:gam:jlands:v:14:y:2025:i:8:p:1612-:d:1720508
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2073-445X/14/8/1612/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2073-445X/14/8/1612/
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jlands:v:14:y:2025:i:8:p:1612-:d:1720508. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.