IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v8y2016i11p1100-d81590.html
   My bibliography  Save this article

Predicting Short-Term Subway Ridership and Prioritizing Its Influential Factors Using Gradient Boosting Decision Trees

Author

Listed:
  • Chuan Ding

    (School of Transportation Science and Engineering, Beijing Key Laboratory for Cooperative Vehicle Infrastructure System and Safety Control, Beihang University, Beijing 100191, China
    State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing 100044, China)

  • Donggen Wang

    (Department of Geography, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China)

  • Xiaolei Ma

    (School of Transportation Science and Engineering, Beijing Key Laboratory for Cooperative Vehicle Infrastructure System and Safety Control, Beihang University, Beijing 100191, China
    Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Si-Pai-Lou #2, Nanjing 210096, China)

  • Haiying Li

    (State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing 100044, China)

Abstract

Understanding the relationship between short-term subway ridership and its influential factors is crucial to improving the accuracy of short-term subway ridership prediction. Although there has been a growing body of studies on short-term ridership prediction approaches, limited effort is made to investigate the short-term subway ridership prediction considering bus transfer activities and temporal features. To fill this gap, a relatively recent data mining approach called gradient boosting decision trees (GBDT) is applied to short-term subway ridership prediction and used to capture the associations with the independent variables. Taking three subway stations in Beijing as the cases, the short-term subway ridership and alighting passengers from its adjacent bus stops are obtained based on transit smart card data. To optimize the model performance with different combinations of regularization parameters, a series of GBDT models are built with various learning rates and tree complexities by fitting a maximum of trees. The optimal model performance confirms that the gradient boosting approach can incorporate different types of predictors, fit complex nonlinear relationships, and automatically handle the multicollinearity effect with high accuracy. In contrast to other machine learning methods—or “black-box” procedures—the GBDT model can identify and rank the relative influences of bus transfer activities and temporal features on short-term subway ridership. These findings suggest that the GBDT model has considerable advantages in improving short-term subway ridership prediction in a multimodal public transportation system.

Suggested Citation

  • Chuan Ding & Donggen Wang & Xiaolei Ma & Haiying Li, 2016. "Predicting Short-Term Subway Ridership and Prioritizing Its Influential Factors Using Gradient Boosting Decision Trees," Sustainability, MDPI, vol. 8(11), pages 1-16, October.
  • Handle: RePEc:gam:jsusta:v:8:y:2016:i:11:p:1100-:d:81590
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/8/11/1100/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/8/11/1100/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Idris, Ahmed Osman & Nurul Habib, Khandker M. & Shalaby, Amer, 2015. "An investigation on the performances of mode shift models in transit ridership forecasting," Transportation Research Part A: Policy and Practice, Elsevier, vol. 78(C), pages 551-565.
    2. Zhang, Dapeng & Wang, Xiaokun (Cara), 2014. "Transit ridership estimation with network Kriging: a case study of Second Avenue Subway, NYC," Journal of Transport Geography, Elsevier, vol. 41(C), pages 107-115.
    3. Chen, Mu-Chen & Wei, Yu, 2011. "Exploring time variants for short-term passenger flow," Journal of Transport Geography, Elsevier, vol. 19(4), pages 488-498.
    4. Jinbao Zhao & Wei Deng & Yan Song & Yueran Zhu, 2014. "Analysis of Metro ridership at station level and station-to-station level in Nanjing: an approach based on direct demand models," Transportation, Springer, vol. 41(1), pages 133-155, January.
    5. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    6. Matthias Schonlau, 2005. "Boosted regression (boosting): An introductory tutorial and a Stata plugin," Stata Journal, StataCorp LP, vol. 5(3), pages 330-354, September.
    7. Jaeseok Her & Sungjin Park & Jae Seung Lee, 2016. "The Effects of Bus Ridership on Airborne Particulate Matter (PM10) Concentrations," Sustainability, MDPI, vol. 8(7), pages 1-14, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhang, Qian & Liu, Xiaoxiao & Spurgeon, Sarah & Yu, Dingli, 2021. "A two-layer modelling framework for predicting passenger flow on trains: A case study of London underground trains," Transportation Research Part A: Policy and Practice, Elsevier, vol. 151(C), pages 119-139.
    2. Hongtai Yang & Jianjiang Yang & Lee D Han & Xiaohan Liu & Li Pu & Shih-miao Chin & Ho-ling Hwang, 2018. "A Kriging based spatiotemporal approach for traffic volume data imputation," PLOS ONE, Public Library of Science, vol. 13(4), pages 1-11, April.
    3. Mike Lindow & David DeFranza & Arul Mishra & Himanshu Mishra, 2021. "Scared into Action: How Partisanship and Fear are Associated with Reactions to Public Health Directives," Papers 2101.05365, arXiv.org.
    4. Egu, Oscar & Bonnel, Patrick, 2021. "Medium-term public transit route ridership forecasting: What, how and why? A case study in Lyon," Transport Policy, Elsevier, vol. 105(C), pages 124-133.
    5. Anupriya, & Graham, Daniel J. & Bansal, Prateek & Hörcher, Daniel & Anderson, Richard, 2023. "Optimal congestion control strategies for near-capacity urban metros: Informing intervention via fundamental diagrams," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 609(C).
    6. Shao, Qifan & Zhang, Wenjia & Cao, Xinyu & Yang, Jiawen & Yin, Jie, 2020. "Threshold and moderating effects of land use on metro ridership in Shenzhen: Implications for TOD planning," Journal of Transport Geography, Elsevier, vol. 89(C).
    7. Ximan Ling & Zhiren Huang & Chengcheng Wang & Fan Zhang & Pu Wang, 2018. "Predicting subway passenger flows under different traffic conditions," PLOS ONE, Public Library of Science, vol. 13(8), pages 1-23, August.
    8. Ma, Xiaolei & Miao, Ran & Wu, Xinkai & Liu, Xianglong, 2021. "Examining influential factors on the energy consumption of electric and diesel buses: A data-driven analysis of large-scale public transit network in Beijing," Energy, Elsevier, vol. 216(C).
    9. Lee, Yongsung & Lee, Bumsoo, 2022. "What’s eating public transit in the United States? Reasons for declining transit ridership in the 2010s," Transportation Research Part A: Policy and Practice, Elsevier, vol. 157(C), pages 126-143.
    10. Xuesong Feng & Zhibin Tao & Xuejun Niu & Zejing Ruan, 2021. "Multi-Objective Land Use Allocation Optimization in View of Overlapped Influences of Rail Transit Stations," Sustainability, MDPI, vol. 13(23), pages 1-14, November.
    11. Yi Cao & Xiaolei Hou & Nan Chen, 2022. "Short-Term Forecast of OD Passenger Flow Based on Ensemble Empirical Mode Decomposition," Sustainability, MDPI, vol. 14(14), pages 1-14, July.
    12. Zhuangbin Shi & Ning Zhang & Yang Liu & Wei Xu, 2018. "Exploring Spatiotemporal Variation in Hourly Metro Ridership at Station Level: The Influence of Built Environment and Topological Structure," Sustainability, MDPI, vol. 10(12), pages 1-16, December.
    13. Yap, Menno & Munizaga, Marcela, 2018. "Workshop 8 report: Big data in the digital age and how it can benefit public transport users," Research in Transportation Economics, Elsevier, vol. 69(C), pages 615-620.
    14. Pengfei Lin & Jiancheng Weng & Dimitrios Alivanistos & Siyong Ma & Baocai Yin, 2020. "Identifying and Segmenting Commuting Behavior Patterns Based on Smart Card Data and Travel Survey Data," Sustainability, MDPI, vol. 12(12), pages 1-18, June.
    15. Shruti Sachdeva & Tarunpreet Bhatia & A. K. Verma, 2018. "GIS-based evolutionary optimized Gradient Boosted Decision Trees for forest fire susceptibility mapping," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 92(3), pages 1399-1418, July.
    16. Weijia (Vivian) Li & Kara M. Kockelman, 2022. "How does machine learning compare to conventional econometrics for transport data sets? A test of ML versus MLE," Growth and Change, Wiley Blackwell, vol. 53(1), pages 342-376, March.
    17. Yiyi Chen & Ye Liu, 2021. "Which Risk Factors Matter More for Psychological Distress during the COVID-19 Pandemic? An Application Approach of Gradient Boosting Decision Trees," IJERPH, MDPI, vol. 18(11), pages 1-18, May.
    18. Yap, M.D. & Nijënstein, S. & van Oort, N., 2018. "Improving predictions of public transport usage during disturbances based on smart card data," Transport Policy, Elsevier, vol. 61(C), pages 84-95.
    19. Tu, Wei & Cao, Rui & Yue, Yang & Zhou, Baoding & Li, Qiuping & Li, Qingquan, 2018. "Spatial variations in urban public ridership derived from GPS trajectories and smart card data," Journal of Transport Geography, Elsevier, vol. 69(C), pages 45-57.
    20. Jeongwoo Lee & Marlon Boarnet & Douglas Houston & Hilary Nixon & Steven Spears, 2017. "Changes in Service and Associated Ridership Impacts near a New Light Rail Transit Line," Sustainability, MDPI, vol. 9(10), pages 1-27, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kepaptsoglou, Konstantinos & Stathopoulos, Antony & Karlaftis, Matthew G., 2017. "Ridership estimation of a new LRT system: Direct demand model approach," Journal of Transport Geography, Elsevier, vol. 58(C), pages 146-156.
    2. Tu, Wei & Cao, Rui & Yue, Yang & Zhou, Baoding & Li, Qiuping & Li, Qingquan, 2018. "Spatial variations in urban public ridership derived from GPS trajectories and smart card data," Journal of Transport Geography, Elsevier, vol. 69(C), pages 45-57.
    3. Wang, Jing & Wan, Feng & Dong, Chunjiao & Yin, Chaoying & Chen, Xiaoyu, 2023. "Spatiotemporal effects of built environment factors on varying rail transit station ridership patterns," Journal of Transport Geography, Elsevier, vol. 109(C).
    4. Mehmet Güney Celbiş & Pui-Hang Wong & Karima Kourtit & Peter Nijkamp, 2021. "Innovativeness, Work Flexibility, and Place Characteristics: A Spatial Econometric and Machine Learning Approach," Sustainability, MDPI, vol. 13(23), pages 1-29, December.
    5. Ding, Chuan & Cao, Xinyu & Liu, Chao, 2019. "How does the station-area built environment influence Metrorail ridership? Using gradient boosting decision trees to identify non-linear thresholds," Journal of Transport Geography, Elsevier, vol. 77(C), pages 70-78.
    6. Christoph Emanuel Mueller, 2016. "Accurate forecast of countries’ research output by macro-level indicators," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(2), pages 1307-1328, November.
    7. Frenger, Monika & Emrich, Eike & Geber, Sebastian & Follert, Florian & Pierdzioch, Christian, 2019. "The influence of performance parameters on market value," Working Papers of the European Institute for Socioeconomics 30, European Institute for Socioeconomics (EIS), Saarbrücken.
    8. Li, Shaoying & Lyu, Dijiang & Huang, Guanping & Zhang, Xiaohu & Gao, Feng & Chen, Yuting & Liu, Xiaoping, 2020. "Spatially varying impacts of built environment factors on rail transit ridership at station level: A case study in Guangzhou, China," Journal of Transport Geography, Elsevier, vol. 82(C).
    9. Zhuangbin Shi & Ning Zhang & Yang Liu & Wei Xu, 2018. "Exploring Spatiotemporal Variation in Hourly Metro Ridership at Station Level: The Influence of Built Environment and Topological Structure," Sustainability, MDPI, vol. 10(12), pages 1-16, December.
    10. Mansoor, Umer & Jamal, Arshad & Su, Junbiao & Sze, N.N. & Chen, Anthony, 2023. "Investigating the risk factors of motorcycle crash injury severity in Pakistan: Insights and policy recommendations," Transport Policy, Elsevier, vol. 139(C), pages 21-38.
    11. Irene Mosca & Alan Barrett, 2016. "The impact of adult child emigration on the mental health of older parents," Journal of Population Economics, Springer;European Society for Population Economics, vol. 29(3), pages 687-719, July.
    12. Toşa, Cristian & Sato, Hitomi & Morikawa, Takayuki & Miwa, Tomio, 2018. "Commuting behavior in emerging urban areas: Findings of a revealed-preferences and stated-intentions survey in Cluj-Napoca, Romania," Journal of Transport Geography, Elsevier, vol. 68(C), pages 78-93.
    13. Bissan Ghaddar & Ignacio Gómez-Casares & Julio González-Díaz & Brais González-Rodríguez & Beatriz Pateiro-López & Sofía Rodríguez-Ballesteros, 2023. "Learning for Spatial Branching: An Algorithm Selection Approach," INFORMS Journal on Computing, INFORMS, vol. 35(5), pages 1024-1043, September.
    14. Akash Malhotra, 2018. "A hybrid econometric-machine learning approach for relative importance analysis: Prioritizing food policy," Papers 1806.04517, arXiv.org, revised Aug 2020.
    15. Zhang, Jie & Wang, David Z.W. & Meng, Meng, 2018. "Which service is better on a linear travel corridor: Park & ride or on-demand public bus?," Transportation Research Part A: Policy and Practice, Elsevier, vol. 118(C), pages 803-818.
    16. Nahushananda Chakravarthy H G & Karthik M Seenappa & Sujay Raghavendra Naganna & Dayananda Pruthviraja, 2023. "Machine Learning Models for the Prediction of the Compressive Strength of Self-Compacting Concrete Incorporating Incinerated Bio-Medical Waste Ash," Sustainability, MDPI, vol. 15(18), pages 1-22, September.
    17. Tim Voigt & Martin Kohlhase & Oliver Nelles, 2021. "Incremental DoE and Modeling Methodology with Gaussian Process Regression: An Industrially Applicable Approach to Incorporate Expert Knowledge," Mathematics, MDPI, vol. 9(19), pages 1-26, October.
    18. Wen, Shaoting & Buyukada, Musa & Evrendilek, Fatih & Liu, Jingyong, 2020. "Uncertainty and sensitivity analyses of co-combustion/pyrolysis of textile dyeing sludge and incense sticks: Regression and machine-learning models," Renewable Energy, Elsevier, vol. 151(C), pages 463-474.
    19. Zhu, Haibin & Bai, Lu & He, Lidan & Liu, Zhi, 2023. "Forecasting realized volatility with machine learning: Panel data perspective," Journal of Empirical Finance, Elsevier, vol. 73(C), pages 251-271.
    20. Spiliotis, Evangelos & Makridakis, Spyros & Kaltsounis, Anastasios & Assimakopoulos, Vassilios, 2021. "Product sales probabilistic forecasting: An empirical evaluation using the M5 competition data," International Journal of Production Economics, Elsevier, vol. 240(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:8:y:2016:i:11:p:1100-:d:81590. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.