IDEAS home Printed from https://ideas.repec.org/a/eee/transa/v174y2023ics0965856423001635.html
   My bibliography  Save this article

Examining nonlinearity in population inflow estimation using big data: An empirical comparison of explainable machine learning models

Author

Listed:
  • Hu, Songhua
  • Xiong, Chenfeng
  • Chen, Peng
  • Schonfeld, Paul

Abstract

Mobile device location data (MDLD) contain population-representative, fine-grained travel demand information, facilitating opportunities to validate established relations between travel demand and underlying factors from a big data perspective. Using the nationwide census block group (CBG)-level population inflow derived from MDLD as the proxy of travel demand, this study examines its relations with various factors including socioeconomics, demographics, land use, and CBG attributes. A host of tree-based machine learning (ML) models and interpretation techniques (feature importance, partial dependence plot (PDP), accumulated local effect (ALE), SHapley Additive exPlanations (SHAP)) are extensively compared to determine the best model architecture and justify interpretation robustness. Empirical results show that: 1) Boosting trees perform the best among all models, followed by bagging trees, single trees, and linear regressions. (2) Feature importance holds consistently among different tree-based models but is influenced by measures of importance and hyperparameter settings. 3) Pronounced nonlinearities, threshold effects, and interaction effects are observed in relations among population inflow and most of its determinants. 4) Compared with PDP, ALE and SHAP plots are more reliable in the presence of outliers, feature dependency, and local heterogeneity. Taken together, techniques introduced in this study can either be integrated into customary travel demand models to enhance model accuracy or serve as interpretation tools that offer a comprehensive understanding of intricate relations.

Suggested Citation

  • Hu, Songhua & Xiong, Chenfeng & Chen, Peng & Schonfeld, Paul, 2023. "Examining nonlinearity in population inflow estimation using big data: An empirical comparison of explainable machine learning models," Transportation Research Part A: Policy and Practice, Elsevier, vol. 174(C).
  • Handle: RePEc:eee:transa:v:174:y:2023:i:c:s0965856423001635
    DOI: 10.1016/j.tra.2023.103743
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0965856423001635
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.tra.2023.103743?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Daniel W. Apley & Jingyu Zhu, 2020. "Visualizing the effects of predictor variables in black box supervised learning models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(4), pages 1059-1086, September.
    2. Xu, Yiming & Yan, Xiang & Liu, Xinyu & Zhao, Xilei, 2021. "Identifying key factors associated with ridesplitting adoption rate and modeling their nonlinear relationships," Transportation Research Part A: Policy and Practice, Elsevier, vol. 144(C), pages 170-188.
    3. Allahviranloo, Mahdieh & Aissaoui, Leila, 2019. "A comparison of time-use behavior in metropolitan areas using pattern recognition techniques," Transportation Research Part A: Policy and Practice, Elsevier, vol. 129(C), pages 271-287.
    4. Ding, Chuan & Cao, Xinyu (Jason) & Næss, Petter, 2018. "Applying gradient boosting decision trees to examine non-linear effects of the built environment on driving distance in Oslo," Transportation Research Part A: Policy and Practice, Elsevier, vol. 110(C), pages 107-117.
    5. Mahdieh Allahviranloo & Will Recker, 2015. "Mining activity pattern trajectories and allocating activities in the network," Transportation, Springer, vol. 42(4), pages 561-579, July.
    6. Yang, Haoran & Zhang, Qinran & Helbich, Marco & Lu, Yi & He, Dongsheng & Ettema, Dick & Chen, Long, 2022. "Examining non-linear associations between built environments around workplace and adults’ walking behaviour in Shanghai, China," Transportation Research Part A: Policy and Practice, Elsevier, vol. 155(C), pages 234-246.
    7. Shenhao Wang & Baichuan Mo & Stephane Hess & Jinhua Zhao, 2021. "Comparing hundreds of machine learning classifiers and discrete choice models in predicting travel behavior: an empirical benchmark," Papers 2102.01130, arXiv.org.
    8. Hu, Songhua & Chen, Mingyang & Jiang, Yuan & Sun, Wei & Xiong, Chenfeng, 2022. "Examining factors associated with bike-and-ride (BnR) activities around metro stations in large-scale dockless bikesharing systems," Journal of Transport Geography, Elsevier, vol. 98(C).
    9. Shao, Qifan & Zhang, Wenjia & Cao, Xinyu & Yang, Jiawen & Yin, Jie, 2020. "Threshold and moderating effects of land use on metro ridership in Shenzhen: Implications for TOD planning," Journal of Transport Geography, Elsevier, vol. 89(C).
    10. Mohammad Hesam Hafezi & Lei Liu & Hugh Millward, 2019. "A time-use activity-pattern recognition model for activity-based travel demand modeling," Transportation, Springer, vol. 46(4), pages 1369-1394, August.
    11. Yang, Jiawen & Cao, Jason & Zhou, Yufei, 2021. "Elaborating non-linear associations and synergies of subway access and land uses with urban vitality in Shenzhen," Transportation Research Part A: Policy and Practice, Elsevier, vol. 144(C), pages 74-88.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yuhan Zhang & Youqi Wang & Yiru Bai & Ruiyuan Zhang & Xu Liu & Xian Ma, 2023. "Prediction of Spatial Distribution of Soil Organic Carbon in Helan Farmland Based on Different Prediction Models," Land, MDPI, vol. 12(11), pages 1-15, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shao, Qifan & Zhang, Wenjia & Cao, Xinyu (Jason) & Yang, Jiawen, 2023. "Built environment interventions for emission mitigation: A machine learning analysis of travel-related CO2 in a developing city," Journal of Transport Geography, Elsevier, vol. 110(C).
    2. Yang, Hongtai & Zheng, Rong & Li, Xuan & Huo, Jinghai & Yang, Linchuan & Zhu, Tong, 2022. "Nonlinear and threshold effects of the built environment on e-scooter sharing ridership," Journal of Transport Geography, Elsevier, vol. 104(C).
    3. Tao, Tao & Cao, Jason, 2022. "Examining motivations for owning autonomous vehicles: Implications for land use and transportation," Journal of Transport Geography, Elsevier, vol. 102(C).
    4. Ding, Chuan & Cao, Xinyu & Yu, Bin & Ju, Yang, 2021. "Non-linear associations between zonal built environment attributes and transit commuting mode choice accounting for spatial heterogeneity," Transportation Research Part A: Policy and Practice, Elsevier, vol. 148(C), pages 22-35.
    5. Tao, Tao & Cao, Jason, 2023. "Exploring nonlinear and collective influences of regional and local built environment characteristics on travel distances by mode," Journal of Transport Geography, Elsevier, vol. 109(C).
    6. Cheng, Long & Huang, Jie & Jin, Tanhua & Chen, Wendong & Li, Aoyong & Witlox, Frank, 2023. "Comparison of station-based and free-floating bikeshare systems as feeder modes to the metro," Journal of Transport Geography, Elsevier, vol. 107(C).
    7. Yang, Hongtai & Luo, Peng & Li, Chaojing & Zhai, Guocong & Yeh, Anthony G.O., 2023. "Nonlinear effects of fare discounts and built environment on ridesplitting adoption rates," Transportation Research Part A: Policy and Practice, Elsevier, vol. 169(C).
    8. Gao, Kun & Yang, Ying & Gil, Jorge & Qu, Xiaobo, 2023. "Data-driven interpretation on interactive and nonlinear effects of the correlated built environment on shared mobility," Journal of Transport Geography, Elsevier, vol. 110(C).
    9. Zhang, Xiaojian & Zhao, Xilei, 2022. "Machine learning approach for spatial modeling of ridesourcing demand," Journal of Transport Geography, Elsevier, vol. 100(C).
    10. Li, Zhitao & Tang, Jinjun & Zhao, Chuyun & Gao, Fan, 2023. "Improved centrality measure based on the adapted PageRank algorithm for urban transportation multiplex networks," Chaos, Solitons & Fractals, Elsevier, vol. 167(C).
    11. Laviolette, Jérôme & Morency, Catherine & Waygood, E.O.D., 2022. "A kilometer or a mile? Does buffer size matter when it comes to car ownership?," Journal of Transport Geography, Elsevier, vol. 104(C).
    12. Su, Shiliang & Wang, Zhuolun & Li, Bozhao & Kang, Mengjun, 2022. "Deciphering the influence of TOD on metro ridership: An integrated approach of extended node-place model and interpretable machine learning with planning implications," Journal of Transport Geography, Elsevier, vol. 104(C).
    13. Yin, Chun & Cao, Jason & Sun, Bindong & Liu, Jiahang, 2023. "Exploring built environment correlates of walking for different purposes: Evidence for substitution," Journal of Transport Geography, Elsevier, vol. 106(C).
    14. Tao, Sui & Cheng, Long & He, Sylvia & Witlox, Frank, 2023. "Examining the non-linear effects of transit accessibility on daily trip duration: A focus on the low-income population," Journal of Transport Geography, Elsevier, vol. 109(C).
    15. Hamed Naseri & Edward Owen Douglas Waygood & Bobin Wang & Zachary Patterson, 2022. "Application of Machine Learning to Child Mode Choice with a Novel Technique to Optimize Hyperparameters," IJERPH, MDPI, vol. 19(24), pages 1-19, December.
    16. Lixuan Zhao & Dewei Fang & Yang Cao & Shan Sun & Liu Han & Yang Xue & Qian Zheng, 2023. "Impact-Asymmetric Analysis of Bike-Sharing Residents’ Satisfaction: A Case Study of Harbin, China," Sustainability, MDPI, vol. 15(2), pages 1-19, January.
    17. Limon Barua & Bo Zou & Yan Zhou & Yulin Liu, 2023. "Modeling household online shopping demand in the U.S.: a machine learning approach and comparative investigation between 2009 and 2017," Transportation, Springer, vol. 50(2), pages 437-476, April.
    18. Cheng, Long & Wang, Kailai & De Vos, Jonas & Huang, Jie & Witlox, Frank, 2022. "Exploring non-linear built environment effects on the integration of free-floating bike-share and urban rail transport: A quantile regression approach," Transportation Research Part A: Policy and Practice, Elsevier, vol. 162(C), pages 175-187.
    19. Liu, Jixiang & Wang, Bo & Xiao, Longzhu, 2021. "Non-linear associations between built environment and active travel for working and shopping: An extreme gradient boosting approach," Journal of Transport Geography, Elsevier, vol. 92(C).
    20. Du, Qiang & Zhou, Yuqing & Huang, Youdan & Wang, Yalei & Bai, Libiao, 2022. "Spatiotemporal exploration of the non-linear impacts of accessibility on metro ridership," Journal of Transport Geography, Elsevier, vol. 102(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:transa:v:174:y:2023:i:c:s0965856423001635. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/547/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.