IDEAS home Printed from https://ideas.repec.org/a/spr/pubtra/v15y2023i2d10.1007_s12469-022-00309-0.html
   My bibliography  Save this article

A supervised machine learning model for imputing missing boarding stops in smart card data

Author

Listed:
  • Nadav Shalit

    (Ben-Gurion University of the Negev)

  • Michael Fire

    (Ben-Gurion University of the Negev)

  • Eran Ben-Elia

    (Ben-Gurion University of the Negev)

Abstract

Public transport has become an essential part of urban existence with increased population densities and environmental awareness. Large quantities of data are currently generated, allowing for more robust methods to understand travel behavior by harvesting smart card usage. However, public transport datasets suffer from data integrity problems; boarding stop information may be missing due to imperfect acquirement processes or inadequate reporting. This study introduces a supervised machine learning method to impute missing boarding stops based on ordinal classification using GTFS timetable, smart card, and geospatial datasets. A new metric, Pareto Accuracy, is suggested to evaluate algorithms where classes have an ordinal nature. The results are based on a case study in the city of Beer Sheva, Israel, consisting of one month of smart card data. We show that our proposed method is robust to irregular travelers and significantly outperforms well-known imputation methods without the need to mine any additional datasets. The data validation from another Israeli city using transfer learning shows the presented model is general and context-free. The implications for transportation planning and travel behavior research are further discussed.

Suggested Citation

  • Nadav Shalit & Michael Fire & Eran Ben-Elia, 2023. "A supervised machine learning model for imputing missing boarding stops in smart card data," Public Transport, Springer, vol. 15(2), pages 287-319, June.
  • Handle: RePEc:spr:pubtra:v:15:y:2023:i:2:d:10.1007_s12469-022-00309-0
    DOI: 10.1007/s12469-022-00309-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s12469-022-00309-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s12469-022-00309-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sebastián M. Palacio, "undated". "Machine Learning Forecasts of Public Transport Demand: A comparative analysis of supervised algorithms using smart card data," Working Papers XREAP2018-3, Xarxa de Referència en Economia Aplicada (XREAP).
    2. Hadas, Yuval, 2013. "Assessing public transport systems connectivity based on Google Transit data," Journal of Transport Geography, Elsevier, vol. 33(C), pages 105-116.
    3. Tao, Sui & Rohde, David & Corcoran, Jonathan, 2014. "Examining the spatial–temporal dynamics of bus passenger travel behaviour using smart card data and the flow-comap," Journal of Transport Geography, Elsevier, vol. 41(C), pages 21-36.
    4. Bagchi, M. & White, P.R., 2005. "The potential of public transport smart card data," Transport Policy, Elsevier, vol. 12(5), pages 464-474, September.
    5. Filip Covic & Stefan Voß, 2019. "Interoperable smart card data management in public mass transit," Public Transport, Springer, vol. 11(3), pages 523-548, October.
    6. Jie Huang & David Levinson & Jiaoe Wang & Jiangping Zhou & Zi-jia Wang, 2018. "Tracking job and housing dynamics with smartcard data," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 115(50), pages 12710-12715, December.
    7. Eneko Echaniz & Chinh Ho & Andres Rodriguez & Luigi dell’Olio, 2020. "Modelling user satisfaction in public transport systems considering missing information," Transportation, Springer, vol. 47(6), pages 2903-2921, December.
    8. Eran Ben-Elia & Glenn Lyons & Patricia L. Mokhtarian, 2018. "Epilogue: the new frontiers of behavioral research on the interrelationships between ICT, activities, time use and mobility," Transportation, Springer, vol. 45(2), pages 479-497, March.
    9. Sohail, M. & Maunder, D.A.C. & Cavill, S., 2006. "Effective regulation for sustainable public transport in developing countries," Transport Policy, Elsevier, vol. 13(3), pages 177-190, May.
    10. Ma, Xiaolei & Liu, Congcong & Wen, Huimin & Wang, Yunpeng & Wu, Yao-Jan, 2017. "Understanding commuting patterns using transit smart card data," Journal of Transport Geography, Elsevier, vol. 58(C), pages 135-145.
    11. Guihaire, Valérie & Hao, Jin-Kao, 2008. "Transit network design and scheduling: A global review," Transportation Research Part A: Policy and Practice, Elsevier, vol. 42(10), pages 1251-1273, December.
    12. Alfonso Orro & Margarita Novales & Ángel Monteagudo & José-Benito Pérez-López & Miguel R. Bugarín, 2020. "Impact on City Bus Transit Services of the COVID–19 Lockdown and Return to the New Normal: The Case of A Coruña (Spain)," Sustainability, MDPI, vol. 12(17), pages 1-30, September.
    13. Cuauhtemoc Anda & Alexander Erath & Pieter Jacobus Fourie, 2017. "Transport modelling in the age of big data," International Journal of Urban Sciences, Taylor & Francis Journals, vol. 21(0), pages 19-42, August.
    14. Milne, Dave & Watling, David, 2019. "Big data and understanding change in the context of planning transport systems," Journal of Transport Geography, Elsevier, vol. 76(C), pages 235-244.
    15. Stopher, Peter R. & Greaves, Stephen P., 2007. "Household travel surveys: Where are we going?," Transportation Research Part A: Policy and Practice, Elsevier, vol. 41(5), pages 367-381, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Liping Ge & Malek Sarhani & Stefan Voß & Lin Xie, 2021. "Review of Transit Data Sources: Potentials, Challenges and Complementarity," Sustainability, MDPI, vol. 13(20), pages 1-37, October.
    2. Kevin Credit & Zander Arnao, 2023. "A method to derive small area estimates of linked commuting trips by mode from open source LODES and ACS data," Environment and Planning B, , vol. 50(3), pages 709-722, March.
    3. Kandt, Jens & Leak, Alistair, 2019. "Examining inclusive mobility through smartcard data: What shall we make of senior citizens' declining bus patronage in the West Midlands?," Journal of Transport Geography, Elsevier, vol. 79(C), pages 1-1.
    4. Chen, Wendong & Cheng, Long & Chen, Xuewu & Chen, Jingxu & Cao, Mengqiu, 2021. "Measuring accessibility to health care services for older bus passengers: A finer spatial resolution," Journal of Transport Geography, Elsevier, vol. 93(C).
    5. Bantis, Thanos & Haworth, James, 2020. "Assessing transport related social exclusion using a capabilities approach to accessibility framework: A dynamic Bayesian network approach," Journal of Transport Geography, Elsevier, vol. 84(C).
    6. Zhou, Yang & Thill, Jean-Claude & Xu, Yang & Fang, Zhixiang, 2021. "Variability in individual home-work activity patterns," Journal of Transport Geography, Elsevier, vol. 90(C).
    7. Egu, Oscar & Bonnel, Patrick, 2020. "How comparable are origin-destination matrices estimated from automatic fare collection, origin-destination surveys and household travel survey? An empirical investigation in Lyon," Transportation Research Part A: Policy and Practice, Elsevier, vol. 138(C), pages 267-282.
    8. Amaya, Margarita & Cruzat, Ramón & Munizaga, Marcela A., 2018. "Estimating the residence zone of frequent public transport users to make travel pattern and time use analysis," Journal of Transport Geography, Elsevier, vol. 66(C), pages 330-339.
    9. Zijia Wang & Hao Tang & Wenjuan Wang & Yang Xi, 2020. "The Pattern of Non-Roundtrip Travel on Urban Rail and Its Application in Transit Improvement," Sustainability, MDPI, vol. 12(9), pages 1-16, April.
    10. Fangye Du & Jiaoe Wang & Yu Liu & Zihao Zhou & Haitao Jin, 2022. "Equity in Health-Seeking Behavior of Groups Using Different Transportations," IJERPH, MDPI, vol. 19(5), pages 1-16, February.
    11. Yu, Chang & He, Zhao-Cheng, 2017. "Analysing the spatial-temporal characteristics of bus travel demand using the heat map," Journal of Transport Geography, Elsevier, vol. 58(C), pages 247-255.
    12. Cong Liao & Teqi Dai, 2022. "Is “Attending Nearby School” Near? An Analysis of Travel-to-School Distances of Primary Students in Beijing Using Smart Card Data," Sustainability, MDPI, vol. 14(7), pages 1-12, April.
    13. Pieroni, Caio & Giannotti, Mariana & Alves, Bianca B. & Arbex, Renato, 2021. "Big data for big issues: Revealing travel patterns of low-income population based on smart card data mining in a global south unequal city," Journal of Transport Geography, Elsevier, vol. 96(C).
    14. Wang, Yihong & Correia, Gonçalo Homem de Almeida & de Romph, Erik & Timmermans, H.J.P., 2017. "Using metro smart card data to model location choice of after-work activities: An application to Shanghai," Journal of Transport Geography, Elsevier, vol. 63(C), pages 40-47.
    15. Masood Jafari Kang & Shervin Ataeian & S. M. Mahdi Amiripour, 2021. "A procedure for public transit OD matrix generation using smart card transaction data," Public Transport, Springer, vol. 13(1), pages 81-100, March.
    16. Benito Zaragozí & Sergio Trilles & Aaron Gutiérrez & Daniel Miravet, 2021. "Development of a Common Framework for Analysing Public Transport Smart Card Data," Energies, MDPI, vol. 14(19), pages 1-22, September.
    17. Weng, JianCheng & Yu, JiangBo & Di, XiaoJian & Lin, PengFei & Wang, Jing-Jing & Mao, Li-Zeng, 2023. "How does the state of bus operations influence passengers’ service satisfaction? A method considering the differences in passenger preferences," Transportation Research Part A: Policy and Practice, Elsevier, vol. 174(C).
    18. Xia Zhao & Mengying Cui & David Levinson, 2023. "Exploring temporal variability in travel patterns on public transit using big smart card data," Environment and Planning B, , vol. 50(1), pages 198-217, January.
    19. Anwar, Muhammad Azfar & Dhir, Amandeep & Jabeen, Fauzia & Zhang, Qingyu & Siddiquei, Ahmad Nabeel, 2023. "Unconventional green transport innovations in the post-COVID-19 era. A trade-off between green actions and personal health protection," Journal of Business Research, Elsevier, vol. 155(PA).
    20. Kuo, Yong-Hong & Leung, Janny M.Y. & Yan, Yimo, 2023. "Public transport for smart cities: Recent innovations and future challenges," European Journal of Operational Research, Elsevier, vol. 306(3), pages 1001-1026.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:pubtra:v:15:y:2023:i:2:d:10.1007_s12469-022-00309-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.