IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0244317.html
   My bibliography  Save this article

Applying machine learning and geolocation techniques to social media data (Twitter) to develop a resource for urban planning

Author

Listed:
  • Sveta Milusheva
  • Robert Marty
  • Guadalupe Bedoya
  • Sarah Williams
  • Elizabeth Resor
  • Arianna Legovini

Abstract

With all the recent attention focused on big data, it is easy to overlook that basic vital statistics remain difficult to obtain in most of the world. What makes this frustrating is that private companies hold potentially useful data, but it is not accessible by the people who can use it to track poverty, reduce disease, or build urban infrastructure. This project set out to test whether we can transform an openly available dataset (Twitter) into a resource for urban planning and development. We test our hypothesis by creating road traffic crash location data, which is scarce in most resource-poor environments but essential for addressing the number one cause of mortality for children over five and young adults. The research project scraped 874,588 traffic related tweets in Nairobi, Kenya, applied a machine learning model to capture the occurrence of a crash, and developed an improved geoparsing algorithm to identify its location. We geolocate 32,991 crash reports in Twitter for 2012–2020 and cluster them into 22,872 unique crashes during this period. For a subset of crashes reported on Twitter, a motorcycle delivery service was dispatched in real-time to verify the crash and its location; the results show 92% accuracy. To our knowledge this is the first geolocated dataset of crashes for the city and allowed us to produce the first crash map for Nairobi. Using a spatial clustering algorithm, we are able to locate portions of the road network (

Suggested Citation

  • Sveta Milusheva & Robert Marty & Guadalupe Bedoya & Sarah Williams & Elizabeth Resor & Arianna Legovini, 2021. "Applying machine learning and geolocation techniques to social media data (Twitter) to develop a resource for urban planning," PLOS ONE, Public Library of Science, vol. 16(2), pages 1-12, February.
  • Handle: RePEc:plo:pone00:0244317
    DOI: 10.1371/journal.pone.0244317
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244317
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0244317&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0244317?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Serajuddin,Umar & Uematsu,Hiroki & Wieser,Christina & Yoshida,Nobuo & Dabalen,Andrew L., 2015. "Data deprivation : another deprivation to end," Policy Research Working Paper Series 7252, The World Bank.
    2. Bernd Resch & Anja Summa & Peter Zeile & Michael Strube, 2016. "Citizen-Centric Urban Planning through Extracting Emotion Information from Twitter in an Interdisciplinary Space-Time-Linguistics Algorithm," Urban Planning, Cogitatio Press, vol. 1(2), pages 114-127.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pilvi Nummi, 2018. "Crowdsourcing Local Knowledge with PPGIS and Social Media for Urban Planning to Reveal Intangible Cultural Heritage," Urban Planning, Cogitatio Press, vol. 3(1), pages 100-115.
    2. Dang, Hai-Anh H. & Serajuddin, Umar, 2020. "Tracking the sustainable development goals: Emerging measurement challenges and further reflections," World Development, Elsevier, vol. 127(C).
    3. Yong Gao & Yuanyuan Chen & Lan Mu & Shize Gong & Pengcheng Zhang & Yu Liu, 2022. "Measuring urban sentiments from social media data: a dual-polarity metric approach," Journal of Geographical Systems, Springer, vol. 24(2), pages 199-221, April.
    4. Dang,Hai-Anh H. & Kilic,Talip & Carletto,Calogero & Abanokova,Kseniya, 2021. "Poverty Imputation in Contexts without Consumption Data : A Revisit with Further Refinements," Policy Research Working Paper Series 9838, The World Bank.
    5. Dang, Hai-Anh & Carletto, Calogero, 2022. "Recall Bias Revisited: Measure Farm Labor Using Mixed-Mode Surveys and Multiple Imputation," IZA Discussion Papers 14997, Institute of Labor Economics (IZA).
    6. Hai‐Anh Dang & Dean Jolliffe & Calogero Carletto, 2019. "Data Gaps, Data Incomparability, And Data Imputation: A Review Of Poverty Measurement Methods For Data‐Scarce Environments," Journal of Economic Surveys, Wiley Blackwell, vol. 33(3), pages 757-797, July.
    7. Emily Aiken & Suzanne Bellue & Dean Karlan & Christopher R. Udry & Joshua Blumenstock, 2021. "Machine Learning and Mobile Phone Data Can Improve the Targeting of Humanitarian Assistance," NBER Working Papers 29070, National Bureau of Economic Research, Inc.
    8. Ruixue Liu & Jing Xiao, 2020. "Factors Affecting Users’ Satisfaction with Urban Parks through Online Comments Data: Evidence from Shenzhen, China," IJERPH, MDPI, vol. 18(1), pages 1-22, December.
    9. Paul Makdissi & Walid Marrouch & Myra Yazbeck, 2022. "Monitoring Poverty in a Data Deprived Environment: The Case of Lebanon," Working Papers 2022-014, Human Capital and Economic Opportunity Working Group.
    10. Guanghua Chi & Han Fang & Sourav Chatterjee & Joshua E. Blumenstock, 2022. "Microestimates of wealth for all low- and middle-income countries," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 119(3), pages 2113658119-, January.
    11. Kilic,Talip & Serajuddin,Umar & Uematsu,Hiroki & Yoshida,Nobuo & Kilic,Talip & Serajuddin,Umar & Uematsu,Hiroki & Yoshida,Nobuo, 2017. "Costing household surveys for monitoring progress toward ending extreme poverty and boosting shared prosperity," Policy Research Working Paper Series 7951, The World Bank.
    12. Arena, Marika & Azzone, Giovanni & Dell’Agostino, Laura & Scotti, Francesco, 2022. "Precision policies and local content targets in resource-rich developing countries: The case of the oil and gas sector in Mozambique," Resources Policy, Elsevier, vol. 76(C).
    13. Cuesta, Jose & Chagalj, Cristian, 2019. "Measuring poverty with administrative data in data deprived contexts: The case of Nicaragua," Economics Letters, Elsevier, vol. 183(C), pages 1-1.
    14. Francis Rathinam & Sayak Khatua & Zeba Siddiqui & Manya Malik & Pallavi Duggal & Samantha Watson & Xavier Vollenweider, 2021. "Using big data for evaluating development outcomes: A systematic map," Campbell Systematic Reviews, John Wiley & Sons, vol. 17(3), September.
    15. Espen Beer Prydz & Dean Jolliffe & Umar Serajuddin, 2022. "Disparities in Assessments of Living Standards Using National Accounts and Household Surveys," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 68(S2), pages 385-420, December.
    16. Zezza, Alberto & Carletto, Gero & Fiedler, John L & Gennari, Pietro & Jolliffe, Dean M, 2017. "Food Counts. Measuring Food Consumption And Expenditures In Household Consumption And Expenditure Surveys (HCES)," 2017 International Congress, August 28-September 1, 2017, Parma, Italy 260886, European Association of Agricultural Economists.
    17. Bernd Resch & Inga Puetz & Matthias Bluemke & Kalliopi Kyriakou & Jakob Miksch, 2020. "An Interdisciplinary Mixed-Methods Approach to Analyzing Urban Spaces: The Case of Urban Walkability and Bikeability," IJERPH, MDPI, vol. 17(19), pages 1-20, September.
    18. Lahoti Rahul & Jayadev Arjun & Reddy Sanjay, 2016. "The Global Consumption and Income Project (GCIP): An Overview," Journal of Globalization and Development, De Gruyter, vol. 7(1), pages 61-108, June.
    19. Francisco H. G. Ferreira & Shaohua Chen & Andrew Dabalen & Yuri Dikhanov & Nada Hamadeh & Dean Jolliffe & Ambar Narayan & Espen Beer Prydz & Ana Revenga & Prem Sangraula & Umar Serajuddin & Nobuo Yosh, 2016. "A global count of the extreme poor in 2012: data issues, methodology and initial results," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 14(2), pages 141-172, June.
    20. Raquel Pérez‐delHoyo & Higinio Mora & José Manuel Nolasco‐Vidal & Rubén Abad‐Ortiz & Rafael A. Mollá‐Sirvent, 2021. "Addressing new challenges in smart urban planning using Information and Communication Technologies," Systems Research and Behavioral Science, Wiley Blackwell, vol. 38(3), pages 342-354, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0244317. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.