IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0282942.html
   My bibliography  Save this article

Enhanced geocoding precision for location inference of tweet text using spaCy, Nominatim and Google Maps. A comparative analysis of the influence of data selection

Author

Listed:
  • Helen Ngonidzashe Serere
  • Bernd Resch
  • Clemens Rudolf Havas

Abstract

Twitter location inference methods are developed with the purpose of increasing the percentage of geotagged tweets by inferring locations on a non-geotagged dataset. For validation of proposed approaches, these location inference methods are developed on a fully geotagged dataset on which the attached Global Navigation Satellite System coordinates are used as ground truth data. Whilst a substantial number of location inference methods have been developed to date, questions arise pertaining the generalizability of the developed location inference models on a non-geotagged dataset. This paper proposes a high precision location inference method for inferring tweets’ point of origin based on location mentions within the tweet text. We investigate the influence of data selection by comparing the model performance on two datasets. For the first dataset, we use a proportionate sample of tweet sources of a geotagged dataset. For the second dataset, we use a modelled distribution of tweet sources following a non-geotagged dataset. Our results showed that the distribution of tweet sources influences the performance of location inference models. Using the first dataset we outweighed state-of-the-art location extraction models by inferring 61.9%, 86.1% and 92.1% of the extracted locations within 1 km, 10 km and 50 km radius values, respectively. However, using the second dataset our precision values dropped to 45.3%, 73.1% and 81.0% for the same radius values.

Suggested Citation

  • Helen Ngonidzashe Serere & Bernd Resch & Clemens Rudolf Havas, 2023. "Enhanced geocoding precision for location inference of tweet text using spaCy, Nominatim and Google Maps. A comparative analysis of the influence of data selection," PLOS ONE, Public Library of Science, vol. 18(3), pages 1-19, March.
  • Handle: RePEc:plo:pone00:0282942
    DOI: 10.1371/journal.pone.0282942
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0282942
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0282942&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0282942?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Bernd Resch & Anja Summa & Peter Zeile & Michael Strube, 2016. "Citizen-Centric Urban Planning through Extracting Emotion Information from Twitter in an Interdisciplinary Space-Time-Linguistics Algorithm," Urban Planning, Cogitatio Press, vol. 1(2), pages 114-127.
    2. Jyoti Prakash Singh & Yogesh K. Dwivedi & Nripendra P. Rana & Abhinav Kumar & Kawaljeet Kaur Kapoor, 2019. "Event classification and location prediction from tweets during disasters," Annals of Operations Research, Springer, vol. 283(1), pages 737-757, December.
    3. Bernd Resch & Anja Summa & Peter Zeile & Michael Strube, 2016. "Citizen-Centric Urban Planning through Extracting Emotion Information from Twitter in an Interdisciplinary Space-Time-Linguistics Algorithm," Urban Planning, Cogitatio Press, vol. 1(2), pages 114-127.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Helen Roberts & Bernd Resch & Jon Sadler & Lee Chapman & Andreas Petutschnig & Stefan Zimmer, 2018. "Investigating the Emotional Responses of Individuals to Urban Green Space Using Twitter Data: A Critical Comparison of Three Different Methods of Sentiment Analysis," Urban Planning, Cogitatio Press, vol. 3(1), pages 21-33.
    2. Anna Kovacs-Gyori & Alina Ristea & Clemens Havas & Bernd Resch & Pablo Cabrera-Barona, 2018. "#London2012: Towards Citizen-Contributed Urban Planning Through Sentiment Analysis of Twitter Data," Urban Planning, Cogitatio Press, vol. 3(1), pages 75-99.
    3. Pilvi Nummi, 2018. "Crowdsourcing Local Knowledge with PPGIS and Social Media for Urban Planning to Reveal Intangible Cultural Heritage," Urban Planning, Cogitatio Press, vol. 3(1), pages 100-115.
    4. Yong Gao & Yuanyuan Chen & Lan Mu & Shize Gong & Pengcheng Zhang & Yu Liu, 2022. "Measuring urban sentiments from social media data: a dual-polarity metric approach," Journal of Geographical Systems, Springer, vol. 24(2), pages 199-221, April.
    5. Ruixue Liu & Jing Xiao, 2020. "Factors Affecting Users’ Satisfaction with Urban Parks through Online Comments Data: Evidence from Shenzhen, China," IJERPH, MDPI, vol. 18(1), pages 1-22, December.
    6. Guizhe Song & Degen Huang, 2021. "A Sentiment-Aware Contextual Model for Real-Time Disaster Prediction Using Twitter Data," Future Internet, MDPI, vol. 13(7), pages 1-15, June.
    7. Abhinav Kumar & Jyoti Prakash Singh & Yogesh K. Dwivedi & Nripendra P. Rana, 2022. "A deep multi-modal neural network for informative Twitter content classification during emergencies," Annals of Operations Research, Springer, vol. 319(1), pages 791-822, December.
    8. Bernd Resch & Inga Puetz & Matthias Bluemke & Kalliopi Kyriakou & Jakob Miksch, 2020. "An Interdisciplinary Mixed-Methods Approach to Analyzing Urban Spaces: The Case of Urban Walkability and Bikeability," IJERPH, MDPI, vol. 17(19), pages 1-20, September.
    9. Mihalis Giannakis & Rameshwar Dubey & Shishi Yan & Konstantina Spanaki & Thanos Papadopoulos, 2022. "Social media and sensemaking patterns in new product development: demystifying the customer sentiment," Annals of Operations Research, Springer, vol. 308(1), pages 145-175, January.
    10. Duan, Huijue Kelly & Vasarhelyi, Miklos A. & Codesso, Mauricio & Alzamil, Zamil, 2023. "Enhancing the government accounting information systems using social media information: An application of text mining and machine learning," International Journal of Accounting Information Systems, Elsevier, vol. 48(C).
    11. Raquel Pérez‐delHoyo & Higinio Mora & José Manuel Nolasco‐Vidal & Rubén Abad‐Ortiz & Rafael A. Mollá‐Sirvent, 2021. "Addressing new challenges in smart urban planning using Information and Communication Technologies," Systems Research and Behavioral Science, Wiley Blackwell, vol. 38(3), pages 342-354, May.
    12. Abhinav Kumar & Jyoti Prakash Singh & Nripendra P. Rana & Yogesh K. Dwivedi, 2023. "Multi-Channel Convolutional Neural Network for the Identification of Eyewitness Tweets of Disaster," Information Systems Frontiers, Springer, vol. 25(4), pages 1589-1604, August.
    13. Zha, Wenbin & Ye, Qian & Li, Jian & Ozbay, Kaan, 2023. "A social media Data-Driven analysis for transport policy response to the COVID-19 pandemic outbreak in Wuhan, China," Transportation Research Part A: Policy and Practice, Elsevier, vol. 172(C).
    14. Prabhsimran Singh & Surleen Kaur & Abdullah M. Baabdullah & Yogesh K. Dwivedi & Sandeep Sharma & Ravinder Singh Sawhney & Ronnie Das, 2023. "Is #SDG13 Trending Online? Insights from Climate Change Discussions on Twitter," Information Systems Frontiers, Springer, vol. 25(1), pages 199-219, February.
    15. Jamal Al Qundus & Kosai Dabbour & Shivam Gupta & Régis Meissonier & Adrian Paschke, 2022. "Wireless sensor network for AI-based flood disaster detection," Annals of Operations Research, Springer, vol. 319(1), pages 697-719, December.
    16. Li, Xinwei & Xu, Mao & Zeng, Wenjuan & Tse, Ying Kei & Chan, Hing Kai, 2023. "Exploring customer concerns on service quality under the COVID-19 crisis: A social media analytics study from the retail industry," Journal of Retailing and Consumer Services, Elsevier, vol. 70(C).
    17. Higinio Mora & Raquel Pérez-delHoyo & José F. Paredes-Pérez & Rafael A. Mollá-Sirvent, 2018. "Analysis of Social Networking Service Data for Smart Urban Planning," Sustainability, MDPI, vol. 10(12), pages 1-19, December.
    18. Fernando Santa & Roberto Henriques & Joaquín Torres-Sospedra & Edzer Pebesma, 2019. "A Statistical Approach for Studying the Spatio-Temporal Distribution of Geolocated Tweets in Urban Environments," Sustainability, MDPI, vol. 11(3), pages 1-29, January.
    19. Choi, Tsan-Ming, 2020. "Innovative “Bring-Service-Near-Your-Home” operations under Corona-Virus (COVID-19/SARS-CoV-2) outbreak: Can logistics become the Messiah?," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 140(C).
    20. Serge Nyawa & Dieudonné Tchuente & Samuel Fosso-Wamba, 2024. "COVID-19 vaccine hesitancy: a social media analysis using deep learning," Annals of Operations Research, Springer, vol. 339(1), pages 477-515, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0282942. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.