IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0162360.html
   My bibliography  Save this article

Abundant Topological Outliers in Social Media Data and Their Effect on Spatial Analysis

Author

Listed:
  • Rene Westerholt
  • Enrico Steiger
  • Bernd Resch
  • Alexander Zipf

Abstract

Twitter and related social media feeds have become valuable data sources to many fields of research. Numerous researchers have thereby used social media posts for spatial analysis, since many of them contain explicit geographic locations. However, despite its widespread use within applied research, a thorough understanding of the underlying spatial characteristics of these data is still lacking. In this paper, we investigate how topological outliers influence the outcomes of spatial analyses of social media data. These outliers appear when different users contribute heterogeneous information about different phenomena simultaneously from similar locations. As a consequence, various messages representing different spatial phenomena are captured closely to each other, and are at risk to be falsely related in a spatial analysis. Our results reveal indications for corresponding spurious effects when analyzing Twitter data. Further, we show how the outliers distort the range of outcomes of spatial analysis methods. This has significant influence on the power of spatial inferential techniques, and, more generally, on the validity and interpretability of spatial analysis results. We further investigate how the issues caused by topological outliers are composed in detail. We unveil that multiple disturbing effects are acting simultaneously and that these are related to the geographic scales of the involved overlapping patterns. Our results show that at some scale configurations, the disturbances added through overlap are more severe than at others. Further, their behavior turns into a volatile and almost chaotic fluctuation when the scales of the involved patterns become too different. Overall, our results highlight the critical importance of thoroughly considering the specific characteristics of social media data when analyzing them spatially.

Suggested Citation

  • Rene Westerholt & Enrico Steiger & Bernd Resch & Alexander Zipf, 2016. "Abundant Topological Outliers in Social Media Data and Their Effect on Spatial Analysis," PLOS ONE, Public Library of Science, vol. 11(9), pages 1-31, September.
  • Handle: RePEc:plo:pone00:0162360
    DOI: 10.1371/journal.pone.0162360
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0162360
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0162360&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0162360?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Maxime Lenormand & Antònia Tugores & Pere Colet & José J Ramasco, 2014. "Tweets on the Road," PLOS ONE, Public Library of Science, vol. 9(8), pages 1-12, August.
    2. Manfred M. Fischer & Arthur Getis (ed.), 2010. "Handbook of Applied Spatial Analysis," Springer Books, Springer, number 978-3-642-03647-7, November.
    3. Stefanie Haustein & Timothy D. Bowman & Kim Holmberg & Andrew Tsou & Cassidy R. Sugimoto & Vincent Larivière, 2016. "Tweets as impact indicators: Examining the implications of automated “bot” accounts on Twitter," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(1), pages 232-238, January.
    4. Min Xu & Chang-Lin Mei & Na Yan, 2014. "A note on the null distribution of the local spatial heteroscedasticity (LOSH) statistic," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 52(3), pages 697-710, May.
    5. Daniel Griffith, 2006. "Hidden negative spatial autocorrelation," Journal of Geographical Systems, Springer, vol. 8(4), pages 335-355, October.
    6. M Tiefelsdorf & D A Griffith & B Boots, 1999. "A Variance-Stabilizing Coding Scheme for Spatial Link Matrices," Environment and Planning A, , vol. 31(1), pages 165-180, January.
    7. J. Ord & Arthur Getis, 2012. "Local spatial heteroscedasticity (LOSH)," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 48(2), pages 529-539, April.
    8. J. Keith Ord & Arthur Getis, 2001. "Testing for Local Spatial Autocorrelation in the Presence of Global Autocorrelation," Journal of Regional Science, Wiley Blackwell, vol. 41(3), pages 411-432, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yanguang Chen, 2020. "New framework of Getis-Ord’s indexes associating spatial autocorrelation with interaction," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-25, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Roger S. Bivand & David W. S. Wong, 2018. "Comparing implementations of global and local indicators of spatial association," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 27(3), pages 716-748, September.
    2. Yongwan Chun, 2008. "Modeling network autocorrelation within migration flows by eigenvector spatial filtering," Journal of Geographical Systems, Springer, vol. 10(4), pages 317-344, December.
    3. Daisuke Murakami & Daniel Griffith, 2015. "Random effects specifications in eigenvector spatial filtering: a simulation study," Journal of Geographical Systems, Springer, vol. 17(4), pages 311-331, October.
    4. Álvarez, Inmaculada C. & Gude, Alberto & Orea, Luis, 2019. "Effects of inter-industry and spatial spillovers on regional productivity: Evidence from Spanish panel data," Efficiency Series Papers 2019/01, University of Oviedo, Department of Economics, Oviedo Efficiency Group (OEG).
    5. Bivand, Roger & Müller, Werner G. & Reder, Markus, 2009. "Power calculations for global and local Moran's," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 2859-2872, June.
    6. Valeria M. Toledo‐Gallegos & Jed Long & Danny Campbell & Tobias Börger & Nick Hanley, 2021. "Spatial clustering of willingness to pay for ecosystem services," Journal of Agricultural Economics, Wiley Blackwell, vol. 72(3), pages 673-697, September.
    7. Sergio Rey, 2014. "Rank-based Markov chains for regional income distribution dynamics," Journal of Geographical Systems, Springer, vol. 16(2), pages 115-137, April.
    8. Chocholatá Michaela & Furková Andrea, 2017. "Regional Disparities in Education Attainment Level in the European Union: A Spatial Approach," TalTech Journal of European Studies, Sciendo, vol. 7(2), pages 107-131, October.
    9. Motoyama, Yasuyuki & Cao, Cong & Appelbaum, Richard, 2014. "Observing regional divergence of Chinese nanotechnology centers," Technological Forecasting and Social Change, Elsevier, vol. 81(C), pages 11-21.
    10. Rey, Sergio, 2015. "Bells in Space: The Spatial Dynamics of US Interpersonal and Interregional Income Inequality," MPRA Paper 69482, University Library of Munich, Germany.
    11. Chih-Hao Wang & Na Chen, 2021. "A multi-objective optimization approach to balancing economic efficiency and equity in accessibility to multi-use paths," Transportation, Springer, vol. 48(4), pages 1967-1986, August.
    12. Padovano, Fabio & Petrarca, Ilaria, 2014. "Are the responsibility and yardstick competition hypotheses mutually consistent?," European Journal of Political Economy, Elsevier, vol. 34(C), pages 459-477.
    13. Atems, Bebonchu, 2013. "The spatial dynamics of growth and inequality: Evidence using U.S. county-level data," Economics Letters, Elsevier, vol. 118(1), pages 19-22.
    14. Giuseppe Espa & Giuseppe Arbia & Diego Giuliani, 2013. "Conditional versus unconditional industrial agglomeration: disentangling spatial dependence and spatial heterogeneity in the analysis of ICT firms’ distribution in Milan," Journal of Geographical Systems, Springer, vol. 15(1), pages 31-50, January.
    15. Manfred M. Fischer & Nico Pintar & Benedikt Sargant, 2016. "Austrian Outbound Foreign Direct Investment in Europe:A spatial econometric study," Romanian Journal of Regional Science, Romanian Regional Science Association, vol. 10(1), pages 1-22, JUNE.
    16. Manfred Fischer, 2011. "A spatial Mankiw–Romer–Weil model: theory and evidence," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 47(2), pages 419-436, October.
    17. Kaixing Huang & Wenshou Yan & Jikun Huang, 2020. "Agricultural subsidies retard urbanisation in China," Australian Journal of Agricultural and Resource Economics, Australian Agricultural and Resource Economics Society, vol. 64(4), pages 1308-1327, October.
    18. Samantha Leorato & Maura Mezzetti, 2015. "Spatial Panel Data Model with error dependence: a Bayesian Separable Covariance Approach," CEIS Research Paper 338, Tor Vergata University, CEIS, revised 09 Apr 2015.
    19. Mohl, Philipp & Hagen, Tobias, 2011. "Do EU structural funds promote regional employment? Evidence from dynamic panel data models," Working Paper Series 1403, European Central Bank.
    20. Daniel A. Griffith & Yongwan Chun & Jan Hauke, 2022. "A Moran eigenvector spatial filtering specification of entropy measures," Papers in Regional Science, Wiley Blackwell, vol. 101(1), pages 259-279, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0162360. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.