IDEAS home Printed from https://ideas.repec.org/p/osf/socarx/8hgjt.html
   My bibliography  Save this paper

Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data. Simulation study and application to perceived safety

Author

Listed:
  • Buil-Gil, David

    (University of Manchester)

  • Solymosi, Reka
  • Moretti, Angelo

Abstract

Open and crowdsourced data are becoming prominent in social sciences research. Crowdsourcing projects harness information from large crowds of citizens who voluntarily participate into one collaborative project, and allow new insights into people’s attitudes and perceptions. However, these are usually affected by a series of biases that limit their representativeness (i.e. self-selection bias, unequal participation, underrepresentation of certain areas and times). In this chapter we present a two-step method aimed to produce reliable small area estimates from crowdsourced data when no auxiliary information is available at the individual level. A non-parametric bootstrap, aimed to compute pseudosampling weights and bootstrap weighted estimates, is followed by an area-level model based small area estimation approach, which borrows strength from related areas based on a set of covariates, to improve the small area estimates. In order to assess the method, a simulation study and an application to safety perceptions in Greater London are conducted. The simulation study shows that the area-level model-based small area estimator under the non-parametric bootstrap improves (in terms of bias and variability) the small area estimates in the majority of areas. The application produces estimates of safety perceptions at a small geographical level in Greater London from Place Pulse 2.0 data. In the application, estimates are validated externally by comparing these to reliable survey estimates. Further simulation experiments and applications are needed to examine whether this method also improves the small area estimates when the sample biases are larger, smaller or show different distributions. A measure of reliability also needs to be developed to estimate the error of the small area estimates under the non-parametric bootstrap.

Suggested Citation

  • Buil-Gil, David & Solymosi, Reka & Moretti, Angelo, 2019. "Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data. Simulation study and application to perceived safety," SocArXiv 8hgjt, Center for Open Science.
  • Handle: RePEc:osf:socarx:8hgjt
    DOI: 10.31219/osf.io/8hgjt
    as

    Download full text from publisher

    File URL: https://osf.io/download/5d94a6bbc8a75d00176a84f8/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/8hgjt?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Matthias Schonlau & Arthur Van Soest & Arie Kapteyn, 2007. "Are 'Webographic' or Attitudinal Questions Useful for Adjusting Estimates From Web Surveys Using Propensity Scoring?," Working Papers 506, RAND Corporation.
    2. Matthias Schonlau & Arthur Van Soest & Arie Kapteyn, 2007. "Are 'Webographic' or Attitudinal Questions Useful for Adjusting Estimates From Web Surveys Using Propensity Scoring?," Working Papers WR-506, RAND Corporation.
    3. Philip Salesses & Katja Schechtner & César A Hidalgo, 2013. "The Collaborative Image of The City: Mapping the Inequality of Urban Perception," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-12, July.
    4. Matthew Nahorniak & David P Larsen & Carol Volk & Chris E Jordan, 2015. "Using Inverse Probability Bootstrap Sampling to Eliminate Sample Induced Bias in Model Based Analysis of Unequal Probability Samples," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-19, June.
    5. Wang, Wei & Rothschild, David & Goel, Sharad & Gelman, Andrew, 2015. "Forecasting elections with non-representative polls," International Journal of Forecasting, Elsevier, vol. 31(3), pages 980-991.
    6. Matthew J Salganik & Karen E C Levy, 2015. "Wiki Surveys: Open and Quantifiable Social Data Collection," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-17, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ramón Ferri-García & María del Mar Rueda, 2022. "Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys," Statistical Papers, Springer, vol. 63(6), pages 1829-1881, December.
    2. Luis Castro-Martín & Maria del Mar Rueda & Ramón Ferri-García, 2020. "Inference from Non-Probability Surveys with Statistical Matching and Propensity Score Adjustment Using Modern Prediction Techniques," Mathematics, MDPI, vol. 8(6), pages 1-19, June.
    3. Stéphane Legleye & Géraldine Charrance & Nicolas Razafindratsima & Nathalie Bajos & Aline Bohet & Caroline Moreau, 2018. "The Use of a Nonprobability Internet Panel to Monitor Sexual and Reproductive Health in the General Population," Sociological Methods & Research, , vol. 47(2), pages 314-348, March.
    4. Richard Valliant & Jill A. Dever, 2011. "Estimating Propensity Adjustments for Volunteer Web Surveys," Sociological Methods & Research, , vol. 40(1), pages 105-137, February.
    5. Sunghee Lee & Richard Valliant, 2009. "Estimation for Volunteer Panel Web Surveys Using Propensity Score Adjustment and Calibration Adjustment," Sociological Methods & Research, , vol. 37(3), pages 319-343, February.
    6. repec:aia:aiaswp:wp76 is not listed on IDEAS
    7. Ioanna Arkoudi & Carlos Lima Azevedo & Francisco C. Pereira, 2021. "Combining Discrete Choice Models and Neural Networks through Embeddings: Formulation, Interpretability and Performance," Papers 2109.12042, arXiv.org, revised Sep 2021.
    8. Mark Richard & Jan Vecer, 2021. "Efficiency Testing of Prediction Markets: Martingale Approach, Likelihood Ratio and Bayes Factor Analysis," Risks, MDPI, vol. 9(2), pages 1-20, February.
    9. Edward L. Glaeser & Scott Duke Kominers & Michael Luca & Nikhil Naik, 2018. "Big Data And Big Cities: The Promises And Limitations Of Improved Measures Of Urban Life," Economic Inquiry, Western Economic Association International, vol. 56(1), pages 114-137, January.
    10. Been, Vicki & Ellen, Ingrid Gould & Gedal, Michael & Glaeser, Edward & McCabe, Brian J., 2016. "Preserving history or restricting development? The heterogeneous effects of historic districts on local housing markets in New York City," Journal of Urban Economics, Elsevier, vol. 92(C), pages 16-30.
    11. Galdo, Virgilio & Li, Yue & Rama, Martin, 2021. "Identifying urban areas by combining human judgment and machine learning: An application to India," Journal of Urban Economics, Elsevier, vol. 125(C).
    12. Gafari Lukumon & Mark Klein, 2023. "Crowd-sourced idea filtering with Bag of Lemons: the impact of the token budget size," DECISION: Official Journal of the Indian Institute of Management Calcutta, Springer;Indian Institute of Management Calcutta, vol. 50(2), pages 205-219, June.
    13. José Miguel Mansilla Domínguez & Isabel Font Jiménez & Angel Belzunegui Eraso & David Peña Otero & David Díaz Pérez & Ana María Recio Vivas, 2020. "Risk Perception of COVID−19 Community Transmission among the Spanish Population," IJERPH, MDPI, vol. 17(23), pages 1-15, December.
    14. Cem Çağrı Dönmez & Abdulkadir Atalan, 2019. "Developing Statistical Optimization Models for Urban Competitiveness Index: Under the Boundaries of Econophysics Approach," Complexity, Hindawi, vol. 2019, pages 1-11, November.
    15. J. N. K. Rao, 2021. "On Making Valid Inferences by Integrating Data from Surveys and Other Sources," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 242-272, May.
    16. Jincheng Jiang & Jinsong Chen & Wei Tu & Chisheng Wang, 2019. "A Novel Effective Indicator of Weighted Inter-City Human Mobility Networks to Estimate Economic Development," Sustainability, MDPI, vol. 11(22), pages 1-18, November.
    17. Mohamed R Ibrahim & James Haworth & Tao Cheng, 2021. "URBAN-i: From urban scenes to mapping slums, transport modes, and pedestrians in cities using deep learning and computer vision," Environment and Planning B, , vol. 48(1), pages 76-93, January.
    18. Jian Gao & Tao Zhou, 2017. "Quantifying China's Regional Economic Complexity," Papers 1703.01292, arXiv.org, revised Nov 2017.
    19. Heng Chen & Marie-Hélène Felt & Christopher Henry, 2018. "2017 Methods-of-Payment Survey: Sample Calibration and Variance Estimation," Technical Reports 114, Bank of Canada.
    20. Spyridon Spyratos & Demetris Stathakis, 2018. "Evaluating the services and facilities of European cities using crowdsourced place data," Environment and Planning B, , vol. 45(4), pages 733-750, July.
    21. Grow, André & Perrotta, Daniela & Del Fava, Emanuele & Cimentada, Jorge & Rampazzo, Francesco & Gil-Clavel, Sofia & Zagheni, Emilio, 2020. "Addressing Public Health Emergencies via Facebook Surveys: Advantages, Challenges, and Practical Considerations," SocArXiv ez9pb, Center for Open Science.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:socarx:8hgjt. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://arabixiv.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.