IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v70y2021i1p147-173.html
   My bibliography  Save this article

Correcting misclassification errors in crowdsourced ecological data: A Bayesian perspective

Author

Listed:
  • Edgar Santos‐Fernandez
  • Erin E. Peterson
  • Julie Vercelloni
  • Em Rushworth
  • Kerrie Mengersen

Abstract

Many research domains use data elicited from ‘citizen scientists’ when a direct measure of a process is expensive or infeasible. However, participants may report incorrect estimates or classifications due to their lack of skill. We demonstrate how Bayesian hierarchical models can be used to learn about latent variables of interest, while accounting for the participants’ abilities. The model is described in the context of an ecological application that involves crowdsourced classifications of georeferenced coral‐reef images from the Great Barrier Reef, Australia. The latent variable of interest is the proportion of coral cover, which is a common indicator of coral reef health. The participants’ abilities are expressed in terms of sensitivity and specificity of a correctly classified set of points on the images. The model also incorporates a spatial component, which allows prediction of the latent variable in locations that have not been surveyed. We show that the model outperforms traditional weighted‐regression approaches used to account for uncertainty in citizen science data. Our approach produces more accurate regression coefficients and provides a better characterisation of the latent process of interest. This new method is implemented in the probabilistic programming language Stan and can be applied to a wide number of problems that rely on uncertain citizen science data.

Suggested Citation

  • Edgar Santos‐Fernandez & Erin E. Peterson & Julie Vercelloni & Em Rushworth & Kerrie Mengersen, 2021. "Correcting misclassification errors in crowdsourced ecological data: A Bayesian perspective," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(1), pages 147-173, January.
  • Handle: RePEc:bla:jorssc:v:70:y:2021:i:1:p:147-173
    DOI: 10.1111/rssc.12453
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssc.12453
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssc.12453?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Carpenter, Bob & Gelman, Andrew & Hoffman, Matthew D. & Lee, Daniel & Goodrich, Ben & Betancourt, Michael & Brubaker, Marcus & Guo, Jiqiang & Li, Peter & Riddell, Allen, 2017. "Stan: A Probabilistic Programming Language," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 76(i01).
    2. Abhirup Datta & Sudipto Banerjee & Andrew O. Finley & Alan E. Gelfand, 2016. "Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 800-812, April.
    3. Silvia Ferrari & Francisco Cribari-Neto, 2004. "Beta Regression for Modelling Rates and Proportions," Journal of Applied Statistics, Taylor & Francis Journals, vol. 31(7), pages 799-815.
    4. Julian Besag & Jeremy York & Annie Mollié, 1991. "Bayesian image restoration, with two applications in spatial statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 43(1), pages 1-20, March.
    5. Alan E. Gelfand & Alexandra M. Schmidt & Shanshan Wu & John A. Silander & Andrew Latimer & Anthony G. Rebelo, 2005. "Modelling species diversity through species level hierarchical modelling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(1), pages 1-20, January.
    6. Kenneth Rice & Julian P. T. Higgins & Thomas Lumley, 2018. "A re‐evaluation of fixed effect(s) meta‐analysis," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(1), pages 205-227, January.
    7. Kerrie Mengersen & Erin E. Peterson & Samuel Clifford & Nan Ye & June Kim & Tomasz Bednarz & Ross Brown & Allan James & Julie Vercelloni & Alan R. Pearse & Jacqueline Davis & Vanessa Hunter, 2017. "Modelling imperfect presence data obtained by citizen science," Environmetrics, John Wiley & Sons, Ltd., vol. 28(5), August.
    8. Steffen Fritz & Linda See & Tyler Carlson & Mordechai (Muki) Haklay & Jessie L. Oliver & Dilek Fraisl & Rosy Mondardini & Martin Brocklehurst & Lea A. Shanley & Sven Schade & Uta Wehn & Tommaso Abrate, 2019. "Author Correction: Citizen science and the United Nations Sustainable Development Goals," Nature Sustainability, Nature, vol. 2(11), pages 1063-1063, November.
    9. Stefanie Muff & Andrea Riebler & Leonhard Held & Håvard Rue & Philippe Saner, 2015. "Bayesian analysis of measurement error models using integrated nested Laplace approximations," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 64(2), pages 231-252, February.
    10. Terry P. Hughes & James T. Kerry & Andrew H. Baird & Sean R. Connolly & Andreas Dietzel & C. Mark Eakin & Scott F. Heron & Andrew S. Hoey & Mia O. Hoogenboom & Gang Liu & Michael J. McWilliam & Rachel, 2018. "Global warming transforms coral reef assemblages," Nature, Nature, vol. 556(7702), pages 492-496, April.
    11. Steffen Fritz & Linda See & Tyler Carlson & Mordechai (Muki) Haklay & Jessie L. Oliver & Dilek Fraisl & Rosy Mondardini & Martin Brocklehurst & Lea A. Shanley & Sven Schade & Uta Wehn & Tommaso Abrate, 2019. "Citizen science and the United Nations Sustainable Development Goals," Nature Sustainability, Nature, vol. 2(10), pages 922-930, October.
    12. Angel Hsu & Omar Malik & Laura Johnson & Daniel C Esty, 2014. "Development: Mobilize citizens to track sustainability," Nature, Nature, vol. 508(7494), pages 33-35, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. D. Fouskakis & G. Petrakos & I. Rotous, 2020. "A Bayesian longitudinal model for quantifying students’ preferences regarding teaching quality indicators," METRON, Springer;Sapienza Università di Roma, vol. 78(2), pages 255-270, August.
    2. Anantha, K.H. & Garg, Kaushal K. & Barron, Jennie & Dixit, Sreenath & Venkataradha, A. & Singh, Ramesh & Whitbread, Anthony M., 2021. "Impact of best management practices on sustainable crop production and climate resilience in smallholder farming systems of South Asia," Agricultural Systems, Elsevier, vol. 194(C).
    3. Alvarado, Rafael & Murshed, Muntasir & Cifuentes-Faura, Javier & Işık, Cem & Razib Hossain, Mohammad & Tillaguango, Brayan, 2023. "Nexuses between rent of natural resources, economic complexity, and technological innovation: The roles of GDP, human capital and civil liberties," Resources Policy, Elsevier, vol. 85(PA).
    4. Dang,Hai-Anh H. & Pullinger,John James & Serajuddin,Umar & Stacy,Brian William, 2024. "Reviewing Assessment Tools for Measuring Country Statistical Capacity," Policy Research Working Paper Series 10717, The World Bank.
    5. Radka Jersakova & James Lomax & James Hetherington & Brieuc Lehmann & George Nicholson & Mark Briers & Chris Holmes, 2022. "Bayesian imputation of COVID‐19 positive test counts for nowcasting under reporting lag," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(4), pages 834-860, August.
    6. Santos-Fernandez, Edgar & Ver Hoef, Jay M. & Peterson, Erin E. & McGree, James & Isaak, Daniel J. & Mengersen, Kerrie, 2022. "Bayesian spatio-temporal models for stream networks," Computational Statistics & Data Analysis, Elsevier, vol. 170(C).
    7. Snapp, Sieglinde, 2022. "Embracing variability in soils on smallholder farms: New tools and better science," Agricultural Systems, Elsevier, vol. 195(C).
    8. Sachit Mahajan & Ming-Kuang Chung & Jenny Martinez & Yris Olaya & Dirk Helbing & Ling-Jyh Chen, 2022. "Translating citizen-generated air quality data into evidence for shaping policy," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-18, December.
    9. Kubinec, Robert, 2020. "Ordered Beta Regression: A Parsimonious, Well-Fitting Model for Survey Sliders and Visual Analog Scales," SocArXiv 2sx6y, Center for Open Science.
    10. Andrea Ballatore & Teun Johannes Verhagen & Zhije Li & Stefano Cucurachi, 2022. "This city is not a bin: Crowdmapping the distribution of urban litter," Journal of Industrial Ecology, Yale University, vol. 26(1), pages 197-212, February.
    11. Peter A. Gao & Jonathan Wakefield, 2023. "A Spatial Variance‐Smoothing Area Level Model for Small Area Estimation of Demographic Rates," International Statistical Review, International Statistical Institute, vol. 91(3), pages 493-510, December.
    12. de Oliveira, Gisliany L.A. & Silva, Ivanovitch & Lima, Luciana & Costa, Daniel G., 2023. "A composite indicator of liveability based on sociodemographic and Uber quality service dimensions: A data-driven approach," Transport Policy, Elsevier, vol. 141(C), pages 97-115.
    13. Chen, Yewen & Chang, Xiaohui & Luo, Fangzhi & Huang, Hui, 2023. "Additive dynamic models for correcting numerical model outputs," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    14. Muchen Zheng & Wenli Huang & Gang Xu & Xi Li & Limin Jiao, 2023. "Spatial gradients of urban land density and nighttime light intensity in 30 global megacities," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-11, December.
    15. Connor Donegan & Yongwan Chun & Daniel A. Griffith, 2021. "Modeling Community Health with Areal Data: Bayesian Inference with Survey Standard Errors and Spatial Structure," IJERPH, MDPI, vol. 18(13), pages 1-27, June.
    16. Jinjie Chen & Joon Jin Song & James D. Stamey, 2022. "A Bayesian Hierarchical Spatial Model to Correct for Misreporting in Count Data: Application to State-Level COVID-19 Data in the United States," IJERPH, MDPI, vol. 19(6), pages 1-15, March.
    17. Spring, Daniel & Le, Thao P. & Bloom, Samuel Adam & Keith, Jonathan M. & Kompas, Tom, 2023. "Reconstructing the dynamics of managed populations to estimate the impact of citizen surveillance," Ecological Modelling, Elsevier, vol. 475(C).
    18. Wang, Craig & Furrer, Reinhard, 2021. "Combining heterogeneous spatial datasets with process-based spatial fusion models: A unifying framework," Computational Statistics & Data Analysis, Elsevier, vol. 161(C).
    19. Benjamin Herfort & Sven Lautenbach & João Porto de Albuquerque & Jennings Anderson & Alexander Zipf, 2023. "A spatio-temporal analysis investigating completeness and inequalities of global urban building data in OpenStreetMap," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    20. Rediet Girma & Awdenegest Moges & Christine Fürst, 2023. "Integrated Modeling of Land Degradation Dynamics and Insights on the Possible Future Management Alternatives in the Gidabo River Basin, Ethiopian Rift Valley," Land, MDPI, vol. 12(9), pages 1-23, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:70:y:2021:i:1:p:147-173. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.