IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i8p875-d536894.html
   My bibliography  Save this article

A Conceptual Probabilistic Framework for Annotation Aggregation of Citizen Science Data

Author

Listed:
  • Jesus Cerquides

    (Institut d’Investigació en Intel ligència Artificial (IIIA), CSIC, 08193 Cerdanyola, Spain)

  • Mehmet Oğuz Mülâyim

    (Institut d’Investigació en Intel ligència Artificial (IIIA), CSIC, 08193 Cerdanyola, Spain)

  • Jerónimo Hernández-González

    (Department de Matemàtiques, Universitat de Barcelona, 08007 Barcelona, Spain)

  • Amudha Ravi Shankar

    (Citizen Cyberlab, CUI, University of Geneva, CH-1227 Geneva, Switzerland)

  • Jose Luis Fernandez-Marquez

    (Citizen Cyberlab, CUI, University of Geneva, CH-1227 Geneva, Switzerland)

Abstract

Over the last decade, hundreds of thousands of volunteers have contributed to science by collecting or analyzing data. This public participation in science, also known as citizen science, has contributed to significant discoveries and led to publications in major scientific journals. However, little attention has been paid to data quality issues. In this work we argue that being able to determine the accuracy of data obtained by crowdsourcing is a fundamental question and we point out that, for many real-life scenarios, mathematical tools and processes for the evaluation of data quality are missing. We propose a probabilistic methodology for the evaluation of the accuracy of labeling data obtained by crowdsourcing in citizen science. The methodology builds on an abstract probabilistic graphical model formalism, which is shown to generalize some already existing label aggregation models. We show how to make practical use of the methodology through a comparison of data obtained from different citizen science communities analyzing the earthquake that took place in Albania in 2019.

Suggested Citation

  • Jesus Cerquides & Mehmet Oğuz Mülâyim & Jerónimo Hernández-González & Amudha Ravi Shankar & Jose Luis Fernandez-Marquez, 2021. "A Conceptual Probabilistic Framework for Annotation Aggregation of Citizen Science Data," Mathematics, MDPI, vol. 9(8), pages 1-15, April.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:8:p:875-:d:536894
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/8/875/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/8/875/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Carpenter, Bob & Gelman, Andrew & Hoffman, Matthew D. & Lee, Daniel & Goodrich, Ben & Betancourt, Michael & Brubaker, Marcus & Guo, Jiqiang & Li, Peter & Riddell, Allen, 2017. "Stan: A Probabilistic Programming Language," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 76(i01).
    2. Andrei P. Kirilenko & Travis Desell & Hany Kim & Svetlana Stepchenkova, 2017. "Crowdsourcing Analysis of Twitter Data on Climate Change: Paid Workers vs. Volunteers," Sustainability, MDPI, vol. 9(11), pages 1-15, November.
    3. Trisha Gura, 2013. "Citizen science: Amateur experts," Nature, Nature, vol. 496(7444), pages 259-261, April.
    4. A. P. Dawid & A. M. Skene, 1979. "Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 28(1), pages 20-28, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Francis,David C. & Kubinec ,Robert, 2022. "Beyond Political Connections : A Measurement Model Approach to Estimating Firm-levelPolitical Influence in 41 Economies," Policy Research Working Paper Series 10119, The World Bank.
    2. Yongping Bao & Ludwig Danwitz & Fabian Dvorak & Sebastian Fehrler & Lars Hornuf & Hsuan Yu Lin & Bettina von Helversen, 2022. "Similarity and Consistency in Algorithm-Guided Exploration," CESifo Working Paper Series 10188, CESifo.
    3. Torsten Heinrich & Jangho Yang & Shuanping Dai, 2020. "Growth, development, and structural change at the firm-level: The example of the PR China," Papers 2012.14503, arXiv.org.
    4. Xin Xu & Yang Lu & Yupeng Zhou & Zhiguo Fu & Yanjie Fu & Minghao Yin, 2021. "An Information-Explainable Random Walk Based Unsupervised Network Representation Learning Framework on Node Classification Tasks," Mathematics, MDPI, vol. 9(15), pages 1-14, July.
    5. Spilker Finn & Ötting Marius, 2024. "No cheering in the background? Individual performance in professional darts during COVID-19," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 20(3), pages 219-234.
    6. Xiaoyue Xi & Simon E. F. Spencer & Matthew Hall & M. Kate Grabowski & Joseph Kagaayi & Oliver Ratmann & Rakai Health Sciences Program and PANGEA‐HIV, 2022. "Inferring the sources of HIV infection in Africa from deep‐sequence data with semi‐parametric Bayesian Poisson flow models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(3), pages 517-540, June.
    7. Luo, Nanyu & Ji, Feng & Han, Yuting & He, Jinbo & Zhang, Xiaoya, 2024. "Fitting item response theory models using deep learning computational frameworks," OSF Preprints tjxab, Center for Open Science.
    8. Joseph B. Bak-Coleman & Ian Kennedy & Morgan Wack & Andrew Beers & Joseph S. Schafer & Emma S. Spiro & Kate Starbird & Jevin D. West, 2022. "Combining interventions to reduce the spread of viral misinformation," Nature Human Behaviour, Nature, vol. 6(10), pages 1372-1380, October.
    9. David M. Phillippo & Sofia Dias & A. E. Ades & Mark Belger & Alan Brnabic & Alexander Schacht & Daniel Saure & Zbigniew Kadziola & Nicky J. Welton, 2020. "Multilevel network meta‐regression for population‐adjusted treatment comparisons," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(3), pages 1189-1210, June.
    10. Alina Ferecatu & Arnaud Bruyn & Prithwiraj Mukherjee, 2024. "Silently killing your panelists one email at a time: The true cost of email solicitations," Journal of the Academy of Marketing Science, Springer, vol. 52(4), pages 1216-1239, July.
    11. Burbano, Vanessa & Padilla, Nicolas & Meier, Stephan, 2020. "Gender Differences in Preferences for Meaning at Work," IZA Discussion Papers 13053, Institute of Labor Economics (IZA).
    12. Michelle A. Mycoo, 2018. "Achieving SDG 6: water resources sustainability in Caribbean Small Island Developing States through improved water governance," Natural Resources Forum, Blackwell Publishing, vol. 42(1), pages 54-68, February.
    13. Robert Kubinec & Haillie Na‐Kyung Lee & Andrey Tomashevskiy, 2021. "Politically connected companies are less likely to shutdown due to COVID‐19 restrictions," Social Science Quarterly, Southwestern Social Science Association, vol. 102(5), pages 2155-2169, September.
    14. Barrington-Leigh, C.P., 2024. "The econometrics of happiness: Are we underestimating the returns to education and income?," Journal of Public Economics, Elsevier, vol. 230(C).
    15. Salvatore Nunnari & Massimiliano Pozzi, 2022. "Meta-Analysis of Inequality Aversion Estimates," CESifo Working Paper Series 9851, CESifo.
    16. Andreas Kryger Jensen & Claus Thorn Ekstrøm, 2021. "Quantifying the trendiness of trends," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(1), pages 98-121, January.
    17. Lauderdale, Benjamin E. & Bailey, Delia & Blumenau, Jack & Rivers, Douglas, 2020. "Model-based pre-election polling for national and sub-national outcomes in the US and UK," International Journal of Forecasting, Elsevier, vol. 36(2), pages 399-413.
    18. Tamara Broderick & Ryan Giordano & Rachael Meager, 2020. "An Automatic Finite-Sample Robustness Metric: When Can Dropping a Little Data Make a Big Difference?," Papers 2011.14999, arXiv.org, revised Jul 2023.
    19. Kenneth F. Kellner & Arielle W. Parsons & Roland Kays & Joshua J. Millspaugh & Christopher T. Rota, 2022. "A Two-Species Occupancy Model with a Continuous-Time Detection Process Reveals Spatial and Temporal Interactions," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 27(2), pages 321-338, June.
    20. Owen G. Ward & Jing Wu & Tian Zheng & Anna L. Smith & James P. Curley, 2022. "Network Hawkes process models for exploring latent hierarchy in social animal interactions," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1402-1426, November.

    More about this item

    Keywords

    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:8:p:875-:d:536894. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.