IDEAS home Printed from https://ideas.repec.org/a/spr/astaws/v17y2023i3d10.1007_s11943-023-00332-y.html
   My bibliography  Save this article

Quality aspects of annotated data

Author

Listed:
  • Jacob Beck

    (Ludwig-Maximilians-University Munich)

Abstract

The quality of Machine Learning (ML) applications is commonly assessed by quantifying how well an algorithm fits its respective training data. Yet, a perfect model that learns from and reproduces erroneous data will always be flawed in its real-world application. Hence, a comprehensive assessment of ML quality must include an additional data perspective, especially for models trained on human-annotated data. For the collection of human-annotated training data, best practices often do not exist and leave researchers to make arbitrary decisions when collecting annotations. Decisions about the selection of annotators or label options may affect training data quality and model performance. In this paper, I will outline and summarize previous research and approaches to the collection of annotated training data. I look at data annotation and its quality confounders from two perspectives: the set of annotators and the strategy of data collection. The paper will highlight the various implementations of text and image annotation collection and stress the importance of careful task construction. I conclude by illustrating the consequences for future research and applications of data annotation. The paper is intended give readers a starting point on annotated data quality research and stress the necessity of thoughtful consideration of the annotation collection process to researchers and practitioners.

Suggested Citation

  • Jacob Beck, 2023. "Quality aspects of annotated data," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 17(3), pages 331-353, December.
  • Handle: RePEc:spr:astaws:v:17:y:2023:i:3:d:10.1007_s11943-023-00332-y
    DOI: 10.1007/s11943-023-00332-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11943-023-00332-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11943-023-00332-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Chandler, Dana & Kapelner, Adam, 2013. "Breaking monotony with meaning: Motivation in crowdsourcing markets," Journal of Economic Behavior & Organization, Elsevier, vol. 90(C), pages 123-133.
    2. Andrius Vabalas & Emma Gowen & Ellen Poliakoff & Alexander J Casson, 2019. "Machine learning algorithm validation with a limited sample size," PLOS ONE, Public Library of Science, vol. 14(11), pages 1-20, November.
    3. Berinsky, Adam J. & Huber, Gregory A. & Lenz, Gabriel S., 2012. "Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk," Political Analysis, Cambridge University Press, vol. 20(3), pages 351-368, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Minjung Park & Sangmi Chai, 2024. "What Makes Fake News Appeal to You? Empirical Evidence from the Tweets Related to COVID-19 Vaccines," SAGE Open, , vol. 14(3), pages 21582440241, July.
    2. Sebastian Fest & Ola Kvaløy & Petra Nieken & Anja Schöttner, 2019. "Motivation and incentives in an online labor market," CESifo Working Paper Series 7526, CESifo.
    3. Mourelatos, Evangelos, 2021. "Personality and Ethics on Online Labor Markets: How mood influences ethical perceptions," EconStor Preprints 244735, ZBW - Leibniz Information Centre for Economics.
    4. R. Gordon Rinderknecht, 2019. "Effects of Participant Displeasure on the Social-Psychological Study of Power on Amazon’s Mechanical Turk," SAGE Open, , vol. 9(3), pages 21582440198, September.
    5. Vanessa C. Burbano, 2016. "Social Responsibility Messages and Worker Wage Requirements: Field Experimental Evidence from Online Labor Marketplaces," Organization Science, INFORMS, vol. 27(4), pages 1010-1028, August.
    6. Nana Adrian & Ann-Kathrin Crede & Jonas Gehrlein, 2019. "Market Interaction and the Focus on Consequences in Moral Decision Making," Diskussionsschriften dp1905, Universitaet Bern, Departement Volkswirtschaft.
    7. Florian Teschner & Henner Gimpel, 2018. "Crowd Labor Markets as Platform for Group Decision and Negotiation Research: A Comparison to Laboratory Experiments," Group Decision and Negotiation, Springer, vol. 27(2), pages 197-214, April.
    8. Robbett, Andrea & Matthews, Peter Hans, 2018. "Partisan bias and expressive voting," Journal of Public Economics, Elsevier, vol. 157(C), pages 107-120.
    9. Małgorzata W Kożusznik & José M Peiró & Aida Soriano, 2019. "Daily eudaimonic well-being as a predictor of daily performance: A dynamic lens," PLOS ONE, Public Library of Science, vol. 14(4), pages 1-24, April.
    10. Li-Dunn Chen & Michael A Caprio & Devin M Chen & Andrew J Kouba & Carrie K Kouba, 2024. "Enhancing predictive performance for spectroscopic studies in wildlife science through a multi-model approach: A case study for species classification of live amphibians," PLOS Computational Biology, Public Library of Science, vol. 20(2), pages 1-24, February.
    11. Ephrem Habyarimana & Faheem S Baloch, 2021. "Machine learning models based on remote and proximal sensing as potential methods for in-season biomass yields prediction in commercial sorghum fields," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-23, March.
    12. Vanessa, Mertins & Jeworrek, Sabrina & Vlassopoulos, Michael, 2018. ""The Good News about Bad News": Feedback about Past Organisational Failure Bad ist Impact in Worker Productivity," VfS Annual Conference 2018 (Freiburg, Breisgau): Digital Economy 181644, Verein für Socialpolitik / German Economic Association.
    13. Mattozzi, Andrea & Snowberg, Erik, 2018. "The right type of legislator: A theory of taxation and representation," Journal of Public Economics, Elsevier, vol. 159(C), pages 54-65.
    14. Jasper Grashuis & Theodoros Skevas & Michelle S. Segovia, 2020. "Grocery Shopping Preferences during the COVID-19 Pandemic," Sustainability, MDPI, vol. 12(13), pages 1-10, July.
    15. Jeanette A.M.J. Deetlefs & Mathew Chylinski & Andreas Ortmann, 2015. "MTurk ‘Unscrubbed’: Exploring the good, the ‘Super’, and the unreliable on Amazon’s Mechanical Turk," Discussion Papers 2015-20, School of Economics, The University of New South Wales.
    16. Stephan Breimann & Frits Kamp & Gabriele Basset & Claudia Abou-Ajram & Gökhan Güner & Kanta Yanagida & Masayasu Okochi & Stephan A. Müller & Stefan F. Lichtenthaler & Dieter Langosch & Dmitrij Frishma, 2025. "Charting γ-secretase substrates by explainable AI," Nature Communications, Nature, vol. 16(1), pages 1-20, December.
    17. Cantarella, Michele & Strozzi, Chiara, 2019. "Workers in the Crowd: The Labour Market Impact of the Online Platform Economy," IZA Discussion Papers 12327, Institute of Labor Economics (IZA).
    18. John Hulland & Jeff Miller, 2018. "“Keep on Turkin’”?," Journal of the Academy of Marketing Science, Springer, vol. 46(5), pages 789-794, September.
    19. Kyungsik Han, 2018. "How do you perceive this author? Understanding and modeling authors’ communication quality in social media," PLOS ONE, Public Library of Science, vol. 13(2), pages 1-25, February.
    20. Leandro C. Hermida & E. Michael Gertz & Eytan Ruppin, 2022. "Predicting cancer prognosis and drug response from the tumor microbiome," Nature Communications, Nature, vol. 13(1), pages 1-15, December.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:astaws:v:17:y:2023:i:3:d:10.1007_s11943-023-00332-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.