IDEAS home Printed from https://ideas.repec.org/a/spr/astaws/v17y2023i3d10.1007_s11943-023-00332-y.html
   My bibliography  Save this article

Quality aspects of annotated data

Author

Listed:
  • Jacob Beck

    (Ludwig-Maximilians-University Munich)

Abstract

The quality of Machine Learning (ML) applications is commonly assessed by quantifying how well an algorithm fits its respective training data. Yet, a perfect model that learns from and reproduces erroneous data will always be flawed in its real-world application. Hence, a comprehensive assessment of ML quality must include an additional data perspective, especially for models trained on human-annotated data. For the collection of human-annotated training data, best practices often do not exist and leave researchers to make arbitrary decisions when collecting annotations. Decisions about the selection of annotators or label options may affect training data quality and model performance. In this paper, I will outline and summarize previous research and approaches to the collection of annotated training data. I look at data annotation and its quality confounders from two perspectives: the set of annotators and the strategy of data collection. The paper will highlight the various implementations of text and image annotation collection and stress the importance of careful task construction. I conclude by illustrating the consequences for future research and applications of data annotation. The paper is intended give readers a starting point on annotated data quality research and stress the necessity of thoughtful consideration of the annotation collection process to researchers and practitioners.

Suggested Citation

  • Jacob Beck, 2023. "Quality aspects of annotated data," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 17(3), pages 331-353, December.
  • Handle: RePEc:spr:astaws:v:17:y:2023:i:3:d:10.1007_s11943-023-00332-y
    DOI: 10.1007/s11943-023-00332-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11943-023-00332-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11943-023-00332-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chandler, Dana & Kapelner, Adam, 2013. "Breaking monotony with meaning: Motivation in crowdsourcing markets," Journal of Economic Behavior & Organization, Elsevier, vol. 90(C), pages 123-133.
    2. Andrius Vabalas & Emma Gowen & Ellen Poliakoff & Alexander J Casson, 2019. "Machine learning algorithm validation with a limited sample size," PLOS ONE, Public Library of Science, vol. 14(11), pages 1-20, November.
    3. Berinsky, Adam J. & Huber, Gregory A. & Lenz, Gabriel S., 2012. "Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk," Political Analysis, Cambridge University Press, vol. 20(3), pages 351-368, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Vanessa C. Burbano, 2016. "Social Responsibility Messages and Worker Wage Requirements: Field Experimental Evidence from Online Labor Marketplaces," Organization Science, INFORMS, vol. 27(4), pages 1010-1028, August.
    2. Minjung Park & Sangmi Chai, 2024. "What Makes Fake News Appeal to You? Empirical Evidence from the Tweets Related to COVID-19 Vaccines," SAGE Open, , vol. 14(3), pages 21582440241, July.
    3. Sebastian Fest & Ola Kvaløy & Petra Nieken & Anja Schöttner, 2019. "Motivation and incentives in an online labor market," CESifo Working Paper Series 7526, CESifo.
    4. Mourelatos, Evangelos, 2021. "Personality and Ethics on Online Labor Markets: How mood influences ethical perceptions," EconStor Preprints 244735, ZBW - Leibniz Information Centre for Economics.
    5. Nana Adrian & Ann-Kathrin Crede & Jonas Gehrlein, 2019. "Market Interaction and the Focus on Consequences in Moral Decision Making," Diskussionsschriften dp1905, Universitaet Bern, Departement Volkswirtschaft.
    6. Florian Teschner & Henner Gimpel, 2018. "Crowd Labor Markets as Platform for Group Decision and Negotiation Research: A Comparison to Laboratory Experiments," Group Decision and Negotiation, Springer, vol. 27(2), pages 197-214, April.
    7. R. Gordon Rinderknecht, 2019. "Effects of Participant Displeasure on the Social-Psychological Study of Power on Amazon’s Mechanical Turk," SAGE Open, , vol. 9(3), pages 21582440198, September.
    8. Pan, Jing Yu & Liu, Dahai, 2022. "Mask-wearing intentions on airplanes during COVID-19 – Application of theory of planned behavior model," Transport Policy, Elsevier, vol. 119(C), pages 32-44.
    9. repec:plo:pone00:0085508 is not listed on IDEAS
    10. Michele Cantarella & Chiara Strozzi, 2021. "Workers in the crowd: the labor market impact of the online platform economy [An evaluation of instrumental variable strategies for estimating the effects of catholic schooling]," Industrial and Corporate Change, Oxford University Press and the Associazione ICC, vol. 30(6), pages 1429-1458.
    11. Robbett, Andrea & Matthews, Peter Hans, 2018. "Partisan bias and expressive voting," Journal of Public Economics, Elsevier, vol. 157(C), pages 107-120.
    12. Małgorzata W Kożusznik & José M Peiró & Aida Soriano, 2019. "Daily eudaimonic well-being as a predictor of daily performance: A dynamic lens," PLOS ONE, Public Library of Science, vol. 14(4), pages 1-24, April.
    13. Li-Dunn Chen & Michael A Caprio & Devin M Chen & Andrew J Kouba & Carrie K Kouba, 2024. "Enhancing predictive performance for spectroscopic studies in wildlife science through a multi-model approach: A case study for species classification of live amphibians," PLOS Computational Biology, Public Library of Science, vol. 20(2), pages 1-24, February.
    14. Ephrem Habyarimana & Faheem S Baloch, 2021. "Machine learning models based on remote and proximal sensing as potential methods for in-season biomass yields prediction in commercial sorghum fields," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-23, March.
    15. Non, Arjan & Rohde, Ingrid & de Grip, Andries & Dohmen, Thomas, 2022. "Mission of the company, prosocial attitudes and job preferences: A discrete choice experiment," Labour Economics, Elsevier, vol. 74(C).
    16. Park, JungKun & Ahn, Jiseon & Thavisay, Toulany & Ren, Tianbao, 2019. "Examining the role of anxiety and social influence in multi-benefits of mobile payment service," Journal of Retailing and Consumer Services, Elsevier, vol. 47(C), pages 140-149.
    17. Chunhao Wei & Han Chen & Yee Ming Lee, 2022. "COVID-19 preventive measures and restaurant customers’ intention to dine out: the role of brand trust and perceived risk," Service Business, Springer;Pan-Pacific Business Association, vol. 16(3), pages 581-600, September.
    18. Heinz, Matthias & Jeworrek, Sabrina & Mertins, Vanessa & Schumacher, Heiner & Sutter, Matthias, 2017. "Measuring Indirect Effects of Unfair Employer Behavior on Worker Productivity: A Field Experiment," IZA Discussion Papers 11128, Institute of Labor Economics (IZA).
    19. Masha Shunko & Julie Niederhoff & Yaroslav Rosokha, 2018. "Humans Are Not Machines: The Behavioral Impact of Queueing Design on Service Time," Management Science, INFORMS, vol. 64(1), pages 453-473, January.
    20. Vanessa, Mertins & Jeworrek, Sabrina & Vlassopoulos, Michael, 2018. ""The Good News about Bad News": Feedback about Past Organisational Failure Bad ist Impact in Worker Productivity," VfS Annual Conference 2018 (Freiburg, Breisgau): Digital Economy 181644, Verein für Socialpolitik / German Economic Association.
    21. Yoram Halevy & Guy Mayraz, 2024. "Identifying Rule-Based Rationality," The Review of Economics and Statistics, MIT Press, vol. 106(5), pages 1369-1380, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:astaws:v:17:y:2023:i:3:d:10.1007_s11943-023-00332-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.