IDEAS home Printed from https://ideas.repec.org/a/plo/pdig00/0000456.html
   My bibliography  Save this article

Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality

Author

Listed:
  • Alexander D VanHelene
  • Ishaani Khatri
  • C Beau Hilton
  • Sanjay Mishra
  • Ece D Gamsiz Uzun
  • Jeremy L Warner

Abstract

Meta-researchers commonly leverage tools that infer gender from first names, especially when studying gender disparities. However, tools vary in their accuracy, ease of use, and cost. The objective of this study was to compare the accuracy and cost of the commercial software Genderize and Gender API, and the open-source gender R package. Differences in binary gender prediction accuracy between the three services were evaluated. Gender prediction accuracy was tested on a multi-national dataset of 32,968 gender-labeled clinical trial authors. Additionally, two datasets from previous studies with 5779 and 6131 names, respectively, were re-evaluated with modern implementations of Genderize and Gender API. The gender inference accuracy of Genderize and Gender API were compared, both with and without supplying trialists’ country of origin in the API call. The accuracy of the gender R package was only evaluated without supplying countries of origin. The accuracy of Genderize, Gender API, and the gender R package were defined as the percentage of correct gender predictions. Accuracy differences between methods were evaluated using McNemar’s test. Genderize and Gender API demonstrated 96.6% and 96.1% accuracy, respectively, when countries of origin were not supplied in the API calls. Genderize and Gender API achieved the highest accuracy when predicting the gender of German authors with accuracies greater than 98%. Genderize and Gender API were least accurate with South Korean, Chinese, Singaporean, and Taiwanese authors, demonstrating below 82% accuracy. Genderize can provide similar accuracy to Gender API while being 4.85x less expensive. The gender R package achieved below 86% accuracy on the full dataset. In the replication studies, Genderize and gender API demonstrated better performance than in the original publications. Our results indicate that Genderize and Gender API achieve similar accuracy on a multinational dataset. The gender R package is uniformly less accurate than Genderize and Gender API.Author summary: Gender disparities in academia have prompted researchers to investigate gender gaps in professorship roles and publication authorship. Of particular concern are the gender gaps in cancer clinical trial authorship. Methodologies that evaluate gender disparities in academia often rely on tools that infer gender from first names. Tools that predict gender from first names are often used in methodologies that determine the gender ratios of academic departments or publishing authors in a discipline. However, researchers must choose between different gender predicting tools that vary in their accuracy, ease of use, and cost. We evaluated the binary gender prediction accuracy of Genderize, Gender API, and the gender R package on a gold-standard dataset of 32,968 clinical trialists from around the world. Genderize and Gender API are commercially available, while the gender R package is free and open source. We found that Genderize and Gender API were more accurate than the gender R package. In addition, Genderize is cheaper than Gender API, but is more sensitive to inconsistencies in name formatting and the presence of diacritical marks. Both Genderize and Gender API were most accurate with non-Asian names.

Suggested Citation

  • Alexander D VanHelene & Ishaani Khatri & C Beau Hilton & Sanjay Mishra & Ece D Gamsiz Uzun & Jeremy L Warner, 2024. "Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality," PLOS Digital Health, Public Library of Science, vol. 3(10), pages 1-15, October.
  • Handle: RePEc:plo:pdig00:0000456
    DOI: 10.1371/journal.pdig.0000456
    as

    Download full text from publisher

    File URL: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000456
    Download Restriction: no

    File URL: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000456&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pdig.0000456?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Huang, Yana & Wang, Tianyu, 2022. "MULAN in the name: Causes and consequences of gendered Chinese names," China Economic Review, Elsevier, vol. 75(C).
    2. Mathias Wullum Nielsen & Jens Peter Andersen & Londa Schiebinger & Jesper W. Schneider, 2017. "One and a half million medical papers reveal a link between author gender and attention to gender and sex analysis," Nature Human Behaviour, Nature, vol. 1(11), pages 791-796, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Marie-Laure Charpignon & Leo Anthony Celi & Marisa Cobanaj & Rene Eber & Amelia Fiske & Jack Gallifant & Chenyu Li & Gurucharan Lingamallu & Anton Petushkov & Robin Pierce, 2024. "Diversity and inclusion: A hidden additional benefit of Open Data," PLOS Digital Health, Public Library of Science, vol. 3(7), pages 1-17, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Baron, Justus & Ganglmair, Bernhard & Persico, Nicola & Simcoe, Timothy & Tarantino, Emanuele, 2024. "Representation is not sufficient for selecting gender diversity," Research Policy, Elsevier, vol. 53(6).
    2. Jennifer S. Williams & Jenna C. Stone & Stacey A. Ritz & Maureen J. MacDonald, 2023. "Letter to the editor: Laxdal (2023) “The sex gap in sports and exercise medicine research: who does research on females?”," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(7), pages 4155-4160, July.
    3. Ke Xu & Xianli Xia, 2023. "The Influence of Farmers’ Clan Networks on Their Participation in Living Environment Improvement during the Time of Transition in Traditional Rural China," Agriculture, MDPI, vol. 13(5), pages 1-22, May.
    4. Smith, Thomas Bryan & Vacca, Raffaele & Krenz, Till & McCarty, Christopher, 2021. "Great minds think alike, or do they often differ? Research topic overlap and the formation of scientific teams," Journal of Informetrics, Elsevier, vol. 15(1).
    5. Aron Laxdal, 2023. "The sex gap in sports and exercise medicine research: who does research on females?," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(3), pages 1987-1994, March.
    6. repec:osf:osfxxx:3agxf_v1 is not listed on IDEAS
    7. Wu, Huajun & He, Zongping & Yu, Ning Neil, 2025. "Names with rare Chinese characters and mental ill-being," Journal of Asian Economics, Elsevier, vol. 96(C).
    8. David Ardia & Keven Bluteau & Mohammad‐Abbas Meghani, 2024. "Thirty years of academic finance," Journal of Economic Surveys, Wiley Blackwell, vol. 38(3), pages 1008-1042, July.
    9. Marta Jiménez Carrillo & Unai Martín & Amaia Bacigalupe, 2023. "Gender Inequalities in Publications about COVID-19 in Spain: Authorship and Sex-Disaggregated Data," IJERPH, MDPI, vol. 20(3), pages 1-10, January.
    10. Shihao Wei & Christopher J. Boudreaux & Zhongfeng Su & Zhan Wu, 2024. "Natural disasters, personal attributes, and social entrepreneurship: an attention-based view," Small Business Economics, Springer, vol. 62(4), pages 1409-1427, April.
    11. Gita Ghiasi & Catherine Beaudry & Vincent Larivière & Carl St-Pierre & Andrea Schiffauerova & Matthew Harsh, 2021. "Who profits from the Canadian nanotechnology reward system? Implications for gender-responsible innovation," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 7937-7991, September.
    12. Claudia T Riche & Lindsey K Reif & Natalie T Nguyen & G Rinu Alakiu & Grace Seo & Jyoti S Mathad & Margaret L McNairy & Alexandra A Cordeiro & Aarti Kinikar & Kathleen F Walsh & Marie Marcelle Descham, 2023. "“Mobilizing our leaders”: A multi-country qualitative study to increase the representation of women in global health leadership," PLOS Global Public Health, Public Library of Science, vol. 3(1), pages 1-16, January.
    13. Jens Peter Andersen & Serge P. J. M. Horbach & Tony Ross-Hellauer, 2024. "Through the secret gate: a study of member-contributed submissions in PNAS," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(9), pages 5673-5687, September.
    14. Antonio De Nicola & Gregorio D’Agostino, 2021. "Assessment of gender divide in scientific communities," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(5), pages 3807-3840, May.
    15. Anne Laure Humbert & Elisabeth Anna Guenther & Jörg Müller, 2021. "Not Simply ‘Counting Heads’: A Gender Diversity Index for the Team Level," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 157(2), pages 689-707, September.
    16. Rebecca K. Rechlin & Tallinn F. L. Splinter & Travis E. Hodges & Arianne Y. Albert & Liisa A. M. Galea, 2022. "An analysis of neuroscience and psychiatry papers published from 2009 and 2019 outlines opportunities for increasing discovery of sex differences," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    17. Sara Dada & Kim Robin van Daalen & Alanna Barrios-Ruiz & Kai-Ti Wu & Aidan Desjardins & Mayte Bryce-Alberti & Alejandra Castro-Varela & Parnian Khorsand & Ander Santamarta Zamorano & Laura Jung & Grac, 2022. "Challenging the “old boys club” in academia: Gender and geographic representation in editorial boards of journals publishing in environmental sciences and public health," PLOS Global Public Health, Public Library of Science, vol. 2(6), pages 1-23, June.
    18. Kim, Lanu & Smith, Daniel Scott & Hofstra, Bas & McFarland, Daniel A., 2022. "Gendered knowledge in fields and academic careers," Research Policy, Elsevier, vol. 51(1).
    19. Mancuso, Raffaele & Rossi-Lamastra, Cristina & Franzoni, Chiara, 2023. "Topic choice, gendered language, and the under-funding of female scholars in mission-oriented research," Research Policy, Elsevier, vol. 52(6).
    20. Lori van den Hurk & Sarah Hiltner & Sabine Oertelt-Prigione, 2022. "Operationalization and Reporting Practices in Manuscripts Addressing Gender Differences in Biomedical Research: A Cross-Sectional Bibliographical Study," IJERPH, MDPI, vol. 19(21), pages 1-13, November.
    21. Lauren A. Rivera & András Tilcsik, 2023. "Not in My Schoolyard: Disability Discrimination in Educational Access," American Sociological Review, , vol. 88(2), pages 284-321, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000456. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.