IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v126y2021i12d10.1007_s11192-021-04111-w.html
   My bibliography  Save this article

Clinical trial registries as Scientometric data: A novel solution for linking and deduplicating clinical trials from multiple registries

Author

Listed:
  • Christian Thiele

    (University of Applied Sciences Bielefeld)

  • Gerrit Hirschfeld

    (University of Applied Sciences Bielefeld)

  • Ruth Brachel

    (Ruhr-University Bochum)

Abstract

Registries of clinical trials are a potential source for scientometric analysis of medical research and serve important functions for the research community and the public at large. Clinical trials that recruit patients in Germany are usually registered in the German Clinical Trials Register (DRKS) or in international registries such as ClinicalTrials.gov. Furthermore, the International Clinical Trials Registry Platform (ICTRP) aggregates trials from multiple primary registries. We queried the DRKS, ClinicalTrials.gov, and the ICTRP for trials with a recruiting location in Germany. Trials that were registered in multiple registries were linked using the primary and secondary identifiers and a Random Forest model based on various similarity metrics. We identified 35,912 trials that were conducted in Germany. The majority of the trials was registered in multiple databases. 32,106 trials were linked using primary IDs, 26 were linked using a Random Forest model, and 10,537 internal duplicates on ICTRP were identified using the Random Forest model after finding pairs with matching primary or secondary IDs. In cross-validation, the Random Forest increased the F1-score from 96.4% to 97.1% compared to a linkage based solely on secondary IDs on a manually labelled data set. 28% of all trials were registered in the German DRKS. 54% of the trials on ClinicalTrials.gov, 43% of the trials on the DRKS and 56% of the trials on the ICTRP were pre-registered. The ratio of pre-registered studies and the ratio of studies that are registered in the DRKS increased over time.

Suggested Citation

  • Christian Thiele & Gerrit Hirschfeld & Ruth Brachel, 2021. "Clinical trial registries as Scientometric data: A novel solution for linking and deduplicating clinical trials from multiple registries," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(12), pages 9733-9750, December.
  • Handle: RePEc:spr:scient:v:126:y:2021:i:12:d:10.1007_s11192-021-04111-w
    DOI: 10.1007/s11192-021-04111-w
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-021-04111-w
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-021-04111-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jinseok Kim & Jenna Kim, 2018. "The impact of imbalanced training data on machine learning for author name disambiguation," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 511-526, October.
    2. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    3. Carolina Riveros & Agnes Dechartres & Elodie Perrodeau & Romana Haneef & Isabelle Boutron & Philippe Ravaud, 2013. "Timing and Completeness of Trial Results Posted at ClinicalTrials.gov and Published in Journals," PLOS Medicine, Public Library of Science, vol. 10(12), pages 1-12, December.
    4. Jens Peter Andersen & Björn Hammarfelt, 2011. "Price revisited: on the growth of dissertations in eight research fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 88(2), pages 371-383, August.
    5. Grolemund, Garrett & Wickham, Hadley, 2011. "Dates and Times Made Easy with lubridate," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i03).
    6. Mike Thelwall & Kayvan Kousha, 2016. "Are citations from clinical trials evidence of higher impact research? An analysis of ClinicalTrials.gov," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(2), pages 1341-1351, November.
    7. Mehmet Ali Abdulhayoglu & Bart Thijs, 2018. "Use of locality sensitive hashing (LSH) algorithm to match Web of Science and Scopus," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 1229-1245, August.
    8. Darren B Taichman & Peush Sahni & Anja Pinborg & Larry Peiperl & Christine Laine & Astrid James & Sung-Tae Hong & Abraham Haileamlak & Laragh Gollogly & Fiona Godlee & Frank A Frizelle & Fernando Flor, 2017. "Data Sharing Statements for Clinical Trials: A Requirement of the International Committee of Medical Journal Editors," PLOS Medicine, Public Library of Science, vol. 14(6), pages 1-3, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Arthur Novaes de Amorim & Rob Deardon & Vineet Saini, 2021. "A stacked ensemble method for forecasting influenza-like illness visit volumes at emergency departments," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-15, March.
    2. Kawalec Paweł, 2020. "The dynamics of theories of economic growth: An impact of Unified Growth Theory," Economics and Business Review, Sciendo, vol. 6(2), pages 19-44, June.
    3. Cathelijn J F Waaijer & Benoît Macaluso & Cassidy R Sugimoto & Vincent Larivière, 2016. "Stability and Longevity in the Publication Careers of U.S. Doctorate Recipients," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-15, April.
    4. Prabal Das & D. A. Sachindra & Kironmala Chanda, 2022. "Machine Learning-Based Rainfall Forecasting with Multiple Non-Linear Feature Selection Algorithms," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 36(15), pages 6043-6071, December.
    5. Jie Zhao & Ji Chen & Damien Beillouin & Hans Lambers & Yadong Yang & Pete Smith & Zhaohai Zeng & Jørgen E. Olesen & Huadong Zang, 2022. "Global systematic review with meta-analysis reveals yield advantage of legume-based rotations and its drivers," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    6. Piaopiao Chen & Agnès H. Michel & Jianzhi Zhang, 2022. "Transposon insertional mutagenesis of diverse yeast strains suggests coordinated gene essentiality polymorphisms," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    7. Paulo Infante & Gonçalo Jacinto & Anabela Afonso & Leonor Rego & Pedro Nogueira & Marcelo Silva & Vitor Nogueira & José Saias & Paulo Quaresma & Daniel Santos & Patrícia Góis & Paulo Rebelo Manuel, 2023. "Factors That Influence the Type of Road Traffic Accidents: A Case Study in a District of Portugal," Sustainability, MDPI, vol. 15(3), pages 1-16, January.
    8. Ephrem Habyarimana & Faheem S Baloch, 2021. "Machine learning models based on remote and proximal sensing as potential methods for in-season biomass yields prediction in commercial sorghum fields," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-23, March.
    9. Banks, Jonathan & Rabbani, Arif & Nadkarni, Kabir & Renaud, Evan, 2020. "Estimating parasitic loads related to brine production from a hot sedimentary aquifer geothermal project: A case study from the Clarke Lake gas field, British Columbia," Renewable Energy, Elsevier, vol. 153(C), pages 539-552.
    10. Laha, A. K. & Putatunda, Sayan, 2017. "Travel Time Prediction for Taxi-GPS Data Streams," IIMA Working Papers WP 2017-03-03, Indian Institute of Management Ahmedabad, Research and Publication Department.
    11. Fraccaroli, Nicolò & Giovannini, Alessandro & Jamet, Jean-François & Persson, Eric, 2022. "Ideology and monetary policy. The role of political parties’ stances in the European Central Bank’s parliamentary hearings," European Journal of Political Economy, Elsevier, vol. 74(C).
    12. Michael A Ruderman & Deirdra F Wilson & Savanna Reid, 2015. "Does Prison Crowding Predict Higher Rates of Substance Use Related Parole Violations? A Recurrent Events Multi-Level Survival Analysis," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-19, October.
    13. Alexander Wettstein & Gabriel Jenni & Ida Schneider & Fabienne Kühne & Martin grosse Holtforth & Roberto La Marca, 2023. "Predictors of Psychological Strain and Allostatic Load in Teachers: Examining the Long-Term Effects of Biopsychosocial Risk and Protective Factors Using a LASSO Regression Approach," IJERPH, MDPI, vol. 20(10), pages 1-20, May.
    14. Tang, Kayu & Parsons, David J. & Jude, Simon, 2019. "Comparison of automatic and guided learning for Bayesian networks to analyse pipe failures in the water distribution system," Reliability Engineering and System Safety, Elsevier, vol. 186(C), pages 24-36.
    15. Daifeng Xiang & Gangsheng Wang & Jing Tian & Wanyu Li, 2023. "Global patterns and edaphic-climatic controls of soil carbon decomposition kinetics predicted from incubation experiments," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    16. Norma Salgado-Orellana & Emilio Berrocal de-Luna & Calixto Gutiérrez-Braojos, 2021. "A scientometric study of doctoral theses on the Roma in the Iberian Peninsula during the 1977–2018 period," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(1), pages 437-458, January.
    17. Kong, Ling & Wang, Dongbo, 2020. "Comparison of citations and attention of cover and non-cover papers," Journal of Informetrics, Elsevier, vol. 14(4).
    18. Bellotti, Anthony & Brigo, Damiano & Gambetti, Paolo & Vrins, Frédéric, 2021. "Forecasting recovery rates on non-performing loans with machine learning," International Journal of Forecasting, Elsevier, vol. 37(1), pages 428-444.
    19. Tranos, Emmanouil & Incera, Andre Carrascal & Willis, George, 2022. "Using the web to predict regional trade flows: data extraction, modelling, and validation," OSF Preprints 9bu5z, Center for Open Science.
    20. Mike Thelwall & Kayvan Kousha, 2016. "Are citations from clinical trials evidence of higher impact research? An analysis of ClinicalTrials.gov," Scientometrics, Springer;Akadémiai Kiadó, vol. 109(2), pages 1341-1351, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:126:y:2021:i:12:d:10.1007_s11192-021-04111-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.