Author
Listed:
- Yoonjung Yoonie Joo
- Jennifer A Pacheco
- William K Thompson
- Laura J Rasmussen-Torvik
- Luke V Rasmussen
- Frederick T J Lin
- Mariza de Andrade
- Kenneth M Borthwick
- Erwin Bottinger
- Andrew Cagan
- David S Carrell
- Joshua C Denny
- Stephen B Ellis
- Omri Gottesman
- James G Linneman
- Jyotishman Pathak
- Peggy L Peissig
- Ning Shang
- Gerard Tromp
- Annapoorani Veerappan
- Maureen E Smith
- Rex L Chisholm
- Andrew J Gawron
- M Geoffrey Hayes
- Abel N Kho
Abstract
Objective: Diverticular disease (DD) is one of the most prevalent conditions encountered by gastroenterologists, affecting ~50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with DD, leveraging multiple electronic health record (EHR) data sources of 91,166 multi-ancestry participants with a Natural Language Processing (NLP) technique. Materials and methods: We developed a NLP-enriched phenotyping algorithm that incorporated colonoscopy or abdominal imaging reports to identify patients with diverticulosis and diverticulitis from multicenter EHRs. We performed genome-wide association studies (GWAS) of DD in European, African and multi-ancestry participants, followed by phenome-wide association studies (PheWAS) of the risk variants to identify their potential comorbid/pleiotropic effects in clinical phenotypes. Results: Our developed algorithm showed a significant improvement in patient classification performance for DD analysis (algorithm PPVs ≥ 0.94), with up to a 3.5 fold increase in terms of the number of identified patients than the traditional method. Ancestry-stratified analyses of diverticulosis and diverticulitis of the identified subjects replicated the well-established associations between ARHGAP15 loci with DD, showing overall intensified GWAS signals in diverticulitis patients compared to diverticulosis patients. Our PheWAS analyses identified significant associations between the DD GWAS variants and circulatory system, genitourinary, and neoplastic EHR phenotypes. Discussion: As the first multi-ancestry GWAS-PheWAS study, we showcased that heterogenous EHR data can be mapped through an integrative analytical pipeline and reveal significant genotype-phenotype associations with clinical interpretation. Conclusion: A systematic framework to process unstructured EHR data with NLP could advance a deep and scalable phenotyping for better patient identification and facilitate etiological investigation of a disease with multilayered data.
Suggested Citation
Yoonjung Yoonie Joo & Jennifer A Pacheco & William K Thompson & Laura J Rasmussen-Torvik & Luke V Rasmussen & Frederick T J Lin & Mariza de Andrade & Kenneth M Borthwick & Erwin Bottinger & Andrew Cag, 2023.
"Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm,"
PLOS ONE, Public Library of Science, vol. 18(5), pages 1-17, May.
Handle:
RePEc:plo:pone00:0283553
DOI: 10.1371/journal.pone.0283553
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0283553. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.