IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1009938.html
   My bibliography  Save this article

Novel feature selection methods for construction of accurate epigenetic clocks

Author

Listed:
  • Adam Li
  • Amber Mueller
  • Brad English
  • Anthony Arena
  • Daniel Vera
  • Alice E Kane
  • David A Sinclair

Abstract

Epigenetic clocks allow us to accurately predict the age and future health of individuals based on the methylation status of specific CpG sites in the genome and are a powerful tool to measure the effectiveness of longevity interventions. There is a growing need for methods to efficiently construct epigenetic clocks. The most common approach is to create clocks using elastic net regression modelling of all measured CpG sites, without first identifying specific features or CpGs of interest. The addition of feature selection approaches provides the opportunity to optimise the identification of predictive CpG sites. Here, we apply novel feature selection methods and combinatorial approaches including newly adapted neural networks, genetic algorithms, and ‘chained’ combinations. Human whole blood methylation data of ~470,000 CpGs was used to develop clocks that predict age with R2 correlation scores of greater than 0.73, the most predictive of which uses 35 CpG sites for a R2 correlation score of 0.87. The five most frequent sites across all clocks were modelled to build a clock with a R2 correlation score of 0.83. These two clocks are validated on two external datasets where they maintain excellent predictive accuracy. When compared with three published epigenetic clocks (Hannum, Horvath, Weidner) also applied to these validation datasets, our clocks outperformed all three models. We identified gene regulatory regions associated with selected CpGs as possible targets for future aging studies. Thus, our feature selection algorithms build accurate, generalizable clocks with a low number of CpG sites, providing important tools for the field.Author summary: Epigenetic clocks accurately predict a person’s age by measuring the levels of a chemical mark (methylation) at specific sites of the DNA. More of these clocks are being built all the time, and there is a need for tools to best construct these clocks, and particularly to pick the specific DNA sites to include. We propose several novel machine-learning tools for the optimised selection of these DNA sites, known as feature selection approaches. We applied our approaches to a large human blood dataset to develop several clocks that accurately predict age using 35 or less DNA sites with more accuracy than previously published clocks when applied to other datasets for validation. Some of the DNA sites identified may be associated with interesting genes to explore further for their role in aging. These approaches should enable the building of more accurate, generalizable age prediction clocks from a low number of DNA sites.

Suggested Citation

  • Adam Li & Amber Mueller & Brad English & Anthony Arena & Daniel Vera & Alice E Kane & David A Sinclair, 2022. "Novel feature selection methods for construction of accurate epigenetic clocks," PLOS Computational Biology, Public Library of Science, vol. 18(8), pages 1-18, August.
  • Handle: RePEc:plo:pcbi00:1009938
    DOI: 10.1371/journal.pcbi.1009938
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009938
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1009938&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1009938?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1009938. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.