IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1007560.html
   My bibliography  Save this article

Representation learning of genomic sequence motifs with convolutional neural networks

Author

Listed:
  • Peter K Koo
  • Sean R Eddy

Abstract

Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal how CNN architecture, specifically convolutional filter size and max-pooling, influences the extent that sequence motif representations are learned by first layer filters. We find that CNNs designed to foster hierarchical representation learning of sequence motifs—assembling partial features into whole features in deeper layers—tend to learn distributed representations, i.e. partial motifs. On the other hand, CNNs that are designed to limit the ability to hierarchically build sequence motif representations in deeper layers tend to learn more interpretable localist representations, i.e. whole motifs. We then validate that this representation learning principle established from synthetic sequences generalizes to in vivo sequences.Author summary: Although deep convolutional neural networks (CNNs) have demonstrated promise across many regulatory genomics prediction tasks, their inner workings largely remain a mystery. Here we empirically demonstrate how CNN architecture influences the extent that representations of sequence motifs are captured by first layer filters. We find that max-pooling and convolutional filter size modulates information flow, controlling the extent that deeper layers can build features hierarchically. CNNs designed to foster hierarchical representation learning tend to capture partial representations of motifs in first layer filters. On the other hand, CNNs that are designed to limit the ability of deeper layers to hierarchically build upon low-level features tend to learn whole representations of motifs in first layer filters. Together, this study enables the design of CNNs that intentionally learn interpretable representations in easier to access first layer filters (with a small tradeoff in performance), versus building harder to interpret distributed representations, both of which have their strengths and limitations.

Suggested Citation

  • Peter K Koo & Sean R Eddy, 2019. "Representation learning of genomic sequence motifs with convolutional neural networks," PLOS Computational Biology, Public Library of Science, vol. 15(12), pages 1-17, December.
  • Handle: RePEc:plo:pcbi00:1007560
    DOI: 10.1371/journal.pcbi.1007560
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007560
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1007560&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1007560?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Anna G. Green & Chang Ho Yoon & Michael L. Chen & Yasha Ektefaie & Mack Fina & Luca Freschi & Matthias I. Gröschel & Isaac Kohane & Andrew Beam & Maha Farhat, 2022. "A convolutional neural network highlights mutations relevant to antimicrobial resistance in Mycobacterium tuberculosis," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    2. Kevin B. Dsouza & Alexandra Maslova & Ediem Al-Jibury & Matthias Merkenschlager & Vijay K. Bhargava & Maxwell W. Libbrecht, 2022. "Learning representations of chromatin contacts using a recurrent neural network identifies genomic drivers of conformation," Nature Communications, Nature, vol. 13(1), pages 1-19, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1007560. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.