IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1003569.html
   My bibliography  Save this article

Pervasive Transcription of the Human Genome Produces Thousands of Previously Unidentified Long Intergenic Noncoding RNAs

Author

Listed:
  • Matthew J Hangauer
  • Ian W Vaughn
  • Michael T McManus

Abstract

Known protein coding gene exons compose less than 3% of the human genome. The remaining 97% is largely uncharted territory, with only a small fraction characterized. The recent observation of transcription in this intergenic territory has stimulated debate about the extent of intergenic transcription and whether these intergenic RNAs are functional. Here we directly observed with a large set of RNA-seq data covering a wide array of human tissue types that the majority of the genome is indeed transcribed, corroborating recent observations by the ENCODE project. Furthermore, using de novo transcriptome assembly of this RNA-seq data, we found that intergenic regions encode far more long intergenic noncoding RNAs (lincRNAs) than previously described, helping to resolve the discrepancy between the vast amount of observed intergenic transcription and the limited number of previously known lincRNAs. In total, we identified tens of thousands of putative lincRNAs expressed at a minimum of one copy per cell, significantly expanding upon prior lincRNA annotation sets. These lincRNAs are specifically regulated and conserved rather than being the product of transcriptional noise. In addition, lincRNAs are strongly enriched for trait-associated SNPs suggesting a new mechanism by which intergenic trait-associated regions may function. These findings will enable the discovery and interrogation of novel intergenic functional elements.Author Summary: Much of the human genome is composed of intergenic sequence, the regions between genes. Intergenic sequence was once thought to be transcriptionally silent “junk DNA,” but it has recently become apparent that intergenic regions can be transcribed. However, the scope, nature, and identity of this intergenic transcription remain unknown. Here, by analyzing a large set of RNA-seq data, we found that >85% of the genome is transcribed, allowing us to generate a comprehensive catalog of an important class of intergenic transcripts: long intergenic noncoding RNAs (lincRNAs). We found that the genome encodes far more lincRNAs than previously known. A key question in the field is whether these intergenic transcripts are functional or transcriptional noise. We found that the lincRNAs we identified have many characteristics that are inconsistent with noise, including specific regulation of their expression, the presence of conserved sequence and evidence for regulated processing. Furthermore, these lincRNAs are strongly enriched with intergenic sequences that were previously known to be functional in human traits and diseases. This study provides an essential framework from which the functional elements in intergenic regions can be identified and characterized, facilitating future efforts toward understanding the roles of intergenic transcription in human health and disease.

Suggested Citation

  • Matthew J Hangauer & Ian W Vaughn & Michael T McManus, 2013. "Pervasive Transcription of the Human Genome Produces Thousands of Previously Unidentified Long Intergenic Noncoding RNAs," PLOS Genetics, Public Library of Science, vol. 9(6), pages 1-13, June.
  • Handle: RePEc:plo:pgen00:1003569
    DOI: 10.1371/journal.pgen.1003569
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1003569
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1003569&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1003569?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Haiqing Xu & Chuan Li & Chuan Xu & Jianzhi Zhang, 2023. "Chance promoter activities illuminate the origins of eukaryotic intergenic transcriptions," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    2. Hrant Hovhannisyan & Toni Gabaldón, 2021. "The long non-coding RNA landscape of Candida yeast pathogens," Nature Communications, Nature, vol. 12(1), pages 1-13, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1003569. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.