IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0023409.html
   My bibliography  Save this article

Systematic Clustering of Transcription Start Site Landscapes

Author

Listed:
  • Xiaobei Zhao
  • Eivind Valen
  • Brian J Parker
  • Albin Sandelin

Abstract

Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earlier studies have shown that the TSSDs have biological implications in both regulation and function. However, no systematic study has been made to explore how many types of TSSDs and by extension core promoters exist and to understand which biological features distinguish them. In this study, we developed a new non-parametric dissimilarity measure and clustering approach to explore the similarities and stabilities of clusters of TSSDs. Previous studies have used arbitrary thresholds to arrive at two general classes: broad and sharp. We demonstrated that in addition to the previous broad/sharp dichotomy an additional category of promoters exists. Unlike typical TATA-driven sharp TSSDs where the TSS position can vary a few nucleotides, in this category virtually all TSSs originate from the same genomic position. These promoters lack epigenetic signatures of typical mRNA promoters and a substantial subset of them are mapping upstream of ribosomal protein pseudogenes. We present evidence that these are likely mapping errors, which have confounded earlier analyses, due to the high similarity of ribosomal gene promoters in combination with known G addition bias in the CAGE libraries. Thus, previous two-class separations of promoter based on TSS distributions are motivated, but the ultra-sharp TSS distributions will confound downstream analyses if not removed.

Suggested Citation

  • Xiaobei Zhao & Eivind Valen & Brian J Parker & Albin Sandelin, 2011. "Systematic Clustering of Transcription Start Site Landscapes," PLOS ONE, Public Library of Science, vol. 6(8), pages 1-16, August.
  • Handle: RePEc:plo:pone00:0023409
    DOI: 10.1371/journal.pone.0023409
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0023409
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0023409&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0023409?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hennig, Christian, 2007. "Cluster-wise assessment of cluster stability," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 258-271, September.
    2. Vladimir B Bajic & Sin Lam Tan & Alan Christoffels & Christian Schönbach & Leonard Lipovich & Liang Yang & Oliver Hofmann & Adele Kruger & Winston Hide & Chikatoshi Kai & Jun Kawai & David A Hume & Pi, 2006. "Mice and Men: Their Promoter Properties," PLOS Genetics, Public Library of Science, vol. 2(4), pages 1-13, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Holland, Robert A. & Beaumont, Nicola & Hooper, Tara & Austen, Melanie & Gross, Robert J.K. & Heptonstall, Philip J. & Ketsopoulou, Ioanna & Winskel, Mark & Watson, Jim & Taylor, Gail, 2018. "Incorporating ecosystem services into the design of future energy systems," Applied Energy, Elsevier, vol. 222(C), pages 812-822.
    2. Nikhil B. Ghate & Sungmin Kim & Yonghwan Shin & Jinman Kim & Michael Doche & Scott Valena & Alan Situ & Sangnam Kim & Suhn K. Rhie & Heinz-Josef Lenz & Tobias S. Ulmer & Shannon M. Mumenthaler & Wooji, 2023. "Phosphorylation and stabilization of EZH2 by DCAF1/VprBP trigger aberrant gene silencing in colon cancer," Nature Communications, Nature, vol. 14(1), pages 1-17, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ana Alina Tudoran, 2022. "A machine learning approach to identifying decision-making styles for managing customer relationships," Electronic Markets, Springer;IIM University of St. Gallen, vol. 32(1), pages 351-374, March.
    2. Wu, Han-Ming, 2011. "On biological validity indices for soft clustering algorithms for gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1969-1979, May.
    3. Alessandro Albano & José Luis García-Lapresta & Antonella Plaia & Mariangela Sciandra, 2023. "A family of distances for preference–approvals," Annals of Operations Research, Springer, vol. 323(1), pages 1-29, April.
    4. Aurora Torrente & Juan Romo, 2021. "Initializing k-means Clustering by Bootstrap and Data Depth," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 232-256, July.
    5. Han Yu & Brian Chapman & Arianna Di Florio & Ellen Eischen & David Gotz & Mathews Jacob & Rachael Hageman Blair, 2019. "Bootstrapping estimates of stability for clusters, observations and model selection," Computational Statistics, Springer, vol. 34(1), pages 349-372, March.
    6. Jonas M. B. Haslbeck & Dirk U. Wulff, 2020. "Estimating the number of clusters via a corrected clustering instability," Computational Statistics, Springer, vol. 35(4), pages 1879-1894, December.
    7. Matthew Whitaker & Joshua Elliott & Marc Chadeau-Hyam & Steven Riley & Ara Darzi & Graham Cooke & Helen Ward & Paul Elliott, 2022. "Persistent COVID-19 symptoms in a community study of 606,434 people in England," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    8. Stancu Stelian & Pernici Andreea, 2023. "Assessing the Evolution of the Energy Mix Worldwide, with a Focus on the Renewable Energy Transition," Management & Marketing, Sciendo, vol. 18(s1), pages 384-397, December.
    9. Sara Dolnicar & Friedrich Leisch, 2017. "Using segment level stability to select target segments in data-driven market segmentation studies," Marketing Letters, Springer, vol. 28(3), pages 423-436, September.
    10. Coraggio, Luca & Coretto, Pietro, 2023. "Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score," Journal of Multivariate Analysis, Elsevier, vol. 196(C).
    11. Jyldyz Djumalieva & Antonio Lima & Cath Sleeman, 2018. "Classifying Occupations According to Their Skill Requirements in Job Advertisements," Economic Statistics Centre of Excellence (ESCoE) Discussion Papers ESCoE DP-2018-04, Economic Statistics Centre of Excellence (ESCoE).
    12. Mohr, Lukas & Burg, Vanessa & Thees, Oliver & Trutnevyte, Evelina, 2019. "Spatial hot spots and clusters of bioenergy combined with socio-economic analysis in Switzerland," Renewable Energy, Elsevier, vol. 140(C), pages 840-851.
    13. Jonatan A. González & Francisco J. Rodríguez-Cortés & Elvira Romano & Jorge Mateu, 2021. "Classification of Events Using Local Pair Correlation Functions for Spatial Point Patterns," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(4), pages 538-559, December.
    14. Obal, Thalita Monteiro & de Souza, Jovani Taveira & de Jesus, Rômulo Henrique Gomes & de Francisco, Antonio Carlos, 2023. "Biogascluster: A clustering algorithm to identify potential partnerships between agribusiness properties," Renewable Energy, Elsevier, vol. 206(C), pages 982-993.
    15. Ali Ferjani & Albert Zimmermann, 2013. "Modelling structural-change-related shifts in labour input in the agent-based sector model SWISSland," Journal of Socio-Economics in Agriculture (Until 2015: Yearbook of Socioeconomics in Agriculture), Swiss Society for Agricultural Economics and Rural Sociology, vol. 6(1), pages 177-200.
    16. Shaza B. Zaghlool & Anna Halama & Nisha Stephan & Valborg Gudmundsdottir & Vilmundur Gudnason & Lori L. Jennings & Manonanthini Thangam & Emma Ahlqvist & Rayaz A. Malik & Omar M. E. Albagha & Abdul Ba, 2022. "Metabolic and proteomic signatures of type 2 diabetes subtypes in an Arab population," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    17. Cabral, Laura & Kim, Amy M., 2020. "An empirical reappraisal of the four types of cyclists," Transportation Research Part A: Policy and Practice, Elsevier, vol. 137(C), pages 206-221.
    18. Liao, Tim F. & Bolano, Danilo & Brzinsky-Fay, Christian & Cornwell, Benjamin & Fasang, Anette Eva & Helske, Satu & Piccarreta, Raffaella & Raab, Marcel & Ritschard, Gilbert & Struffolino, Emanuela & S, 2022. "Sequence analysis: Its past, present, and future," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 107, pages 1-1.
    19. Joeri Hofmans & Eva Ceulemans & Douglas Steinley & Iven Mechelen, 2015. "On the Added Value of Bootstrap Analysis for K-Means Clustering," Journal of Classification, Springer;The Classification Society, vol. 32(2), pages 268-284, July.
    20. Pfenninger, Stefan, 2017. "Dealing with multiple decades of hourly wind and PV time series in energy models: A comparison of methods to reduce time resolution and the planning implications of inter-annual variability," Applied Energy, Elsevier, vol. 197(C), pages 1-13.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0023409. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.