IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0227760.html
   My bibliography  Save this article

A graph-based algorithm for RNA-seq data normalization

Author

Listed:
  • Diem-Trang Tran
  • Aditya Bhaskara
  • Balagurunathan Kuberan
  • Matthew Might

Abstract

The use of RNA-sequencing has garnered much attention in recent years for characterizing and understanding various biological systems. However, it remains a major challenge to gain insights from a large number of RNA-seq experiments collectively, due to the normalization problem. Normalization has been challenging due to an inherent circularity, requiring that RNA-seq data be normalized before any pattern of differential (or non-differential) expression can be ascertained; meanwhile, the prior knowledge of non-differential transcripts is crucial to the normalization process. Some methods have successfully overcome this problem by the assumption that most transcripts are not differentially expressed. However, when RNA-seq profiles become more abundant and heterogeneous, this assumption fails to hold, leading to erroneous normalization. We present a normalization procedure that does not rely on this assumption, nor prior knowledge about the reference transcripts. This algorithm is based on a graph constructed from intrinsic correlations among RNA-seq transcripts and seeks to identify a set of densely connected vertices as references. Application of this algorithm on our synthesized validation data showed that it could recover the reference transcripts with high precision, thus resulting in high-quality normalization. On a realistic data set from the ENCODE project, this algorithm gave good results and could finish in a reasonable time. These preliminary results imply that we may be able to break the long persisting circularity problem in RNA-seq normalization.

Suggested Citation

  • Diem-Trang Tran & Aditya Bhaskara & Balagurunathan Kuberan & Matthew Might, 2020. "A graph-based algorithm for RNA-seq data normalization," PLOS ONE, Public Library of Science, vol. 15(1), pages 1-19, January.
  • Handle: RePEc:plo:pone00:0227760
    DOI: 10.1371/journal.pone.0227760
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0227760
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0227760&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0227760?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Zhang Bin & Horvath Steve, 2005. "A General Framework for Weighted Gene Co-Expression Network Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-45, August.
    2. Yoshimasa Aoto & Tsuyoshi Hachiya & Kazuhiro Okumura & Sumitaka Hase & Kengo Sato & Yuichi Wakabayashi & Yasubumi Sakakibara, 2017. "DEclust: A statistical approach for obtaining differential expression profiles of multiple conditions," PLOS ONE, Public Library of Science, vol. 12(11), pages 1-15, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yixuan Qiu & Jing Lei & Kathryn Roeder, 2023. "Gradient-based sparse principal component analysis with extensions to online learning," Biometrika, Biometrika Trust, vol. 110(2), pages 339-360.
    2. Ruiz Vargas, E. & Mitchell, D.G.V. & Greening, S.G. & Wahl, L.M., 2014. "Topology of whole-brain functional MRI networks: Improving the truncated scale-free model," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 405(C), pages 151-158.
    3. Yan Guo & Hui Yu & Haocan Song & Jiapeng He & Olufunmilola Oyebamiji & Huining Kang & Jie Ping & Scott Ness & Yu Shyr & Fei Ye, 2021. "MetaGSCA: A tool for meta-analysis of gene set differential coexpression," PLOS Computational Biology, Public Library of Science, vol. 17(5), pages 1-15, May.
    4. Xue Jiang & Han Zhang & Xiongwen Quan & Zhandong Liu & Yanbin Yin, 2017. "Disease-related gene module detection based on a multi-label propagation clustering algorithm," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-17, May.
    5. Mandel, Antoine & Landini, Simone & Gallegati, Mauro & Gintis, Herbert, 2015. "Price dynamics, financial fragility and aggregate volatility," Journal of Economic Dynamics and Control, Elsevier, vol. 51(C), pages 257-277.
    6. Bárbara Andrade Barbosa & Saskia D. Asten & Ji Won Oh & Arantza Farina-Sarasqueta & Joanne Verheij & Frederike Dijk & Hanneke W. M. Laarhoven & Bauke Ylstra & Juan J. Garcia Vallejo & Mark A. Wiel & Y, 2021. "Bayesian log-normal deconvolution for enhanced in silico microdissection of bulk gene expression data," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    7. Peter Langfelder & Rui Luo & Michael C Oldham & Steve Horvath, 2011. "Is My Network Module Preserved and Reproducible?," PLOS Computational Biology, Public Library of Science, vol. 7(1), pages 1-29, January.
    8. Elva María Novoa-del-Toro & Efrén Mezura-Montes & Matthieu Vignes & Morgane Térézol & Frédérique Magdinier & Laurent Tichit & Anaïs Baudot, 2021. "A multi-objective genetic algorithm to find active modules in multiplex biological networks," PLOS Computational Biology, Public Library of Science, vol. 17(8), pages 1-24, August.
    9. Matias Nehuen Iglesias, 2021. "The Overlooked Insights from Correlation Structures in Economic Geography," Papers in Evolutionary Economic Geography (PEEG) 2105, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Jan 2021.
    10. Lingxue Zhang & Seyoung Kim, 2014. "Learning Gene Networks under SNP Perturbations Using eQTL Datasets," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-20, February.
    11. Benjamin A Samuels & E David Leonardo & Alex Dranovsky & Amanda Williams & Erik Wong & Addie May I Nesbitt & Richard D McCurdy & Rene Hen & Mark Alter, 2014. "Global State Measures of the Dentate Gyrus Gene Expression System Predict Antidepressant-Sensitive Behaviors," PLOS ONE, Public Library of Science, vol. 9(1), pages 1-10, January.
    12. Tingting Bo & Jie Li & Ganlu Hu & Ge Zhang & Wei Wang & Qian Lv & Shaoling Zhao & Junjie Ma & Meng Qin & Xiaohui Yao & Meiyun Wang & Guang-Zhong Wang & Zheng Wang, 2023. "Brain-wide and cell-specific transcriptomic insights into MRI-derived cortical morphology in macaque monkeys," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    13. Chang Su & Zichun Xu & Xinning Shan & Biao Cai & Hongyu Zhao & Jingfei Zhang, 2023. "Cell-type-specific co-expression inference from single cell RNA-sequencing data," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    14. Sahra Uygun & Cheng Peng & Melissa D Lehti-Shiu & Robert L Last & Shin-Han Shiu, 2016. "Utility and Limitations of Using Gene Expression Data to Identify Functional Associations," PLOS Computational Biology, Public Library of Science, vol. 12(12), pages 1-27, December.
    15. Li, Jie & Wang, Lidan & Zhou, Zhong-Qiang & Zhang, Yongjie, 2021. "Monitoring or tunneling? Information interaction among large shareholders and the crash risk of the stock price," Pacific-Basin Finance Journal, Elsevier, vol. 65(C).
    16. Khang Tsung Fei & Yap Von Bing, 2010. "The Apportionment of Total Genetic Variation by Categorical Analysis of Variance," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-34, January.
    17. Shaoshuo Li & Baixing Chen & Hao Chen & Zhen Hua & Yang Shao & Heng Yin & Jianwei Wang, 2021. "Analysis of potential genetic biomarkers and molecular mechanism of smoking-related postmenopausal osteoporosis using weighted gene co-expression network analysis and machine learning," PLOS ONE, Public Library of Science, vol. 16(9), pages 1-18, September.
    18. Peter Langfelder & Fuying Gao & Nan Wang & David Howland & Seung Kwak & Thomas F Vogt & Jeffrey S Aaronson & Jim Rosinski & Giovanni Coppola & Steve Horvath & X William Yang, 2018. "MicroRNA signatures of endogenous Huntingtin CAG repeat expansion in mice," PLOS ONE, Public Library of Science, vol. 13(1), pages 1-20, January.
    19. Renaud Tissier & Jeanine Houwing-Duistermaat & Mar Rodríguez-Girondo, 2018. "Improving stability of prediction models based on correlated omics data by using network approaches," PLOS ONE, Public Library of Science, vol. 13(2), pages 1-23, February.
    20. Shujuan Zhao & Kedous Y. Mekbib & Martijn A. Ent & Garrett Allington & Andrew Prendergast & Jocelyn E. Chau & Hannah Smith & John Shohfi & Jack Ocken & Daniel Duran & Charuta G. Furey & Le Thi Hao & P, 2023. "Mutation of key signaling regulators of cerebrovascular development in vein of Galen malformations," Nature Communications, Nature, vol. 14(1), pages 1-23, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0227760. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.