IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0230594.html
   My bibliography  Save this article

A Zipf-plot based normalization method for high-throughput RNA-seq data

Author

Listed:
  • Bin Wang

Abstract

Normalization is crucial in RNA-seq data analyses. Due to the existence of excessive zeros and a large number of small measures, it is challenging to find reliable linear rescaling normalization parameters. We propose a Zipf plot based normalization method (ZN) assuming that all gene profiles have similar upper tail behaviors in their expression distributions. The new normalization method uses global information of all genes in the same profile without gene-level expression alteration. It doesn’t require the majority of genes to be not differentially expressed (DE), and can be applied to data where the majority of genes are weakly or not expressed. Two normalization schemes are implemented with ZN: a linear rescaling scheme and a non-linear transformation scheme. The linear rescaling scheme can be applied alone or together with the non-linear normalization scheme. The performance of ZN is benchmarked against five popular linear normalization methods for RNA-seq data. Results show that the linear rescaling normalization scheme by itself works well and is robust. The non-linear normalization scheme can further improve the normalization outcomes and is optional if the Zipf plots show parallel patterns.

Suggested Citation

  • Bin Wang, 2020. "A Zipf-plot based normalization method for high-throughput RNA-seq data," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-15, April.
  • Handle: RePEc:plo:pone00:0230594
    DOI: 10.1371/journal.pone.0230594
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0230594
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0230594&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0230594?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. James Robert White & Niranjan Nagarajan & Mihai Pop, 2009. "Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples," PLOS Computational Biology, Public Library of Science, vol. 5(4), pages 1-11, April.
    2. Joseph K. Pickrell & John C. Marioni & Athma A. Pai & Jacob F. Degner & Barbara E. Engelhardt & Everlyne Nkadori & Jean-Baptiste Veyrieras & Matthew Stephens & Yoav Gilad & Jonathan K. Pritchard, 2010. "Understanding mechanisms underlying human gene expression variation with RNA sequencing," Nature, Nature, vol. 464(7289), pages 768-772, April.
    3. Farnoosh Abbas-Aghababazadeh & Qian Li & Brooke L Fridley, 2018. "Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-21, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chen Ge & Shu-Guang Zhang & Bin Wang, 2020. "Modeling the joint distribution of firm size and firm age based on grouped data," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-19, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pengfei Song & Wen Qin & YanGan Huang & Lei Wang & Zhenyuan Cai & Tongzuo Zhang, 2020. "Grazing Management Influences Gut Microbial Diversity of Livestock in the Same Area," Sustainability, MDPI, vol. 12(10), pages 1-12, May.
    2. Shilan Li & Jianxin Shi & Paul Albert & Hong-Bin Fang, 2022. "Dependence Structure Analysis and Its Application in Human Microbiome," Mathematics, MDPI, vol. 11(1), pages 1-14, December.
    3. Sora Yoon & Seon-Young Kim & Dougu Nam, 2016. "Improving Gene-Set Enrichment Analysis of RNA-Seq Data with Small Replicates," PLOS ONE, Public Library of Science, vol. 11(11), pages 1-16, November.
    4. Allison G. White & George S. Watts & Zhenqiang Lu & Maria M. Meza-Montenegro & Eric A. Lutz & Philip Harber & Jefferey L. Burgess, 2014. "Environmental Arsenic Exposure and Microbiota in Induced Sputum," IJERPH, MDPI, vol. 11(2), pages 1-15, February.
    5. Yong Li & Jiejie Zhang & Jianqiang Zhang & Wenlai Xu & Zishen Mou, 2019. "Microbial Community Structure in the Sediments and Its Relation to Environmental Factors in Eutrophicated Sancha Lake," IJERPH, MDPI, vol. 16(11), pages 1-15, May.
    6. Monica Vera-Lise Tulstrup & Ellen Gerd Christensen & Vera Carvalho & Caroline Linninge & Siv Ahrné & Ole Højberg & Tine Rask Licht & Martin Iain Bahl, 2015. "Antibiotic Treatment Affects Intestinal Permeability and Gut Microbial Composition in Wistar Rats Dependent on Antibiotic Class," PLOS ONE, Public Library of Science, vol. 10(12), pages 1-17, December.
    7. Zhenqiu Liu & Dechang Chen & Li Sheng & Amy Y Liu, 2013. "Class Prediction and Feature Selection with Linear Optimization for Metagenomic Count Data," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-7, March.
    8. Pingting Ying & Can Chen & Zequn Lu & Shuoni Chen & Ming Zhang & Yimin Cai & Fuwei Zhang & Jinyu Huang & Linyun Fan & Caibo Ning & Yanmin Li & Wenzhuo Wang & Hui Geng & Yizhuo Liu & Wen Tian & Zhiyong, 2023. "Genome-wide enhancer-gene regulatory maps link causal variants to target genes underlying human cancer risk," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    9. Paul J McMurdie & Susan Holmes, 2014. "Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible," PLOS Computational Biology, Public Library of Science, vol. 10(4), pages 1-12, April.
    10. Xiaodong Cai & Juan Andrés Bazerque & Georgios B Giannakis, 2013. "Inference of Gene Regulatory Networks with Sparse Structural Equation Models Exploiting Genetic Perturbations," PLOS Computational Biology, Public Library of Science, vol. 9(5), pages 1-13, May.
    11. Edoardo Pasolli & Duy Tin Truong & Faizan Malik & Levi Waldron & Nicola Segata, 2016. "Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights," PLOS Computational Biology, Public Library of Science, vol. 12(7), pages 1-26, July.
    12. Nicoló Fusi & Oliver Stegle & Neil D Lawrence, 2012. "Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies," PLOS Computational Biology, Public Library of Science, vol. 8(1), pages 1-9, January.
    13. Hongjian Wei & Yongqi Wang & Juming Zhang & Liangfa Ge & Tianzeng Liu, 2022. "Changes in Soil Bacterial Community Structure in Bermudagrass Turf under Short-Term Traffic Stress," Agriculture, MDPI, vol. 12(5), pages 1-18, May.
    14. Jin Hyun Ju & Sushila A Shenoy & Ronald G Crystal & Jason G Mezey, 2017. "An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci," PLOS Computational Biology, Public Library of Science, vol. 13(5), pages 1-26, May.
    15. Kai Qiu & Huiyi Cai & Xin Wang & Guohua Liu, 2023. "Effects of Peroral Microbiota Transplantation on the Establishment of Intestinal Microorganisms in a Newly-Hatched Chick Model," Agriculture, MDPI, vol. 13(5), pages 1-13, April.
    16. Qi Xu & Xiaoya Yuan & Tiantian Gu & Yang Li & Wangcheng Dai & Xiaokun Shen & Yadong Song & Yang Zhang & Wenming Zhao & Guobin Chang & Guohong Chen, 2017. "Comparative characterization of bacterial communities in geese fed all-grass or high-grain diets," PLOS ONE, Public Library of Science, vol. 12(10), pages 1-14, October.
    17. Faisal Shahla & Tutz Gerhard, 2017. "Missing value imputation for gene expression data by tailored nearest neighbors," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(2), pages 95-106, April.
    18. Amirhossein Shamsaddini & Kimia Dadkhah & Patrick M Gillevet, 2020. "BiomMiner: An advanced exploratory microbiome analysis and visualization pipeline," PLOS ONE, Public Library of Science, vol. 15(6), pages 1-13, June.
    19. Gregor Gorkiewicz & Gerhard G Thallinger & Slave Trajanoski & Stefan Lackner & Gernot Stocker & Thomas Hinterleitner & Christian Gülly & Christoph Högenauer, 2013. "Alterations in the Colonic Microbiota in Response to Osmotic Diarrhea," PLOS ONE, Public Library of Science, vol. 8(2), pages 1-17, February.
    20. Tang Clara S. & Ferreira Manuel A. R., 2012. "GENOVA: Gene Overlap Analysis of GWAS Results," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-15, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0230594. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.