IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0164766.html
   My bibliography  Save this article

RNA-Seq Count Data Modelling by Grey Relational Analysis and Nonparametric Gaussian Process

Author

Listed:
  • Thanh Nguyen
  • Asim Bhatti
  • Samuel Yang
  • Saeid Nahavandi

Abstract

This paper introduces an approach to classification of RNA-seq read counts using grey relational analysis (GRA) and Bayesian Gaussian process (GP) models. Read counts are transformed to microarray-like data to facilitate normal-based statistical methods. GRA is designed to select differentially expressed genes by integrating outcomes of five individual feature selection methods including two-sample t-test, entropy test, Bhattacharyya distance, Wilcoxon test and receiver operating characteristic curve. GRA performs as an aggregate filter method through combining advantages of the individual methods to produce significant feature subsets that are then fed into a nonparametric GP model for classification. The proposed approach is verified by using two benchmark real datasets and the five-fold cross-validation method. Experimental results show the performance dominance of the GRA-based feature selection method as well as GP classifier against their competing methods. Moreover, the results demonstrate that GRA-GP considerably dominates the sparse Poisson linear discriminant analysis classifiers, which were introduced specifically for read counts, on different number of features. The proposed approach therefore can be implemented effectively in real practice for read count data analysis, which is useful in many applications including understanding disease pathogenesis, diagnosis and treatment monitoring at the molecular level.

Suggested Citation

  • Thanh Nguyen & Asim Bhatti & Samuel Yang & Saeid Nahavandi, 2016. "RNA-Seq Count Data Modelling by Grey Relational Analysis and Nonparametric Gaussian Process," PLOS ONE, Public Library of Science, vol. 11(10), pages 1-18, October.
  • Handle: RePEc:plo:pone00:0164766
    DOI: 10.1371/journal.pone.0164766
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0164766
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0164766&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0164766?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Joseph K. Pickrell & John C. Marioni & Athma A. Pai & Jacob F. Degner & Barbara E. Engelhardt & Everlyne Nkadori & Jean-Baptiste Veyrieras & Matthew Stephens & Yoav Gilad & Jonathan K. Pritchard, 2010. "Understanding mechanisms underlying human gene expression variation with RNA sequencing," Nature, Nature, vol. 464(7289), pages 768-772, April.
    2. Asta Laiho & Laura L Elo, 2014. "A Note on an Exon-Based Strategy to Identify Differentially Expressed Genes in RNA-Seq Experiments," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-12, December.
    3. Isabella Zwiener & Barbara Frisch & Harald Binder, 2014. "Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures," PLOS ONE, Public Library of Science, vol. 9(1), pages 1-13, January.
    4. Yong Wang & Jill Waters & Marco L. Leung & Anna Unruh & Whijae Roh & Xiuqing Shi & Ken Chen & Paul Scheet & Selina Vattathil & Han Liang & Asha Multani & Hong Zhang & Rui Zhao & Franziska Michor & Fun, 2014. "Clonal evolution in breast cancer revealed by single nucleus genome sequencing," Nature, Nature, vol. 512(7513), pages 155-160, August.
    5. Stephen B. Montgomery & Micha Sammeth & Maria Gutierrez-Arcelus & Radoslaw P. Lach & Catherine Ingle & James Nisbett & Roderic Guigo & Emmanouil T. Dermitzakis, 2010. "Transcriptome genetics using second generation sequencing in a Caucasian population," Nature, Nature, vol. 464(7289), pages 773-777, April.
    6. Yaqing Si & Peng Liu, 2013. "An Optimal Test with Maximum Average Power While Controlling FDR with Application to RNA-Seq Data," Biometrics, The International Biometric Society, vol. 69(3), pages 594-605, September.
    7. Auer Paul L. & Doerge Rebecca W, 2011. "A Two-Stage Poisson Model for Testing RNA-Seq Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-26, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Faisal Shahla & Tutz Gerhard, 2017. "Missing value imputation for gene expression data by tailored nearest neighbors," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(2), pages 95-106, April.
    2. Kensuke Yamaguchi & Kazuyoshi Ishigaki & Akari Suzuki & Yumi Tsuchida & Haruka Tsuchiya & Shuji Sumitomo & Yasuo Nagafuchi & Fuyuki Miya & Tatsuhiko Tsunoda & Hirofumi Shoda & Keishi Fujio & Kazuhiko , 2022. "Splicing QTL analysis focusing on coding sequences reveals mechanisms for disease susceptibility loci," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    3. Alexandra C Nica & Leopold Parts & Daniel Glass & James Nisbet & Amy Barrett & Magdalena Sekowska & Mary Travers & Simon Potter & Elin Grundberg & Kerrin Small & Åsa K Hedman & Veronique Bataille & Jo, 2011. "The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study," PLOS Genetics, Public Library of Science, vol. 7(2), pages 1-9, February.
    4. Jean Francois Lefebvre & Emilio Vello & Bing Ge & Stephen B Montgomery & Emmanouil T Dermitzakis & Tomi Pastinen & Damian Labuda, 2012. "Genotype-Based Test in Mapping Cis-Regulatory Variants from Allele-Specific Expression Data," PLOS ONE, Public Library of Science, vol. 7(6), pages 1-15, June.
    5. Daria V Zhernakova & Eleonora de Klerk & Harm-Jan Westra & Anastasios Mastrokolias & Shoaib Amini & Yavuz Ariyurek & Rick Jansen & Brenda W Penninx & Jouke J Hottenga & Gonneke Willemsen & Eco J de Ge, 2013. "DeepSAGE Reveals Genetic Variants Associated with Alternative Polyadenylation and Expression of Coding and Non-coding Transcripts," PLOS Genetics, Public Library of Science, vol. 9(6), pages 1-15, June.
    6. Pounds Stanley B. & Gao Cuilan L. & Zhang Hui, 2012. "Empirical Bayesian Selection of Hypothesis Testing Procedures for Analysis of Sequence Count Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(5), pages 1-32, October.
    7. Humberto Contreras-Trujillo & Jiya Eerdeng & Samir Akre & Du Jiang & Jorge Contreras & Basia Gala & Mary C. Vergel-Rodriguez & Yeachan Lee & Aparna Jorapur & Areen Andreasian & Lisa Harton & Charles S, 2021. "Deciphering intratumoral heterogeneity using integrated clonal tracking and single-cell transcriptome analyses," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    8. Sora Yoon & Seon-Young Kim & Dougu Nam, 2016. "Improving Gene-Set Enrichment Analysis of RNA-Seq Data with Small Replicates," PLOS ONE, Public Library of Science, vol. 11(11), pages 1-16, November.
    9. Cheng-Kai Shiau & Lina Lu & Rachel Kieser & Kazutaka Fukumura & Timothy Pan & Hsiao-Yun Lin & Jie Yang & Eric L. Tong & GaHyun Lee & Yuanqing Yan & Jason T. Huse & Ruli Gao, 2023. "High throughput single cell long-read sequencing analyses of same-cell genotypes and phenotypes in human tumors," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    10. Jinhyun Kim & Sungsik Kim & Huiran Yeom & Seo Woo Song & Kyoungseob Shin & Sangwook Bae & Han Suk Ryu & Ji Young Kim & Ahyoun Choi & Sumin Lee & Taehoon Ryu & Yeongjae Choi & Hamin Kim & Okju Kim & Yu, 2023. "Barcoded multiple displacement amplification for high coverage sequencing in spatial genomics," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    11. Nicolas Jouvin & Pierre Latouche & Charles Bouveyron & Guillaume Bataillon & Alain Livartowski, 2021. "Greedy clustering of count data through a mixture of multinomial PCA," Computational Statistics, Springer, vol. 36(1), pages 1-33, March.
    12. Dong, Kai & Pang, Herbert & Tong, Tiejun & Genton, Marc G., 2016. "Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 127-142.
    13. Pingting Ying & Can Chen & Zequn Lu & Shuoni Chen & Ming Zhang & Yimin Cai & Fuwei Zhang & Jinyu Huang & Linyun Fan & Caibo Ning & Yanmin Li & Wenzhuo Wang & Hui Geng & Yizhuo Liu & Wen Tian & Zhiyong, 2023. "Genome-wide enhancer-gene regulatory maps link causal variants to target genes underlying human cancer risk," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    14. Kyung-Won Hong & Seok Won Jeong & Myungguen Chung & Seong Beom Cho, 2014. "Association between Expression Quantitative Trait Loci and Metabolic Traits in Two Korean Populations," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-13, December.
    15. Armin Rauschenberger & Iuliana Ciocănea-Teodorescu & Marianne A. Jonker & Renée X. Menezes & Mark A. Wiel, 2020. "Sparse classification with paired covariates," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(3), pages 571-588, September.
    16. Barbara E Stranger & Stephen B Montgomery & Antigone S Dimas & Leopold Parts & Oliver Stegle & Catherine E Ingle & Magda Sekowska & George Davey Smith & David Evans & Maria Gutierrez-Arcelus & Alkes P, 2012. "Patterns of Cis Regulatory Variation in Diverse Human Populations," PLOS Genetics, Public Library of Science, vol. 8(4), pages 1-13, April.
    17. Xiaodong Cai & Juan Andrés Bazerque & Georgios B Giannakis, 2013. "Inference of Gene Regulatory Networks with Sparse Structural Equation Models Exploiting Genetic Perturbations," PLOS Computational Biology, Public Library of Science, vol. 9(5), pages 1-13, May.
    18. Nicoló Fusi & Oliver Stegle & Neil D Lawrence, 2012. "Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies," PLOS Computational Biology, Public Library of Science, vol. 8(1), pages 1-9, January.
    19. Bin Wang, 2020. "A Zipf-plot based normalization method for high-throughput RNA-seq data," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-15, April.
    20. Jin Hyun Ju & Sushila A Shenoy & Ronald G Crystal & Jason G Mezey, 2017. "An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci," PLOS Computational Biology, Public Library of Science, vol. 13(5), pages 1-26, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0164766. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.