IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0036540.html
   My bibliography  Save this article

Algebraic Comparison of Partial Lists in Bioinformatics

Author

Listed:
  • Giuseppe Jurman
  • Samantha Riccadonna
  • Roberto Visintainer
  • Cesare Furlanello

Abstract

The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or to a meta-analysis comparison, it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained, instead of just one list. Here we introduce a method, based on permutations, for studying the variability between lists (“list stability”) in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated by finding and comparing gene profiles on a large prostate cancer dataset, consisting of two cohorts of patients from different countries, for a total of 455 samples.

Suggested Citation

  • Giuseppe Jurman & Samantha Riccadonna & Roberto Visintainer & Cesare Furlanello, 2012. "Algebraic Comparison of Partial Lists in Bioinformatics," PLOS ONE, Public Library of Science, vol. 7(5), pages 1-20, May.
  • Handle: RePEc:plo:pone00:0036540
    DOI: 10.1371/journal.pone.0036540
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0036540
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0036540&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0036540?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Anne-Claire Haury & Pierre Gestraud & Jean-Philippe Vert, 2011. "The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures," PLOS ONE, Public Library of Science, vol. 6(12), pages 1-12, December.
    2. Peter Hall & Michael G. Schimek, 2012. "Moderate-Deviation-Based Inference for Random Degeneration in Paired Rank Lists," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(498), pages 661-672, June.
    3. Efron B. & Tibshirani R. & Storey J.D. & Tusher V., 2001. "Empirical Bayes Analysis of a Microarray Experiment," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1151-1160, December.
    4. John D. Storey, 2002. "A direct approach to false discovery rates," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(3), pages 479-498, August.
    5. Lin Shili, 2010. "Space Oriented Rank-Based Data Integration," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-25, April.
    6. Dudoit S. & Fridlyand J. & Speed T. P, 2002. "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 77-87, March.
    7. Shili Lin & Jie Ding, 2009. "Integration of Ranked Lists via Cross Entropy Monte Carlo with Applications to mRNA and microRNA Studies," Biometrics, The International Biometric Society, vol. 65(1), pages 9-18, March.
    8. Barbara Di Camillo & Tiziana Sanavia & Matteo Martini & Giuseppe Jurman & Francesco Sambo & Annalisa Barla & Margherita Squillario & Cesare Furlanello & Gianna Toffolo & Claudio Cobelli, 2012. "Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment," PLOS ONE, Public Library of Science, vol. 7(3), pages 1-8, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Švendová, Vendula & Schimek, Michael G., 2017. "A novel method for estimating the common signals for consensus across multiple ranked lists," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 122-135.
    2. Antonio D’Ambrosio & Carmela Iorio & Michele Staiano & Roberta Siciliano, 2019. "Median constrained bucket order rank aggregation," Computational Statistics, Springer, vol. 34(2), pages 787-802, June.
    3. Youngchao Ge & Sandrine Dudoit & Terence Speed, 2003. "Resampling-based multiple testing for microarray data analysis," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 12(1), pages 1-77, June.
    4. Wen Shi & Xi Chen & Jennifer Shang, 2019. "An Efficient Morris Method-Based Framework for Simulation Factor Screening," INFORMS Journal on Computing, INFORMS, vol. 31(4), pages 745-770, October.
    5. Hossain, Ahmed & Beyene, Joseph & Willan, Andrew R. & Hu, Pingzhao, 2009. "A flexible approximate likelihood ratio test for detecting differential expression in microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3685-3695, August.
    6. Dørum Guro & Snipen Lars & Solheim Margrete & Saebo Solve, 2011. "Smoothing Gene Expression Data with Network Information Improves Consistency of Regulated Genes," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-26, August.
    7. Won, Joong-Ho & Lim, Johan & Yu, Donghyeon & Kim, Byung Soo & Kim, Kyunga, 2014. "Monotone false discovery rate," Statistics & Probability Letters, Elsevier, vol. 87(C), pages 86-93.
    8. Xiaoquan Wen, 2017. "Robust Bayesian FDR Control Using Bayes Factors, with Applications to Multi-tissue eQTL Discovery," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(1), pages 28-49, June.
    9. Kline, Patrick & Walters, Christopher, 2019. "Audits as Evidence: Experiments, Ensembles, and Enforcement," Institute for Research on Labor and Employment, Working Paper Series qt3z72m9kn, Institute of Industrial Relations, UC Berkeley.
    10. Alejandro Ochoa & John D Storey & Manuel Llinás & Mona Singh, 2015. "Beyond the E-Value: Stratified Statistics for Protein Domain Prediction," PLOS Computational Biology, Public Library of Science, vol. 11(11), pages 1-21, November.
    11. E. M. Conlon & B. L. Postier & B. A. Methe & K. P. Nevin & D. R. Lovley, 2009. "Hierarchical Bayesian meta-analysis models for cross-platform microarray studies," Journal of Applied Statistics, Taylor & Francis Journals, vol. 36(10), pages 1067-1085.
    12. Izmirlian, Grant, 2020. "Strong consistency and asymptotic normality for quantities related to the Benjamini–Hochberg false discovery rate procedure," Statistics & Probability Letters, Elsevier, vol. 160(C).
    13. Chen, Xiongzhi, 2019. "Uniformly consistently estimating the proportion of false null hypotheses via Lebesgue–Stieltjes integral equations," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 724-744.
    14. Cipolli III, William & Hanson, Timothy & McLain, Alexander C., 2016. "Bayesian nonparametric multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 64-79.
    15. Guo Wenge & Peddada Shyamal, 2008. "Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-21, March.
    16. Alessio Farcomeni, 2006. "More Powerful Control of the False Discovery Rate Under Dependence," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 15(1), pages 43-73, May.
    17. M. Kathleen Kerr, 2003. "Design Considerations for Efficient and Effective Microarray Studies," Biometrics, The International Biometric Society, vol. 59(4), pages 822-828, December.
    18. Ang Li & Rina Foygel Barber, 2017. "Accumulation Tests for FDR Control in Ordered Hypothesis Testing," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 837-849, April.
    19. Patrick Kline & Christopher Walters, 2021. "Reasonable Doubt: Experimental Detection of Job‐Level Employment Discrimination," Econometrica, Econometric Society, vol. 89(2), pages 765-792, March.
    20. Nik Tuzov & Frederi Viens, 2011. "Mutual fund performance: false discoveries, bias, and power," Annals of Finance, Springer, vol. 7(2), pages 137-169, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0036540. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.