IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/0040022.html
   My bibliography  Save this article

Microarray Based Diagnosis Profits from Better Documentation of Gene Expression Signatures

Author

Listed:
  • Dennis Kostka
  • Rainer Spang

Abstract

Microarray gene expression signatures hold great promise to improve diagnosis and prognosis of disease. However, current documentation standards of such signatures do not allow for an unambiguous application to study-external patients. This hinders independent evaluation, effectively delaying the use of signatures in clinical practice. Data from eight publicly available clinical microarray studies were analyzed and the consistency of study-internal with study-external diagnoses was evaluated. Study-external classifications were based on documented information only. Documenting a signature is conceptually different from reporting a list of genes. We show that even the exact quantitative specification of a classification rule alone does not define a signature unambiguously. We found that discrepancy between study-internal and study-external diagnoses can be as frequent as 30% (worst case) and 18% (median). By using the proposed documentation by value strategy, which documents quantitative preprocessing information, the median discrepancy was reduced to 1%. The process of evaluating microarray gene expression diagnostic signatures and bringing them to clinical practice can be substantially improved and made more reliable by better documentation of the signatures.: It has been shown that microarray based gene expression signatures have the potential to be powerful tools for patient stratification, diagnosis of disease, prognosis of survival, assessment of risk group, and selection of treatment. However, documentation standards in current publications do not allow for a signature's unambiguous application to study-external patients. This hinders independent evaluation, effectively delaying the use of signatures in clinical practice. Based on eight clinical microarray studies, we show that common documentation standards have the following shortcoming: when using the documented information only, the same patient might receive a diagnosis different from the one he would have received in the original study. To address the problem, we derive a documentation protocol that reduces the ambiguity of diagnoses to a minimum. The resulting gain in consistency of study-internal versus study-external diagnosis is validated by statistical resampling analysis: using the proposed documentation by value strategy, the median inconsistency dropped from 18% to 1%. Software implementing the proposed method, as well as practical guidelines for using it, are provided. We conclude that the process of evaluating microarray gene expression diagnostic signatures and bringing them to clinical practice can be substantially improved and made more reliable by better documentation.

Suggested Citation

  • Dennis Kostka & Rainer Spang, 2008. "Microarray Based Diagnosis Profits from Better Documentation of Gene Expression Signatures," PLOS Computational Biology, Public Library of Science, vol. 4(2), pages 1-6, February.
  • Handle: RePEc:plo:pcbi00:0040022
    DOI: 10.1371/journal.pcbi.0040022
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0040022
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.0040022&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.0040022?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Lee, Jae Won & Lee, Jung Bok & Park, Mira & Song, Seuck Heun, 2005. "An extensive comparison of recent classification tools applied to microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 48(4), pages 869-885, April.
    2. Tibshirani Robert J. & Efron Brad, 2002. "Pre-validation and inference in microarrays," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 1(1), pages 1-20, August.
    3. Andrea H. Bild & Guang Yao & Jeffrey T. Chang & Quanli Wang & Anil Potti & Dawn Chasse & Mary-Beth Joshi & David Harpole & Johnathan M. Lancaster & Andrew Berchuck & John A. Olson & Jeffrey R. Marks &, 2006. "Oncogenic pathway signatures in human cancers as a guide to targeted therapies," Nature, Nature, vol. 439(7074), pages 353-357, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Frénay, Benoît & Doquire, Gauthier & Verleysen, Michel, 2014. "Estimating mutual information for feature selection in the presence of label noise," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 832-848.
    2. Junjie Su & Byung-Jun Yoon & Edward R Dougherty, 2009. "Accurate and Reliable Cancer Classification Based on Probabilistic Inference of Pathway Activity," PLOS ONE, Public Library of Science, vol. 4(12), pages 1-10, December.
    3. Carey K Anders & Chaitanya R Acharya & David S Hsu & Gloria Broadwater & Katherine Garman & John A Foekens & Yi Zhang & Yixin Wang & Kelly Marcom & Jeffrey R Marks & Sayan Mukherjee & Joseph R Nevins , 2008. "Age-Specific Differences in Oncogenic Pathway Deregulation Seen in Human Breast Tumors," PLOS ONE, Public Library of Science, vol. 3(1), pages 1-8, January.
    4. Verena Jabs & Karolina Edlund & Helena König & Marianna Grinberg & Katrin Madjar & Jörg Rahnenführer & Simon Ekman & Michael Bergkvist & Lars Holmberg & Katja Ickstadt & Johan Botling & Jan G Hengstle, 2017. "Integrative analysis of genome-wide gene copy number changes and gene expression in non-small cell lung cancer," PLOS ONE, Public Library of Science, vol. 12(11), pages 1-23, November.
    5. Eun Sung Park & Ju-Seog Lee & Hyun Goo Woo & Fenghuang Zhan & Joanna H Shih & John D Shaughnessy Jr. & J Frederic Mushinski, 2007. "Heterologous Tissue Culture Expression Signature Predicts Human Breast Cancer Prognosis," PLOS ONE, Public Library of Science, vol. 2(1), pages 1-16, January.
    6. Alan R Dabney & John D Storey, 2007. "Optimality Driven Nearest Centroid Classification from Genomic Data," PLOS ONE, Public Library of Science, vol. 2(10), pages 1-7, October.
    7. Dong, Kai & Pang, Herbert & Tong, Tiejun & Genton, Marc G., 2016. "Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 127-142.
    8. Lida Qiu & Deyong Kang & Chuan Wang & Wenhui Guo & Fangmeng Fu & Qingxiang Wu & Gangqin Xi & Jiajia He & Liqin Zheng & Qingyuan Zhang & Xiaoxia Liao & Lianhuang Li & Jianxin Chen & Haohua Tu, 2022. "Intratumor graph neural network recovers hidden prognostic value of multi-biomarker spatial heterogeneity," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    9. David Lindgren & Gottfrid Sjödahl & Martin Lauss & Johan Staaf & Gunilla Chebil & Kristina Lövgren & Sigurdur Gudjonsson & Fredrik Liedberg & Oliver Patschan & Wiking Månsson & Mårten Fernö & Mattias , 2012. "Integrated Genomic and Gene Expression Profiling Identifies Two Major Genomic Circuits in Urothelial Carcinoma," PLOS ONE, Public Library of Science, vol. 7(6), pages 1-11, June.
    10. Matthias Weber & Martin Schumacher & Harald Binder, 2014. "Regularized Regression Incorporating Network Information: Simultaneous Estimation of Covariate Coefficients and Connection Signs," Tinbergen Institute Discussion Papers 14-089/I, Tinbergen Institute.
    11. Herbert Pang & Tiejun Tong & Hongyu Zhao, 2009. "Shrinkage-based Diagonal Discriminant Analysis and Its Applications in High-Dimensional Data," Biometrics, The International Biometric Society, vol. 65(4), pages 1021-1029, December.
    12. Shieh Albert D & Hung Yeung Sam, 2009. "Detecting Outlier Samples in Microarray Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-24, February.
    13. Valkenborg Dirk & Van Sanden Suzy & Lin Dan & Kasim Adetayo & Zhu Qi & Haldermans Philippe & Jansen Ivy & Shkedy Ziv & Burzykowski Tomasz, 2008. "A Cross-Validation Study to Select a Classification Procedure for Clinical Diagnosis Based on Proteomic Mass Spectrometry," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(2), pages 1-22, March.
    14. Lambert-Lacroix, Sophie & Peyre, Julie, 2006. "Local likelihood regression in generalized linear single-index models with applications to microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 51(3), pages 2091-2113, December.
    15. Hu, Jianwei & Chai, Hao, 2013. "Adjusted regularized estimation in the accelerated failure time model with high dimensional covariates," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 96-114.
    16. Jong Victor L. & Novianti Putri W. & Roes Kit C.B. & Eijkemans Marinus J.C., 2014. "Exploring homogeneity of correlation structures of gene expression datasets within and between etiological disease categories," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(6), pages 1-16, December.
    17. Xuan Bich Trinh & Wiebren A A Tjalma & Luc Y Dirix & Peter B Vermeulen & Dieter J Peeters & Dimcho Bachvarov & Marie Plante & Els M Berns & Jozien Helleman & Steven J Van Laere & Peter A van Dam, 2011. "Microarray-Based Oncogenic Pathway Profiling in Advanced Serous Papillary Ovarian Carcinoma," PLOS ONE, Public Library of Science, vol. 6(7), pages 1-9, July.
    18. Yang, Tae Young, 2009. "Efficient multi-class cancer diagnosis algorithm, using a global similarity pattern," Computational Statistics & Data Analysis, Elsevier, vol. 53(3), pages 756-765, January.
    19. Lucas Joseph & Carvalho Carlos & West Mike, 2009. "A Bayesian Analysis Strategy for Cross-Study Translation of Gene Expression Biomarkers," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-26, February.
    20. Peter Langfelder & Paul S Mischel & Steve Horvath, 2013. "When Is Hub Gene Selection Better than Standard Meta-Analysis?," PLOS ONE, Public Library of Science, vol. 8(4), pages 1-16, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:0040022. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.