IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v3y2004i1n10.html
   My bibliography  Save this article

Evaluation of Multiple Models to Distinguish Closely Related Forms of Disease Using DNA Microarray Data: an Application to Multiple Myeloma

Author

Listed:
  • Hardin Johanna

    (Pomona College)

  • Waddell Michael

    (University of Wisconsin, Madison)

  • Page C. David

    (University of Wisconsin)

  • Zhan Fenghuang

    (University of Arkansas for Medical Sciences, Little Rock)

  • Barlogie Bart

    (University of Arkansas)

  • Shaughnessy John

    (University of Arkansas for Medical Sciences)

  • Crowley John J

    (Cancer Research And Biostatistics)

Abstract

Motivation: Standard laboratory classification of the plasma cell dyscrasia monoclonal gammopathy of undetermined significance (MGUS) and the overt plasma cell neoplasm multiple myeloma (MM) is quite accurate, yet, for the most part, biologically uninformative. Most, if not all, cancers are caused by inherited or acquired genetic mutations that manifest themselves in altered gene expression patterns in the clonally related cancer cells. Microarray technology allows for qualitative and quantitative measurements of the expression levels of thousands of genes simultaneously, and it has now been used both to classify cancers that are morphologically indistinguishable and to predict response to therapy. It is anticipated that this information can also be used to develop molecular diagnostic models and to provide insight into mechanisms of disease progression, e.g., transition from healthy to benign hyperplasia or conversion of a benign hyperplasia to overt malignancy. However, standard data analysis techniques are not trivial to employ on these large data sets. Methodology designed to handle large data sets (or modified to do so) is needed to access the vital information contained in the genetic samples, which in turn can be used to develop more robust and accurate methods of clinical diagnostics and prognostics.Results: Here we report on the application of a panel of statistical and data mining methodologies to classify groups of samples based on expression of 12,000 genes derived from a high density oligonucleotide microarray analysis of highly purified plasma cells from newly diagnosed MM, MGUS, and normal healthy donors. The three groups of samples are each tested against each other. The methods are found to be similar in their ability to predict group membership; all do quite well at predicting MM vs. normal and MGUS vs. normal. However, no method appears to be able to distinguish explicitly the genetic mechanisms between MM and MGUS. We believe this might be due to the lack of genetic differences between these two conditions, and may not be due to the failure of the models. We report the prediction errors for each of the models and each of the methods. Additionally, we report ROC curves for the results on group prediction.Availability: Logistic regression: standard software, available, for example in SAS. Decision trees and boosted trees: C5.0 from www.rulequest.com. SVM: SVM-light is publicly available from svmlight.joachims.org. Naïve Bayes and ensemble of voters are publicly available from www.biostat.wisc.edu/~mwaddell/eov.html. Nearest Shrunken Centroids is publicly available from http://www-stat.stanford.edu/~tibs/PAM.

Suggested Citation

  • Hardin Johanna & Waddell Michael & Page C. David & Zhan Fenghuang & Barlogie Bart & Shaughnessy John & Crowley John J, 2004. "Evaluation of Multiple Models to Distinguish Closely Related Forms of Disease Using DNA Microarray Data: an Application to Multiple Myeloma," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-24, June.
  • Handle: RePEc:bpj:sagmbi:v:3:y:2004:i:1:n:10
    DOI: 10.2202/1544-6115.1018
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1544-6115.1018
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1544-6115.1018?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:3:y:2004:i:1:n:10. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.