IDEAS home Printed from https://ideas.repec.org/p/bep/ucbbio/1136.html
   My bibliography  Save this paper

Loss-Based Estimation with Cross-Validation: Applications to Microarray Data Analysis and Motif Finding

Author

Listed:
  • Sandrine Dudoit

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Mark van der Laan

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Sunduz Keles

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Annette Molinaro

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Sandra Sinisi

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

  • Siew Leng Teng

    (Division of Biostatistics, School of Public Health, University of California, Berkeley)

Abstract

Current statistical inference problems in genomic data analysis involve parameter estimation for high-dimensional multivariate distributions, with typically unknown and intricate correlation patterns among variables. Addressing these inference questions satisfactorily requires: (i) an intensive and thorough search of the parameter space to generate good candidate estimators, (ii) an approach for selecting an optimal estimator among these candidates, and (iii) a method for reliably assessing the performance of the resulting estimator. We propose a unified loss-based methodology for estimator construction, selection, and performance assessment with cross-validation. In this approach, the parameter of interest is defined as the risk minimizer for a suitable loss function and candidate estimators are generated using this (or possibly another) loss function. Cross-validation is applied to select an optimal estimator among the candidates and to assess the overall performance of the resulting estimator. This general estimation framework encompasses a number of problems which have traditionally been treated separately in the statistical literature, including multivariate outcome prediction and density estimation based on either uncensored or censored data. This article provides an overview of the methodology and describes its application to two problems in genomic data analysis: the prediction of biological and clinical outcomes (possibly censored) using microarray gene expression measures and the identification of regulatory motifs (i.e., transcription factor binding sites) in DNA sequences.

Suggested Citation

  • Sandrine Dudoit & Mark van der Laan & Sunduz Keles & Annette Molinaro & Sandra Sinisi & Siew Leng Teng, 2004. "Loss-Based Estimation with Cross-Validation: Applications to Microarray Data Analysis and Motif Finding," U.C. Berkeley Division of Biostatistics Working Paper Series 1136, Berkeley Electronic Press.
  • Handle: RePEc:bep:ucbbio:1136
    Note: oai:bepress.com:ucbbiostat-1136
    as

    Download full text from publisher

    File URL: http://www.bepress.com/cgi/viewcontent.cgi?article=1136&context=ucbbiostat
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Keles Sunduz & van der Laan Mark J. & Dudoit Sandrine & Xing Biao & Eisen Michael B., 2003. "Supervised Detection of Regulatory Motifs in DNA Sequences," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 2(1), pages 1-40, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2015. "Tree-based censored regression with applications to insurance," Working Papers hal-01141228, HAL.
    2. Laan Mark J. van der & Dudoit Sandrine & Vaart Aad W. van der, 2006. "The cross-validated adaptive epsilon-net estimator," Statistics & Risk Modeling, De Gruyter, vol. 24(3), pages 1-23, December.
    3. Molinaro, Annette M. & Dudoit, Sandrine & van der Laan, M.J.Mark J., 2004. "Tree-based multivariate regression and density estimation with right-censored data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 154-177, July.
    4. Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2016. "Tree-based censored regression with applications in insurance," Post-Print hal-01141228, HAL.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Molinaro, Annette M. & Dudoit, Sandrine & van der Laan, M.J.Mark J., 2004. "Tree-based multivariate regression and density estimation with right-censored data," Journal of Multivariate Analysis, Elsevier, vol. 90(1), pages 154-177, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bep:ucbbio:1136. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.bepress.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.