IDEAS home Printed from
   My bibliography  Save this article

Impacts of Predictive Genomic Classifier Performance on Subpopulation-Specific Treatment Effects Assessment


  • Sue-Jane Wang

    () (OTS, CDER, US FDA
    Center for Drug Evaluation and Research, U.S. Food and Drug Administration)

  • Ming-Chung Li

    (National Cancer Institute, National Institutes of Health)


Abstract We consider three (strong, moderate and mild) predictive biomarker scenarios with varying prevalence. As such, there is no treatment effect in the biomarker negative (g −) patient subpopulation. Relative to g −, there is a four-fold profound treatment effect in the biomarker positive (g +) patient subpopulation, a strongly predictive scenario; a three-fold large g + subpopulation treatment effect, a moderately predictive scenario; and a two-fold modest g + subpopulation treatment effect, a mildly predictive scenario. In this paper, we focus on binary endpoint in prescribing treatment effect. Using a Breiman’s (Mach. Learn. 24:123–140, 1996) machine learning voting algorithm via a k-fold cross-validated approach applied by Freidlin et al. (Clin. Cancer Res. 16:691–698, 2010), a predictive biomarker may be developed. We consider development or discovery of a genomic biomarker using microarray gene expressions data in randomized controlled trials and validate the biomarker’s predictive performance in an independent data set. We investigate the classification performance characteristics of a binary genomic composite biomarker (expected to be predictive of treatment effects) including sensitivity, specificity, accuracy, positive predictive value and negative predictive value as a function of true sensitive prevalence. In doing so, we report the finding based on three representative tuning parameter sets with varying degree of rigor in their choices of the parameters ranging from highly rigorous, moderately rigorous to mildly rigorous. We articulate the rationales on the choices of tuning parameter sets. We also study the impacts of misclassification of genomic biomarker classifiers on their assessment of treatment effects in the positive and negative patient subpopulations, and all-comer patients. We elucidate via simulation studies on approaches to improve sensitivity when a biomarker is highly specific but poorly sensitive, a scenario that is most likely to lead to an incorrect test conclusion of an applicable significant treatment effect in a specific patient subpopulation or both positive and negative subpopulations. We explore when it will be beneficial to develop a binary predictive biomarker and conclude that hypothesis test inferences for the g + subpopulation treatment effect in the dual hypotheses setting (all-comer and g + alone) cannot be relied upon if the biomarker classifier is only highly specific and poorly sensitive or resulting in poor negative predictive value. The converse dual hypotheses (all-comer and g − alone) have the same concern, viz. highly sensitive and poorly specific or resulting in poor positive predictive value. In addition, we compare the predictive performance of a biomarker classifier between use of direct selection and selection from a candidate pool shedding favorable lights of direct selection approach where biological or mechanistic plausibility can be relied upon. Further research is needed if accurate classifier is required irrespective of prevalence level.

Suggested Citation

  • Sue-Jane Wang & Ming-Chung Li, 2016. "Impacts of Predictive Genomic Classifier Performance on Subpopulation-Specific Treatment Effects Assessment," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 8(1), pages 129-158, June.
  • Handle: RePEc:spr:stabio:v:8:y:2016:i:1:d:10.1007_s12561-013-9092-y
    DOI: 10.1007/s12561-013-9092-y

    Download full text from publisher

    File URL:
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    As the access to this document is restricted, you may want to search for a different version of it.


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stabio:v:8:y:2016:i:1:d:10.1007_s12561-013-9092-y. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Sonal Shukla) or (Rebekah McClure). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.