IDEAS home Printed from https://ideas.repec.org/a/bla/jorssc/v63y2014i4p657-672.html
   My bibliography  Save this article

Variable and threshold selection to control predictive accuracy in logistic regression

Author

Listed:
  • Anthony Y. C. Kuk
  • Jialiang Li
  • A. John Rush

Abstract

type="main" xml:id="rssc12058-abs-0001"> Using data collected from the ‘Sequenced treatment alternatives to relieve depression’ study, we use logistic regression to predict whether a patient will respond to treatment on the basis of early symptom change and patient characteristics. Model selection criteria such as the Akaike information criterion AIC and mean-squared-error of prediction MSEP may not be appropriate if the aim is to predict with a high degree of certainty who will respond or not respond to treatment. Towards this aim, we generalize the definition of the positive and negative predictive value curves to the case of multiple predictors. We point out that it is the ordering rather than the precise values of the response probabilities which is important, and we arrive at a unified approach to model selection via two-sample rank tests. To avoid overfitting, we define a cross-validated version of the positive and negative predictive value curves and compare these curves after smoothing for various models. When applied to the study data, we obtain a ranking of models that differs from those based on AIC and MSEP, as well as a tree-based method and regularized logistic regression using a lasso penalty. Our selected model performs consistently well for both 4-week-ahead and 7-week-ahead predictions.

Suggested Citation

  • Anthony Y. C. Kuk & Jialiang Li & A. John Rush, 2014. "Variable and threshold selection to control predictive accuracy in logistic regression," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 63(4), pages 657-672, August.
  • Handle: RePEc:bla:jorssc:v:63:y:2014:i:4:p:657-672
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1111/rssc.2014.63.issue-4
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Alessandra Amendola & Francesco Giordano & Maria Lucia Parrella & Marialuisa Restaino, 2017. "Variable selection in high‐dimensional regression: a nonparametric procedure for business failure prediction," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 33(4), pages 355-368, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssc:v:63:y:2014:i:4:p:657-672. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.