IDEAS home Printed from https://ideas.repec.org/p/bos/wpaper/wp2013-026.html
   My bibliography  Save this paper

K-Fold Cross-Validation is Superior to Split Sample Validation for Risk Adjustment Models

Author

Listed:
  • Randall P. Ellis

    (Boston University)

  • Pooja G. Mookim

Abstract

This paper examines cross-validation techniques, with a particular focus on assessing thepredictive validity of risk adjustment models as commonly estimated. We validate that K-Fold cross-validation is more efficient than a 50-50 split sample and illustrate that overfitting with rich risk adjustment models remains meaningful even in samples of a million observations. A new estimation algorithm is described that efficiently calculates K-Fold cross-validated R-squared and other measures of goodness of fit using only three (XXX verify) passes through the data, and hence can be applied easily on sample sizes in the millions without sorting or relying on repeated split-sample techniques. Analysis of K-fold cross-validation results using a large claims dataset is used to calculate the standard deviation and bias of fitted R-squares for different models and sample sizes, which have a larger bias in moderately large sample sizes than most researchers would realize. Programs that implement the algorithm in SAS and STATA are presented that can be easily used on any sample.

Suggested Citation

  • Randall P. Ellis & Pooja G. Mookim, 2013. "K-Fold Cross-Validation is Superior to Split Sample Validation for Risk Adjustment Models," Boston University - Department of Economics - Working Papers Series wp2013-026, Boston University - Department of Economics.
  • Handle: RePEc:bos:wpaper:wp2013-026
    as

    Download full text from publisher

    File URL: http://www.bu.edu/econ/files/2016/01/Ellis_Mookim_R2paper_20130605.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Manning, Willard G, et al, 1987. "Health Insurance and the Demand for Medical Care: Evidence from a Randomized Experiment," American Economic Review, American Economic Association, vol. 77(3), pages 251-277, June.
    2. Mullahy, John, 1998. "Much ado about two: reconsidering retransformation and the two-part model in health econometrics," Journal of Health Economics, Elsevier, vol. 17(3), pages 247-281, June.
    3. John Mullahy, 1998. "Much Ado About Two: Reconsidering Retransformation and the Two-Part Model in Health Economics," NBER Technical Working Papers 0228, National Bureau of Economic Research, Inc.
    4. Randall P. Ellis & Denzil G. Fiebig & Meliyanni Johar & Glenn Jones & Elizabeth Savage, 2013. "Explaining Health Care Expenditure Variation: Large‐Sample Evidence Using Linked Survey And Health Administrative Data," Health Economics, John Wiley & Sons, Ltd., vol. 22(9), pages 1093-1110, September.
    5. Manning, Willard G. & Basu, Anirban & Mullahy, John, 2005. "Generalized modeling approaches to risk adjustment of skewed outcomes data," Journal of Health Economics, Elsevier, vol. 24(3), pages 465-488, May.
    6. Ellis, Randall P. & McGuire, Thomas G., 2007. "Predictability and predictiveness in health care spending," Journal of Health Economics, Elsevier, vol. 26(1), pages 25-48, January.
    7. Van de ven, Wynand P.M.M. & Ellis, Randall P., 2000. "Risk adjustment in competitive health plan markets," Handbook of Health Economics, in: A. J. Culyer & J. P. Newhouse (ed.), Handbook of Health Economics, edition 1, volume 1, chapter 14, pages 755-845, Elsevier.
    8. Buntin, Melinda Beeuwkes & Zaslavsky, Alan M., 2004. "Too much ado about two-part models and transformation?: Comparing methods of modeling Medicare expenditures," Journal of Health Economics, Elsevier, vol. 23(3), pages 525-542, May.
    9. Randall P. Ellis & Shenyi Jiang & Tzu-Chun Kuo, 2013. "Does service-level spending show evidence of selection across health plan types?," Applied Economics, Taylor & Francis Journals, vol. 45(13), pages 1701-1712, May.
    10. Office of Health Economics, 2007. "The Economics of Health Care," For School 001490, Office of Health Economics.
    11. Duan, Naihua, et al, 1983. "A Comparison of Alternative Models for the Demand for Medical Care," Journal of Business & Economic Statistics, American Statistical Association, vol. 1(2), pages 115-126, April.
    12. Manning, Willard G. & Mullahy, John, 2001. "Estimating log models: to transform or not to transform?," Journal of Health Economics, Elsevier, vol. 20(4), pages 461-494, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jones, A.M, 2010. "Models For Health Care," Health, Econometrics and Data Group (HEDG) Working Papers 10/01, HEDG, c/o Department of Economics, University of York.
    2. Keane, Michael & Stavrunova, Olena, 2016. "Adverse selection, moral hazard and the demand for Medigap insurance," Journal of Econometrics, Elsevier, vol. 190(1), pages 62-78.
    3. Toni Mora & Joan Gil & Antoni Sicras-Mainar, 2015. "The influence of obesity and overweight on medical costs: a panel data perspective," The European Journal of Health Economics, Springer;Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ), vol. 16(2), pages 161-173, March.
    4. Dunn, Abe, 2016. "Health insurance and the demand for medical care: Instrumental variable estimates using health insurer claims data," Journal of Health Economics, Elsevier, vol. 48(C), pages 74-88.
    5. Kurt Lavetti & Thomas DeLeire & Nicolas R. Ziebarth, 2023. "How do low‐income enrollees in the Affordable Care Act marketplaces respond to cost‐sharing?," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 90(1), pages 155-183, March.
    6. Amir Marashi & Shima Ghassem Pour & Vincy Li & Chris Rissel & Federico Girosi, 2019. "The association between physical activity and hospital payments for acute admissions in the Australian population aged 45 and over," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-16, June.
    7. Brilleman, Samuel L. & Gravelle, Hugh & Hollinghurst, Sandra & Purdy, Sarah & Salisbury, Chris & Windmeijer, Frank, 2014. "Keep it simple? Predicting primary health care costs with clinical morbidity measures," Journal of Health Economics, Elsevier, vol. 35(C), pages 109-122.
    8. Jones, A. & Lomas, J. & Rice, N., 2014. "Going Beyond the Mean in Healthcare Cost Regressions: a Comparison of Methods for Estimating the Full Conditional Distribution," Health, Econometrics and Data Group (HEDG) Working Papers 14/26, HEDG, c/o Department of Economics, University of York.
    9. Yi Yao & Joan Schmit & Julie Shi, 2019. "Promoting sustainability for micro health insurance: a risk-adjusted subsidy approach for maternal healthcare service," The Geneva Papers on Risk and Insurance - Issues and Practice, Palgrave Macmillan;The Geneva Association, vol. 44(3), pages 382-409, July.
    10. Samuel L Brilleman & Hugh Gravelle & Sandra Hollinghurst & Sarah Purdy & Chris Salisbury & Frank Windmeijer, 2011. "Keep it Simple? Predicting Primary Health Care Costs with Measures of Morbidity and Multimorbidity," Working Papers 072cherp, Centre for Health Economics, University of York.
    11. Borislava Mihaylova & Andrew Briggs & Anthony O'Hagan & Simon G. Thompson, 2011. "Review of statistical methods for analysing healthcare resources and costs," Health Economics, John Wiley & Sons, Ltd., vol. 20(8), pages 897-916, August.
    12. Julie Shi & Yi Yao & Gordon Liu, 2018. "Modeling individual health care expenditures in China: Evidence to assist payment reform in public insurance," Health Economics, John Wiley & Sons, Ltd., vol. 27(12), pages 1945-1962, December.
    13. Keane, Michael & Stavrunova, Olena, 2016. "Adverse selection, moral hazard and the demand for Medigap insurance," Journal of Econometrics, Elsevier, vol. 190(1), pages 62-78.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bos:wpaper:wp2013-026. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Program Coordinator (email available below). General contact details of provider: https://edirc.repec.org/data/decbuus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.