IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v34y2022i1p503-521.html
   My bibliography  Save this article

Improving Reliability Estimation for Individual Numeric Predictions: A Machine Learning Approach

Author

Listed:
  • Gediminas Adomavicius

    (Department of Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455)

  • Yaqiong Wang

    (Information Systems & Analytics Department, Leavey School of Business, Santa Clara University, Santa Clara, California 95050)

Abstract

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.

Suggested Citation

  • Gediminas Adomavicius & Yaqiong Wang, 2022. "Improving Reliability Estimation for Individual Numeric Predictions: A Machine Learning Approach," INFORMS Journal on Computing, INFORMS, vol. 34(1), pages 503-521, January.
  • Handle: RePEc:inm:orijoc:v:34:y:2022:i:1:p:503-521
    DOI: 10.1287/ijoc.2020.1019
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijoc.2020.1019
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2020.1019?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sebastian Briesemeister & Jörg Rahnenführer & Oliver Kohlbacher, 2012. "No Longer Confidential: Estimating the Confidence of Individual Regression Predictions," PLOS ONE, Public Library of Science, vol. 7(11), pages 1-9, November.
    2. Bradley Efron, 2004. "The Estimation of Prediction Error: Covariance Penalties and Cross-Validation," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 619-632, January.
    3. David J. Hand & Keming Yu, 2001. "Idiot's Bayes—Not So Stupid After All?," International Statistical Review, International Statistical Institute, vol. 69(3), pages 385-398, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Theo Dijkstra, 2014. "Ridge regression and its degrees of freedom," Quality & Quantity: International Journal of Methodology, Springer, vol. 48(6), pages 3185-3193, November.
    2. Sieds, 2012. "Complete Volume LXVI n.1 2012," RIEDS - Rivista Italiana di Economia, Demografia e Statistica - The Italian Journal of Economic, Demographic and Statistical Studies, SIEDS Societa' Italiana di Economia Demografia e Statistica, vol. 66(1), pages 1-296.
    3. DE CNUDDE, Sofie & MARTENS, David & EVGENIOU, Theodoros & PROVOST, Foster, 2017. "A benchmarking study of classification techniques for behavioral data," Working Papers 2017005, University of Antwerp, Faculty of Business and Economics.
    4. Stefano Marchetti & Maciej Beręsewicz & Nicola Salvati & Marcin Szymkowiak & Łukasz Wawrowski, 2018. "The use of a three‐level M‐quantile model to map poverty at local administrative unit 1 in Poland," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 1077-1104, October.
    5. Hettihewa, Samanthala & Saha, Shrabani & Zhang, Hanxiong, 2018. "Does an aging population influence stock markets? Evidence from New Zealand," Economic Modelling, Elsevier, vol. 75(C), pages 142-158.
    6. Sascha O. Becker & Luigi Pascali, 2019. "Religion, Division of Labor, and Conflict: Anti-semitism in Germany over 600 Years," American Economic Review, American Economic Association, vol. 109(5), pages 1764-1804, May.
    7. Rajeev D S Raizada & Yune-Sang Lee, 2013. "Smoothness without Smoothing: Why Gaussian Naive Bayes Is Not Naive for Multi-Subject Searchlight Studies," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-10, July.
    8. Mendez, Guillermo & Lohr, Sharon, 2011. "Estimating residual variance in random forest regression," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2937-2950, November.
    9. Yanagihara, Hirokazu & Satoh, Kenichi, 2010. "An unbiased Cp criterion for multivariate ridge regression," Journal of Multivariate Analysis, Elsevier, vol. 101(5), pages 1226-1238, May.
    10. Yongli Zhang & Xiaotong Shen, 2015. "Adaptive Modeling Procedure Selection by Data Perturbation," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 33(4), pages 541-551, October.
    11. Aletti, Giacomo, 2018. "Generation of discrete random variables in scalable frameworks," Statistics & Probability Letters, Elsevier, vol. 132(C), pages 99-106.
    12. Zhang, Xinyu & Yu, Jihai, 2018. "Spatial weights matrix selection and model averaging for spatial autoregressive models," Journal of Econometrics, Elsevier, vol. 203(1), pages 1-18.
    13. Brighton, Henry, 2020. "Statistical foundations of ecological rationality," Economics - The Open-Access, Open-Assessment E-Journal (2007-2020), Kiel Institute for the World Economy (IfW Kiel), vol. 14, pages 1-32.
    14. Chunming Zhang, 2008. "Prediction Error Estimation Under Bregman Divergence for Non‐Parametric Regression and Classification," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 35(3), pages 496-523, September.
    15. Jonathan Bradley & Noel Cressie & Tao Shi, 2015. "Comparing and selecting spatial predictors using local criteria," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(1), pages 1-28, March.
    16. Giessing, Alexander & He, Xuming, 2019. "On the predictive risk in misspecified quantile regression," Journal of Econometrics, Elsevier, vol. 213(1), pages 235-260.
    17. José Luis Preciado Arreola & Daisuke Yagi & Andrew L. Johnson, 2020. "Insights from machine learning for evaluating production function estimators on manufacturing survey data," Journal of Productivity Analysis, Springer, vol. 53(2), pages 181-225, April.
    18. James Younker, 2022. "Calculating Effective Degrees of Freedom for Forecast Combinations and Ensemble Models," Discussion Papers 2022-19, Bank of Canada.
    19. Wang, You-Gan & Hin, Lin-Yee, 2010. "Modeling strategies in longitudinal data analysis: Covariate, variance function and correlation structure selection," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3359-3370, December.
    20. Zhang, Bo & Shen, Xiaotong & Mumford, Sunni L., 2012. "Generalized degrees of freedom and adaptive model selection in linear mixed-effects models," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 574-586.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:34:y:2022:i:1:p:503-521. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.