IDEAS home Printed from https://ideas.repec.org/a/inm/orijds/v4y2025i1p85-99.html

A Reduced Modeling Approach for Making Predictions with Incomplete Data Having Blockwise Missing Patterns

Author

Listed:
  • Karthik Srinivasan

    (School of Business, University of Kansas, Lawrence, Kansas 66045)

  • Faiz Currim

    (Department of MIS, Eller College of Management, University of Arizona, Tucson, Arizona 85721)

  • Sudha Ram

    (Department of MIS, Eller College of Management, University of Arizona, Tucson, Arizona 85721)

Abstract

Incomplete data with blockwise missing patterns are commonly encountered in analytics, and solutions typically entail listwise deletion or imputation. However, as the proportion of missing values in input features increases, listwise or columnwise deletion leads to information loss, whereas imputation diminishes the integrity of the training data set. We present the blockwise reduced modeling (BRM) method for analyzing blockwise missing patterns, which adapts and improves on the notion of reduced modeling proposed by Friedman, Kohavi, and Yun in 1996 as lazy decision trees. In contrast to the original idea of reduced modeling of delaying model induction until a prediction is required, our method is significantly faster because it exploits the blockwise missing patterns to pretrain ensemble models that require minimum imputation of data. Models are pretrained over the overlapping subsets of an incomplete data set that contain only populated values. During prediction, each test instance is mapped to one of these models based on its feature-missing pattern. BRM can be applied to any supervised learning model for tabular data. We benchmark the predictive performance of BRM using simulations of blockwise missing patterns on three complete data sets from public repositories. Thereafter, we evaluate its utility on three data sets with actual blockwise missing patterns. We demonstrate that BRM is superior to most existing benchmarks in terms of predictive performance for linear and nonlinear models. It also scales well and is more reliable than existing benchmarks for making predictions with blockwise missing pattern data.

Suggested Citation

  • Karthik Srinivasan & Faiz Currim & Sudha Ram, 2025. "A Reduced Modeling Approach for Making Predictions with Incomplete Data Having Blockwise Missing Patterns," INFORMS Joural on Data Science, INFORMS, vol. 4(1), pages 85-99, January.
  • Handle: RePEc:inm:orijds:v:4:y:2025:i:1:p:85-99
    DOI: 10.1287/ijds.2022.9016
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijds.2022.9016
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijds.2022.9016?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    2. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    3. Daniel W. Apley & Jingyu Zhu, 2020. "Visualizing the effects of predictor variables in black box supervised learning models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(4), pages 1059-1086, September.
    4. Joshua A. Salomon & Alex Reinhart & Alyssa Bilinski & Eu Jing Chua & Wichada La Motte-Kerr & Minttu M. Rönn & Marissa B. Reitsma & Katherine A. Morris & Sarah LaRocca & Tamer H. Farag & Frauke Kreuter, 2021. "The US COVID-19 Trends and Impact Survey: Continuous real-time measurement of COVID-19 symptoms, risks, protective behaviors, testing, and vaccination," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 118(51), pages 2111454118-, December.
    5. Guan Yu & Quefeng Li & Dinggang Shen & Yufeng Liu, 2020. "Optimal Sparse Linear Prediction for Block-missing Multi-modality Data Without Imputation," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(531), pages 1406-1419, July.
    6. Hansen, Bruce E. & Racine, Jeffrey S., 2012. "Jackknife model averaging," Journal of Econometrics, Elsevier, vol. 167(1), pages 38-46.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Henry Webel & Lili Niu & Annelaura Bach Nielsen & Marie Locard-Paulet & Matthias Mann & Lars Juhl Jensen & Simon Rasmussen, 2024. "Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    2. Nicholas Tierney & Dianne Cook, 2018. "Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations," Monash Econometrics and Business Statistics Working Papers 14/18, Monash University, Department of Econometrics and Business Statistics.
    3. Nengsih Titin Agustin & Bertrand Frédéric & Maumy-Bertrand Myriam & Meyer Nicolas, 2019. "Determining the number of components in PLS regression on incomplete data set," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(6), pages 1-28, December.
    4. Adel Bosch & Steven F. Koch, 2021. "Individual and Household Debt: Does Imputation Choice Matter?," Working Papers 202141, University of Pretoria, Department of Economics.
    5. Matthias Templ, 2023. "Enhancing Precision in Large-Scale Data Analysis: An Innovative Robust Imputation Algorithm for Managing Outliers and Missing Values," Mathematics, MDPI, vol. 11(12), pages 1-22, June.
    6. Jacob D. Gardner & Joanna Baker & Chris Venditti & Chris L. Organ, 2025. "Phylogenetically informed predictions outperform predictive equations in real and simulated data," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    7. Maria Lucia Parrella & Giuseppina Albano & Michele La Rocca & Cira Perna, 2019. "Reconstructing missing data sequences in multivariate time series: an application to environmental data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(2), pages 359-383, June.
    8. Downing, Nicholas Joseph, 2025. "Missing value imputation in environmental, social, and governance data: an impact on emissions scores," Finance Research Letters, Elsevier, vol. 85(PA).
    9. Wan-Lun Wang & Victor Hugo Lachos & Yu-Chien Chen & Tsung-I Lin, 2025. "Flexible clustering via Gaussian parsimonious mixture models with censored and missing values," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 34(2), pages 431-458, June.
    10. Steeven Ye & David Attali & Maria Ghazi & Arnaud Cachia & Mathieu Cassotti & Grégoire Borst, 2026. "Systematic review and meta-analysis of the evidence for an illusory truth effect and its determinants," Nature Communications, Nature, vol. 17(1), pages 1-16, December.
    11. Parashmoni Borah & Suhasini Hazarika & Amit Prakash, 2022. "Assessing the state of homogeneity, variability and trends in the rainfall time series from 1969 to 2017 and its significance for groundwater in north-east India," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 111(1), pages 585-617, March.
    12. Noémi Kreif & Richard Grieve & Iván Díaz & David Harrison, 2015. "Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury," Health Economics, John Wiley & Sons, Ltd., vol. 24(9), pages 1213-1228, September.
    13. Abhilash Bandam & Eedris Busari & Chloi Syranidou & Jochen Linssen & Detlef Stolten, 2022. "Classification of Building Types in Germany: A Data-Driven Modeling Approach," Data, MDPI, vol. 7(4), pages 1-23, April.
    14. Peng, Qiao & McKillop, Donal & Quinn, Barry & Liu, Kailong, 2025. "Modeling and predicting failure in US credit unions," International Journal of Forecasting, Elsevier, vol. 41(3), pages 1237-1259.
    15. Kitagawa, Toru & Muris, Chris, 2016. "Model averaging in semiparametric estimation of treatment effects," Journal of Econometrics, Elsevier, vol. 193(1), pages 271-289.
    16. Jan R. Magnus & Wendun Wang & Xinyu Zhang, 2016. "Weighted-Average Least Squares Prediction," Econometric Reviews, Taylor & Francis Journals, vol. 35(6), pages 1040-1074, June.
    17. Boonstra Philip S. & Little Roderick J.A. & West Brady T. & Andridge Rebecca R. & Alvarado-Leiton Fernanda, 2021. "A Simulation Study of Diagnostics for Selection Bias," Journal of Official Statistics, Sciendo, vol. 37(3), pages 751-769, September.
    18. Lin Lin & Rachel L Spreng & Kelly E Seaton & S Moses Dennison & Lindsay C Dahora & Daniel J Schuster & Sheetal Sawant & Peter B Gilbert & Youyi Fong & Neville Kisalu & Andrew J Pollard & Georgia D Tom, 2024. "GeM-LR: Discovering predictive biomarkers for small datasets in vaccine studies," PLOS Computational Biology, Public Library of Science, vol. 20(11), pages 1-23, November.
    19. Shi, Pengfei & Zhang, Xinyu & Zhong, Wei, 2024. "Estimating conditional average treatment effects with heteroscedasticity by model averaging and matching," Economics Letters, Elsevier, vol. 238(C).
    20. Ruairi C. Robertson & Thaddeus J. Edens & Lynnea Carr & Kuda Mutasa & Ethan K. Gough & Ceri Evans & Hyun Min Geum & Iman Baharmand & Sandeep K. Gill & Robert Ntozini & Laura E. Smith & Bernard Chasekw, 2023. "The gut microbiome and early-life growth in a population with high prevalence of stunting," Nature Communications, Nature, vol. 14(1), pages 1-15, December.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijds:v:4:y:2025:i:1:p:85-99. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.