Author
Listed:
- Veronika Labosova
(The Faculty of Operation and Economics of Transport and Communications, University of Zilina, Univerzitna 1, SK-01026 Zilina, Slovakia)
- Lucia Duricova
(The Faculty of Operation and Economics of Transport and Communications, University of Zilina, Univerzitna 1, SK-01026 Zilina, Slovakia)
- Katarina Kramarova
(The Faculty of Operation and Economics of Transport and Communications, University of Zilina, Univerzitna 1, SK-01026 Zilina, Slovakia)
- Marek Durica
(The Faculty of Operation and Economics of Transport and Communications, University of Zilina, Univerzitna 1, SK-01026 Zilina, Slovakia)
Abstract
Financial distress prediction remains a central topic in corporate finance and risk management, with extensive research devoted to improving classification accuracy through increasingly sophisticated statistical and machine learning techniques. Nevertheless, the influence of data preparation on predictive performance has received comparatively less systematic attention. This study examines how an economically grounded data-preparation process affects the predictive performance of selected statistical and machine-learning models dedicated to predicting corporate financial distress. Using the chosen financial ratios, generally accepted indicators of corporate financial stability and economic performance, financial distress models are estimated on both raw, unprocessed input data and pre-processed data involving the exclusion of economically implausible accounting values, treatment of missing observations, and class balancing. In light of the above, the study adopts a structured methodological approach to assess the predictive performance of selected classification models, namely decision tree algorithms (CART, CHAID, and C5.0), artificial neural networks (ANNs), logistic regression (LR), and linear discriminant analysis (DA), using confusion-matrix–based evaluation and a comprehensive set of evaluation measures. The results suggest that the process of input data preparation is a critical factor, significantly improving the predictive performance of financial distress prediction models across most modelling techniques employed. The most pronounced gains are observed in decision tree models. ANNs also demonstrate marked improvement after input data preparation, whereas LR benefits more moderately, and linear DA remains limited despite preprocessing. The average gain in accuracy across all six modelling techniques, calculated as the difference between pre-processed and raw performance for each method and averaged across methods, was approximately 15.6 percentage points, with specificity improving by approximately 26.9 percentage points on average, amounting to roughly half the performance variation attributable to algorithm choice, which underscores that data preparation is a primary determinant of model reliability alongside algorithm selection. A step-level detailed analysis further shows that missing value imputation is the dominant driver of improvement for tree-based models, while class balancing contributes most for ANNs and logistic regression. The findings highlight that reliable financial distress prediction depends not only on technique selection but also on the consistency and economic plausibility of the input data, underscoring the central role of structured data preparation in developing robust early-warning models.
Suggested Citation
Veronika Labosova & Lucia Duricova & Katarina Kramarova & Marek Durica, 2026.
"Garbage In, Garbage Out? The Impact of Data Quality on the Performance of Financial Distress Prediction Models,"
Forecasting, MDPI, vol. 8(3), pages 1-42, April.
Handle:
RePEc:gam:jforec:v:8:y:2026:i:3:p:35-:d:1925893
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jforec:v:8:y:2026:i:3:p:35-:d:1925893. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.