Author
Listed:
- Chih‐Fong Tsai
- Wei‐Chao Lin
- Yi‐Hsien Chen
Abstract
In financial distress prediction (FDP), it is very important to ensure the quality of the data for developing effective prediction models. Related studies often apply feature selection to filter out some unrepresentative features from a set of financial ratios, or data re‐sampling to re‐balance class imbalanced FDP training sets. Although these two types of data pre‐processing methods have been demonstrated their effectiveness, they have not often been applied at the same time to develop FDP models. Moreover, the performances of various feature selection algorithms, which can be divided into filter, wrapper, and embedded methods, and data re‐sampling algorithms, which can be divided into under‐sampling, over‐sampling, and hybrid sampling methods, have not been fully investigated in FDP. Therefore, in this study several feature selection and data re‐sampling methods, which are employed alone and in combination by different orders are compared. The experimental results based on nine FDP datasets show that executing data re‐sampling alone always outperforms executing feature selection alone to develop FDP models, in which hybrid sampling is the better choice. In most cases, better prediction performances can be obtained by performing feature selection first and data re‐sampling second. The best combined algorithms are based on the decision tree method for feature selection and Synthetic Minority Over‐sampling Technique‐Edited Nearest Neighbors (SMOTE‐ENN) for hybrid sampling. This combination allows the random forest classifier to produce the highest rate of prediction accuracy. On the other hand, for the Type I error, where crisis cases are misclassified into the non‐crisis class, the lowest error rate is produced by executing under‐sampling alone using the ClusterCentroids algorithm combined with the random forest classifier.
Suggested Citation
Chih‐Fong Tsai & Wei‐Chao Lin & Yi‐Hsien Chen, 2025.
"Data Quality Improvement for Financial Distress Prediction: Feature Selection, Data Re‐Sampling, and Their Combinations in Different Orders,"
Journal of Forecasting, John Wiley & Sons, Ltd., vol. 44(7), pages 2205-2229, November.
Handle:
RePEc:wly:jforec:v:44:y:2025:i:7:p:2205-2229
DOI: 10.1002/for.70002
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:jforec:v:44:y:2025:i:7:p:2205-2229. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www3.interscience.wiley.com/cgi-bin/jhome/2966 .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.