Training and assessing classification rules with unbalanced data
AbstractThe problem of modeling binary responses by using cross-sectional data has been addressed with a number of satisfying solutions that draw on both parametric and nonparametric methods. However, there exist many real situations where one of the two responses (usually the most interesting for the analysis) is rare. It has been largely reported that this class imbalance heavily compromises the process of learning, because the model tends to focus on the prevalent class and to ignore the rare events. However, not only the estimation of the classification model is affected by a skewed distribution of the classes, but also the evaluation of its accuracy is jeopardized, because the scarcity of data leads to poor estimates of the model's accuracy. In this work, the effects of class imbalance on model training and model assessing are discussed. Moreover, a unified and systematic framework for dealing with both the problems is proposed, based on a smoothed bootstrap re-sampling technique.
Download InfoIf you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
Bibliographic InfoPaper provided by DEAMS - Dipartimento di Scienze Economiche, Aziendali, Matematiche e Statistiche "Bruno de Finetti" in its series Working Papers DEAMS with number 2.
Date of creation: Jan 2010
Date of revision:
accuracy; binary classification; bootstrap; kernel density estimation; unbalanced learning;
You can help add them by filling out this form.
reading list or among the top items on IDEAS.Access and download statisticsgeneral information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Gianni Perini).
If references are entirely missing, you can add them using this form.