IDEAS home Printed from https://ideas.repec.org/a/prg/jnlaop/v2015y2015i3id472p3-17.html
   My bibliography  Save this article

Data representativeness problem in credit scoring

Author

Listed:
  • Josef Ditrich

Abstract

When building models, it is common to split the whole dataset into a development and a validation sample. In some cases, using random sampling instead of stratified sampling can lead to loss of representativeness of final samples. In such cases, a model built on these data gives different or unexpected results when its performance is measured on the validation sample. In the business area, a lack of representativeness can cause interpretative problems and can have a huge financial impact when a biased model is involved in the credit granting process. The aim of this paper is to examine and understand why representativeness should be checked before the start of modelling. The paper deals with methods of identification of selection bias in time. It recommends using three tests as a common part of the data preparation process.

Suggested Citation

  • Josef Ditrich, 2015. "Data representativeness problem in credit scoring," Acta Oeconomica Pragensia, Prague University of Economics and Business, vol. 2015(3), pages 3-17.
  • Handle: RePEc:prg:jnlaop:v:2015:y:2015:i:3:id:472:p:3-17
    DOI: 10.18267/j.aop.472
    as

    Download full text from publisher

    File URL: http://aop.vse.cz/doi/10.18267/j.aop.472.html
    Download Restriction: free of charge

    File URL: http://aop.vse.cz/doi/10.18267/j.aop.472.pdf
    Download Restriction: free of charge

    File URL: https://libkey.io/10.18267/j.aop.472?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Donald W. K. Andrews, 2000. "Inconsistency of the Bootstrap when a Parameter Is on the Boundary of the Parameter Space," Econometrica, Econometric Society, vol. 68(2), pages 399-406, March.
    2. Xiao-Li Meng & Xianchao Xie, 2014. "I Got More Data, My Model is More Refined, but My Estimator is Getting Worse! Am I Just Dumb?," Econometric Reviews, Taylor & Francis Journals, vol. 33(1-4), pages 218-250, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Young-Joo Kim & Myung Hwan Seo, 2017. "Is There a Jump in the Transition?," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 35(2), pages 241-249, April.
    2. Khalaf, Lynda & Saphores, Jean-Daniel & Bilodeau, Jean-Francois, 2003. "Simulation-based exact jump tests in models with conditional heteroskedasticity," Journal of Economic Dynamics and Control, Elsevier, vol. 28(3), pages 531-553, December.
    3. Dufour, Jean-Marie, 2006. "Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics," Journal of Econometrics, Elsevier, vol. 133(2), pages 443-477, August.
    4. Jean-Thomas Bernard & Ba Chu & Lynda Khalaf & Marcel Voia, 2019. "Non-Standard Confidence Sets for Ratios and Tipping Points with Applications to Dynamic Panel Data," Annals of Economics and Statistics, GENES, issue 134, pages 79-108.
    5. Iglesias Emma M., 2011. "Constrained k-class Estimators in the Presence of Weak Instruments," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 15(4), pages 1-13, September.
    6. Jiang, Feiyu & Li, Dong & Zhu, Ke, 2020. "Non-standard inference for augmented double autoregressive models with null volatility coefficients," Journal of Econometrics, Elsevier, vol. 215(1), pages 165-183.
    7. Greg Hannsgen, 2011. "Infinite-variance, Alpha-stable Shocks in Monetary SVAR: Final Working Paper Version," Economics Working Paper Archive wp_682, Levy Economics Institute.
    8. Chunlin Wang & Paul Marriott & Pengfei Li, 2022. "A note on the coverage behaviour of bootstrap percentile confidence intervals for constrained parameters," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 85(7), pages 809-831, October.
    9. Ekaterina Oparina & Sorawoot Srisuma, 2022. "Analyzing Subjective Well-Being Data with Misclassification," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 40(2), pages 730-743, April.
    10. Matthew Reimherr & Xiao‐Li Meng & Dan L. Nicolae, 2021. "Prior sample size extensions for assessing prior impact and prior‐likelihood discordance," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(3), pages 413-437, July.
    11. Sauer, J., 2007. "Monotonicity and Curvature – A Bootstrapping Approach," Proceedings “Schriften der Gesellschaft für Wirtschafts- und Sozialwissenschaften des Landbaues e.V.”, German Association of Agricultural Economists (GEWISOLA), vol. 42, March.
    12. Michael Jansson & Demian Pouzo, 2017. "Towards a General Large Sample Theory for Regularized Estimators," Papers 1712.07248, arXiv.org, revised Jul 2020.
    13. Boswijk, H. Peter & Cavaliere, Giuseppe & Georgiev, Iliyan & Rahbek, Anders, 2021. "Bootstrapping non-stationary stochastic volatility," Journal of Econometrics, Elsevier, vol. 224(1), pages 161-180.
    14. Ian W. McKeague & Min Qian, 2015. "An Adaptive Resampling Test for Detecting the Presence of Significant Predictors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1422-1433, December.
    15. Hounyo, Ulrich & Varneskov, Rasmus T., 2017. "A local stable bootstrap for power variations of pure-jump semimartingales and activity index estimation," Journal of Econometrics, Elsevier, vol. 198(1), pages 10-28.
    16. Andrews, Donald W.K. & Guggenberger, Patrik, 2009. "Validity Of Subsampling And “Plug-In Asymptotic” Inference For Parameters Defined By Moment Inequalities," Econometric Theory, Cambridge University Press, vol. 25(3), pages 669-709, June.
    17. Berkowitz, Daniel & Caner, Mehmet & Fang, Ying, 2008. "Are "Nearly Exogenous Instruments" reliable?," Economics Letters, Elsevier, vol. 101(1), pages 20-23, October.
    18. Frazis, Harley & Loewenstein, Mark A., 2003. "Estimating linear regressions with mismeasured, possibly endogenous, binary explanatory variables," Journal of Econometrics, Elsevier, vol. 117(1), pages 151-178, November.
    19. Cherchye, Laurens & Demuynck, Thomas & Rock, Bram De, 2019. "Bounding counterfactual demand with unobserved heterogeneity and endogenous expenditures," Journal of Econometrics, Elsevier, vol. 211(2), pages 483-506.
    20. Lionel Truquet, 2017. "Parameter stability and semiparametric inference in time varying auto-regressive conditional heteroscedasticity models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(5), pages 1391-1414, November.

    More about this item

    Keywords

    credit scoring; credit risk models; selection bias; random sampling; stratified sampling; data splitting;
    All these keywords.

    JEL classification:

    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C80 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - General
    • C83 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Survey Methods; Sampling Methods

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:prg:jnlaop:v:2015:y:2015:i:3:id:472:p:3-17. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Stanislav Vojir (email available below). General contact details of provider: https://edirc.repec.org/data/uevsecz.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.