IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2301.01212.html
   My bibliography  Save this paper

Assessment of creditworthiness models privacy-preserving training with synthetic data

Author

Listed:
  • Ricardo Mu~noz-Cancino
  • Cristi'an Bravo
  • Sebasti'an A. R'ios
  • Manuel Gra~na

Abstract

Credit scoring models are the primary instrument used by financial institutions to manage credit risk. The scarcity of research on behavioral scoring is due to the difficult data access. Financial institutions have to maintain the privacy and security of borrowers' information refrain them from collaborating in research initiatives. In this work, we present a methodology that allows us to evaluate the performance of models trained with synthetic data when they are applied to real-world data. Our results show that synthetic data quality is increasingly poor when the number of attributes increases. However, creditworthiness assessment models trained with synthetic data show a reduction of 3\% of AUC and 6\% of KS when compared with models trained with real data. These results have a significant impact since they encourage credit risk investigation from synthetic data, making it possible to maintain borrowers' privacy and to address problems that until now have been hampered by the availability of information.

Suggested Citation

  • Ricardo Mu~noz-Cancino & Cristi'an Bravo & Sebasti'an A. R'ios & Manuel Gra~na, 2022. "Assessment of creditworthiness models privacy-preserving training with synthetic data," Papers 2301.01212, arXiv.org.
  • Handle: RePEc:arx:papers:2301.01212
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2301.01212
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kwanda Sydwell Ngwenduna & Rendani Mbuvha, 2021. "Alleviating Class Imbalance in Actuarial Applications Using Generative Adversarial Networks," Risks, MDPI, vol. 9(3), pages 1-33, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Szymon Kubiak & Tillman Weyde & Oleksandr Galkin & Dan Philps & Ram Gopal, 2023. "Improved Data Generation for Enhanced Asset Allocation: A Synthetic Dataset Approach for the Fixed Income Universe," Papers 2311.16004, arXiv.org.
    2. Solveig Flaig & Gero Junike, 2022. "Scenario Generation for Market Risk Models Using Generative Neural Networks," Risks, MDPI, vol. 10(11), pages 1-28, October.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2301.01212. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.