IDEAS home Printed from https://ideas.repec.org/p/fip/fedbwp/87410.html
   My bibliography  Save this paper

How Magic a Bullet Is Machine Learning for Credit Analysis? An Exploration with FinTech Lending Data

Author

Listed:
  • Charles B. Perkins
  • J. Christina Wang

Abstract

FinTech online lending to consumers has grown rapidly in the post-crisis era. As argued by its advocates, one key advantage of FinTech lending is that lenders can predict loan outcomes more accurately by employing complex analytical tools, such as machine learning (ML) methods. This study applies ML methods, in particular random forests and stochastic gradient boosting, to loan-level data from the largest FinTech lender of personal loans to assess the extent to which those methods can produce more accurate out-of-sample predictions of default on future loans relative to standard regression models. To explain loan outcomes, this analysis accounts for the economic conditions faced by a borrower after origination, which are typically absent from other ML studies of default. For the given data, the ML methods indeed improve prediction accuracy, but more so over the near horizon than beyond a year. This study then shows that having more data up to, but not beyond, a certain quantity enhances the predictive accuracy of the ML methods relative to that of parametric models. The likely explanation is that there has been data or model drift over time, so that methods that fit more complex models with more data can in fact suffer greater out-of-sample misses. Prediction accuracy rises, but only marginally, with additional standard credit variables beyond the core set, suggesting that unconventional data need to be sufficiently informative as a whole to help consumers with little or no credit history. This study further explores whether the greater functional flexibility of ML methods yields unequal benefit to consumers with different attributes or who reside in locales with varying economic conditions. It finds that the ML methods produce more favorable ratings for different groups of consumers, although those already deemed less risky seem to benefit more on balance.

Suggested Citation

  • Charles B. Perkins & J. Christina Wang, 2019. "How Magic a Bullet Is Machine Learning for Credit Analysis? An Exploration with FinTech Lending Data," Working Papers 19-16, Federal Reserve Bank of Boston.
  • Handle: RePEc:fip:fedbwp:87410
    DOI: 10.29412/res.wp.2019.16
    as

    Download full text from publisher

    File URL: https://www.bostonfed.org/-/media/Documents/Workingpapers/PDF/2019/wp1916.pdf
    Download Restriction: no

    File URL: https://libkey.io/10.29412/res.wp.2019.16?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Rajan, Uday & Seru, Amit & Vig, Vikrant, 2015. "The failure of models that predict failure: Distance, incentives, and defaults," Journal of Financial Economics, Elsevier, vol. 115(2), pages 237-260.
    2. Strobl, Carolin & Boulesteix, Anne-Laure & Augustin, Thomas, 2007. "Unbiased split selection for classification trees based on the Gini Index," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 483-501, September.
    3. J. Christina Wang, 2018. "Technology, the nature of information, and fintech marketplace lending," Current Policy Perspectives 18-3, Federal Reserve Bank of Boston.
    4. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    5. Jushan Bai & Pierre Perron, 1998. "Estimating and Testing Linear Models with Multiple Structural Changes," Econometrica, Econometric Society, vol. 66(1), pages 47-78, January.
    6. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    7. Jeffrey M Wooldridge, 2010. "Econometric Analysis of Cross Section and Panel Data," MIT Press Books, The MIT Press, edition 2, volume 1, number 0262232588, December.
    8. Brian S. Chen & Samuel G. Hanson & Jeremy C. Stein, 2017. "The Decline of Big-Bank Lending to Small Business: Dynamic Impacts on Local Credit and Labor Markets," NBER Working Papers 23843, National Bureau of Economic Research, Inc.
    9. Judit Montoriol-Garriga & J. Christina Wang, 2011. "The Great Recession and bank lending to small businesses," Working Papers 11-16, Federal Reserve Bank of Boston.
    10. Stahlecker, Peter & Trenkler, Gotz, 1993. "Some Further Results on the Use of Proxy Variables in Prediction," The Review of Economics and Statistics, MIT Press, vol. 75(4), pages 707-711, November.
    11. Lin, Yi & Jeon, Yongho, 2006. "Random Forests and Adaptive Nearest Neighbors," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 578-590, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Tom, Daniel M. Ph.D., 2023. "Shapley Values Infidelity," OSF Preprints 2p5d3, Center for Open Science.
    2. Ying Lei Toh, 2023. "Addressing Traditional Credit Scores as a Barrier to Accessing Affordable Credit," Economic Review, Federal Reserve Bank of Kansas City, vol. 0(no. 3), pages 1-22, June.
    3. Buckmann, Marcus & Haldane, Andy & Hüser, Anne-Caroline, 2021. "Comparing minds and machines: implications for financial stability," Bank of England working papers 937, Bank of England.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
    2. Borup, Daniel & Christensen, Bent Jesper & Mühlbach, Nicolaj Søndergaard & Nielsen, Mikkel Slot, 2023. "Targeting predictors in random forest regression," International Journal of Forecasting, Elsevier, vol. 39(2), pages 841-868.
    3. Mark Kattenberg & Bas Scheer & Jurre Thiel, 2023. "Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences," CPB Discussion Paper 452, CPB Netherlands Bureau for Economic Policy Analysis.
    4. Jean-Louis Combes & Rasmane Ouedraogo & Sampawende J.-A. Tapsoba, 2016. "Structural shifts in aid dependency and fiscal policy in developing countries," Applied Economics, Taylor & Francis Journals, vol. 48(46), pages 4426-4446, October.
    5. Jushan Bai & Sung Hoon Choi & Yuan Liao, 2021. "Feasible generalized least squares for panel data with cross-sectional and serial correlations," Empirical Economics, Springer, vol. 60(1), pages 309-326, January.
    6. Miguel Godinho de Matos & Pedro Ferreira & Michael D. Smith, 2018. "The Effect of Subscription Video-on-Demand on Piracy: Evidence from a Household-Level Randomized Experiment," Management Science, INFORMS, vol. 64(12), pages 5610-5630, December.
    7. Chang Cai & Sandy Dall’Erba, 2021. "On the evaluation of heterogeneous climate change impacts on US agriculture: does group membership matter?," Climatic Change, Springer, vol. 167(1), pages 1-23, July.
    8. Ekaterina Oparina & Caspar Kaiser & Niccolo Gentile & Alexandre Tkatchenko & Andrew E. Clark & Jan-Emmanuel De Neve & Conchita D'Ambrosio, 2022. "Human wellbeing and machine learning," CEP Discussion Papers dp1863, Centre for Economic Performance, LSE.
    9. Vaughan Daniel, 2013. "An Analysis of the Process of Disinflationary Structural Change: The Case of Mexico," Working Papers 2013-12, Banco de México.
    10. Augustine Denteh & Helge Liebert, 2022. "Who Increases Emergency Department Use? New Insights from the Oregon Health Insurance Experiment," Papers 2201.07072, arXiv.org, revised Apr 2023.
    11. Patrick Guillaumont & Laurent Wagner, 2012. "Aid and Growth Accelerations: Vulnerability Matters," Post-Print halshs-00692388, HAL.
    12. De Wachter, Stefan & Tzavalis, Elias, 2012. "Detection of structural breaks in linear dynamic panel data models," Computational Statistics & Data Analysis, Elsevier, vol. 56(11), pages 3020-3034.
    13. Olexiy Kyrychenko, 2021. "The Impact of the Crisis-inducted Reduction in Air Pollution on Infant Mortality in India: A Policy Perspective," CERGE-EI Working Papers wp702, The Center for Economic Research and Graduate Education - Economics Institute, Prague.
    14. Verhagen, Mark D., 2023. "Using machine learning to monitor the equity of large-scale policy interventions: The Dutch decentralisation of the Social Domain," SocArXiv qzm7y, Center for Open Science.
    15. Jorge José Luis Reynoso-González. & Adrián De León Arias., 2021. "Crecimiento económico y gasto público en salud según población objetivo en México. (Economic Growth and Public Spending on Health According to Target Population in Mexico)," Ensayos Revista de Economia, Universidad Autonoma de Nuevo Leon, Facultad de Economia, vol. 0(1), pages 89-114, May.
    16. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    17. Richard Bluhm & Denis de Crombrugghe & Adam Szirmai, 0. "Do Weak Institutions Prolong Crises? On the Identification, Characteristics, and Duration of Declines during Economic Slumps," The World Bank Economic Review, World Bank, vol. 34(3), pages 810-832.
    18. Uguccioni, James, 2022. "The long-run effects of parental unemployment in childhood," CLEF Working Paper Series 45, Canadian Labour Economics Forum (CLEF), University of Waterloo.
    19. Cem Işık & Ercan Sirakaya-Turk & Serdar Ongan, 2020. "Testing the efficacy of the economic policy uncertainty index on tourism demand in USMCA: Theory and evidence," Tourism Economics, , vol. 26(8), pages 1344-1357, December.
    20. Carlos Serrano-Cinca & Begoña Gutiérrez-Nieto & Luz López-Palacios, 2015. "Determinants of Default in P2P Lending," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-22, October.

    More about this item

    Keywords

    FinTech/marketplace lending; supervised machine learning; default prediction;
    All these keywords.

    JEL classification:

    • C52 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Model Evaluation, Validation, and Selection
    • C53 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Forecasting and Prediction Models; Simulation Methods
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • G23 - Financial Economics - - Financial Institutions and Services - - - Non-bank Financial Institutions; Financial Instruments; Institutional Investors

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:fip:fedbwp:87410. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Spozio (email available below). General contact details of provider: https://edirc.repec.org/data/frbbous.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.