IDEAS home Printed from https://ideas.repec.org/a/kap/compec/v54y2019i3d10.1007_s10614-018-9864-z.html
   My bibliography  Save this article

Machine Learning and Sampling Scheme: An Empirical Study of Money Laundering Detection

Author

Listed:
  • Yan Zhang

    (Office of the Comptroller of the Currency)

  • Peter Trubey

    (University of California Santa Cruz)

Abstract

This paper studies the interplay of machine learning and sampling scheme in an empirical analysis of money laundering detection algorithms. Using actual transaction data provided by a U.S. financial institution, we study five major machine learning algorithms including Bayes logistic regression, decision tree, random forest, support vector machine, and artificial neural network. As the incidence of money laundering events is rare, we apply and compare two sampling techniques that increase the relative presence of the events. Our analysis reveals potential advantages of machine learning algorithms in modeling money laundering events. This paper provides insights into the use of machine learning and sampling schemes in money laundering detection specifically, and classification of rare events in general.

Suggested Citation

  • Yan Zhang & Peter Trubey, 2019. "Machine Learning and Sampling Scheme: An Empirical Study of Money Laundering Detection," Computational Economics, Springer;Society for Computational Economics, vol. 54(3), pages 1043-1063, October.
  • Handle: RePEc:kap:compec:v:54:y:2019:i:3:d:10.1007_s10614-018-9864-z
    DOI: 10.1007/s10614-018-9864-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10614-018-9864-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10614-018-9864-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kar Yan Tam & Melody Y. Kiang, 1992. "Managerial Applications of Neural Networks: The Case of Bank Failure Predictions," Management Science, INFORMS, vol. 38(7), pages 926-947, July.
    2. G. V. Kass, 1980. "An Exploratory Technique for Investigating Large Quantities of Categorical Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(2), pages 119-127, June.
    3. Mark Cecchini & Haldun Aytug & Gary J. Koehler & Praveen Pathak, 2010. "Detecting Management Fraud in Public Companies," Management Science, INFORMS, vol. 56(7), pages 1146-1160, July.
    4. Butaru, Florentin & Chen, Qingqing & Clark, Brian & Das, Sanmay & Lo, Andrew W. & Siddique, Akhtar, 2016. "Risk and risk management in the credit card industry," Journal of Banking & Finance, Elsevier, vol. 72(C), pages 218-239.
    5. Khandani, Amir E. & Kim, Adlar J. & Lo, Andrew W., 2010. "Consumer credit-risk models via machine-learning algorithms," Journal of Banking & Finance, Elsevier, vol. 34(11), pages 2767-2787, November.
    6. Altman, Edward I. & Marco, Giancarlo & Varetto, Franco, 1994. "Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the Italian experience)," Journal of Banking & Finance, Elsevier, vol. 18(3), pages 505-529, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chen, Jian & Katchova, Ani L. & Zhou, Chenxi, 2021. "Agricultural loan delinquency prediction using machine learning methods," International Food and Agribusiness Management Review, International Food and Agribusiness Management Association, vol. 24(5), May.
    2. Königstorfer, Florian & Thalmann, Stefan, 2020. "Applications of Artificial Intelligence in commercial banks – A research agenda for behavioral finance," Journal of Behavioral and Experimental Finance, Elsevier, vol. 27(C).
    3. Petra Posedel v{S}imovi'c & Davor Horvatic & Edward W. Sun, 2021. "Classifying variety of customer's online engagement for churn prediction with mixed-penalty logistic regression," Papers 2105.07671, arXiv.org, revised Jul 2021.
    4. Abbas Haider & Hui Wang & Bryan Scotney & Glenn Hawe, 2022. "Predictive Market Making via Machine Learning," SN Operations Research Forum, Springer, vol. 3(1), pages 1-21, March.
    5. Petra P. Šimović & Claire Y. T. Chen & Edward W. Sun, 2023. "Classifying the Variety of Customers’ Online Engagement for Churn Prediction with a Mixed-Penalty Logistic Regression," Computational Economics, Springer;Society for Computational Economics, vol. 61(1), pages 451-485, January.
    6. Alonso-Robisco, Andrés & Carbó, José Manuel, 2022. "Can machine learning models save capital for banks? Evidence from a Spanish credit portfolio," International Review of Financial Analysis, Elsevier, vol. 84(C).
    7. Zanin, Luca, 2020. "Combining multiple probability predictions in the presence of class imbalance to discriminate between potential bad and good borrowers in the peer-to-peer lending market," Journal of Behavioral and Experimental Finance, Elsevier, vol. 25(C).
    8. Ajitha Kumari Vijayappan Nair Biju & Ann Susan Thomas & J Thasneem, 2024. "Examining the research taxonomy of artificial intelligence, deep learning & machine learning in the financial sphere—a bibliometric analysis," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(1), pages 849-878, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2017. "Econom\'etrie et Machine Learning," Papers 1708.06992, arXiv.org, revised Mar 2018.
    2. Wolfgang Karl Härdle & Dedy Dwi Prastyo, 2013. "Default Risk Calculation based on Predictor Selection for the Southeast Asian Industry," SFB 649 Discussion Papers SFB649DP2013-037, Sonderforschungsbereich 649, Humboldt University, Berlin, Germany.
    3. Mark T. Leung & An-Sing Chen, 2005. "Performance evaluation of neural network architectures: the case of predicting foreign exchange correlations," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 24(6), pages 403-420.
    4. Steven Heston & Nitish R. Sinha, 2016. "News versus Sentiment : Predicting Stock Returns from News Stories," Finance and Economics Discussion Series 2016-048, Board of Governors of the Federal Reserve System (U.S.).
    5. Wolfgang Härdle & Yuh-Jye Lee & Dorothea Schäfer & Yi-Ren Yeh, 2009. "Variable selection and oversampling in the use of smooth support vector machines for predicting the default risk of companies," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 28(6), pages 512-534.
    6. Jones, Stewart & Johnstone, David & Wilson, Roy, 2015. "An empirical evaluation of the performance of binary classifiers in the prediction of credit ratings changes," Journal of Banking & Finance, Elsevier, vol. 56(C), pages 72-85.
    7. Anastasios Petropoulos & Vasilis Siakoulis & Evaggelos Stavroulakis & Aristotelis Klamargias, 2019. "A robust machine learning approach for credit risk analysis of large loan level datasets using deep learning and extreme gradient boosting," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Are post-crisis statistical initiatives completed?, volume 49, Bank for International Settlements.
    8. Greta Falavigna, 2008. "Nouveaux instruments d’évaluation pour le risque financier d’entreprise," CERIS Working Paper 200801, CNR-IRCrES Research Institute on Sustainable Economic Growth - Torino (TO) ITALY - former Institute for Economic Research on Firms and Growth - Moncalieri (TO) ITALY.
    9. Anastasios Petropoulos & Vasilis Siakoulis & Evaggelos Stavroulakis & Aristotelis Klamargias, 2019. "A robust machine learning approach for credit risk analysis of large loan-level datasets using deep learning and extreme gradient boosting," IFC Bulletins chapters, in: Bank for International Settlements (ed.), The use of big data analytics and artificial intelligence in central banking, volume 50, Bank for International Settlements.
    10. Ting Sun & Miklos A. Vasarhelyi, 2018. "Predicting credit card delinquencies: An application of deep neural networks," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 25(4), pages 174-189, October.
    11. Irving Fisher Committee, 2019. "The use of big data analytics and artificial intelligence in central banking," IFC Bulletins, Bank for International Settlements, number 50, July.
    12. Hussein A. Abdou & John Pointon, 2011. "Credit Scoring, Statistical Techniques And Evaluation Criteria: A Review Of The Literature," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 18(2-3), pages 59-88, April.
    13. du Jardin, Philippe, 2012. "The influence of variable selection methods on the accuracy of bankruptcy prediction models," MPRA Paper 44383, University Library of Munich, Germany.
    14. Salman Bahoo & Marco Cucculelli & Xhoana Goga & Jasmine Mondolo, 2024. "Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis," SN Business & Economics, Springer, vol. 4(2), pages 1-46, February.
    15. Carlos Serrano-Cinca, 1997. "Feedforward neural networks in the classification of financial information," The European Journal of Finance, Taylor & Francis Journals, vol. 3(3), pages 183-202.
    16. Pan, Shuiyang & Long, Suwan(Cheng) & Wang, Yiming & Xie, Ying, 2023. "Nonlinear asset pricing in Chinese stock market: A deep learning approach," International Review of Financial Analysis, Elsevier, vol. 87(C).
    17. Angelini, Eliana & di Tollo, Giacomo & Roli, Andrea, 2008. "A neural network approach for credit risk evaluation," The Quarterly Review of Economics and Finance, Elsevier, vol. 48(4), pages 733-755, November.
    18. Su-Han Woo & Min-Su Kwon & Kum Fai Yuen, 2021. "Financial determinants of credit risk in the logistics and shipping industries," Maritime Economics & Logistics, Palgrave Macmillan;International Association of Maritime Economists (IAME), vol. 23(2), pages 268-290, June.
    19. Zhichao Luo & Pingyu Hsu & Ni Xu, 2020. "SME Default Prediction Framework with the Effective Use of External Public Credit Data," Sustainability, MDPI, vol. 12(18), pages 1-18, September.
    20. García-Céspedes, Rubén & Moreno, Manuel, 2022. "The generalized Vasicek credit risk model: A Machine Learning approach," Finance Research Letters, Elsevier, vol. 47(PA).

    More about this item

    Keywords

    Bootstrap; Machine learning; Money laundering; Rare event; Sampling scheme;
    All these keywords.

    JEL classification:

    • G21 - Financial Economics - - Financial Institutions and Services - - - Banks; Other Depository Institutions; Micro Finance Institutions; Mortgages
    • G28 - Financial Economics - - Financial Institutions and Services - - - Government Policy and Regulation

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:kap:compec:v:54:y:2019:i:3:d:10.1007_s10614-018-9864-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.