IDEAS home Printed from https://ideas.repec.org/a/eme/jmlcpp/jmlc-05-2021-0049.html
   My bibliography  Save this article

Data quality issues leading to sub optimal machine learning for money laundering models

Author

Listed:
  • Abhishek Gupta
  • Dwijendra Nath Dwivedi
  • Jigar Shah
  • Ashish Jain

Abstract

Purpose - Good quality input data is critical to developing a robust machine learning model for identifying possible money laundering transactions. McKinsey, during one of the conferences of ACAMS, attributed data quality as one of the reasons for struggling artificial intelligence use cases in compliance to data. There were often use concerns raised on data quality of predictors such as wrong transaction codes, industry classification, etc. However, there has not been much discussion on the most critical variable of machine learning, the definition of an event, i.e. the date on which the suspicious activity reports (SAR) is filed. Design/methodology/approach - The team analyzed the transaction behavior of four major banks spread across Asia and Europe. Based on the findings, the team created a synthetic database comprising 2,000 SAR customers mimicking the time of investigation and case closure. In this paper, the authors focused on one very specific area of data quality, the definition of an event, i.e. the SAR/suspicious transaction report. Findings - The analysis of few of the banks in Asia and Europe suggests that this itself can improve the effectiveness of model and reduce the prediction span, i.e. the time lag between money laundering transaction done and prediction of money laundering as an alert for investigation Research limitations/implications - The analysis was done with existing experience of all situations where the time duration between alert and case closure is high (anywhere between 15 days till 10 months). Team could not quantify the impact of this finding due to lack of such actual case observed so far. Originality/value - The key finding from paper suggests that the money launderers typically either increase their level of activity or reduce their activity in the recent quarter. This is not true in terms of real behavior. They typically show a spike in activity through various means during money laundering. This in turn impacts the quality of insights that the model should be trained on. The authors believe that once the financial institutions start speeding up investigations on high risk cases, the scatter plot of SAR behavior will change significantly and will lead to better capture of money laundering behavior and a faster and more precise “catch” rate.

Suggested Citation

  • Abhishek Gupta & Dwijendra Nath Dwivedi & Jigar Shah & Ashish Jain, 2021. "Data quality issues leading to sub optimal machine learning for money laundering models," Journal of Money Laundering Control, Emerald Group Publishing Limited, vol. 25(3), pages 551-555, July.
  • Handle: RePEc:eme:jmlcpp:jmlc-05-2021-0049
    DOI: 10.1108/JMLC-05-2021-0049
    as

    Download full text from publisher

    File URL: https://www.emerald.com/insight/content/doi/10.1108/JMLC-05-2021-0049/full/html?utm_source=repec&utm_medium=feed&utm_campaign=repec
    Download Restriction: no

    File URL: https://www.emerald.com/insight/content/doi/10.1108/JMLC-05-2021-0049/full/pdf?utm_source=repec&utm_medium=feed&utm_campaign=repec
    Download Restriction: no

    File URL: https://libkey.io/10.1108/JMLC-05-2021-0049?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eme:jmlcpp:jmlc-05-2021-0049. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Emerald Support (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.