IDEAS home Printed from https://ideas.repec.org/a/sae/joudef/v18y2021i3p175-192.html

Stacked generalizations in imbalanced fraud data sets using resampling methods

Author

Listed:
  • Kathleen R Kerwin
  • Nathaniel D Bastian

Abstract

Predicting fraud is challenging due to inherent issues in the fraud data structure, since the crimes are committed through trickery or deceit with an ever-present moving target of changing modus operandi to circumvent human and system controls. As a national security challenge, criminals continually exploit the electronic financial system to defraud consumers and businesses by finding weaknesses in the system, including in audit controls. This study uses stacked generalization using meta or super learners for improving the performance of algorithms in step one (minimizing the algorithm error rate to reduce its bias in the learning set) and then in step two the results are input into the meta learner with its stacked blended output (with the weakest algorithms learning better). A fundamental key to fraud data is that it is inherently not systematic, and an optimal resampling methodology has yet not been identified. Building a test harness, for all permutations of algorithm sample set pairs, demonstrates that the complex, intrinsic data structures are all thoroughly tested. A comparative analysis on fraud data that applies stacked generalizations provides useful insight to find the optimal mathematical formula for imbalanced fraud data sets necessary to improve upon fraud detection for national security.

Suggested Citation

  • Kathleen R Kerwin & Nathaniel D Bastian, 2021. "Stacked generalizations in imbalanced fraud data sets using resampling methods," The Journal of Defense Modeling and Simulation, , vol. 18(3), pages 175-192, July.
  • Handle: RePEc:sae:joudef:v:18:y:2021:i:3:p:175-192
    DOI: 10.1177/1548512920962219
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/1548512920962219
    Download Restriction: no

    File URL: https://libkey.io/10.1177/1548512920962219?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Mills, Clinton, 2017. "Predictive analytics in fraud and AML," Journal of Financial Compliance, Henry Stewart Publications, vol. 1(1), pages 17-26, June.
    2. Maher Maalouf & Theodore B. Trafalis, 2011. "Rare events and imbalanced datasets: an overview," International Journal of Data Mining, Modelling and Management, Inderscience Enterprises Ltd, vol. 3(4), pages 375-388.
    3. D J Hand & C Whitrow & N M Adams & P Juszczak & D Weston, 2008. "Performance criteria for plastic card fraud detection tools," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 59(7), pages 956-962, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Songul Cinaroglu, 2020. "Modelling unbalanced catastrophic health expenditure data by using machine‐learning methods," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 27(4), pages 168-181, October.
    2. Lutz Kretschmann, 2020. "Leading indicators and maritime safety: predicting future risk with a machine learning approach," Journal of Shipping and Trade, Springer, vol. 5(1), pages 1-22, December.
    3. Bart Baesens & Sebastiaan Höppner & Irene Ortner & Tim Verdonck, 2021. "robROSE: A robust approach for dealing with imbalanced data in fraud detection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(3), pages 841-861, September.
    4. Höppner, Sebastiaan & Baesens, Bart & Verbeke, Wouter & Verdonck, Tim, 2022. "Instance-dependent cost-sensitive learning for detecting transfer fraud," European Journal of Operational Research, Elsevier, vol. 297(1), pages 291-300.
    5. Finlay, Steven, 2010. "Credit scoring for profitability objectives," European Journal of Operational Research, Elsevier, vol. 202(2), pages 528-537, April.
    6. Christoforos Anagnostopoulos & Dimitris Tasoulis & Niall Adams & David Hand, 2009. "Temporally adaptive estimation of logistic classifiers on data streams," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 3(3), pages 243-261, December.
    7. Xing, Jin & Chi, Guotai & Pan, Ancheng, 2024. "Instance-dependent misclassification cost-sensitive learning for default prediction," Research in International Business and Finance, Elsevier, vol. 69(C).
    8. Lessmann, Stefan & Voß, Stefan, 2009. "A reference model for customer-centric data mining with support vector machines," European Journal of Operational Research, Elsevier, vol. 199(2), pages 520-530, December.
    9. Hand, David J. & Crowder, Martin J., 2012. "Overcoming selectivity bias in evaluating new fraud detection systems for revolving credit operations," International Journal of Forecasting, Elsevier, vol. 28(1), pages 216-223.
    10. Sanjeev Jha & J. Christopher Westland, 2013. "A Descriptive Study of Credit Card Fraud Pattern," Global Business Review, International Management Institute, vol. 14(3), pages 373-384, September.
    11. Yang, Yi & Guo, Yuxuan & Chang, Xiangyu, 2021. "Angle-based cost-sensitive multicategory classification," Computational Statistics & Data Analysis, Elsevier, vol. 156(C).

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:joudef:v:18:y:2021:i:3:p:175-192. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.