IDEAS home Printed from https://ideas.repec.org/p/bdi/opques/qef_689_22.html
   My bibliography  Save this paper

Stacking machine-learning models for anomaly detection: comparing AnaCredit to other banking datasets

Author

Listed:
  • Pasquale Maddaloni

    (Bank of Italy)

  • Davide Nicola Continanza

    (Bank of Italy)

  • Andrea del Monaco

    (Bank of Italy)

  • Daniele Figoli

    (Bank of Italy)

  • Marco di Lucido

    (Bank of Italy)

  • Filippo Quarta

    (Bank of Italy)

  • Giuseppe Turturiello

    (Bank of Italy)

Abstract

This paper addresses the issue of assessing the quality of granular datasets reported by banks via machine learning models. In particular, it investigates how supervised and unsupervised learning algorithms can exploit patterns that can be recognized in other data sources dealing with similar phenomena (although these phenomena are available at a different level of aggregation), in order to detect potential outliers to be submitted to banks for their own checks. The above machine learning algorithms are finally stacked in a semi-supervised fashion in order to enhance their individual outlier detection ability. The described methodology is applied to compare the granular AnaCredit dataset, firstly with the Balance Sheet Items statistics (BSI), and secondly with the harmonised supervisory statistics of the Financial Reporting (FinRep), which are compiled for the Eurosystem and the Single Supervisory Mechanism, respectively. In both cases, we show that the performance of the stacking technique, in terms of F1-score, is higher than in each algorithm alone.

Suggested Citation

  • Pasquale Maddaloni & Davide Nicola Continanza & Andrea del Monaco & Daniele Figoli & Marco di Lucido & Filippo Quarta & Giuseppe Turturiello, 2022. "Stacking machine-learning models for anomaly detection: comparing AnaCredit to other banking datasets," Questioni di Economia e Finanza (Occasional Papers) 689, Bank of Italy, Economic Research and International Relations Area.
  • Handle: RePEc:bdi:opques:qef_689_22
    as

    Download full text from publisher

    File URL: https://www.bancaditalia.it/pubblicazioni/qef/2022-0689/QEF_689_22.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Farnè, Matteo & Vouldis, Angelos T., 2018. "A methodology for automised outlier detection in high-dimensional datasets: an application to euro area banks' supervisory data," Working Paper Series 2171, European Central Bank.
    2. Markus Goldstein & Seiichi Uchida, 2016. "A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-31, April.
    3. Tobias Cagala, 2017. "Improving data quality and closing data gaps with machine learning," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Data needs and Statistics compilation for macroprudential analysis, volume 46, Bank for International Settlements.
    4. Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
    5. Cerioli, Andrea & Farcomeni, Alessio, 2011. "Error rates for multivariate outlier detection," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 544-553, January.
    6. Granger, C. W. J., 1981. "Some properties of time series data and their use in econometric model specification," Journal of Econometrics, Elsevier, vol. 16(1), pages 121-130, May.
    7. Fabio Zambuto, 2021. "Quality checks on granular banking data: an experimental approach based on machine learning," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Micro data for the macro world, volume 53, Bank for International Settlements.
    8. Koller, Manuel & Stahel, Werner A., 2011. "Sharpening Wald-type inference in robust regression for small samples," Computational Statistics & Data Analysis, Elsevier, vol. 55(8), pages 2504-2515, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Francesco Cusano & Giuseppe Marinelli & Stefano Piermattei, 2022. "Learning from revisions: an algorithm to detect errors in banks’ balance sheet statistical reporting," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(6), pages 4025-4059, December.
    2. Fabio Zambuto & Simona Arcuti & Roberto Sabatini & Daniele Zambuto, 2021. "Application of classification algorithms for the assessment of confirmation to quality remarks," Questioni di Economia e Finanza (Occasional Papers) 631, Bank of Italy, Economic Research and International Relations Area.
    3. Francesco Cusano & Giuseppe Marinelli & Stefano Piermattei, 2021. "Learning from revisions: a tool for detecting potential errors in banks' balance sheet statistical reporting," Questioni di Economia e Finanza (Occasional Papers) 611, Bank of Italy, Economic Research and International Relations Area.
    4. Fabio Zambuto, 2021. "Quality checks on granular banking data: an experimental approach based on machine learning," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Micro data for the macro world, volume 53, Bank for International Settlements.
    5. Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
    6. Wesam Salah Alaloul & Muhammad Ali Musarat & Muhammad Babar Ali Rabbani & Qaiser Iqbal & Ahsen Maqsoom & Waqas Farooq, 2021. "Construction Sector Contribution to Economic Stability: Malaysian GDP Distribution," Sustainability, MDPI, vol. 13(9), pages 1-26, April.
    7. Luis Gil-Alana, 2004. "Forecasting the real output using fractionally integrated techniques," Applied Economics, Taylor & Francis Journals, vol. 36(14), pages 1583-1589.
    8. Biqing Cai & Jiti Gao & Dag Tjøstheim, 2017. "A New Class of Bivariate Threshold Cointegration Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 35(2), pages 288-305, April.
    9. repec:kap:iaecre:v:17:y:2011:i:2:p:157-168 is not listed on IDEAS
    10. Nielsen, Morten Orregaard & Shimotsu, Katsumi, 2007. "Determining the cointegrating rank in nonstationary fractional systems by the exact local Whittle approach," Journal of Econometrics, Elsevier, vol. 141(2), pages 574-596, December.
    11. Andreas Stephan, 1997. "The Impact of Road Infrastructure on Productivity and Growth: Some Preliminary Results for the German Manufacturing Sector," CIG Working Papers FS IV 97-47, Wissenschaftszentrum Berlin (WZB), Research Unit: Competition and Innovation (CIG).
    12. Guglielmo Maria Caporale & Luis A. Gil-Alana, 2015. "Infant mortality rates: time trends and fractional integration," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(3), pages 589-602, March.
    13. Mohamed, Hazik & Masih, Mansur, 2017. "Stock market comovement among the ASEAN-5 : a causality analysis," MPRA Paper 98781, University Library of Munich, Germany.
    14. Muhammad Shahbaz & Syed Jawad Hussain Shahzad & Mantu Kumar Mahalik & Perry Sadorsky, 2018. "How strong is the causal relationship between globalization and energy consumption in developed economies? A country-specific time-series and panel analysis," Applied Economics, Taylor & Francis Journals, vol. 50(13), pages 1479-1494, March.
    15. Erie Febrian & Aldrin Herwany, 2009. "Volatility Forecasting Models and Market Co-Integration: A Study on South-East Asian Markets," Working Papers in Economics and Development Studies (WoPEDS) 200911, Department of Economics, Padjadjaran University, revised Sep 2009.
    16. Yong Glasure & Aie-Rie Lee & James Norris, 1999. "Level of economic development and political democracy revisited," International Advances in Economic Research, Springer;International Atlantic Economic Society, vol. 5(4), pages 466-477, November.
    17. William Ginn, 2022. "Climate Disasters and the Macroeconomy: Does State-Dependence Matter? Evidence for the US," Economics of Disasters and Climate Change, Springer, vol. 6(1), pages 141-161, March.
    18. Carmen van der Merwe & Martin de Wit, 2021. "An In-Depth Investigation into the Relationship Between Municipal Solid Waste Generation and Economic Growth in the City of Cape Town," Working Papers 07/2021, Stellenbosch University, Department of Economics, revised 2021.
    19. Hassler, U. & Marmol, F. & Velasco, C., 2006. "Residual log-periodogram inference for long-run relationships," Journal of Econometrics, Elsevier, vol. 130(1), pages 165-207, January.
    20. Kenneth F. Wallis & Jan P. A. M. Jacobs, 2005. "Comparing SVARs and SEMs: two models of the UK economy," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 20(2), pages 209-228.
    21. Yin, Sihua & Yang, Haidong & Xu, Kangkang & Zhu, Chengjiu & Zhang, Shaqing & Liu, Guosheng, 2022. "Dynamic real–time abnormal energy consumption detection and energy efficiency optimization analysis considering uncertainty," Applied Energy, Elsevier, vol. 307(C).

    More about this item

    Keywords

    banking data; data quality management; outlier and anomaly detection; machine learning; auto-encoder; robust regression; pseudo labelling;
    All these keywords.

    JEL classification:

    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • G21 - Financial Economics - - Financial Institutions and Services - - - Banks; Other Depository Institutions; Micro Finance Institutions; Mortgages

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bdi:opques:qef_689_22. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/bdigvit.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.