IDEAS home Printed from https://ideas.repec.org/p/mnb/opaper/2023-148.html
   My bibliography  Save this paper

Error Spotting with Gradient Boosting: A Machine Learning-Based Application for Central Bank Data Quality

Author

Listed:
  • Csaba Burger

    (Magyar Nemzeti Bank (the Central Bank of Hungary))

  • Mihály Berndt

    (Clarity Consulting Kft)

Abstract

Supervised machine learning methods, in which no error labels are present, are increasingly popular methods for identifying potential data errors. Such algorithms rely on the tenet of a ‘ground truth’ in the data, which in other words assumes correctness in the majority of the cases. Points deviating from such relationships, outliers, are flagged as potential data errors. This paper implements an outlier-based error-spotting algorithm using gradient boosting, and presents a blueprint for the modelling pipeline. More specifically, it underpins three main modelling hypotheses with empirical evidence, which are related to (1) missing value imputation, (2) the loss-function choice and (3) the location of the error. By doing so, it uses a cross sectional view on the loan-to-value and its related columns of the Credit Registry (Hitelregiszter) of the Central Bank of Hungary (MNB), and introduces a set of synthetic error types to test its hypotheses. The paper shows that gradient boosting is not materially impacted by the choice of the imputation method, hence, replacement with a constant, the computationally most efficient, is recommended. Second, the Huber-loss function, which is piecewise quadratic up until the Huber-slope parameter and linear above it, is better suited to cope with outlier values; it is therefore better in capturing data errors. Finally, errors in the target variable are captured best, while errors in the predictors are hardly found at all. These empirical results may generalize to other cases, depending on data specificities, and the modelling pipeline described underscores significant modelling decisions.

Suggested Citation

  • Csaba Burger & Mihály Berndt, 2023. "Error Spotting with Gradient Boosting: A Machine Learning-Based Application for Central Bank Data Quality," MNB Occasional Papers 2023/148, Magyar Nemzeti Bank (Central Bank of Hungary).
  • Handle: RePEc:mnb:opaper:2023/148
    as

    Download full text from publisher

    File URL: https://www.mnb.hu/en/publications/studies-publications-statistics/occasional-papers/op-148-csaba-burger-mihaly-berndt-error-spotting-with-gradient-boosting-a-machine-learning-based-application-for-central-bank-data-quality
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    data quality; machine learning; gradient boosting; central banking; loss functions; missing values;
    All these keywords.

    JEL classification:

    • C5 - Mathematical and Quantitative Methods - - Econometric Modeling
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • E58 - Macroeconomics and Monetary Economics - - Monetary Policy, Central Banking, and the Supply of Money and Credit - - - Central Banks and Their Policies

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:mnb:opaper:2023/148. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Lorant Kaszab (email available below). General contact details of provider: https://edirc.repec.org/data/mnbgvhu.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.