IDEAS home Printed from https://ideas.repec.org/a/spr/stmapp/v32y2023i1d10.1007_s10260-022-00643-4.html
   My bibliography  Save this article

2-step Gradient Boosting approach to selectivity bias correction in tax audit: an application to the VAT gap in Italy

Author

Listed:
  • Pierfrancesco Alaimo Di Loro

    (La Sapienza
    LUMSA)

  • Daria Scacciatelli

    (SOGEI)

  • Giovanna Tagliaferri

    (La Sapienza
    SOGEI)

Abstract

The revenue loss from tax avoidance can undermine the effectiveness and equity of the government policies. A standard measure of its magnitude is known as the tax gap, that is defined as the difference between the total taxes theoretically collectable and the total taxes actually collected in a given period. Estimation from a micro perspective is usually tackled in the context of bottom-up approaches, where data regularly collected through fiscal audits are analyzed in order to provide inference on the general population. However, the sampling scheme of fiscal audits performed by revenue agencies is not random but characterized by a selection bias toward risky taxpayers. The current standard adopted by the Italian Revenue Agency (IRA) for overcoming this issue in the Tax audit context is the Heckman model, based on linear models for modeling both the selection and the outcome mechanisms. Here we propose the adoption of the CART-based Gradient Boosting in place of standard linear models to account for the complex patterns often arising in the relationships between covariates and outcome. Selection bias is corrected by considering a re-weighting scheme based on propensity scores, attained through the sequential application of a classifier and a regressor. In short we refer to the method as 2-step Gradient Boosting. We argue how this scheme fits the sampling mechanism of the IRA fiscal audits, and it is applied to a sample of VAT declarations from Italian individual firms in the fiscal year 2011. Results show a marked dominance of the proposed method over the currently adopted Heckman model in terms of predictive performances.

Suggested Citation

  • Pierfrancesco Alaimo Di Loro & Daria Scacciatelli & Giovanna Tagliaferri, 2023. "2-step Gradient Boosting approach to selectivity bias correction in tax audit: an application to the VAT gap in Italy," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(1), pages 237-270, March.
  • Handle: RePEc:spr:stmapp:v:32:y:2023:i:1:d:10.1007_s10260-022-00643-4
    DOI: 10.1007/s10260-022-00643-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10260-022-00643-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10260-022-00643-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Martin Werding, 2005. "Survivor Benefits and the Gender Tax Gap in Public Pension Schemes: Observations from Germany," CESifo Working Paper Series 1596, CESifo.
    2. Jiaming Liu & Chong Wu & Yongli Li, 2019. "Improving Financial Distress Prediction Using Financial Network-Based Information and GA-Based Gradient Boosting Method," Computational Economics, Springer;Society for Computational Economics, vol. 53(2), pages 851-872, February.
    3. James J. Heckman, 1976. "The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 5, number 4, pages 475-492, National Bureau of Economic Research, Inc.
    4. Paul H. Lee, 2014. "Resampling Methods Improve the Predictive Power of Modeling in Class-Imbalanced Datasets," IJERPH, MDPI, vol. 11(9), pages 1-14, September.
    5. Keisuke Hirano & Guido W. Imbens & Geert Ridder, 2003. "Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score," Econometrica, Econometric Society, vol. 71(4), pages 1161-1189, July.
    6. Sudhanshu Kumar & kavita rao, 2015. "Minimising Selection Failure and Measuring Tax Gap: An Empirical Model," Working Papers id:7031, eSocialSciences.
    7. Whitney K. Newey, 2009. "Two-step series estimation of sample selection models," Econometrics Journal, Royal Economic Society, vol. 12(s1), pages 217-229, January.
    8. James J. Heckman, 1976. "Introduction to "Annals of Economic and Social Measurement, Volume 5, number 4"," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 5, number 4, National Bureau of Economic Research, Inc.
    9. Marra, Giampiero & Wyszynski, Karol, 2016. "Semi-parametric copula sample selection models for count responses," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 110-129.
    10. Dangerfield, Byron J. & Morris, John S., 1992. "Top-down or bottom-up: Aggregate versus disaggregate extrapolations," International Journal of Forecasting, Elsevier, vol. 8(2), pages 233-241, October.
    11. Jelke Bethlehem, 2010. "Selection Bias in Web Surveys," International Statistical Review, International Statistical Institute, vol. 78(2), pages 161-188, August.
    12. Yang, Jui-Chung & Chuang, Hui-Ching & Kuan, Chung-Ming, 2020. "Double machine learning with gradient boosting and its application to the Big N audit quality effect," Journal of Econometrics, Elsevier, vol. 216(1), pages 268-283.
    13. Wojtyś, Magorzata & Marra, Giampiero & Radice, Rosalba, 2016. "Copula Regression Spline Sample Selection Models: The R Package SemiParSampleSel," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 71(i06).
    14. Marra, Giampiero & Radice, Rosalba, 2013. "Estimation of a regression spline sample selection model," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 158-173.
    15. Guido W. Imbens, 2004. "Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 4-29, February.
    16. Patrick Puhani, 2000. "The Heckman Correction for Sample Selection and Its Critique," Journal of Economic Surveys, Wiley Blackwell, vol. 14(1), pages 53-68, February.
    17. Kumar, Sudhanshu & Rao, R. Kavita, 2015. "Minimising Selection Failure and Measuring Tax Gap: An Empirical Model," Working Papers 15/150, National Institute of Public Finance and Policy.
    18. FISCALIS Tax Gap Project Group, 2018. "The concept of tax gaps - Corporate Income Tax Gap Estimation Methodologies," Taxation Papers 73, Directorate General Taxation and Customs Union, European Commission.
    19. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wojtyś, Małgorzata & Marra, Giampiero & Radice, Rosalba, 2018. "Copula based generalized additive models for location, scale and shape with non-random sample selection," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 1-14.
    2. Giovanna Tagliaferri & Daria Scacciatelli & Pierfrancesco Alaimo Di Loro, 2019. "VAT tax gap prediction: a 2-steps Gradient Boosting approach," Papers 1912.03781, arXiv.org, revised Jun 2020.
    3. Marra, Giampiero & Wyszynski, Karol, 2016. "Semi-parametric copula sample selection models for count responses," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 110-129.
    4. Martin Huber, 2014. "Treatment Evaluation in the Presence of Sample Selection," Econometric Reviews, Taylor & Francis Journals, vol. 33(8), pages 869-905, November.
    5. Ruoyao Shi, 2021. "An Averaging Estimator for Two Step M Estimation in Semiparametric Models," Working Papers 202105, University of California at Riverside, Department of Economics.
    6. Karol Wyszynski & Giampiero Marra, 2018. "Sample selection models for count data in R," Computational Statistics, Springer, vol. 33(3), pages 1385-1412, September.
    7. Martin Huber, 2012. "Identification of Average Treatment Effects in Social Experiments Under Alternative Forms of Attrition," Journal of Educational and Behavioral Statistics, , vol. 37(3), pages 443-474, June.
    8. Katrin Hussinger, 2008. "R&D and subsidies at the firm level: an application of parametric and semiparametric two-step selection models," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 23(6), pages 729-747.
    9. Emmanuel O. Ogundimu & Jane L. Hutton, 2016. "A Sample Selection Model with Skew-normal Distribution," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(1), pages 172-190, March.
    10. Eunji Choi & Jonghoon Park & Seongwoo Lee, 2020. "The Effect of the Comprehensive Rural Village Development Program on Farm Income in South Korea," Sustainability, MDPI, vol. 12(17), pages 1-23, August.
    11. Seonho Shin, 2022. "To work or not? Wages or subsidies?: Copula-based evidence of subsidized refugees’ negative selection into employment," Empirical Economics, Springer, vol. 63(4), pages 2209-2252, October.
    12. Martin Huber & Giovanni Mellace, 2015. "Sharp Bounds on Causal Effects under Sample Selection," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 77(1), pages 129-151, February.
    13. Wiemann, Paul F.V. & Klein, Nadja & Kneib, Thomas, 2022. "Correcting for sample selection bias in Bayesian distributional regression models," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    14. Martin Huber & Anna Solovyeva, 2020. "Direct and Indirect Effects under Sample Selection and Outcome Attrition," Econometrics, MDPI, vol. 8(4), pages 1-25, December.
    15. Adelchi Azzalini & Hyoung-Moon Kim & Hea-Jung Kim, 2019. "Sample selection models for discrete and other non-Gaussian response variables," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(1), pages 27-56, March.
    16. Mikhail Zhelonkin & Marc G. Genton & Elvezio Ronchetti, 2016. "Robust inference in sample selection models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(4), pages 805-827, September.
    17. Rogelio A. Mancisidor & Michael Kampffmeyer & Kjersti Aas & Robert Jenssen, 2019. "Deep Generative Models for Reject Inference in Credit Scoring," Papers 1904.11376, arXiv.org, revised Sep 2021.
    18. Qi Li & Jeffrey Scott Racine, 2006. "Nonparametric Econometrics: Theory and Practice," Economics Books, Princeton University Press, edition 1, volume 1, number 8355.
    19. Töpfer, Marina, 2017. "Detailed RIF decomposition with selection: The gender pay gap in Italy," Hohenheim Discussion Papers in Business, Economics and Social Sciences 26-2017, University of Hohenheim, Faculty of Business, Economics and Social Sciences.
    20. Renuka Sane & Susan Thomas, 2020. "From Participation To Repurchase: Low Income Households And Micro‐insurance," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 87(3), pages 783-814, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stmapp:v:32:y:2023:i:1:d:10.1007_s10260-022-00643-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.