IDEAS home Printed from
   My bibliography  Save this paper

Optimizing Tax Administration Policies with Machine Learning


  • Pietro Battiston
  • Simona Gamba
  • Alessandro Santoro


Tax authorities around the world are increasingly employing data mining and machine learning algorithms to predict individual behaviours. Although the traditional literature on optimal tax administration provides useful tools for ex-post evaluation of policies, it disregards the problem of which taxpayers to target. This study identifies and characterises a loss function that assigns a social cost to any prediction-based policy. We define such measure as the difference between the social welfare of a given policy and that of an ideal policy unaffected by prediction errors. We show how this loss function shares a relationship with the receiver operating characteristic curve, a standard statistical tool used to evaluate prediction performance. Subsequently, we apply our measure to predict inaccurate tax returns issued by self-employed and sole proprietorships in Italy. In our application, a random forest model provides the best prediction: we show how it can be interpreted using measures of variable importance developed in the machine learning literature.

Suggested Citation

  • Pietro Battiston & Simona Gamba & Alessandro Santoro, 2020. "Optimizing Tax Administration Policies with Machine Learning," Working Papers 436, University of Milano-Bicocca, Department of Economics, revised Mar 2020.
  • Handle: RePEc:mib:wpaper:436

    Download full text from publisher

    File URL:
    Download Restriction: no

    References listed on IDEAS

    1. Jon Kleinberg & Jens Ludwig & Sendhil Mullainathan & Ziad Obermeyer, 2015. "Prediction Policy Problems," American Economic Review, American Economic Association, vol. 105(5), pages 491-495, May.
    2. Miguel Almunia & David Lopez-Rodriguez, 2018. "Under the Radar: The Effects of Monitoring Firms on Tax Compliance," American Economic Journal: Economic Policy, American Economic Association, vol. 10(1), pages 1-38, February.
    3. Keen, Michael & Slemrod, Joel, 2017. "Optimal tax administration," Journal of Public Economics, Elsevier, vol. 152(C), pages 133-142.
    4. Jonah E. Rockoff & Brian A. Jacob & Thomas J. Kane & Douglas O. Staiger, 2011. "Can You Recognize an Effective Teacher When You Recruit One?," Education Finance and Policy, MIT Press, vol. 6(1), pages 43-74, January.
    5. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    6. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    7. Dana Chandler & Steven D. Levitt & John A. List, 2011. "Predicting and Preventing Shootings among At-Risk Youth," American Economic Review, American Economic Association, vol. 101(3), pages 288-292, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Monica Andini & Emanuele Ciani & Guido de Blasio & Alessio D'Ignazio & Viola Salvestrini, 2017. "Targeting policy-compliers with machine learning: an application to a tax rebate programme in Italy," Temi di discussione (Economic working papers) 1158, Bank of Italy, Economic Research and International Relations Area.
    2. Andini, Monica & Ciani, Emanuele & de Blasio, Guido & D'Ignazio, Alessio & Salvestrini, Viola, 2018. "Targeting with machine learning: An application to a tax rebate program in Italy," Journal of Economic Behavior & Organization, Elsevier, vol. 156(C), pages 86-102.
    3. Guido de Blasio & Alessio D'Ignazio & Marco Letta, 2020. "Predicting Corruption Crimes with Machine Learning. A Study for the Italian Municipalities," Working Papers 16/20, Sapienza University of Rome, DISS.
    4. Sophie-Charlotte Klose & Johannes Lederer, 2020. "A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics," Papers 2006.12296,, revised Jun 2020.
    5. de Lucio, Juan, 2021. "Estimación adelantada del crecimiento regional mediante redes neuronales LSTM," INVESTIGACIONES REGIONALES - Journal of REGIONAL RESEARCH, Asociación Española de Ciencia Regional, issue 49, pages 45-64.
    6. Chakraborty, Chiranjit & Joseph, Andreas, 2017. "Machine learning at central banks," Bank of England working papers 674, Bank of England.
    7. McKenzie, David J. & Sansone, Dario, 2017. "Man vs. Machine in Predicting Successful Entrepreneurs: Evidence from a Business Plan Competition in Nigeria," CEPR Discussion Papers 12523, C.E.P.R. Discussion Papers.
    8. Francesco Decarolis & Cristina Giorgiantonio, 2020. "Corruption red flags in public procurement: new evidence from Italian calls for tenders," Questioni di Economia e Finanza (Occasional Papers) 544, Bank of Italy, Economic Research and International Relations Area.
    9. Monica Andini & Michela Boldrini & Emanuele Ciani & Guido de Blasio & Alessio D'Ignazio & Andrea Paladini, 2019. "Machine learning in the service of policy targeting: the case of public credit guarantees," Temi di discussione (Economic working papers) 1206, Bank of Italy, Economic Research and International Relations Area.
    10. Fabio Pammolli & Paolo Bonaretti & Massimo Riccaboni & Valentina Tortolini, 2019. "Quali Regole per la Spesa Farmaceutica? - Criticità, Impatti, Proposte," Working Papers CERM 01-2019, Competitività, Regole, Mercati (CERM).
    11. McKenzie, David & Sansone, Dario, 2019. "Predicting entrepreneurial success is hard: Evidence from a business plan competition in Nigeria," Journal of Development Economics, Elsevier, vol. 141(C).
    12. Böhme, Marcus H. & Gröger, André & Stöhr, Tobias, 2020. "Searching for a better life: Predicting international migration with online search keywords," Journal of Development Economics, Elsevier, vol. 142(C).
    13. Jorge Mejia & Shawn Mankad & Anandasivam Gopal, 2019. "A for Effort? Using the Crowd to Identify Moral Hazard in New York City Restaurant Hygiene Inspections," Information Systems Research, INFORMS, vol. 30(4), pages 1363-1386, December.
    14. Susan Athey, 2018. "The Impact of Machine Learning on Economics," NBER Chapters, in: The Economics of Artificial Intelligence: An Agenda, pages 507-547, National Bureau of Economic Research, Inc.
    15. Potnuru Kishen Suraj & Ankesh Gupta & Makkunda Sharma & Sourabh Bikas Paul & Subhashis Banerjee, 2017. "On monitoring development indicators using high resolution satellite images," Papers 1712.02282,, revised Jun 2018.
    16. Naguib, Costanza, 2019. "Estimating the Heterogeneous Impact of the Free Movement of Persons on Relative Wage Mobility," Economics Working Paper Series 1903, University of St. Gallen, School of Economics and Political Science.
    17. Akash Malhotra, 2018. "A hybrid econometric-machine learning approach for relative importance analysis: Prioritizing food policy," Papers 1806.04517,, revised Aug 2020.
    18. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2017. "Econom\'etrie et Machine Learning," Papers 1708.06992,, revised Mar 2018.
    19. Croux, Christophe & Jagtiani, Julapa & Korivi, Tarunsai & Vulanovic, Milos, 2020. "Important factors determining Fintech loan default: Evidence from a lendingclub consumer platform," Journal of Economic Behavior & Organization, Elsevier, vol. 173(C), pages 270-296.
    20. Bryan T. Kelly & Asaf Manela & Alan Moreira, 2019. "Text Selection," NBER Working Papers 26517, National Bureau of Economic Research, Inc.

    More about this item


    policy prediction problems; tax behaviour; big data; machine learning;
    All these keywords.

    JEL classification:

    • H26 - Public Economics - - Taxation, Subsidies, and Revenue - - - Tax Evasion and Avoidance
    • H32 - Public Economics - - Fiscal Policies and Behavior of Economic Agents - - - Firm
    • C53 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Forecasting and Prediction Models; Simulation Methods

    NEP fields

    This paper has been announced in the following NEP Reports:


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:mib:wpaper:436. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Matteo Pelagatti). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.