IDEAS home Printed from https://ideas.repec.org/p/tul/wpaper/2603.html

Optimal Audit Targeting with Machine Learning: Evidence from Pakistan

Author

Listed:
  • Nicholas Lacoste

    (Tulane University)

  • Zehra Farooq

    (Federal Board of Revenue, Pakistan)

Abstract

This paper bridges welfare economics and machine learning econometrics to develop empirically implementable algorithms for optimal audit targeting. We derive a sufficient statistic-based targeting algorithm that depends on three individualized causal effects: the immediate revenue recovered from an audit, the causal effect of an audit on long-run tax revenue, and the marginal administrative cost of an audit. We estimate these effects with a variety of machine learners comparing causal forests, LASSO, gradient boosted trees, and neural networks using the universe of Pakistani income tax returns, exploiting years in which audits were assigned completely at random. We implement our targeting algorithms in out-of-bag years, comparing them to the real-world policy when audits were partially or entirely targeted. We show that the real world audit program in Pakistan lost almost 173,000 Rs ($1,700) in net revenue per-audit, while our optimal policy generates 285,000 Rs ($2,800) in expected net revenue per-audit. We also find that targeting audits based on immediate recoup is sub-optimal to targeting on long-run deterrence in this setting. Moving forward, our framework offers a general approach to empirical welfare maximization using machine learning in resource-constrained policy settings.

Suggested Citation

  • Nicholas Lacoste & Zehra Farooq, 2026. "Optimal Audit Targeting with Machine Learning: Evidence from Pakistan," Working Papers 2603, Tulane University, Department of Economics.
  • Handle: RePEc:tul:wpaper:2603
    as

    Download full text from publisher

    File URL: http://repec.tulane.edu/RePEc/pdf/tul2603.pdf
    File Function: First Version, February 2026
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    JEL classification:

    • H21 - Public Economics - - Taxation, Subsidies, and Revenue - - - Efficiency; Optimal Taxation
    • H26 - Public Economics - - Taxation, Subsidies, and Revenue - - - Tax Evasion and Avoidance
    • C14 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Semiparametric and Nonparametric Methods: General
    • C45 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Neural Networks and Related Topics

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:tul:wpaper:2603. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Nicholas Lacoste (email available below). General contact details of provider: https://edirc.repec.org/data/detulus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.