IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2512.09652.html

Measuring Corruption from Text Data

Author

Listed:
  • Arieda Muc{c}o

Abstract

Using Brazilian municipal audit reports, I construct an automated corruption index that combines a dictionary of audit irregularities with principal component analysis. The index validates strongly against independent human coders, explaining 71-73 \% of the variation in hand-coded corruption counts in samples where coders themselves exhibit high agreement, and the results are robust within these validation samples. The index behaves as theory predicts, correlating with municipal characteristics that prior research links to corruption. Supervised learning alternatives yield nearly identical municipal rankings ($R^{2}=0.98$), confirming that the dictionary approach captures the same underlying construct. The method scales to the full audit corpus and offers advantages over both manual coding and Large Language Models (LLMs) in transparency, cost, and long-run replicability.

Suggested Citation

  • Arieda Muc{c}o, 2025. "Measuring Corruption from Text Data," Papers 2512.09652, arXiv.org.
  • Handle: RePEc:arx:papers:2512.09652
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2512.09652
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Claudio Ferraz & Frederico Finan, 2011. "Electoral Accountability and Corruption: Evidence from the Audits of Local Governments," American Economic Review, American Economic Association, vol. 101(4), pages 1274-1311, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marcel Fafchamps & Julien Labonne, 2017. "Do Politicians’ Relatives Get Better Jobs? Evidence from Municipal Elections," The Journal of Law, Economics, and Organization, Oxford University Press, vol. 33(2), pages 268-300.
    2. Quinckhardt, Matthias, 2023. "The value of a party: Local politics and the allocation of intergovernmental transfers," European Journal of Political Economy, Elsevier, vol. 80(C).
    3. Laurent Bouton & Paola Conconi & Francisco Pino & Maurizio Zanardi, 2018. "Guns, Environment, and Abortion: How Single-Minded Voters Shape Politicians' Decisions," Working Papers gueconwpa~18-18-15, Georgetown University, Department of Economics.
    4. Ye-Feng Chen & Shu-Guang Jiang & Marie Claire Villeval, 2015. "The Tragedy of Corruption. Corruption as a social dilemma," Working Papers 1531, Groupe d'Analyse et de Théorie Economique Lyon St-Etienne (GATE Lyon St-Etienne), Université de Lyon.
    5. Kahsay, Goytom Abraha & Medhin, Haileselassie, 2020. "Leader turnover and forest management outcomes: Micro-level evidence from Ethiopia," World Development, Elsevier, vol. 127(C).
    6. Benjamin Marx & Vincent Pons & Vincent Rollet, 2022. "Electoral Turnovers," NBER Working Papers 29766, National Bureau of Economic Research, Inc.
    7. Britto, Diogo G.C. & Fiorin, Stefano, 2020. "Corruption and legislature size: Evidence from Brazil," European Journal of Political Economy, Elsevier, vol. 65(C).
    8. Bernecker, Andreas, 2014. "Do politicians shirk when reelection is certain? Evidence from the German parliament," European Journal of Political Economy, Elsevier, vol. 36(C), pages 55-70.
    9. Bergh, Andreas & Erlingsson, Gissur & Gustafsson, Anders & Wittberg, Emanuel, 2018. "Municipally owned enterprises: Nested principal-agent relations and conditions for accountability," Ratio Working Papers 306, The Ratio Institute, revised 18 Oct 2018.
    10. Monika Köppl-Turyna, 2016. "Opportunistic politicians and fiscal outcomes: the curious case of Vorarlberg," Public Choice, Springer, vol. 168(3), pages 177-216, September.
    11. Jean Beuve & Marian W. Moszoro & Stéphane Saussier, 2019. "Political contestability and public contract rigidity: An analysis of procurement contracts," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 28(2), pages 316-335, April.
    12. Benjamin Marx, 2018. "Elections as Incentives: Project Completion and Visibility in African Politics," Sciences Po Economics Publications (main) hal-03873801, HAL.
    13. Nicola Mastrorocco & Arianna Ornaghi, 2025. "Who Watches the Watchmen? Local News and Police Behavior in the United States," American Economic Journal: Economic Policy, American Economic Association, vol. 17(2), pages 285-318, May.
    14. Cisneros, Elías & Kis-Katos, Krisztina, 2024. "Unintended environmental consequences of anti-corruption strategies," Journal of Environmental Economics and Management, Elsevier, vol. 128(C).
    15. Saka, O. & Ji, Y. & Minaudier, C., 2024. "Political Accountability during Crises: Evidence from 40 years of Financial Policies," Working Papers 24/01, Department of Economics, City St George's, University of London.
    16. Cavalcanti, Francisco & Daniele, Gianmarco & Galletta, Sergio, 2018. "Popularity shocks and political selection," Journal of Public Economics, Elsevier, vol. 165(C), pages 201-216.
    17. Chan, Jeff & Karim, Ridwan, 2023. "Oil royalties and the provision of public education in Brazil," Economics of Education Review, Elsevier, vol. 92(C).
    18. Prasenjit Banerjee & Vegard Iversen & Sandip Mitra & Antonio Nicolò & Kunal Sen, 2018. "Politicians and Their Promises in an Uncertain World: Evidence from a Lab-in-the-Field Experiment in India," Economics Discussion Paper Series 1806, Economics, The University of Manchester.
    19. Zamboni, Yves & Litschig, Stephan, 2018. "Audit risk and rent extraction: Evidence from a randomized evaluation in Brazil," Journal of Development Economics, Elsevier, vol. 134(C), pages 133-149.
    20. Gouvêa, Raphael & Girardi, Daniele, 2021. "Partisanship and local fiscal policy: Evidence from Brazilian cities," Journal of Development Economics, Elsevier, vol. 150(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2512.09652. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.