IDEAS home Printed from https://ideas.repec.org/p/osf/thesis/z7der_v1.html

Utilizing Big Administrative Data in Evaluation Research: Integrating Causal Modeling, Program Theory, and Machine Learning

Author

Listed:
  • de Avila, Rogerio

Abstract

The increased availability of administrative data and big data, coupled with advances in causal modeling and data analytics, presents new opportunities to enhance program evaluation in public policy and social sciences. This thesis investigates how these modern theory-driven approaches can be integrated with traditional methodologies to address complex causal questions, enhancing evaluations' effectiveness, timeliness, and comprehensiveness. Guided by substantial theoretical frameworks such as those proposed by Funnell and Rogers (2011) and empirical studies like Pearl (2009), this research addresses gaps in data utilization, ethical standards, and the application of machine learning. Specific challenges include improving the precision and comprehensiveness of data analysis, ensuring ethical data use as advocated by frameworks like the Five Safes, and enhancing interdisciplinary collaboration and training. This thesis aims to demonstrate significant advancements in program evaluation by bridging these gaps, proposing a paradigm shift towards a more integrated and data-informed approach in public policy and social sciences.

Suggested Citation

  • de Avila, Rogerio, 2024. "Utilizing Big Administrative Data in Evaluation Research: Integrating Causal Modeling, Program Theory, and Machine Learning," Thesis Commons z7der_v1, Center for Open Science.
  • Handle: RePEc:osf:thesis:z7der_v1
    DOI: 10.31219/osf.io/z7der_v1
    as

    Download full text from publisher

    File URL: https://osf.io/download/672d0c51ff2bca1fa64d6caa/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/z7der_v1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Tanvi Desai & Felix Ritchie & Richard Welpton, 2016. "Five Safes: designing data access for research," Working Papers 20161601, Department of Accounting, Economics and Finance, Bristol Business School, University of the West of England, Bristol.
    2. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. David H. Schiller & Johanna Eberle & Daniel Fuß & Jan Goebel & Jörg Heining & Tatjana Mika & Dana Müller & Frank Röder & Michael Stegmann & Karsten Stephan, 2017. "Standards des sicheren Datenzugangs in den Sozial- und Wirtschaftswissenschaften - Überblick über verschiedene Remote-Access-Verfahren," RatSWD Working Papers 261, German Data Forum (RatSWD).
    2. Li, Jiajia & Yang, Shiyu & Li, Jun & Li, Houjian, 2024. "Targeting SDG7: Identifying heterogeneous energy dilemmas for socially disadvantaged groups in India using machine learning," Energy Economics, Elsevier, vol. 138(C).
    3. Matthew J. Schneider & James Bailie & Dawn Iacobucci, 2025. "Why Data Anonymization Has Not Taken Off," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 12(1), pages 1-8, December.
    4. Ian Foster, 2018. "Research Infrastructure for the Safe Analysis of Sensitive Data," The ANNALS of the American Academy of Political and Social Science, , vol. 675(1), pages 102-120, January.
    5. Kalinda E. Griffiths & Jessica Blain & Claire M. Vajdic & Louisa Jorm, 2021. "Indigenous and Tribal Peoples Data Governance in Health Research: A Systematic Review," IJERPH, MDPI, vol. 18(19), pages 1-22, September.
    6. Peng Zhang & Maged N. Kamel Boulos, 2022. "Privacy-by-Design Environments for Large-Scale Health Research and Federated Learning from Data," IJERPH, MDPI, vol. 19(19), pages 1-13, September.
    7. Martin Huber, 2024. "An introduction to causal discovery," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 160(1), pages 1-16, December.
    8. Boylan, Sally & Arsenault, Catherine & Barreto, Marcos & Bozza, Fernando A & Fonseca, Adalton & Forde, Eoghan & Hookham, Lauren & Humphreys, Georgina S & Ichihara, Maria Yury & Le doare, Kirsty & Liu,, 2024. "Data challenges for international health emergencies: lessons learned from ten international COVID-19 driver projects," LSE Research Online Documents on Economics 122811, London School of Economics and Political Science, LSE Library.
    9. Patrick Rehill & Nicholas Biddle, 2023. "Transparency challenges in policy evaluation with causal machine learning -- improving usability and accountability," Papers 2310.13240, arXiv.org, revised Mar 2024.
    10. Ricardo Arcos & Ana Esteban & Eugenia Koblents & Emma Perez, 2026. "Main outcomes of the INEXDA working group on Statistical Disclosure Control (SDC)," IFC Bulletins chapters, in: Bank for International Settlements (ed.), Statistics and beyond: new data for decision making in central banks, volume 66, Bank for International Settlements.
    11. Elisa Stumpf & Silke Uebelmesser, 2024. "Lifting the Veil of Ignorance – Survey Experiments on Preferences for Wealth Redistribution," CESifo Working Paper Series 11126, CESifo.
    12. Sallie-Anne Pearson & Nicole Pratt & Juliana de Oliveira Costa & Helga Zoega & Tracey-Lea Laba & Christopher Etherton-Beer & Frank M. Sanfilippo & Alice Morgan & Lisa Kalisch Ellett & Claudia Bruno & , 2021. "Generating Real-World Evidence on the Quality Use, Benefits and Safety of Medicines in Australia: History, Challenges and a Roadmap for the Future," IJERPH, MDPI, vol. 18(24), pages 1-20, December.
    13. Patrick Dylong & Silke Uebelmesser, 2023. "Intergroup Contact and Exposure to Information about Immigrants: Experimental Evidence," CESifo Working Paper Series 10808, CESifo.
    14. Patrick Rehill & Nicholas Biddle, 2023. "Fairness Implications of Heterogeneous Treatment Effect Estimation with Machine Learning Methods in Policy-making," Papers 2309.00805, arXiv.org.
    15. Drini Imami & Orjon Xhoxhi & Davit Babayan & Thomas Herzfeld & Rustam Rakhmetov, 2026. "Treatment Effect Heterogeneity of Farmers Participating in Cooperatives," Journal of Agricultural Economics, Wiley Blackwell, vol. 77(1), pages 94-106, February.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:thesis:z7der_v1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://thesiscommons.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.