IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2209.06631.html
   My bibliography  Save this paper

Sample Fit Reliability

Author

Listed:
  • Gabriel Okasa
  • Kenneth A. Younge

Abstract

Researchers frequently test and improve model fit by holding a sample constant and varying the model. We propose methods to test and improve sample fit by holding a model constant and varying the sample. Much as the bootstrap is a well-known method to re-sample data and estimate the uncertainty of the fit of parameters in a model, we develop Sample Fit Reliability (SFR) as a set of computational methods to re-sample data and estimate the reliability of the fit of observations in a sample. SFR uses Scoring to assess the reliability of each observation in a sample, Annealing to check the sensitivity of results to removing unreliable data, and Fitting to re-weight observations for more robust analysis. We provide simulation evidence to demonstrate the advantages of using SFR, and we replicate three empirical studies with treatment effects to illustrate how SFR reveals new insights about each study.

Suggested Citation

  • Gabriel Okasa & Kenneth A. Younge, 2022. "Sample Fit Reliability," Papers 2209.06631, arXiv.org.
  • Handle: RePEc:arx:papers:2209.06631
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2209.06631
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Richard K. Crump & V. Joseph Hotz & Guido W. Imbens & Oscar A. Mitnik, 2009. "Dealing with limited overlap in estimation of average treatment effects," Biometrika, Biometrika Trust, vol. 96(1), pages 187-199.
    2. Joshua D. Angrist & Jörn-Steffen Pischke, 2010. "The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 24(2), pages 3-30, Spring.
    3. LaLonde, Robert J, 1986. "Evaluating the Econometric Evaluations of Training Programs with Experimental Data," American Economic Review, American Economic Association, vol. 76(4), pages 604-620, September.
    4. Hubbard, Raymond & Vetter, Daniel E., 1996. "An empirical comparison of published replication research in accounting, economics, finance, management, and marketing," Journal of Business Research, Elsevier, vol. 35(2), pages 153-164, February.
    5. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    6. Richard Anderson & William Greene & B. D. McCullough & H. D. Vinod, 2008. "The role of data/code archives in the future of economic research," Journal of Economic Methodology, Taylor & Francis Journals, vol. 15(1), pages 99-119.
    7. Richard K. Crump & V. Joseph Hotz & Guido W. Imbens & Oscar A. Mitnik, 2008. "Nonparametric Tests for Treatment Effect Heterogeneity," The Review of Economics and Statistics, MIT Press, vol. 90(3), pages 389-405, August.
    8. Lechner, Michael, 2018. "Modified Causal Forests for Estimating Heterogeneous Causal Effects," IZA Discussion Papers 12040, Institute of Labor Economics (IZA).
    9. Harrison, David Jr. & Rubinfeld, Daniel L., 1978. "Hedonic housing prices and the demand for clean air," Journal of Environmental Economics and Management, Elsevier, vol. 5(1), pages 81-102, March.
    10. JAMES G. MacKINNON, 2006. "Bootstrap Methods in Econometrics," The Economic Record, The Economic Society of Australia, vol. 82(s1), pages 2-18, September.
    11. Jan H. Höffler, 2017. "Replication and Economics Journal Policies," American Economic Review, American Economic Association, vol. 107(5), pages 52-55, May.
    12. Susan Athey & Guido W. Imbens, 2017. "The State of Applied Econometrics: Causality and Policy Evaluation," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 3-32, Spring.
    13. Nikolas Kuschnig & Gregor Zens & Jesús Crespo Cuaresma, 2021. "Hidden in Plain Sight: Influential Sets in Linear Models," CESifo Working Paper Series 8981, CESifo.
    14. Hugo Bodory & Lorenzo Camponovo & Martin Huber & Michael Lechner, 2020. "The Finite Sample Performance of Inference Methods for Propensity Score Matching and Weighting Estimators," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 38(1), pages 183-200, January.
    15. Paul Goldsmith-Pinkham & Peter Hull & Michal Koles'ar, 2021. "Contamination Bias in Linear Regressions," Papers 2106.05024, arXiv.org, revised Jun 2024.
    16. Jason Abrevaya & Yu-Chin Hsu & Robert P. Lieli, 2015. "Estimating Conditional Average Treatment Effects," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 33(4), pages 485-505, October.
    17. Michael Lechner & Anthony Strittmatter, 2019. "Practical procedures to deal with common support problems in matching estimation," Econometric Reviews, Taylor & Francis Journals, vol. 38(2), pages 193-207, February.
    18. Mark F. J. Steel, 2020. "Model Averaging and Its Use in Economics," Journal of Economic Literature, American Economic Association, vol. 58(3), pages 644-719, September.
    19. Steven Lehrer & Tian Xie, 2017. "Box Office Buzz: Does Social Media Data Steal the Show from Model Uncertainty When Forecasting for Hollywood?," The Review of Economics and Statistics, MIT Press, vol. 99(5), pages 749-755, December.
    20. Dean Karlan & John A. List, 2007. "Does Price Matter in Charitable Giving? Evidence from a Large-Scale Natural Field Experiment," American Economic Review, American Economic Association, vol. 97(5), pages 1774-1793, December.
    21. Matias Busso & John DiNardo & Justin McCrary, 2014. "New Evidence on the Finite Sample Properties of Propensity Score Reweighting and Matching Estimators," The Review of Economics and Statistics, MIT Press, vol. 96(5), pages 885-897, December.
    22. Gustavo Canavire-Bacarreza & Luis Castro Peñarrieta & Darwin Ugarte Ontiveros, 2021. "Outliers in Semi-Parametric Estimation of Treatment Effects," Econometrics, MDPI, vol. 9(2), pages 1-32, April.
    23. Leamer, Edward E, 1983. "Let's Take the Con Out of Econometrics," American Economic Review, American Economic Association, vol. 73(1), pages 31-43, March.
    24. Manuela Angelucci & Dean Karlan & Jonathan Zinman, 2015. "Microcredit Impacts: Evidence from a Randomized Microcredit Program Placement Experiment by Compartamos Banco," American Economic Journal: Applied Economics, American Economic Association, vol. 7(1), pages 151-182, January.
    25. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    26. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    27. Iacus, Stefano M. & King, Gary & Porro, Giuseppe, 2011. "Multivariate Matching Methods That Are Monotonic Imbalance Bounding," Journal of the American Statistical Association, American Statistical Association, vol. 106(493), pages 345-361.
    28. Abhijit Banerjee & Dean Karlan & Jonathan Zinman, 2015. "Six Randomized Evaluations of Microcredit: Introduction and Further Steps," American Economic Journal: Applied Economics, American Economic Association, vol. 7(1), pages 1-21, January.
    29. Tobias Cagala & Ulrich Glogowsky & Johannes Rincke & Anthony Strittmatter, 2021. "Optimal Targeting in Fundraising: A Causal Machine-Learning Approach," Papers 2103.10251, arXiv.org, revised Sep 2021.
    30. Leamer, Edward E, 1985. "Sensitivity Analyses Would Help," American Economic Review, American Economic Association, vol. 75(3), pages 308-313, June.
    31. Manuel Koller & Werner A. Stahel, 2017. "Nonsingular subsampling for regression S estimators with categorical predictors," Computational Statistics, Springer, vol. 32(2), pages 631-646, June.
    32. X Nie & S Wager, 2021. "Quasi-oracle estimation of heterogeneous treatment effects [TensorFlow: A system for large-scale machine learning]," Biometrika, Biometrika Trust, vol. 108(2), pages 299-319.
    33. Olive, David J. & Hawkins, Douglas M., 2007. "Behavior of elemental sets in regression," Statistics & Probability Letters, Elsevier, vol. 77(6), pages 621-624, March.
    34. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, November.
    35. James Berry & Lucas C. Coffman & Douglas Hanley & Rania Gihleb & Alistair J. Wilson, 2017. "Assessing the Rate of Replication in Economics," American Economic Review, American Economic Association, vol. 107(5), pages 27-31, May.
    36. Alexis Diamond & Jasjeet S. Sekhon, 2013. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies," The Review of Economics and Statistics, MIT Press, vol. 95(3), pages 932-945, July.
    37. Rachael Meager, 2019. "Understanding the Average Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of Seven Randomized Experiments," American Economic Journal: Applied Economics, American Economic Association, vol. 11(1), pages 57-91, January.
    38. Ankur Moitra & Dhruv Rohatgi, 2022. "Provably Auditing Ordinary Least Squares in Low Dimensions," Papers 2205.14284, arXiv.org, revised Jun 2022.
    39. Meager, Rachael, 2019. "Understanding the average impact of microcredit expansions: a Bayesian hierarchical analysis of seven randomized experiments," LSE Research Online Documents on Economics 88190, London School of Economics and Political Science, LSE Library.
    40. DiCiccio, Cyrus J. & Romano, Joseph P. & Wolf, Michael, 2019. "Improving weighted least squares inference," Econometrics and Statistics, Elsevier, vol. 10(C), pages 96-119.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. repec:oup:emjrnl:v:25:y:2022:i:3:p:602-627. is not listed on IDEAS
    2. Denis Fougère & Nicolas Jacquemet, 2020. "Policy Evaluation Using Causal Inference Methods," SciencePo Working papers Main hal-03455978, HAL.
    3. Knaus, Michael C., 2020. "Double Machine Learning based Program Evaluation under Unconfoundedness," Economics Working Paper Series 2004, University of St. Gallen, School of Economics and Political Science.
    4. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    5. Harsh Parikh & Carlos Varjao & Louise Xu & Eric Tchetgen Tchetgen, 2022. "Validating Causal Inference Methods," Papers 2202.04208, arXiv.org, revised Jul 2022.
    6. Nikolas Kuschnig & Gregor Zens & Jesús Crespo Cuaresma, 2021. "Hidden in Plain Sight: Influential Sets in Linear Models," CESifo Working Paper Series 8981, CESifo.
    7. Gabriel Okasa, 2022. "Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance," Papers 2201.12692, arXiv.org.
    8. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    9. Goller, Daniel & Lechner, Michael & Moczall, Andreas & Wolff, Joachim, 2020. "Does the estimation of the propensity score by machine learning improve matching estimation? The case of Germany's programmes for long term unemployed," Labour Economics, Elsevier, vol. 65(C).
    10. Michael Lechner & Jana Mareckova, 2022. "Modified Causal Forest," Papers 2209.03744, arXiv.org.
    11. Brathwaite, Timothy & Walker, Joan L., 2018. "Causal inference in travel demand modeling (and the lack thereof)," Journal of choice modelling, Elsevier, vol. 26(C), pages 1-18.
    12. Daniel Goller, 2023. "Analysing a built-in advantage in asymmetric darts contests using causal machine learning," Annals of Operations Research, Springer, vol. 325(1), pages 649-679, June.
    13. Mark Kattenberg & Bas Scheer & Jurre Thiel, 2023. "Causal forests with fixed effects for treatment effect heterogeneity in difference-in-differences," CPB Discussion Paper 452, CPB Netherlands Bureau for Economic Policy Analysis.
    14. Cockx, Bart & Lechner, Michael & Bollens, Joost, 2023. "Priority to unemployed immigrants? A causal machine learning evaluation of training in Belgium," Labour Economics, Elsevier, vol. 80(C).
    15. Daniel Boller & Michael Lechner & Gabriel Okasa, 2021. "The Effect of Sport in Online Dating: Evidence from Causal Machine Learning," Papers 2104.04601, arXiv.org.
    16. Phillip Heiler & Michael C. Knaus, 2021. "Effect or Treatment Heterogeneity? Policy Evaluation with Aggregated and Disaggregated Treatments," Papers 2110.01427, arXiv.org, revised Aug 2023.
    17. Arun Advani & Toru Kitagawa & Tymon Słoczyński, 2019. "Mostly harmless simulations? Using Monte Carlo studies for estimator selection," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 34(6), pages 893-910, September.
    18. Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
    19. Advani, Arun & Sloczynski, Tymon, 2013. "Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies," IZA Discussion Papers 7874, Institute of Labor Economics (IZA).
    20. Goller, Daniel & Harrer, Tamara & Lechner, Michael & Wolff, Joachim, 2021. "Active labour market policies for the long-term unemployed: New evidence from causal machine learning," Economics Working Paper Series 2108, University of St. Gallen, School of Economics and Political Science.
    21. Anna Baiardi & Andrea A. Naghi, 2021. "The Value Added of Machine Learning to Causal Inference: Evidence from Revisited Studies," Papers 2101.00878, arXiv.org.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2209.06631. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.