The Role of Sample Size to Attain Statistically Comparable Groups â€“ A Required Data Preprocessing Step to Estimate Causal Effects With Observational Data

My bibliography Save this article

The Role of Sample Size to Attain Statistically Comparable Groups â€“ A Required Data Preprocessing Step to Estimate Causal Effects With Observational Data

Author

Listed:

Ana Kolar
Peter M. Steiner

Registered:

Abstract

Background: Propensity score methods provide data preprocessing tools to remove selection bias and attain statistically comparable groups - the first requirement when attempting to estimate causal effects with observational data. Although guidelines exist on how to remove selection bias when groups in comparison are large, not much is known on how to proceed when one of the groups in comparison, for example, a treated group, is particularly small, or when the study also includes lots of observed covariates (relative to the treated group's sample size). Objectives: This article investigates whether propensity score methods can help us to remove selection bias in studies with small treated groups and large amount of observed covariates. Measures: We perform a series of simulation studies to study factors such as sample size ratio of control to treated units, number of observed covariates and initial imbalances in observed covariates between the groups of units in comparison, that is, selection bias. Results: The results demonstrate that selection bias can be removed with small treated samples, but under different conditions than in studies with large treated samples. For example, a study design with 10 observed covariates and eight treated units will require the control group to be at least 10 times larger than the treated group, whereas a study with 500 treated units will require at least, only, two times bigger control group. Conclusions: To confirm the usefulness of simulation study results for practice, we carry out an empirical evaluation with real data. The study provides insights for practice and directions for future research.

Suggested Citation

Ana Kolar & Peter M. Steiner, 2021. "The Role of Sample Size to Attain Statistically Comparable Groups â€“ A Required Data Preprocessing Step to Estimate Causal Effects With Observational Data," Evaluation Review, , vol. 45(5), pages 195-227, October.

Handle: RePEc:sae:evarev:v:45:y:2021:i:5:p:195-227
DOI: 10.1177/0193841X211053937

Download full text from publisher

References listed on IDEAS

LaLonde, Robert J, 1986. "Evaluating the Econometric Evaluations of Training Programs with Experimental Data," American Economic Review, American Economic Association, vol. 76(4), pages 604-620, September.
- Robert J. LaLonde, 1984. "Evaluating the Econometric Evaluations of Training Programs with Experimental Data," Working Papers 563, Princeton University, Department of Economics, Industrial Relations Section..
Petra E. Todd & Jeffrey A. Smith, 2001. "Reconciling Conflicting Evidence on the Performance of Propensity-Score Matching Methods," American Economic Review, American Economic Association, vol. 91(2), pages 112-118, May.
Shadish, William R. & Clark, M. H. & Steiner, Peter M., 2008. "Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1334-1344.
Donald B. Rubin, 2005. "Causal Inference Using Potential Outcomes: Design, Modeling, Decisions," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 322-331, March.
Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, January.
Ho, Daniel & Imai, Kosuke & King, Gary & Stuart, Elizabeth A., 2011. "MatchIt: Nonparametric Preprocessing for Parametric Causal Inference," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 42(i08).
Zhong Zhao, 2004. "Using Matching to Estimate Treatment Effects: Data Requirements, Matching Metrics, and Monte Carlo Evidence," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 91-107, February.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Harsh Parikh & Cynthia Rudin & Alexander Volfovsky, 2018. "MALTS: Matching After Learning to Stretch," Papers 1811.07415, arXiv.org, revised Jun 2023.
Katherine Baicker & Theodore Svoronos, 2019. "Testing the Validity of the Single Interrupted Time Series Design," NBER Working Papers 26080, National Bureau of Economic Research, Inc.
Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
- Guido M. Imbens & Jeffrey M. Wooldridge, 2008. "Recent Developments in the Econometrics of Program Evaluation," NBER Working Papers 14251, National Bureau of Economic Research, Inc.
- Wooldridge, Jeffrey M. & Imbens, Guido, 2009. "Recent Developments in the Econometrics of Program Evaluation," Scholarly Articles 3043416, Harvard University Department of Economics.
- Guido Imbens & Jeffrey M. Wooldridge, 2008. "Recent developments in the econometrics of program evaluation," CeMMAP working papers CWP24/08, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Imbens, Guido W. & Wooldridge, Jeffrey M., 2008. "Recent Developments in the Econometrics of Program Evaluation," IZA Discussion Papers 3640, Institute of Labor Economics (IZA).
Timothy B. Armstrong & Michal Kolesár, 2021. "Finite‐Sample Optimal Estimation and Inference on Average Treatment Effects Under Unconfoundedness," Econometrica, Econometric Society, vol. 89(3), pages 1141-1177, May.
- Timothy B. Armstrong & Michal Koles'r, 2017. "Finite-Sample Optimal Estimation and Inference on Average Treatment Effects Under Unconfoundedness," Cowles Foundation Discussion Papers 2115R, Cowles Foundation for Research in Economics, Yale University, revised Dec 2018.
- Timothy B. Armstrong & Michal Koles'r, 2017. "Finite-Sample Optimal Estimation and Inference on Average Treatment Effects Under Unconfoundedness," Cowles Foundation Discussion Papers 2115, Cowles Foundation for Research in Economics, Yale University.
- Timothy B. Armstrong & Michal Koles'ar, 2017. "Finite-Sample Optimal Estimation and Inference on Average Treatment Effects Under Unconfoundedness," Papers 1712.04594, arXiv.org, revised Jan 2021.
Arun Advani & Toru Kitagawa & Tymon Słoczyński, 2019. "Mostly harmless simulations? Using Monte Carlo studies for estimator selection," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 34(6), pages 893-910, September.
- Arun Advani & Toru Kitagawa & Tymon S{l}oczy'nski, 2018. "Mostly Harmless Simulations? Using Monte Carlo Studies for Estimator Selection," Papers 1809.09527, arXiv.org, revised Apr 2019.
- Advani, Arun & Kitagawa, Toru & Słoczyński, Tymon, 2019. "Mostly Harmless Simulations? Using Monte Carlo Studies for Estimator Selection," The Warwick Economics Research Paper Series (TWERPS) 1192, University of Warwick, Department of Economics.
- Advani, Arun & Kitagawa, Toru & Sloczynski, Tymon, 2019. "Mostly Harmless Simulations? Using Monte Carlo Studies for Estimator Selection," CAGE Online Working Paper Series 411, Competitive Advantage in the Global Economy (CAGE).
Advani, Arun & Sloczynski, Tymon, 2013. "Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies," IZA Discussion Papers 7874, Institute of Labor Economics (IZA).
- Advani, Arun & Kitagawa, Toru & Sloczynski, Tymon, 2018. "Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies," IZA Discussion Papers 11862, Institute of Labor Economics (IZA).
- Arun Advani & Toru Kitagawa & Tymon Sloczynski, 2018. "Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies," Working Papers 124, Brandeis University, Department of Economics and International Business School.
- Arun Advani & Toru Kitagawa & Tymon Sloczynski, 2018. "Mostly harmless simulations? On the internal validity of empirical Monte Carlo studies," CeMMAP working papers CWP56/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Arun Advani & Tymon Sloczynski, 2013. "Mostly harmless simulations? On the internal validity of empirical Monte Carlo studies," CeMMAP working papers CWP64/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Arun Advani & Tymon Słoczyński, 2013. "Mostly harmless simulations? On the internal validity of empirical Monte Carlo studies," CeMMAP working papers 64/13, Institute for Fiscal Studies.
Ferman, Bruno, 2021. "Matching estimators with few treated and many control observations," Journal of Econometrics, Elsevier, vol. 225(2), pages 295-307.
- Ferman, Bruno, 2017. "Matching Estimators with Few Treated and Many Control Observations," MPRA Paper 78940, University Library of Munich, Germany.
- Bruno Ferman, 2019. "Matching Estimators with Few Treated and Many Control Observations," Papers 1909.05093, arXiv.org, revised Mar 2021.
Gary King & Christopher Lucas & Richard A. Nielsen, 2017. "The Balance‐Sample Size Frontier in Matching Methods for Causal Inference," American Journal of Political Science, John Wiley & Sons, vol. 61(2), pages 473-489, April.
Hainmueller, Jens, 2012. "Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies," Political Analysis, Cambridge University Press, vol. 20(1), pages 25-46, January.
Harsh Parikh & Carlos Varjao & Louise Xu & Eric Tchetgen Tchetgen, 2022. "Validating Causal Inference Methods," Papers 2202.04208, arXiv.org, revised Jul 2022.
Katherine Baicker & Theodore Svoronos, 2019. "Testing the Validity of the Single Interrupted Time Series Design," CID Working Papers 364, Center for International Development at Harvard University.
Vivian C. Wong & Peter M. Steiner & Kylie L. Anglin, 2018. "What Can Be Learned From Empirical Evaluations of Nonexperimental Methods?," Evaluation Review, , vol. 42(2), pages 147-175, April.
Dettmann, E. & Becker, C. & Schmeißer, C., 2011. "Distance functions for matching in small samples," Computational Statistics & Data Analysis, Elsevier, vol. 55(5), pages 1942-1960, May.
Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
- Susan Athey & Guido W. Imbens & Stefan Wager, 2016. "Approximate Residual Balancing: De-Biased Inference of Average Treatment Effects in High Dimensions," Papers 1604.07125, arXiv.org, revised Jan 2018.
Augurzky, Boris & Kluve, Jochen, 2004. "Assessing the performance of matching algorithms when selection into treatment is strong," RWI Discussion Papers 21, RWI - Leibniz-Institut für Wirtschaftsforschung.
Tymon Słoczyński, 2015. "The Oaxaca–Blinder Unexplained Component as a Treatment Effects Estimator," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 77(4), pages 588-604, August.
- Tymon Sloczynski, 2012. "The Oaxaca-Blinder unexplained component as a treatment effects estimator," Working Papers 61, Department of Applied Econometrics, Warsaw School of Economics.
- Słoczyński, Tymon, 2013. "The Oaxaca–Blinder Unexplained Component as a Treatment Effects Estimator," MPRA Paper 50660, University Library of Munich, Germany.
Alberto Abadie & Guido W. Imbens, 2002. "Simple and Bias-Corrected Matching Estimators for Average Treatment Effects," NBER Technical Working Papers 0283, National Bureau of Economic Research, Inc.
Tymon Słoczyński, 2022. "Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights," The Review of Economics and Statistics, MIT Press, vol. 104(3), pages 501-509, May.
- Sloczynski, Tymon, 2020. "Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights," IZA Discussion Papers 13283, Institute of Labor Economics (IZA).
- Tymon Sloczynski & Tymon Słoczyński, 2020. "Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights," CESifo Working Paper Series 8331, CESifo.
Sloczynski, Tymon, 2018. "A General Weighted Average Representation of the Ordinary and Two-Stage Least Squares Estimands," IZA Discussion Papers 11866, Institute of Labor Economics (IZA).
- Tymon Sloczynski, 2018. "A General Weighted Average Representation of the Ordinary and Two-Stage Least Squares Estimands," Working Papers 125, Brandeis University, Department of Economics and International Business School.
A. Smith, Jeffrey & E. Todd, Petra, 2005. "Does matching overcome LaLonde's critique of nonexperimental estimators?," Journal of Econometrics, Elsevier, vol. 125(1-2), pages 305-353.
- Jeffrey Smith & Petra Todd, 2003. "Does Matching Overcome Lalonde's Critique of Nonexperimental Estimators?," University of Western Ontario, Centre for Human Capital and Productivity (CHCP) Working Papers 20035, University of Western Ontario, Centre for Human Capital and Productivity (CHCP).

More about this item

Keywords

; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:evarev:v:45:y:2021:i:5:p:195-227. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

The Role of Sample Size to Attain Statistically Comparable Groups â€“ A Required Data Preprocessing Step to Estimate Causal Effects With Observational Data

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data