IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2509.01861.html
   My bibliography  Save this paper

On the role of the design phase in a linear regression

Author

Listed:
  • Junho Choi

Abstract

The "design phase" refers to a stage in observational studies, during which a researcher constructs a subsample that achieves a better balance in covariate distributions between the treated and untreated units. In this paper, we study the role of this preliminary phase in the context of linear regression, offering a justification for its utility. To that end, we first formalize the design phase as a process of estimand adjustment via selecting a subsample. Then, we show that covariate balance of a subsample is indeed a justifiable criterion for guiding the selection: it informs on the maximum degree of model misspecification that can be allowed for a subsample, when a researcher wishes to restrict the bias of the estimand for the parameter of interest within a target level of precision. In this sense, the pursuit of a balanced subsample in the design phase is interpreted as identifying an estimand that is less susceptible to bias in the presence of model misspecification. Also, we demonstrate that covariate imbalance can serve as a sensitivity measure in regression analysis, and illustrate how it can structure a communication between a researcher and the readers of her report.

Suggested Citation

  • Junho Choi, 2025. "On the role of the design phase in a linear regression," Papers 2509.01861, arXiv.org.
  • Handle: RePEc:arx:papers:2509.01861
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2509.01861
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Richard K. Crump & V. Joseph Hotz & Guido W. Imbens & Oscar A. Mitnik, 2009. "Dealing with limited overlap in estimation of average treatment effects," Biometrika, Biometrika Trust, vol. 96(1), pages 187-199.
    2. Stéphane Bonhomme & Martin Weidner, 2022. "Minimizing sensitivity to model misspecification," Quantitative Economics, Econometric Society, vol. 13(3), pages 907-954, July.
    3. Guido W. Imbens, 2015. "Matching Methods in Practice: Three Examples," Journal of Human Resources, University of Wisconsin Press, vol. 50(2), pages 373-419.
    4. Alberto Abadie & Susan Athey & Guido W. Imbens & Jeffrey M. Wooldridge, 2020. "Sampling‐Based versus Design‐Based Uncertainty in Regression Analysis," Econometrica, Econometric Society, vol. 88(1), pages 265-296, January.
    5. Timothy B. Armstrong & Michal Kolesár, 2021. "Finite‐Sample Optimal Estimation and Inference on Average Treatment Effects Under Unconfoundedness," Econometrica, Econometric Society, vol. 89(3), pages 1141-1177, May.
    6. Akanksha Negi & Jeffrey M. Wooldridge, 2021. "Revisiting regression adjustment in experiments with heterogeneous treatment effects," Econometric Reviews, Taylor & Francis Journals, vol. 40(5), pages 504-534, April.
    7. Munnell, Alicia H. & Geoffrey M. B. Tootell & Lynn E. Browne & James McEneaney, 1996. "Mortgage Lending in Boston: Interpreting HMDA Data," American Economic Review, American Economic Association, vol. 86(1), pages 25-53, March.
    8. Alberto Abadie & Jann Spiess, 2022. "Robust Post-Matching Inference," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 117(538), pages 983-995, April.
    9. Isaiah Andrews & Matthew Gentzkow & Jesse M. Shapiro, 2017. "Measuring the Sensitivity of Parameter Estimates to Estimation Moments," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 132(4), pages 1553-1592.
    10. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, November.
    11. Ho, Daniel E. & Imai, Kosuke & King, Gary & Stuart, Elizabeth A., 2007. "Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference," Political Analysis, Cambridge University Press, vol. 15(3), pages 199-236, July.
    12. Alberto Abadie & Guido W. Imbens & Fanyin Zheng, 2014. "Inference for Misspecified Models With Fixed Regressors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(508), pages 1601-1614, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tymon Słoczyński, 2022. "Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights," The Review of Economics and Statistics, MIT Press, vol. 104(3), pages 501-509, May.
    2. Jeffrey Smith & Arthur Sweetman, 2016. "Viewpoint: Estimating the causal effects of policies and programs," Canadian Journal of Economics, Canadian Economics Association, vol. 49(3), pages 871-905, August.
    3. Stanislao Maldonado, 2024. "Empowering women through multifaceted interventions: long-term evidence from a double matching design," Journal of Population Economics, Springer;European Society for Population Economics, vol. 37(1), pages 1-44, March.
    4. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    5. Alberto Abadie & Susan Athey & Guido W. Imbens & Jeffrey M. Wooldridge, 2020. "Sampling‐Based versus Design‐Based Uncertainty in Regression Analysis," Econometrica, Econometric Society, vol. 88(1), pages 265-296, January.
    6. Özler, Berk & Çelik, Çiğdem & Cunningham, Scott & Cuevas, P. Facundo & Parisotto, Luca, 2021. "Children on the move: Progressive redistribution of humanitarian cash transfers among refugees," Journal of Development Economics, Elsevier, vol. 153(C).
    7. Guido W. Imbens, 2022. "Causality in Econometrics: Choice vs Chance," Econometrica, Econometric Society, vol. 90(6), pages 2541-2566, November.
    8. Tingting Zhou & Michael R. Elliott & Roderick J. A. Little, 2022. "Addressing Disparities in the Propensity Score Distributions for Treatment Comparisons from Observational Studies," Stats, MDPI, vol. 5(4), pages 1-17, December.
    9. Siying Guo & Jianxuan Liu & Qiu Wang, 2022. "Effective Learning During COVID-19: Multilevel Covariates Matching and Propensity Score Matching," Annals of Data Science, Springer, vol. 9(5), pages 967-982, October.
    10. Zhexiao Lin & Peng Ding & Fang Han, 2023. "Estimation Based on Nearest Neighbor Matching: From Density Ratio to Average Treatment Effect," Econometrica, Econometric Society, vol. 91(6), pages 2187-2217, November.
    11. Donna Feir & Kelly Foley & Maggie E. C. Jones, 2021. "The Distributional Impacts of Active Labor Market Programs for Indigenous Populations," AEA Papers and Proceedings, American Economic Association, vol. 111, pages 216-220, May.
    12. Wagener, Andreas & Zenker, Juliane, 2015. "Stochastic Transfers, Risky Investment and Incomes: Evidence from an Income Guarantee Program in Thailand," Hannover Economic Papers (HEP) dp-562, Leibniz Universität Hannover, Wirtschaftswissenschaftliche Fakultät.
    13. Goller, Daniel & Lechner, Michael & Moczall, Andreas & Wolff, Joachim, 2020. "Does the estimation of the propensity score by machine learning improve matching estimation? The case of Germany's programmes for long term unemployed," Labour Economics, Elsevier, vol. 65(C).
    14. Koch, Nicolas & Basse Mama, Houdou, 2019. "Does the EU Emissions Trading System induce investment leakage? Evidence from German multinational firms," Energy Economics, Elsevier, vol. 81(C), pages 479-492.
    15. Gerhard Krug, 2017. "Augmenting propensity score equations to avoid misspecification bias – Evidence from a Monte Carlo simulation [Erweiterung der Propensity Score Gleichung zur Vermeidung von Fehlspezifikationen? Ein," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 11(3), pages 205-231, December.
    16. Haoge Chang, 2023. "Design-based Estimation Theory for Complex Experiments," Papers 2311.06891, arXiv.org, revised May 2025.
    17. Yuta Kikuchi & Ryo Nakajima, 2016. "Evaluating Professor Value-added: Evidence from Professor and Student Matching in Physics," Keio-IES Discussion Paper Series 2016-002, Institute for Economics Studies, Keio University.
    18. Haltiwanger, John C. & Kutzbach, Mark J. & Palloni, Giordano & Pollakowski, Henry O. & Staiger, Matthew & Weinberg, Daniel H., 2024. "The children of HOPE VI demolitions: National evidence on labor market outcomes," Journal of Public Economics, Elsevier, vol. 239(C).
    19. Takasaki, Yoshito, 2020. "Impacts of disability on poverty: Quasi-experimental evidence from landmine amputees in Cambodia," Journal of Economic Behavior & Organization, Elsevier, vol. 180(C), pages 85-107.
    20. Sokbae Lee & Martin Weidner, 2021. "Bounding Treatment Effects by Pooling Limited Information across Observations," Papers 2111.05243, arXiv.org, revised May 2025.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2509.01861. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.