IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2602.16061.html

Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models

Author

Listed:
  • Hongyu Chen
  • David Simchi-Levi
  • Ruoxuan Xiong

Abstract

Estimating population quantities such as mean outcomes from user feedback is fundamental to platform evaluation and social science, yet feedback is often missing not at random (MNAR): users with stronger opinions are more likely to respond, so standard estimators are biased and the estimand is not identified without additional assumptions. Existing approaches typically rely on strong parametric assumptions or bespoke auxiliary variables that may be unavailable in practice. In this paper, we develop a partial identification framework in which sharp bounds on the estimand are obtained by solving a pair of linear programs whose constraints encode the observed data structure. This formulation naturally incorporates outcome predictions from pretrained models, including large language models (LLMs), as additional linear constraints that tighten the feasible set. We call these predictions weak shadow variables: they satisfy a conditional independence assumption with respect to missingness but need not meet the completeness conditions required by classical shadow-variable methods. When predictions are sufficiently informative, the bounds collapse to a point, recovering standard identification as a special case. In finite samples, to provide valid coverage of the identified set, we propose a set-expansion estimator that achieves slower-than-$\sqrt{n}$ convergence rate in the set-identified regime and the standard $\sqrt{n}$ rate under point identification. In simulations and semi-synthetic experiments on customer-service dialogues, we find that LLM predictions are often ill-conditioned for classical shadow-variable methods yet remain highly effective in our framework. They shrink identification intervals by 75--83\% while maintaining valid coverage under realistic MNAR mechanisms.

Suggested Citation

  • Hongyu Chen & David Simchi-Levi & Ruoxuan Xiong, 2026. "Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models," Papers 2602.16061, arXiv.org.
  • Handle: RePEc:arx:papers:2602.16061
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2602.16061
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Guido W. Imbens & Charles F. Manski, 2004. "Confidence Intervals for Partially Identified Parameters," Econometrica, Econometric Society, vol. 72(6), pages 1845-1857, November.
    2. Magne Mogstad & Andres Santos & Alexander Torgovitsky, 2018. "Using Instrumental Variables for Inference About Policy Relevant Treatment Parameters," Econometrica, Econometric Society, vol. 86(5), pages 1589-1619, September.
    3. Victor Chernozhukov & Han Hong & Elie Tamer, 2007. "Estimation and Confidence Regions for Parameter Sets in Econometric Models," Econometrica, Econometric Society, vol. 75(5), pages 1243-1284, September.
    4. Hiroaki Kaido & Francesca Molinari & Jörg Stoye, 2019. "Confidence Intervals for Projections of Partially Identified Parameters," Econometrica, Econometric Society, vol. 87(4), pages 1397-1432, July.
    5. Arie Beresteanu & Francesca Molinari, 2008. "Asymptotic Properties for a Class of Partially Identified Models," Econometrica, Econometric Society, vol. 76(4), pages 763-814, July.
    6. Yuan Gao & Dokyun Lee & Gordon Burtch & Sina Fazelpour, 2024. "Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina," Papers 2410.19599, arXiv.org, revised Jan 2025.
    7. Jason Abrevaya & Stephen G. Donald, 2017. "A GMM Approach for Dealing with Missing Data on Regressors," The Review of Economics and Statistics, MIT Press, vol. 99(4), pages 657-662, July.
    8. Peiyao Li & Noah Castelo & Zsolt Katona & Miklos Sarvary, 2024. "Frontiers: Determining the Validity of Large Language Models for Automated Perceptual Analysis," Marketing Science, INFORMS, vol. 43(2), pages 254-266, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Molinari, Francesca, 2020. "Microeconometrics with partial identification," Handbook of Econometrics, in: Steven N. Durlauf & Lars Peter Hansen & James J. Heckman & Rosa L. Matzkin (ed.), Handbook of Econometrics, edition 1, volume 7, chapter 0, pages 355-486, Elsevier.
    2. Xiaohong Chen & Timothy M. Christensen & Elie Tamer, 2018. "Monte Carlo Confidence Sets for Identified Sets," Econometrica, Econometric Society, vol. 86(6), pages 1965-2018, November.
    3. Xiaohong Chen & Timothy M. Christensen & Keith O'Hara & Elie Tamer, 2016. "MCMC confidence sets for identified sets," CeMMAP working papers 28/16, Institute for Fiscal Studies.
    4. Liao, Yuan & Simoni, Anna, 2019. "Bayesian inference for partially identified smooth convex models," Journal of Econometrics, Elsevier, vol. 211(2), pages 338-360.
    5. Raffaella Giacomini & Toru Kitagawa, 2021. "Robust Bayesian Inference for Set‐Identified Models," Econometrica, Econometric Society, vol. 89(4), pages 1519-1556, July.
    6. Xiaohong Chen & Timothy Christensen & Keith O’Hara & Elie Tamer, 2016. "MCMC Confidence sets for Identified Sets," Cowles Foundation Discussion Papers 2037R, Cowles Foundation for Research in Economics, Yale University, revised Jul 2016.
    7. Felix Chan & Laszlo Matyas & Agoston Reguly, 2024. "Modelling with Sensitive Variables," Papers 2403.15220, arXiv.org, revised Sep 2025.
    8. Isaiah Andrews & Jonathan Roth & Ariel Pakes, 2023. "Inference for Linear Conditional Moment Inequalities," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 90(6), pages 2763-2791.
    9. Xiaohong Chen & Timothy M. Christensen & Elie Tamer, 2017. "Monte Carlo confidence sets for identified sets," CeMMAP working papers 43/17, Institute for Fiscal Studies.
    10. Francesca Molinari, 2019. "Econometrics with Partial Identification," CeMMAP working papers CWP25/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    11. Donald S. Poskitt & Xueyan Zhao, 2023. "Bootstrap Hausdorff Confidence Regions for Average Treatment Effect Identified Sets," Monash Econometrics and Business Statistics Working Papers 9/23, Monash University, Department of Econometrics and Business Statistics.
    12. Hiroaki Kaido & Francesca Molinari & Jörg Stoye, 2019. "Confidence Intervals for Projections of Partially Identified Parameters," Econometrica, Econometric Society, vol. 87(4), pages 1397-1432, July.
    13. Ho, Kate & Rosen, Adam M., 2015. "Partial Identification in Applied Research: Benefits and Challenges," CEPR Discussion Papers 10883, C.E.P.R. Discussion Papers.
    14. Arun G. Chandrasekhar & Victor Chernozhukov & Francesca Molinari & Paul Schrimpf, 2019. "Best Linear Approximations to Set Identified Functions: With an Application to the Gender Wage Gap," NBER Working Papers 25593, National Bureau of Economic Research, Inc.
    15. Yuan Liao & Anna Simoni, 2012. "Semi-parametric Bayesian Partially Identified Models based on Support Function," Papers 1212.3267, arXiv.org, revised Nov 2013.
    16. Semenova, Vira, 2023. "Debiased machine learning of set-identified linear models," Journal of Econometrics, Elsevier, vol. 235(2), pages 1725-1746.
    17. Federico A. Bugni & Ivan A. Canay & Xiaoxia Shi, 2014. "Inference for functions of partially identified parameters in moment inequality models," CeMMAP working papers 22/14, Institute for Fiscal Studies.
    18. Donald W. K. Andrews & Xiaoxia Shi, 2013. "Inference Based on Conditional Moment Inequalities," Econometrica, Econometric Society, vol. 81(2), pages 609-666, March.
    19. Lukáš Lafférs, 2019. "Identification in Models with Discrete Variables," Computational Economics, Springer;Society for Computational Economics, vol. 53(2), pages 657-696, February.
    20. Lee, Sokbae & Song, Kyungchul & Whang, Yoon-Jae, 2018. "Testing For A General Class Of Functional Inequalities," Econometric Theory, Cambridge University Press, vol. 34(5), pages 1018-1064, October.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2602.16061. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.