IDEAS home Printed from https://ideas.repec.org/a/eee/transb/v203y2026ics0191261525001985.html

Data collection, weighting, and modeling techniques to estimate consistent population parameters

Author

Listed:
  • Robbennolt, Dale
  • Pendyala, Ram M.
  • Bhat, Chandra R.

Abstract

Empirical research studies regularly encounter sampling-related challenges that can impact the validity and reliability of model estimation results. This paper presents a comprehensive examination of the implications of nonrandom sampling for estimator consistency and asymptotic efficiency. Through theoretical and simulation-backed support, we underscore the importance of adopting appropriate sampling and estimation methods in two broad scenarios. First, we demonstrate that achieving range variation in exogenous variables, rather than strict population representativeness, is crucial for estimating individual-level causal relationships when sampling is based only on observed exogenous variables. Second, we investigate the efficacy of weighting approaches when sampling is endogenous and use a joint modeling approach to accommodate unobserved self-selection effects where traditional weighting approaches prove inadequate. Our proposed approach accommodates unobserved correlations and successfully recovers true population parameters when the joint distribution of exogenous variables in the population is known. The methodology also shows improved performance compared to existing methods even when only the population marginal distribution of exogenous variables is available. Notably, our simulation experiments extend beyond the conventional linear regression framework to include binary outcomes, providing crucial insights for nonlinear choice modeling applications. The findings underscore the importance of carefully considering sampling mechanisms and their implications for model estimation, while offering practical guidance for researchers facing various sampling-related challenges in empirical studies.

Suggested Citation

  • Robbennolt, Dale & Pendyala, Ram M. & Bhat, Chandra R., 2026. "Data collection, weighting, and modeling techniques to estimate consistent population parameters," Transportation Research Part B: Methodological, Elsevier, vol. 203(C).
  • Handle: RePEc:eee:transb:v:203:y:2026:i:c:s0191261525001985
    DOI: 10.1016/j.trb.2025.103349
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0191261525001985
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.trb.2025.103349?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Konstantin Gluschenko, 2018. "Measuring regional inequality: to weight or not to weight?," Spatial Economic Analysis, Taylor & Francis Journals, vol. 13(1), pages 36-59, January.
    2. Gary Solon & Steven J. Haider & Jeffrey M. Wooldridge, 2015. "What Are We Weighting For?," Journal of Human Resources, University of Wisconsin Press, vol. 50(2), pages 301-316.
    3. Sarah E. Wolfolds & Jordan Siegel, 2019. "Misaccounting for endogeneity: The peril of relying on the Heckman two‐step method without a valid instrument," Strategic Management Journal, Wiley Blackwell, vol. 40(3), pages 432-462, March.
    4. Bhat, Chandra R. & Eluru, Naveen, 2009. "A copula-based approach to accommodate residential self-selection effects in travel behavior modeling," Transportation Research Part B: Methodological, Elsevier, vol. 43(7), pages 749-765, August.
    5. Sukjin Han & Sungwon Lee, 2019. "Estimation in a generalization of bivariate probit models with dummy endogenous regressors," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 34(6), pages 994-1015, September.
    6. Jeffrey M. Wooldridge, 2002. "Inverse probability weighted M-estimators for sample selection, attrition, and stratification," Portuguese Economic Journal, Springer;Instituto Superior de Economia e Gestao, vol. 1(2), pages 117-139, August.
    7. Becker, Jan-Michael & Ismail, Ida Rosnita, 2016. "Accounting for sampling weights in PLS path modeling: Simulations and empirical examples," European Management Journal, Elsevier, vol. 34(6), pages 606-617.
    8. Xinyi Wang & F. Atiyya Shaw & Patricia L. Mokhtarian & Giovanni Circella & Kari E. Watkins, 2023. "Combining disparate surveys across time to study satisfaction with life: the effects of study context, sampling method, and transport attributes," Transportation, Springer, vol. 50(2), pages 513-543, April.
    9. Darren Hudson & Lee-Hong Seah & Diane Hite & Tim Haab, 2004. "Telephone presurveys, self-selection, and non-response bias to mail and Internet surveys in economic research," Applied Economics Letters, Taylor & Francis Journals, vol. 11(4), pages 237-240.
    10. Bhat, Chandra R., 2015. "A comprehensive dwelling unit choice model accommodating psychological constructs within a search strategy for consideration set formation," Transportation Research Part B: Methodological, Elsevier, vol. 79(C), pages 161-188.
    11. Cosslett, Stephen R, 1981. "Maximum Likelihood Estimator for Choice-Based Samples," Econometrica, Econometric Society, vol. 49(5), pages 1289-1316, September.
    12. Patrick Puhani, 2000. "The Heckman Correction for Sample Selection and Its Critique," Journal of Economic Surveys, Wiley Blackwell, vol. 14(1), pages 53-68, February.
    13. Tripathi, Gautam, 1999. "A matrix extension of the Cauchy-Schwarz inequality," Economics Letters, Elsevier, vol. 63(1), pages 1-3, April.
    14. Wooldridge, Jeffrey M., 2001. "Asymptotic Properties Of Weighted M-Estimators For Standard Stratified Samples," Econometric Theory, Cambridge University Press, vol. 17(2), pages 451-470, April.
    15. Murphy, Kevin M & Topel, Robert H, 2002. "Estimation and Inference in Two-Step Econometric Models," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(1), pages 88-97, January.
    16. Lee, Lung-Fei, 1983. "Generalized Econometric Models with Selectivity," Econometrica, Econometric Society, vol. 51(2), pages 507-512, March.
    17. James Heckman, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    18. Wooldridge, Jeffrey M., 2007. "Inverse probability weighted estimation for general missing data problems," Journal of Econometrics, Elsevier, vol. 141(2), pages 1281-1301, December.
    19. Dylan Brewer & Alyssa Carlson, 2024. "Addressing sample selection bias for machine learning methods," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(3), pages 383-400, April.
    20. Feng Wang & HaiYing Wang & Jun Yan, 2023. "Diagnostic Tests for the Necessity of Weight in Regression With Survey Data," International Statistical Review, International Statistical Institute, vol. 91(1), pages 55-71, April.
    21. Lee, Lung-Fei, 1979. "Identification and Estimation in Binary Choice Models with Limited (Censored) Dependent Variables," Econometrica, Econometric Society, vol. 47(4), pages 977-996, July.
    22. Whitney K. Newey, 2009. "Two-step series estimation of sample selection models," Econometrics Journal, Royal Economic Society, vol. 12(s1), pages 217-229, January.
    23. Jean-Claude Thill & Joel L. Horowitz, 1997. "Modelling Non-Work Destination Choices with Choice Sets Defined by Travel-Time Constraints," Advances in Spatial Science, in: Manfred M. Fischer & Arthur Getis (ed.), Recent Developments in Spatial Analysis, chapter 10, pages 186-208, Springer.
    24. Raoul S. Liévanos & Amy Lubitow & Julius Alexander McGee, 2019. "Misrecognition in a Sustainability Capital: Race, Representation, and Transportation Survey Response Rates in the Portland Metropolitan Area," Sustainability, MDPI, vol. 11(16), pages 1-33, August.
    25. John Rose & Michiel Bliemer, 2013. "Sample size requirements for stated choice experiments," Transportation, Springer, vol. 40(5), pages 1021-1041, September.
    26. Bhat, Chandra R., 2014. "The Composite Marginal Likelihood (CML) Inference Approach with Applications to Discrete and Mixed Dependent Variable Models," Foundations and Trends(R) in Econometrics, now publishers, vol. 7(1), pages 1-117, July.
    27. Manski, Charles F & Lerman, Steven R, 1977. "The Estimation of Choice Probabilities from Choice Based Samples," Econometrica, Econometric Society, vol. 45(8), pages 1977-1988, November.
    28. Liu, Ruixuan & Yu, Zhengfei, 2022. "Sample selection models with monotone control functions," Journal of Econometrics, Elsevier, vol. 226(2), pages 321-342.
    29. Jeffrey M. Wooldridge, 1999. "Asymptotic Properties of Weighted M-Estimators for Variable Probability Samples," Econometrica, Econometric Society, vol. 67(6), pages 1385-1406, November.
    30. Nancy Duong Nguyen & Patrick Murphy, 2015. "To Weight or Not To Weight? A Statistical Analysis of How Weights Affect the Reliability of the Quarterly National Household Survey for Immigration Research in Ireland," The Economic and Social Review, Economic and Social Studies, vol. 46(4), pages 567-603.
    31. Beckman, Richard J. & Baggerly, Keith A. & McKay, Michael D., 1996. "Creating synthetic baseline populations," Transportation Research Part A: Policy and Practice, Elsevier, vol. 30(6), pages 415-429, November.
    32. Bhat, Chandra R., 2024. "Transformation-based flexible error structures for choice modeling," Journal of choice modelling, Elsevier, vol. 53(C).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Robbennolt, Dale & Haddad, Angela J. & Bhat, Chandra R., 2026. "A rank-based model of residential location preferences before and during the COVID-19 pandemic," Transportation Research Part A: Policy and Practice, Elsevier, vol. 203(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Prokhorov, Artem & Schmidt, Peter, 2009. "GMM redundancy results for general missing data problems," Journal of Econometrics, Elsevier, vol. 151(1), pages 47-55, July.
    2. Dylan Brewer & Alyssa Carlson, 2024. "Addressing sample selection bias for machine learning methods," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(3), pages 383-400, April.
    3. Martin Huber, 2014. "Treatment Evaluation in the Presence of Sample Selection," Econometric Reviews, Taylor & Francis Journals, vol. 33(8), pages 869-905, November.
    4. Esmeralda Ramalho, 2004. "Covariate Measurement Error in Endogenous Stratified Samples," Economics Working Papers 2_2004, University of Évora, Department of Economics (Portugal).
    5. Erika Spissu & Abdul Pinjari & Ram Pendyala & Chandra Bhat, 2009. "A copula-based joint multinomial discrete–continuous model of vehicle type choice and miles of travel," Transportation, Springer, vol. 36(4), pages 403-422, July.
    6. William C. Horrace & Hyunseok Jung & Shane Sanders, 2022. "Network Competition and Team Chemistry in the NBA," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 40(1), pages 35-49, January.
    7. Kyungchul Song, 2009. "Efficient Estimation of Average Treatment Effects under Treatment-Based Sampling," PIER Working Paper Archive 09-011, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.
    8. Wiemann, Paul F.V. & Klein, Nadja & Kneib, Thomas, 2022. "Correcting for sample selection bias in Bayesian distributional regression models," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    9. Nevo, Aviv, 2003. "Using Weights to Adjust for Sample Selection When Auxiliary Information Is Available," Journal of Business & Economic Statistics, American Statistical Association, vol. 21(1), pages 43-52, January.
    10. Esmeralda A. Ramalho & Richard J. Smith, 2013. "Discrete Choice Non-Response," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 80(1), pages 343-364.
    11. Esmerelda A. Ramalho & Richard Smith, 2003. "Discrete choice non-response," CeMMAP working papers 07/03, Institute for Fiscal Studies.
    12. Wooldridge, Jeffrey M., 2007. "Inverse probability weighted estimation for general missing data problems," Journal of Econometrics, Elsevier, vol. 141(2), pages 1281-1301, December.
    13. Ramalho Esmeralda A., 2010. "Covariate Measurement Error: Bias Reduction under Response-Based Sampling," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 14(4), pages 1-34, September.
    14. Emmanuel O. Ogundimu & Jane L. Hutton, 2016. "A Sample Selection Model with Skew-normal Distribution," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(1), pages 172-190, March.
    15. Gary Solon & Steven J. Haider & Jeffrey M. Wooldridge, 2015. "What Are We Weighting For?," Journal of Human Resources, University of Wisconsin Press, vol. 50(2), pages 301-316.
    16. Bhattacharya, Debopam, 2005. "Asymptotic inference from multi-stage samples," Journal of Econometrics, Elsevier, vol. 126(1), pages 145-171, May.
    17. Bryan S. Graham & Cristine Campos De Xavier Pinto & Daniel Egel, 2012. "Inverse Probability Tilting for Moment Condition Models with Missing Data," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 79(3), pages 1053-1079.
    18. Yulia V. Marchenko & Marc G. Genton, 2012. "A Heckman Selection- t Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 304-317, March.
    19. Asma Hyder & Barry Reilly, 2005. "The Public and Private Sector Pay Gap in Pakistan: A Quantile Regression Analysis," The Pakistan Development Review, Pakistan Institute of Development Economics, vol. 44(3), pages 271-306.
    20. S. I. Dolgikh & B. S. Potanin, 2023. "The Impact of Public Administration on the Efficiency of Russian Firms," Studies on Russian Economic Development, Springer, vol. 34(1), pages 59-67, February.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:transb:v:203:y:2026:i:c:s0191261525001985. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/548/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.