IDEAS home Printed from https://ideas.repec.org/a/inm/ormksc/v41y2022i2p336-360.html
   My bibliography  Save this article

Simplifying Bias Correction for Selective Sampling: A Unified Distribution-Free Approach to Handling Endogenously Selected Samples

Author

Listed:
  • Yi Qian

    (Marketing and Behavioral Science Division, Sauder School of Business, University of British Columbia, Vancouver, British Columbia V6T 1Z2, Canada)

  • Hui Xie

    (Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia V5A 1S6, Canada)

Abstract

Unlike random sampling, selective sampling draws units based on the outcome values, such as oversampling rare events in choice outcomes and extreme activities on continuous and count outcomes. Despite high cost-effectiveness for marketing research, such endogenously selected samples must be carefully analyzed to avoid selection bias. We introduce a unified and efficient approach based on semiparametric odds ratio (SOR) models applicable for categorical, continuous and count response data collected using selective sampling. Unlike extant sampling-adjusting methods and Heckman-type selection models, the proposed approach requires neither modeling selection mechanisms nor imposing parametric distributional assumptions on the response variables, eliminating both sources of mis-specification bias. Using this approach, one can quantify and test for the relationships among variables as if samples had been collected via random sampling, simplifying bias correction of endogenously selected samples. We evaluate and illustrate the method using extensive simulation studies and two real data examples: endogenously stratified sampling for linear/nonlinear regressions to identify drivers of the share-of-wallet outcome for cigarettes smokers and using truncated and on-site samples for count data models of store shopping demand. The evaluation shows that selective sampling followed by applying the SOR approach reduces required sample size by more than 70% compared with random sampling and that in a wide range of selective sampling scenarios SOR offers novel solutions outperforming extant methods for selective samples with opportunities to make better managerial decisions.

Suggested Citation

  • Yi Qian & Hui Xie, 2022. "Simplifying Bias Correction for Selective Sampling: A Unified Distribution-Free Approach to Handling Endogenously Selected Samples," Marketing Science, INFORMS, vol. 41(2), pages 336-360, March.
  • Handle: RePEc:inm:ormksc:v:41:y:2022:i:2:p:336-360
    DOI: 10.1287/mksc.2021.1330
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mksc.2021.1330
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mksc.2021.1330?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Patrick Puhani, 2000. "The Heckman Correction for Sample Selection and Its Critique," Journal of Economic Surveys, Wiley Blackwell, vol. 14(1), pages 53-68, February.
    2. Fred M. Feinberg & Linda Court Salisbury & Yuanping Ying, 2016. "When Random Assignment Is Not Enough: Accounting for Item Selectivity in Experimental Research," Marketing Science, INFORMS, vol. 35(6), pages 976-994, November.
    3. Hui Xie & Yi Qian, 2012. "Measuring the impact of nonignorability in panel data with non‐monotone nonresponse," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 27(1), pages 129-159, January.
    4. Hua Yun Chen, 2007. "A Semiparametric Odds Ratio Model for Measuring Association," Biometrics, The International Biometric Society, vol. 63(2), pages 413-421, June.
    5. Shaw, Daigee, 1988. "On-site samples' regression : Problems of non-negative integers, truncation, and endogenous stratification," Journal of Econometrics, Elsevier, vol. 37(2), pages 211-223, February.
    6. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    7. Englin, Jeffrey & Shonkwiler, J S, 1995. "Estimating Social Welfare Using Count Data Models: An Application to Long-Run Recreation Demand under Conditions of Endogenous Stratification and Truncation," The Review of Economics and Statistics, MIT Press, vol. 77(1), pages 104-112, February.
    8. Wagner Kamakura & Carl Mela & Asim Ansari & Anand Bodapati & Pete Fader & Raghuram Iyengar & Prasad Naik & Scott Neslin & Baohong Sun & Peter Verhoef & Michel Wedel & Ron Wilcox, 2005. "Choice Models and Customer Relationship Management," Marketing Letters, Springer, vol. 16(3), pages 279-291, December.
    9. Stephan Wachtel & Thomas Otter, 2013. "Successive Sample Selection and Its Relevance for Management Decisions," Marketing Science, INFORMS, vol. 32(1), pages 170-185, September.
    10. Arora, Neeraj & Huber, Joel, 2001. "Improving Parameter Estimates and Model Prediction by Aggregate Customization in Choice Experiments," Journal of Consumer Research, Journal of Consumer Research Inc., vol. 28(2), pages 273-283, September.
    11. Yi Qian & Hui Xie, 2014. "Which Brand Purchasers Are Lost to Counterfeiters? An Application of New Data Fusion Approaches," Marketing Science, INFORMS, vol. 33(3), pages 437-448, May.
    12. Hua Yun Chen & Daniel E. Rader & Mingyao Li, 2015. "Likelihood Inferences on Semiparametric Odds Ratio Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 1125-1135, September.
    13. Yi Qian & Hui Xie, 2011. "No Customer Left Behind: A Distribution-Free Bayesian Approach to Accounting for Missing Xs in Marketing Models," Marketing Science, INFORMS, vol. 30(4), pages 717-736, July.
    14. Longxiu Tian & Fred M. Feinberg, 2020. "Optimizing Price Menus for Duration Discounts: A Subscription Selectivity Field Experiment," Marketing Science, INFORMS, vol. 39(6), pages 1181-1198, November.
    15. Cosslett, Stephen R., 2013. "Efficient semiparametric estimation for endogenously stratified regression via smoothed likelihood," Journal of Econometrics, Elsevier, vol. 177(1), pages 116-129.
    16. Leonard Feldt, 1961. "The use of extreme groups to test for the presence of a relationship," Psychometrika, Springer;The Psychometric Society, vol. 26(3), pages 307-316, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Longxiu Tian & Fred M. Feinberg, 2020. "Optimizing Price Menus for Duration Discounts: A Subscription Selectivity Field Experiment," Marketing Science, INFORMS, vol. 39(6), pages 1181-1198, November.
    2. Yi Qian & Hui Xie, 2015. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," Management Science, INFORMS, vol. 61(3), pages 520-541, March.
    3. Bowker, James Michael & Starbuck, C. Meghan & English, Donald B.K. & Bergstrom, John C. & Rosenberger, Randall S. & McCollum, Daniel W., 2009. "Estimating the Net Economic Value of National Forest Recreation: An Application of the National Visitor Use Monitoring Database," Faculty Series 59603, University of Georgia, Department of Agricultural and Applied Economics.
    4. Myck, Michal & Nici?ska, Anna & Morawski, Leszek, 2009. "Count Your Hours: Returns to Education in Poland," IZA Discussion Papers 4332, Institute of Labor Economics (IZA).
    5. Patrick A. Puhani, 2000. "On the Identification of Relative Wage Rigidity Dynamics," William Davidson Institute Working Papers Series 343, William Davidson Institute at the University of Michigan.
    6. Egan, Kevin & Herriges, Joseph, 2006. "Multivariate count data regression models with individual panel data from an on-site sample," Journal of Environmental Economics and Management, Elsevier, vol. 52(2), pages 567-581, September.
    7. Cristian Castillo & Julimar Da Silva & Sandro Monsueto, 2020. "Objectives of Sustainable Development and Youth Employment in Colombia," Sustainability, MDPI, vol. 12(3), pages 1-18, January.
    8. Lienhoop, Nele & Ansmann, Till, 2011. "Valuing water level changes in reservoirs using two stated preference approaches: An exploration of validity," Ecological Economics, Elsevier, vol. 70(7), pages 1250-1258, May.
    9. Arndt Reichert & Harald Tauchmann, 2014. "When outcome heterogeneously matters for selection: a generalized selection correction estimator," Applied Economics, Taylor & Francis Journals, vol. 46(7), pages 762-768, March.
    10. Bolwig, Simon & Gibbon, Peter & Jones, Sam, 2009. "The Economics of Smallholder Organic Contract Farming in Tropical Africa," World Development, Elsevier, vol. 37(6), pages 1094-1104, June.
    11. Eric J. Tchetgen Tchetgen & Kathleen E. Wirth, 2017. "A general instrumental variable framework for regression analysis with outcome missing not at random," Biometrics, The International Biometric Society, vol. 73(4), pages 1123-1131, December.
    12. John A. Curtis, 2002. "Estimating the Demand for Salmon Angling in Ireland," The Economic and Social Review, Economic and Social Studies, vol. 33(3), pages 319-332.
    13. Pastwa, Anna M. & Shrestha, Prabal & Thewissen, James & Torsin, Wouter, 2021. "Unpacking the black box of ICO white papers: a topic modeling approach," LIDAM Discussion Papers LFIN 2021018, Université catholique de Louvain, Louvain Finance (LFIN).
    14. Fougère, Denis & Gautier, Erwan & Roux, Sébastien, 2018. "Wage floor rigidity in industry-level agreements: Evidence from France," Labour Economics, Elsevier, vol. 55(C), pages 72-97.
    15. Oren Gazal‐Ayal & Raanan Sulitzeanu‐Kenan, 2010. "Let My People Go: Ethnic In‐Group Bias in Judicial Decisions—Evidence from a Randomized Natural Experiment," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 7(3), pages 403-428, September.
    16. Thapa, Samir & Morrison, Mark & Parton, Kevin A, 2021. "Willingness to pay for domestic biogas plants and distributing carbon revenues to influence their purchase: A case study in Nepal," Energy Policy, Elsevier, vol. 158(C).
    17. Ertan, Arhan & Fiszbein, Martin & Putterman, Louis, 2016. "Who was colonized and when? A cross-country analysis of determinants," European Economic Review, Elsevier, vol. 83(C), pages 165-184.
    18. Ibáñez, Ana María & Muñoz, Juan Carlos & Verwimp, Philip, 2013. "Abandoning Coffee under the Threat of Violence and the Presence of Illicit Crops. Evidence from Colombia," Documentos CEDE Series 161356, Universidad de Los Andes, Economics Department.
    19. Yi Qian & Hui Xie, 2014. "Which Brand Purchasers Are Lost to Counterfeiters? An Application of New Data Fusion Approaches," Marketing Science, INFORMS, vol. 33(3), pages 437-448, May.
    20. Neumayer, Eric, 2002. "Is Good Governance Rewarded? A Cross-national Analysis of Debt Forgiveness," World Development, Elsevier, vol. 30(6), pages 913-930, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormksc:v:41:y:2022:i:2:p:336-360. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.