IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v8y2020i6p879-d365757.html
   My bibliography  Save this article

Inference from Non-Probability Surveys with Statistical Matching and Propensity Score Adjustment Using Modern Prediction Techniques

Author

Listed:
  • Luis Castro-Martín

    (Department of Statistics and Operational Research, Faculty of Sciencies, University of Granada, 18071 Granada, Spain)

  • Maria del Mar Rueda

    (Department of Statistics and Operational Research, Faculty of Sciencies, University of Granada, 18071 Granada, Spain)

  • Ramón Ferri-García

    (Department of Statistics and Operational Research, Faculty of Sciencies, University of Granada, 18071 Granada, Spain)

Abstract

Online surveys are increasingly common in social and health studies, as they provide fast and inexpensive results in comparison to traditional ones. However, these surveys often work with biased samples, as the data collection is often non-probabilistic because of the lack of internet coverage in certain population groups and the self-selection procedure that many online surveys rely on. Some procedures have been proposed to mitigate the bias, such as propensity score adjustment (PSA) and statistical matching. In PSA, propensity to participate in a nonprobability survey is estimated using a probability reference survey, and then used to obtain weighted estimates. In statistical matching, the nonprobability sample is used to train models to predict the values of the target variable, and the predictions of the models for the probability sample can be used to estimate population values. In this study, both methods are compared using three datasets to simulate pseudopopulations from which nonprobability and probability samples are drawn and used to estimate population parameters. In addition, the study compares the use of linear models and Machine Learning prediction algorithms in propensity estimation in PSA and predictive modeling in Statistical Matching. The results show that statistical matching outperforms PSA in terms of bias reduction and Root Mean Square Error (RMSE), and that simpler prediction models, such as linear and k-Nearest Neighbors, provide better outcomes than bagging algorithms.

Suggested Citation

  • Luis Castro-Martín & Maria del Mar Rueda & Ramón Ferri-García, 2020. "Inference from Non-Probability Surveys with Statistical Matching and Propensity Score Adjustment Using Modern Prediction Techniques," Mathematics, MDPI, vol. 8(6), pages 1-19, June.
  • Handle: RePEc:gam:jmathe:v:8:y:2020:i:6:p:879-:d:365757
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/8/6/879/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/8/6/879/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. J. B. Copas, 1993. "The Shrinkage of Point Scoring Methods," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 42(2), pages 315-331, June.
    2. J. C. Van Houwelingen, 2001. "Shrinkage and Penalized Likelihood as Methods to Improve Predictive Accuracy," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 55(1), pages 17-34, March.
    3. Matthias Schonlau & Arthur Van Soest & Arie Kapteyn, 2007. "Are 'Webographic' or Attitudinal Questions Useful for Adjusting Estimates From Web Surveys Using Propensity Scoring?," Working Papers 506, RAND Corporation.
    4. Park, Trevor & Casella, George, 2008. "The Bayesian Lasso," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 681-686, June.
    5. Bart Buelens & Joep Burger & Jan A. van den Brakel, 2018. "Comparing Inference Methods for Non‐probability Samples," International Statistical Review, International Statistical Institute, vol. 86(2), pages 322-343, August.
    6. Matthias Schonlau & Arthur Van Soest & Arie Kapteyn, 2007. "Are 'Webographic' or Attitudinal Questions Useful for Adjusting Estimates From Web Surveys Using Propensity Scoring?," Working Papers WR-506, RAND Corporation.
    7. Jack Kuang Tsung Chen & Richard L. Valliant & Michael R. Elliott, 2019. "Calibrating non‐probability surveys to estimated control totals using LASSO, with an application to political polling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 68(3), pages 657-681, April.
    8. Baesens, Bart & Viaene, Stijn & Van den Poel, Dirk & Vanthienen, Jan & Dedene, Guido, 2002. "Bayesian neural network learning for repeat purchase modelling in direct marketing," European Journal of Operational Research, Elsevier, vol. 138(1), pages 191-211, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ramón Ferri-García & María del Mar Rueda, 2022. "Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys," Statistical Papers, Springer, vol. 63(6), pages 1829-1881, December.
    2. Luis Castro-Martín & María del Mar Rueda & Ramón Ferri-García, 2020. "Estimating General Parameters from Non-Probability Surveys Using Propensity Score Adjustment," Mathematics, MDPI, vol. 8(11), pages 1-14, November.
    3. Stéphane Legleye & Géraldine Charrance & Nicolas Razafindratsima & Nathalie Bajos & Aline Bohet & Caroline Moreau, 2018. "The Use of a Nonprobability Internet Panel to Monitor Sexual and Reproductive Health in the General Population," Sociological Methods & Research, , vol. 47(2), pages 314-348, March.
    4. Ferri-García, Ramón & Castro-Martín, Luis & Rueda, María del Mar, 2021. "Evaluating Machine Learning methods for estimation in online surveys with superpopulation modeling," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 186(C), pages 19-28.
    5. Buil-Gil, David & Solymosi, Reka & Moretti, Angelo, 2019. "Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data. Simulation study and application to perceived safety," SocArXiv 8hgjt, Center for Open Science.
    6. Maciej Berk{e}sewicz & Greta Bia{l}kowska & Krzysztof Marcinkowski & Magdalena Ma'slak & Piotr Opiela & Robert Pater & Katarzyna Zadroga, 2019. "Enhancing the Demand for Labour survey by including skills from online job advertisements using model-assisted calibration," Papers 1908.06731, arXiv.org.
    7. Sunghee Lee & Richard Valliant, 2009. "Estimation for Volunteer Panel Web Surveys Using Propensity Score Adjustment and Calibration Adjustment," Sociological Methods & Research, , vol. 37(3), pages 319-343, February.
    8. Richard Valliant & Jill A. Dever, 2011. "Estimating Propensity Adjustments for Volunteer Web Surveys," Sociological Methods & Research, , vol. 40(1), pages 105-137, February.
    9. repec:aia:aiaswp:wp76 is not listed on IDEAS
    10. Li, Chunyu & Lou, Chenxin & Luo, Dan & Xing, Kai, 2021. "Chinese corporate distress prediction using LASSO: The role of earnings management," International Review of Financial Analysis, Elsevier, vol. 76(C).
    11. Anne Musson & Damien Rousselière, 2020. "Exploring the effect of crisis on cooperatives: a Bayesian performance analysis of French craftsmen cooperatives," Applied Economics, Taylor & Francis Journals, vol. 52(25), pages 2657-2678, May.
    12. Chou, Ping & Chuang, Howard Hao-Chun & Chou, Yen-Chun & Liang, Ting-Peng, 2022. "Predictive analytics for customer repurchase: Interdisciplinary integration of buy till you die modeling and machine learning," European Journal of Operational Research, Elsevier, vol. 296(2), pages 635-651.
    13. Prüser, Jan, 2017. "Forecasting US inflation using Markov dimension switching," Ruhr Economic Papers 710, RWI - Leibniz-Institut für Wirtschaftsforschung, Ruhr-University Bochum, TU Dortmund University, University of Duisburg-Essen.
    14. Armagan, Artin & Dunson, David, 2011. "Sparse variational analysis of linear mixed models for large data sets," Statistics & Probability Letters, Elsevier, vol. 81(8), pages 1056-1062, August.
    15. Wang, Hong & Forbes, Catherine S. & Fenech, Jean-Pierre & Vaz, John, 2020. "The determinants of bank loan recovery rates in good times and bad – New evidence," Journal of Economic Behavior & Organization, Elsevier, vol. 177(C), pages 875-897.
    16. Fan, Jianqing & Jiang, Bai & Sun, Qiang, 2022. "Bayesian factor-adjusted sparse regression," Journal of Econometrics, Elsevier, vol. 230(1), pages 3-19.
    17. Van den Poel, Dirk & Lariviere, Bart, 2004. "Customer attrition analysis for financial services using proportional hazard models," European Journal of Operational Research, Elsevier, vol. 157(1), pages 196-217, August.
    18. Kastner, Gregor, 2019. "Sparse Bayesian time-varying covariance estimation in many dimensions," Journal of Econometrics, Elsevier, vol. 210(1), pages 98-115.
    19. Bai, Jushan & Ando, Tomohiro, 2013. "Multifactor asset pricing with a large number of observable risk factors and unobservable common and group-specific factors," MPRA Paper 52785, University Library of Munich, Germany, revised Dec 2013.
    20. Martin Feldkircher & Florian Huber & Gary Koop & Michael Pfarrhofer, 2022. "APPROXIMATE BAYESIAN INFERENCE AND FORECASTING IN HUGE‐DIMENSIONAL MULTICOUNTRY VARs," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 63(4), pages 1625-1658, November.
    21. Eliaz, Kfir & Spiegler, Ran, 2022. "On incentive-compatible estimators," Games and Economic Behavior, Elsevier, vol. 132(C), pages 204-220.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:8:y:2020:i:6:p:879-:d:365757. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.