IDEAS home Printed from https://ideas.repec.org/a/spr/sankha/v83y2021i2d10.1007_s13171-020-00224-1.html
   My bibliography  Save this article

Design-Unbiased Statistical Learning in Survey Sampling

Author

Listed:
  • Luis Sanguiao Sande

    (Instituto Nacional de Estadística)

  • Li-Chun Zhang

    (University of Southampton
    Statistisk Sentralbyraa
    Universitetet i Oslo)

Abstract

Design-consistent model-assisted estimation has become the standard practice in survey sampling. However, design consistency remains to be established for many machine-learning techniques that can potentially be very powerful assisting models. We propose a subsampling Rao-Blackwell method, and develop a statistical learning theory for exactly design-unbiased estimation with the help of linear or non-linear prediction models. Our approach makes use of classic ideas from Statistical Science as well as the rapidly growing field of Machine Learning. Provided rich auxiliary information, it can yield considerable efficiency gains over standard linear model-assisted methods, while ensuring valid estimation for the given target population, which is robust against potential mis-specifications of the assisting model, even if the design consistency of following the standard recipe for plug-in model-assisted estimator cannot be established.

Suggested Citation

  • Luis Sanguiao Sande & Li-Chun Zhang, 2021. "Design-Unbiased Statistical Learning in Survey Sampling," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(2), pages 714-744, August.
  • Handle: RePEc:spr:sankha:v:83:y:2021:i:2:d:10.1007_s13171-020-00224-1
    DOI: 10.1007/s13171-020-00224-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13171-020-00224-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13171-020-00224-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Gordon, Louis & Olshen, Richard A., 1980. "Consistent nonparametric regression from recursive partitioning schemes," Journal of Multivariate Analysis, Elsevier, vol. 10(4), pages 611-627, December.
    2. Kelly S. McConville & Daniell Toth, 2019. "Automated selection of post‐strata using a model‐assisted regression tree estimator," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 46(2), pages 389-413, June.
    3. Lorenzo Fattorini, 2006. "Applying the Horvitz-Thompson criterion in complex designs: A computer-intensive perspective for estimating inclusion probabilities," Biometrika, Biometrika Trust, vol. 93(2), pages 269-278, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tomasz Bąk, 2021. "Spatial sampling methods modified by model use," Statistics in Transition New Series, Polish Statistical Association, vol. 22(2), pages 143-154, June.
    2. Lorenzo Fattorini & Timothy G. Gregoire & Sara Trentini, 2018. "The Use of Calibration Weighting for Variance Estimation Under Systematic Sampling: Applications to Forest Cover Assessment," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 23(3), pages 358-373, September.
    3. Lu Lin & Feng Li, 2008. "Stable and bias-corrected estimation for nonparametric regression models," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 20(4), pages 283-303.
    4. Biau, Gérard & Devroye, Luc & Dujmović, Vida & Krzyżak, Adam, 2012. "An affine invariant k-nearest neighbor regression estimate," Journal of Multivariate Analysis, Elsevier, vol. 112(C), pages 24-34.
    5. L.-C. Zhang & M. Patone, 2017. "Graph sampling," METRON, Springer;Sapienza Università di Roma, vol. 75(3), pages 277-299, December.
    6. Sara Franceschi & Rosa Maria Di Biase & Agnese Marcelli & Lorenzo Fattorini, 2022. "Some Empirical Results on Nearest-Neighbour Pseudo-populations for Resampling from Spatial Populations," Stats, MDPI, vol. 5(2), pages 1-16, April.
    7. Haoge Chang, 2023. "Design-based Estimation Theory for Complex Experiments," Papers 2311.06891, arXiv.org.
    8. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    9. Grafström Anton & Matei Alina, 2015. "Coordination of Conditional Poisson Samples," Journal of Official Statistics, Sciendo, vol. 31(4), pages 649-672, December.
    10. Lorenzo Fattorini, 2009. "An adaptive algorithm for estimating inclusion probabilities and performing the Horvitz–Thompson criterion in complex designs," Computational Statistics, Springer, vol. 24(4), pages 623-639, December.
    11. Zhang, Heping, 2004. "Recursive Partitioning and Tree-based Methods," Papers 2004,30, Humboldt University of Berlin, Center for Applied Statistics and Economics (CASE).
    12. Alessandro Chiarucci & Rosa Maria Di Biase & Lorenzo Fattorini & Marzia Marcheselli & Caterina Pisani, 2017. "Joining the Incompatible: Exploiting Floristic Lists for the Sample-based Estimation of Species Richness," Department of Economics University of Siena 753, Department of Economics, University of Siena.
    13. Roberto Benedetti & Federica Piersimoni & Paolo Postiglione, 2017. "Spatially Balanced Sampling: A Review and A Reappraisal," International Statistical Review, International Statistical Institute, vol. 85(3), pages 439-454, December.
    14. Lorenzo Fattorini & Alberto Meriggi & Enrico Merli & Paolo Varuzza, 2020. "Sampling Strategies to Estimate Deer Density by Drive Counts," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 25(2), pages 168-185, June.
    15. Tomasz Bąk, 2014. "Triangular Method of Spatial Sampling," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 15(1), pages 9-22, January.
    16. Bardia Panahbehagh, 2020. "Estimation in Complex Sampling Designs Based on Resampling Methods," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 25(2), pages 206-228, June.
    17. Levon Demirdjian & Majid Mojirsheibani, 2019. "Kernel classification with missing data and the choice of smoothing parameters," Statistical Papers, Springer, vol. 60(5), pages 1487-1513, October.
    18. ak Tomasz B, 2021. "Spatial sampling methods modified by model use," Statistics in Transition New Series, Polish Statistical Association, vol. 22(2), pages 143-154, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:sankha:v:83:y:2021:i:2:d:10.1007_s13171-020-00224-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.