IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v139y2019icp34-44.html
   My bibliography  Save this article

Estimating population size of heterogeneous populations with large data sets and a large number of parameters

Author

Listed:
  • Li, Haoqi
  • Lin, Huazhen
  • Yip, Paul S.F.
  • Li, Yuan

Abstract

A generalized partial linear regression model is proposed to estimate population size at a specific time from multiple lists of a time-varying and heterogeneous population. The challenge is that we have millions of records and hundreds of parameters for a long period of time. This presents a challenge for data analysis, mainly due to the limitation of computer memory, computational convergence and infeasibility. In the paper, an analytical methodology is proposed for modeling a large data set with a large number of parameters. The basic idea is to apply the maximum likelihood estimator to data observed at each time separately, and then combine these results via weighted averages so that the final estimator becomes the maximum likelihood estimator of the whole data set (full MLE). The asymptotic distribution and inference of the proposed estimators is derived. Simulation studies show that the proposed procedure gives exactly the same performance as the full MLE, but the proposed method is computationally feasible while the full MLE is not, and has much lower computational cost than the full MLE if both methods work. The proposed method is applied to estimate the number of drug-abusers in Hong Kong over the period 1977–2014.

Suggested Citation

  • Li, Haoqi & Lin, Huazhen & Yip, Paul S.F. & Li, Yuan, 2019. "Estimating population size of heterogeneous populations with large data sets and a large number of parameters," Computational Statistics & Data Analysis, Elsevier, vol. 139(C), pages 34-44.
  • Handle: RePEc:eee:csdana:v:139:y:2019:i:c:p:34-44
    DOI: 10.1016/j.csda.2019.04.016
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947319301069
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2019.04.016?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Richard M. Huggins & Paul S. F. Yip, 1999. "Estimation of the Size of an Open Population from Capture-Recapture Data Using Weighted Martingale Methods," Biometrics, The International Biometric Society, vol. 55(2), pages 387-395, June.
    2. D. Y. Lin & D. Zeng, 2010. "On the relative efficiency of using summary statistics versus individual-level data in meta-analysis," Biometrika, Biometrika Trust, vol. 97(2), pages 321-332.
    3. Kenneth Pollock, 2002. "The use of auxiliary variables in capture-recapture modelling: An overview," Journal of Applied Statistics, Taylor & Francis Journals, vol. 29(1-4), pages 85-102.
    4. Hsin-Chou Yang & Richard Huggins & Austina S. S. Clark, 2003. "Estimation of the Size of an Open Population Using Local Estimating Equations II: A Partially Parametric Approach," Biometrics, The International Biometric Society, vol. 59(2), pages 365-374, June.
    5. Stoklosa, Jakub & Huggins, Richard M., 2012. "A robust P-spline approach to closed population capture–recapture models with time dependence and heterogeneity," Computational Statistics & Data Analysis, Elsevier, vol. 56(2), pages 408-417.
    6. Zwane, E. N. & van der Heijden, P. G. M., 2004. "Semiparametric models for capture-recapture studies with covariates," Computational Statistics & Data Analysis, Elsevier, vol. 47(4), pages 729-743, November.
    7. Brian Claggett & Minge Xie & Lu Tian, 2014. "Meta-Analysis With Fixed, Unknown, Study-Specific Parameters," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(508), pages 1660-1671, December.
    8. O. Gimenez & C. Crainiceanu & C. Barbraud & S. Jenouvrier & B. J. T. Morgan, 2006. "Semiparametric Regression in Capture–Recapture Modeling," Biometrics, The International Biometric Society, vol. 62(3), pages 691-698, September.
    9. Kani Chen, 2001. "Parametric and semiparametric models for recapture and removal studies: a likelihood approach," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(3), pages 607-619.
    10. D. Zeng & D. Y. Lin, 2015. "On random-effects meta-analysis," Biometrika, Biometrika Trust, vol. 102(2), pages 281-294.
    11. Richard Huggins & Jakub Stoklosa & Cameron Roach & Paul Yip, 2018. "Estimating the size of an open population using sparse capture–recapture data," Biometrics, The International Biometric Society, vol. 74(1), pages 280-288, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Paul S. F. Yip & Hua-Zhen Lin & Liqun Xi, 2005. "A Semiparametric Method for Estimating Population Size for Capture–Recapture Experiments with Random Covariates in Continuous Time," Biometrics, The International Biometric Society, vol. 61(4), pages 1085-1092, December.
    2. Richard Huggins & Wen‐Han Hwang, 2007. "Non‐parametric estimation of population size from capture–recapture data when the capture probability depends on a covariate," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 56(4), pages 429-443, August.
    3. Jakub Stoklosa & Wen-Han Hwang & Sheng-Hai Wu & Richard Huggins, 2011. "Heterogeneous Capture–Recapture Models with Covariates: A Partial Likelihood Approach for Closed Populations," Biometrics, The International Biometric Society, vol. 67(4), pages 1659-1665, December.
    4. Richard Huggins, 2006. "Semiparametric Estimation of Animal Abundance Using Capture–Recapture Data from Open Populations," Biometrics, The International Biometric Society, vol. 62(3), pages 684-690, September.
    5. Stoklosa, Jakub & Dann, Peter & Huggins, Richard M. & Hwang, Wen-Han, 2016. "Estimation of survival and capture probabilities in open population capture–recapture models when covariates are subject to measurement error," Computational Statistics & Data Analysis, Elsevier, vol. 96(C), pages 74-86.
    6. Oliver, Lauren J. & Morgan, Byron J.T. & Durant, Sarah M. & Pettorelli, Nathalie, 2011. "Individual heterogeneity in recapture probability and survival estimates in cheetah," Ecological Modelling, Elsevier, vol. 222(3), pages 776-784.
    7. Simone Vincenzi & Marc Mangel & Alain J Crivelli & Stephan Munch & Hans J Skaug, 2014. "Determining Individual Variation in Growth and Its Implication for Life-History and Population Processes Using the Empirical Bayes Method," PLOS Computational Biology, Public Library of Science, vol. 10(9), pages 1-16, September.
    8. Jincheng Zhou & James S. Hodges & Haitao Chu, 2020. "Rejoinder to “CACE and meta‐analysis (letter to the editor)” by Stuart Baker," Biometrics, The International Biometric Society, vol. 76(4), pages 1385-1389, December.
    9. Guang Yang & Dungang Liu & Junyuan Wang & Min‐ge Xie, 2016. "Meta‐analysis framework for exact inferences with application to the analysis of rare events," Biometrics, The International Biometric Society, vol. 72(4), pages 1378-1386, December.
    10. Nicole Deflaux & Margaret Sunitha Selvaraj & Henry Robert Condon & Kelsey Mayo & Sara Haidermota & Melissa A. Basford & Chris Lunt & Anthony A. Philippakis & Dan M. Roden & Joshua C. Denny & Anjene Mu, 2023. "Demonstrating paths for unlocking the value of cloud genomics through cross cohort analysis," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
    11. J. Andrew Royle, 2009. "Analysis of Capture–Recapture Models with Individual Covariates Using Data Augmentation," Biometrics, The International Biometric Society, vol. 65(1), pages 267-274, March.
    12. De Bock, Koen W. & Coussement, Kristof & Van den Poel, Dirk, 2010. "Ensemble classification based on generalized additive models," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1535-1546, June.
    13. Zhang, Hong & Wu, Zheyang, 2022. "The general goodness-of-fit tests for correlated data," Computational Statistics & Data Analysis, Elsevier, vol. 167(C).
    14. Wen-Han Hwang & Steve Y. H. Huang, 2003. "Estimation in Capture-Recapture Models When Covariates Are Subject to Measurement Errors," Biometrics, The International Biometric Society, vol. 59(4), pages 1113-1122, December.
    15. D. L. Borchers & B. C. Stevenson & D. Kidney & L. Thomas & T. A. Marques, 2015. "A Unifying Model for Capture-Recapture and Distance Sampling Surveys of Wildlife Populations," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 195-204, March.
    16. Stephen N. Freeman & Nicholas J. B. Isaac & Panagiotis Besbeas & Emily B. Dennis & Byron J. T. Morgan, 2021. "A Generic Method for Estimating and Smoothing Multispecies Biodiversity Indicators Using Intermittent Data," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(1), pages 71-89, March.
    17. Abadi, Fitsum & Gimenez, Olivier & Jakober, Hans & Stauber, Wolfgang & Arlettaz, Raphaël & Schaub, Michael, 2012. "Estimating the strength of density dependence in the presence of observation errors using integrated population models," Ecological Modelling, Elsevier, vol. 242(C), pages 1-9.
    18. Yee, Thomas W. & Stoklosa, Jakub & Huggins, Richard M., 2015. "The VGAM Package for Capture-Recapture Data Using the Conditional Likelihood," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 65(i05).
    19. Eve Bohnett & Jessica Schulz & Robert Dobbs & Thomas Hoctor & Dave Hulse & Bilal Ahmad & Wajid Rashid & Hardin Waddle, 2023. "Shorebird Monitoring Using Spatially Explicit Occupancy and Abundance," Land, MDPI, vol. 12(4), pages 1-15, April.
    20. Stoklosa, Jakub & Huggins, Richard M., 2012. "A robust P-spline approach to closed population capture–recapture models with time dependence and heterogeneity," Computational Statistics & Data Analysis, Elsevier, vol. 56(2), pages 408-417.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:139:y:2019:i:c:p:34-44. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.