IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v56y2012i8p2562-2573.html
   My bibliography  Save this article

Sampling designs via a multivariate hypergeometric-Dirichlet process model for a multi-species assemblage with unknown heterogeneity

Author

Listed:
  • Zhang, Hongmei
  • Ghosh, Kaushik
  • Ghosh, Pulak

Abstract

In a sample of mRNA species counts, sequences without duplicates or with small numbers of copies are likely to carry information related to mutations or diseases and can be of great interest. However, in some situations, sequence abundance is unknown and sequencing the whole sample to find the rare sequences is not practically possible. To collect mRNA sequences of interest, or more generally, species of interest, we propose a two-phase Bayesian sampling method that addresses these concerns. The first phase of the design is used to infer sequence (species) abundance levels through a cluster analysis applied to a pilot data set. The clustering method is built upon a multivariate hypergeometric model with a Dirichlet process prior for species relative frequencies. The second phase, through Monte Carlo simulations, infers the sample size necessary to collect a certain number of species of particular interest. Efficient posterior computing schemes are proposed. The developed approach is demonstrated and evaluated via simulations. An mRNA segment data set is used to illustrate and motivate the proposed sampling method.

Suggested Citation

  • Zhang, Hongmei & Ghosh, Kaushik & Ghosh, Pulak, 2012. "Sampling designs via a multivariate hypergeometric-Dirichlet process model for a multi-species assemblage with unknown heterogeneity," Computational Statistics & Data Analysis, Elsevier, vol. 56(8), pages 2562-2573.
  • Handle: RePEc:eee:csdana:v:56:y:2012:i:8:p:2562-2573
    DOI: 10.1016/j.csda.2012.02.013
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947312000953
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2012.02.013?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Robert M. Dorazio & Bhramar Mukherjee & Li Zhang & Malay Ghosh & Howard L. Jelks & Frank Jordan, 2008. "Modeling Unobserved Sources of Heterogeneity in Animal Abundance Using a Dirichlet Process Prior," Biometrics, The International Biometric Society, vol. 64(2), pages 635-644, June.
    2. Antonio Lijoi & Ramsés H. Mena & Igor Prünster, 2007. "A Bayesian Nonparametric Method for Prediction in EST Analysis," ICER Working Papers - Applied Mathematics Series 16-2007, ICER - International Centre for Economic Research.
    3. Xuan Mao, Chang, 2007. "Estimating population sizes for capture-recapture sampling with binomial mixtures," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5211-5219, July.
    4. Anne Chao & John Bunge, 2002. "Estimating the Number of Species in a Stochastic Abundance Model," Biometrics, The International Biometric Society, vol. 58(3), pages 531-539, September.
    5. Shirley Pledger & Kenneth H. Pollock & James L. Norris, 2010. "Open Capture–Recapture Models with Heterogeneity: II. Jolly–Seber Model," Biometrics, The International Biometric Society, vol. 66(3), pages 883-890, September.
    6. Hongmei Zhang, 2007. "Inferences on the Number of Unseen Species and the Number of Abundant/Rare Species," Journal of Applied Statistics, Taylor & Francis Journals, vol. 34(6), pages 725-740.
    7. Ji-Ping Wang, 2010. "Estimating species richness by a Poisson-compound gamma model," Biometrika, Biometrika Trust, vol. 97(3), pages 727-740.
    8. Mary C. Christman & Feng Lan, 2001. "Inverse Adaptive Cluster Sampling," Biometrics, The International Biometric Society, vol. 57(4), pages 1096-1105, December.
    9. Shirley Pledger & Kenneth H. Pollock & James L. Norris, 2003. "Open Capture-Recapture Models with Heterogeneity: I. Cormack-Jolly-Seber Model," Biometrics, The International Biometric Society, vol. 59(4), pages 786-794, December.
    10. Jeffrey S. Morris & Keith A. Baggerly & Kevin R. Coombes, 2003. "Bayesian Shrinkage Estimation of the Relative Abundance of mRNA Transcripts Using SAGE," Biometrics, The International Biometric Society, vol. 59(3), pages 476-486, September.
    11. Mao, Chang Xuan, 2006. "Inference on the Number of Species Through Geometric Lower Bounds," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1663-1670, December.
    12. P. Besbeas & S. N. Freeman & B. J. T. Morgan & E. A. Catchpole, 2002. "Integrating Mark–Recapture–Recovery and Census Data to Estimate Animal Abundance and Demographic Parameters," Biometrics, The International Biometric Society, vol. 58(3), pages 540-547, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Han, Shengtong & Zhang, Hongmei & Karmaus, Wilfried & Roberts, Graham & Arshad, Hasan, 2017. "Adjusting background noise in cluster analyses of longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 109(C), pages 93-104.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chang Xuan Mao & Nan Yang & Jinhua Zhong, 2013. "On Population Size Estimators in the Poisson Mixture Model," Biometrics, The International Biometric Society, vol. 69(3), pages 758-765, September.
    2. repec:jss:jstsof:40:i09 is not listed on IDEAS
    3. Jakub Stoklosa & Wen-Han Hwang & Sheng-Hai Wu & Richard Huggins, 2011. "Heterogeneous Capture–Recapture Models with Covariates: A Partial Likelihood Approach for Closed Populations," Biometrics, The International Biometric Society, vol. 67(4), pages 1659-1665, December.
    4. Chang Xuan Mao & Cuiying Yang & Yitong Yang & Wei Zhuang, 2017. "Estimating population sizes with the Rasch model," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(3), pages 705-716, June.
    5. R. B. O'Hara & S. Lampila & M. Orell, 2009. "Estimation of Rates of Births, Deaths, and Immigration from Mark–Recapture Data," Biometrics, The International Biometric Society, vol. 65(1), pages 275-281, March.
    6. Chang Xuan Mao & Na You, 2009. "On Comparison of Mixture Models for Closed Population Capture–Recapture Studies," Biometrics, The International Biometric Society, vol. 65(2), pages 547-553, June.
    7. Steven Thompson, 2013. "Adaptive web sampling in ecology," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 22(1), pages 33-43, March.
    8. Dennis, Emily B. & Kéry, Marc & Morgan, Byron J.T. & Coray, Armin & Schaub, Michael & Baur, Bruno, 2021. "Integrated modelling of insect population dynamics at two temporal scales," Ecological Modelling, Elsevier, vol. 441(C).
    9. Gurutzeta Guillera-Arroita & José J. Lahoz-Monfort, 2017. "Species occupancy estimation and imperfect detection: shall surveys continue after the first detection?," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 101(4), pages 381-398, October.
    10. Nichols, J.M. & Spendelow, J.A. & Nichols, J.D., 2017. "Using Optimal Transport Theory to Estimate Transition Probabilities in Metapopulation Dynamics," Ecological Modelling, Elsevier, vol. 359(C), pages 311-319.
    11. Farcomeni, Alessio & Dotto, Francesco, 2021. "A correction to make Chao estimator conservative when the number of sampling occasions is finite," Statistics & Probability Letters, Elsevier, vol. 176(C).
    12. Leo Polansky & Ken B. Newman & Lara Mitchell, 2021. "Improving inference for nonlinear state‐space models of animal population dynamics given biased sequential life stage data," Biometrics, The International Biometric Society, vol. 77(1), pages 352-361, March.
    13. Steven K. Thompson, 2006. "Adaptive Web Sampling," Biometrics, The International Biometric Society, vol. 62(4), pages 1224-1234, December.
    14. Dankmar Böhning & Rattana Lerdsuwansri & Patarawan Sangnawakij, 2023. "Modeling COVID‐19 contact‐tracing using the ratio regression capture–recapture approach," Biometrics, The International Biometric Society, vol. 79(4), pages 3818-3830, December.
    15. Bertrand K. Hassani & Wei Yang, 2016. "The Lila distribution and its applications in risk modelling," Documents de travail du Centre d'Economie de la Sorbonne 16068, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    16. Seungchul Baek & Junyong Park, 2022. "A computationally efficient approach to estimating species richness and rarefaction curve," Computational Statistics, Springer, vol. 37(4), pages 1919-1941, September.
    17. Ann E. McKellar & Roland Langrock & Jeffrey R. Walters & Dylan C. Kesler, 2015. "Using mixed hidden Markov models to examine behavioral states in a cooperatively breeding bird," Behavioral Ecology, International Society for Behavioral Ecology, vol. 26(1), pages 148-157.
    18. Chang Xuan Mao & Sining Chen & Yitong Yang, 2016. "A Population-Size Model for Protein Spot Detection in Proteomic Studies," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 21(1), pages 170-180, March.
    19. Antonio Canale & Igor Prünster, 2017. "Robustifying Bayesian nonparametric mixtures for count data," Biometrics, The International Biometric Society, vol. 73(1), pages 174-184, March.
    20. Blanca Sarzo & Ruth King & David Conesa & Jonas Hentati-Sundberg, 2021. "Correcting Bias in Survival Probabilities for Partially Monitored Populations via Integrated Models," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(2), pages 200-219, June.
    21. Chun-Huo Chiu & Yi-Ting Wang & Bruno A. Walther & Anne Chao, 2014. "An improved nonparametric lower bound of species richness via a modified good–turing frequency formula," Biometrics, The International Biometric Society, vol. 70(3), pages 671-682, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:56:y:2012:i:8:p:2562-2573. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.