IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v145y2020ics0167947320300050.html

Bias reduction in the population size estimation of large data sets

Author

Listed:
  • Chu, Jeffrey
  • Zhang, Yuanyuan
  • Chan, Stephen
  • Nadarajah, Saralees

Abstract

Estimation of the population size of large data sets and hard to reach populations can be a significant problem. For example, in the military, manpower is limited and the manual processing of large data sets can be time consuming. In addition, accessing the full population of data may be restricted by factors such as cost, time, and safety. Four new population size estimators are proposed, as extensions of existing methods, and their performances are compared in terms of bias with two existing methods in the big data literature. These would be particularly beneficial in the context of time-critical decisions or actions. The comparison is based on a simulation study and the application to five real network data sets (Twitter, LiveJournal, Pokec, Youtube, Wikipedia Talk). Whilst no single estimator (out of the four proposed) generates the most accurate estimates overall, the proposed estimators are shown to produce more accurate population size estimates for small sample sizes, but in some cases show more variability than existing estimators in the literature.

Suggested Citation

  • Chu, Jeffrey & Zhang, Yuanyuan & Chan, Stephen & Nadarajah, Saralees, 2020. "Bias reduction in the population size estimation of large data sets," Computational Statistics & Data Analysis, Elsevier, vol. 145(C).
  • Handle: RePEc:eee:csdana:v:145:y:2020:i:c:s0167947320300050
    DOI: 10.1016/j.csda.2020.106914
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947320300050
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2020.106914?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Forrest W. Crawford & Jiacheng Wu & Robert Heimer, 2018. "Hidden Population Size Estimation From Respondent-Driven Sampling: A Network Approach," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 755-766, April.
    2. Boginski, Vladimir & Butenko, Sergiy & Pardalos, Panos M., 2005. "Statistical analysis of financial networks," Computational Statistics & Data Analysis, Elsevier, vol. 48(2), pages 431-443, February.
    3. Yuanyuan Zhang & Saralees Nadarajah, 2017. "Flexible Heavy Tailed Distributions for Big Data," Annals of Data Science, Springer, vol. 4(3), pages 421-432, September.
    4. Zaman, Asad, 1981. "Estimators without moments : The case of the reciprocal of a normal mean," Journal of Econometrics, Elsevier, vol. 15(2), pages 289-298, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. D'Arcangelis, Anna Maria & Rotundo, Giulia, 2021. "Herding in mutual funds: A complex network approach," Journal of Business Research, Elsevier, vol. 129(C), pages 679-686.
    2. Ixandra Achitouv, 2025. "Dynamical analysis of financial stocks network: Improving forecasting using network properties," PLOS ONE, Public Library of Science, vol. 20(5), pages 1-23, May.
    3. Dimitris Andriosopoulos & Michalis Doumpos & Panos M. Pardalos & Constantin Zopounidis, 2019. "Computational approaches and data analytics in financial services: A literature review," Journal of the Operational Research Society, Taylor & Francis Journals, vol. 70(10), pages 1581-1599, October.
    4. Arash Sioofy Khoojine & Ziyun Feng & Mahboubeh Shadabfar & Negar Sioofy Khoojine, 2023. "Analyzing volatility patterns in the Chinese stock market using partial mutual information-based distances," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 96(12), pages 1-21, December.
    5. Zeng, Zhi-Jian & Xie, Chi & Yan, Xin-Guo & Hu, Jue & Mao, Zhou, 2016. "Are stock market networks non-fractal? Evidence from New York Stock Exchange," Finance Research Letters, Elsevier, vol. 17(C), pages 97-102.
    6. Hosseini, Seyed Soheil & Wormald, Nick & Tian, Tianhai, 2021. "A Weight-based Information Filtration Algorithm for Stock-correlation Networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 563(C).
    7. Teh, Boon Kin & Goo, Yik Wen & Lian, Tong Wei & Ong, Wei Guang & Choi, Wen Ting & Damodaran, Mridula & Cheong, Siew Ann, 2015. "The Chinese Correction of February 2007: How financial hierarchies change in a market crash," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 424(C), pages 225-241.
    8. A. Vizgunov & B. Goldengorin & V. Kalyagin & A. Koldanov & P. Koldanov & P. Pardalos, 2014. "Network approach for the Russian stock market," Computational Management Science, Springer, vol. 11(1), pages 45-55, January.
    9. Zhu, Jia & Wei, Daijun, 2021. "Analysis of stock market based on visibility graph and structure entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 576(C).
    10. Stosic, Darko & Stosic, Dusan & Ludermir, Teresa B. & Stosic, Tatijana, 2018. "Collective behavior of cryptocurrency price changes," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 507(C), pages 499-509.
    11. Petr Koldanov & Alexander Koldanov & Dmitry Semenov, 2024. "Finding Weakly Correlated Nodes in Random Variable Networks," SN Operations Research Forum, Springer, vol. 5(4), pages 1-14, December.
    12. Joerg Osterrieder & Stephen Chan & Jeffrey Chu & Yuanyuan Zhang & Branka Hadji Misheva & Codruta Mare, 2024. "Enhancing Security in Blockchain Networks: Anomalies, Frauds, and Advanced Detection Techniques," Papers 2402.11231, arXiv.org.
    13. Marton Gosztonyi, 2021. "A Snapshot of the Ownership Network of the Budapest Stock Exchange," Financial and Economic Review, Magyar Nemzeti Bank (Central Bank of Hungary), vol. 20(3), pages 31-58.
    14. Lillo, Felipe & Valdés, Rodrigo, 2016. "Dynamics of financial markets and transaction costs: A graph-based study," Research in International Business and Finance, Elsevier, vol. 38(C), pages 455-465.
    15. Stephan Bialonski & Martin Wendler & Klaus Lehnertz, 2011. "Unraveling Spurious Properties of Interaction Networks with Tailored Random Networks," PLOS ONE, Public Library of Science, vol. 6(8), pages 1-13, August.
    16. Xue Guo & Hu Zhang & Tianhai Tian, 2019. "Multi-Likelihood Methods for Developing Stock Relationship Networks Using Financial Big Data," Papers 1906.08088, arXiv.org.
    17. Wang, Gang-Jin & Chen, Yang-Yang & Si, Hui-Bin & Xie, Chi & Chevallier, Julien, 2021. "Multilayer information spillover networks analysis of China’s financial institutions based on variance decompositions," International Review of Economics & Finance, Elsevier, vol. 73(C), pages 325-347.
    18. Cameron Cornell & Lewis Mitchell & Matthew Roughan, 2024. "Enhancing Causal Discovery in Financial Networks with Piecewise Quantile Regression," Papers 2408.12210, arXiv.org.
    19. Neto, José de Paula Neves & Figueiredo, Daniel Ratton, 2023. "Ranking influential and influenced stocks over time using transfer entropy networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 630(C).
    20. Carson, Richard T. & Czajkowski, Mikołaj, 2019. "A new baseline model for estimating willingness to pay from discrete choice models," Journal of Environmental Economics and Management, Elsevier, vol. 95(C), pages 57-61.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:145:y:2020:i:c:s0167947320300050. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.