IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v608y2022ip1s0378437122008172.html
   My bibliography  Save this article

Random sampling of the Zipf–Mandelbrot distribution as a representation of vocabulary growth

Author

Listed:
  • Tunnicliffe, Martin
  • Hunter, Gordon

Abstract

We develop a discrete model of type-token dynamics based on random type selection from the Zipf–Mandelbrot probability distribution, with a view to examining the relationships between the constants of Zipf’s and Heaps’ laws. Analysis of items randomly selected items from the Standardised Project Gutenberg Corpus (SPGC) reveal a significant low-frequency “droop” in the β-slope of the types vs. frequency distribution, inconsistent with the model when vocabulary is unlimited: when a finite vocabulary limit is imposed, optimal parameter selection allows the droop to be reproduced. We adjust the parameters of both the limited and unlimited vocabulary models to obtain optimal agreement with the vocabulary growth curves: the limited vocabulary model usually yields the best optimised agreement, but a sizeable minority of items are better represented by an unlimited vocabulary. While the optimised Zipf α indices correlate strongly with the corresponding values obtained directly from document statistics, the former are generally larger than the latter (though this is partially explained by the distorting effect of large values of the Mandelbrot parameter m). The β indices optimised from the limited vocabulary model are also compared with their directly measured equivalents, showing significant positive correlation. The relationship between optimised α and β agrees plausibly with the well-known continuum model, though the degree of agreement depends on how β is defined. The experiments yield repeatable results from each of three 100-item samples, demonstrating the statistical significance of the experiments.

Suggested Citation

  • Tunnicliffe, Martin & Hunter, Gordon, 2022. "Random sampling of the Zipf–Mandelbrot distribution as a representation of vocabulary growth," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 608(P1).
  • Handle: RePEc:eee:phsmap:v:608:y:2022:i:p1:s0378437122008172
    DOI: 10.1016/j.physa.2022.128259
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378437122008172
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2022.128259?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. repec:cup:cbooks:9780511771576 is not listed on IDEAS
    2. Montemurro, Marcelo A., 2001. "Beyond the Zipf–Mandelbrot law in quantitative linguistics," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 300(3), pages 567-578.
    3. H. Bauke, 2007. "Parameter estimation for power-law distributions by maximum likelihood methods," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 58(2), pages 167-173, July.
    4. Linyuan Lü & Zi-Ke Zhang & Tao Zhou, 2010. "Zipf's Law Leads to Heaps' Law: Analyzing Their Relation in Finite-Size Systems," PLOS ONE, Public Library of Science, vol. 5(12), pages 1-11, December.
    5. Eliazar, Iddo, 2011. "The growth statistics of Zipfian ensembles: Beyond Heaps’ law," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(20), pages 3189-3203.
    6. Easley,David & Kleinberg,Jon, 2010. "Networks, Crowds, and Markets," Cambridge Books, Cambridge University Press, number 9780521195331.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Blazquez-Soriano, Amparo & Ramos-Sandoval, Rosmery, 2022. "Information transfer as a tool to improve the resilience of farmers against the effects of climate change: The case of the Peruvian National Agrarian Innovation System," Agricultural Systems, Elsevier, vol. 200(C).
    2. Martin L. Weitzman, 2015. "A Voting Architecture for the Governance of Free-Driver Externalities, with Application to Geoengineering," Scandinavian Journal of Economics, Wiley Blackwell, vol. 117(4), pages 1049-1068, October.
    3. Wei Zhong, 2017. "Simulating influenza pandemic dynamics with public risk communication and individual responsive behavior," Computational and Mathematical Organization Theory, Springer, vol. 23(4), pages 475-495, December.
    4. Guo Weilong & Minca Andreea & Wang Li, 2016. "The topology of overlapping portfolio networks," Statistics & Risk Modeling, De Gruyter, vol. 33(3-4), pages 139-155, December.
    5. Kwame Boamah‐Addo & Tomasz J. Kozubowski & Anna K. Panorska, 2023. "A discrete truncated Zipf distribution," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 77(2), pages 156-187, May.
    6. Kobayashi, Teruyoshi & Takaguchi, Taro, 2018. "Identifying relationship lending in the interbank market: A network approach," Journal of Banking & Finance, Elsevier, vol. 97(C), pages 20-36.
    7. Konstantinos Antoniadis & Kostas Zafiropoulos & Vasiliki Vrana, 2016. "A Method for Assessing the Performance of e-Government Twitter Accounts," Future Internet, MDPI, vol. 8(2), pages 1-18, April.
    8. Maness, Michael & Cirillo, Cinzia, 2016. "An indirect latent informational conformity social influence choice model: Formulation and case study," Transportation Research Part B: Methodological, Elsevier, vol. 93(PA), pages 75-101.
    9. Lomi, Alessandro & Fonti, Fabio, 2012. "Networks in markets and the propensity of companies to collaborate: An empirical test of three mechanisms," Economics Letters, Elsevier, vol. 114(2), pages 216-220.
    10. Zhang, Xuxi & Liu, Xianping & Lewis, Frank L. & Wang, Xia, 2020. "Bipartite tracking consensus of nonlinear multi-agent systems," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 545(C).
    11. Bing Han & Liyan Yang, 2013. "Social Networks, Information Acquisition, and Asset Prices," Management Science, INFORMS, vol. 59(6), pages 1444-1457, June.
    12. Dimitrios Karamanis, 2022. "Defence partnerships, military expenditure, investment, and economic growth: an analysis in PESCO countries," GreeSE – Hellenic Observatory Papers on Greece and Southeast Europe 173, Hellenic Observatory, LSE.
    13. Levent V. Orman, 2016. "Information markets over trust networks," Electronic Commerce Research, Springer, vol. 16(4), pages 529-551, December.
    14. Zhu, Yu-Xiao & Cao, Yan-Yan & Chen, Ting & Qiu, Xiao-Yan & Wang, Wei & Hou, Rui, 2018. "Crossover phenomena in growth pattern of social contagions with restricted contact," Chaos, Solitons & Fractals, Elsevier, vol. 114(C), pages 408-414.
    15. Pablo Galaso & Adrián Rodríguez Miranda & Sebastian Goinheix, 2018. "Local development, social capital and social network analysis: evidence from Uruguay," Revista de Estudios Regionales, Universidades Públicas de Andalucía, vol. 3, pages 137-163.
    16. Takahiro Ezaki & Naoki Masuda, 2017. "Reinforcement learning account of network reciprocity," PLOS ONE, Public Library of Science, vol. 12(12), pages 1-8, December.
    17. Mariann Ollar & Marzena Rostek, 2011. "Information Aggregation and Innovation in Market Design," Working Papers 11-12, NET Institute.
    18. Mr. Jorge A Chan-Lau, 2017. "Variance Decomposition Networks: Potential Pitfalls and a Simple Solution," IMF Working Papers 2017/107, International Monetary Fund.
    19. Lillo, Felipe & Valdés, Rodrigo, 2016. "Dynamics of financial markets and transaction costs: A graph-based study," Research in International Business and Finance, Elsevier, vol. 38(C), pages 455-465.
    20. Usha Sridhar & Sridhar Mandyam, 2016. "Loan Allocation and Guarantee Structure for Group Borrower Networks in Microfinance," Studies in Microeconomics, , vol. 4(2), pages 100-114, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:608:y:2022:i:p1:s0378437122008172. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.