IDEAS home Printed from https://ideas.repec.org/a/taf/japsta/v37y2010i9p1585-1603.html
   My bibliography  Save this article

The Sichel model and the mixing and truncation order

Author

Listed:
  • Xavier Puig
  • Josep Ginebra
  • Marti Font

Abstract

The analysis of word frequency count data can be very useful in authorship attribution problems. Zero-truncated generalized inverse Gaussian-Poisson mixture models are very helpful in the analysis of these kinds of data because their model-mixing density estimates can be used as estimates of the density of the word frequencies of the vocabulary. It is found that this model provides excellent fits for the word frequency counts of very long texts, where the truncated inverse Gaussian-Poisson special case fails because it does not allow for the large degree of over-dispersion in the data. The role played by the three parameters of this truncated GIG-Poisson model is also explored. Our second goal is to compare the fit of the truncated GIG-Poisson mixture model with the fit of the model that results from switching the order of the mixing and truncation stages. A heuristic interpretation of the mixing distribution estimates obtained under this alternative GIG-truncated Poisson mixture model is also provided.

Suggested Citation

  • Xavier Puig & Josep Ginebra & Marti Font, 2010. "The Sichel model and the mixing and truncation order," Journal of Applied Statistics, Taylor & Francis Journals, vol. 37(9), pages 1585-1603.
  • Handle: RePEc:taf:japsta:v:37:y:2010:i:9:p:1585-1603
    DOI: 10.1080/02664760903093617
    as

    Download full text from publisher

    File URL: http://www.tandfonline.com/doi/abs/10.1080/02664760903093617
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/02664760903093617?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Alex Riba & Josep Ginebra, 2006. "Diversity of vocabulary and homogeneity of literary style," Journal of Applied Statistics, Taylor & Francis Journals, vol. 33(7), pages 729-741.
    2. Ginebra, Josep & Puig, Xavier, 2010. "On the measure and the estimation of evenness and diversity," Computational Statistics & Data Analysis, Elsevier, vol. 54(9), pages 2187-2201, September.
    3. H. S. Sichel, 1985. "A bibliometric distribution which really works," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 36(5), pages 314-321, September.
    4. Alex Riba & Josep Ginebra, 2005. "Change-point estimation in a multinomial sequence and homogeneity of literary style," Journal of Applied Statistics, Taylor & Francis Journals, vol. 32(1), pages 61-74.
    5. D. I. Holmes, 1992. "A Stylometric Analysis of Mormon Scripture and Related Texts," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 155(1), pages 91-120, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jordi Valero & Josep Ginebra & Marta Pérez-Casany, 2012. "Extended Truncated Tweedie-Poisson Model," Methodology and Computing in Applied Probability, Springer, vol. 14(3), pages 811-829, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jordi Valero & Josep Ginebra & Marta Pérez-Casany, 2012. "Extended Truncated Tweedie-Poisson Model," Methodology and Computing in Applied Probability, Springer, vol. 14(3), pages 811-829, September.
    2. Alex Riba & Josep Ginebra, 2006. "Diversity of vocabulary and homogeneity of literary style," Journal of Applied Statistics, Taylor & Francis Journals, vol. 33(7), pages 729-741.
    3. Ginebra, Josep & Puig, Xavier, 2010. "On the measure and the estimation of evenness and diversity," Computational Statistics & Data Analysis, Elsevier, vol. 54(9), pages 2187-2201, September.
    4. Quentin L. Burrel, 2001. "Stochastic modelling of the first-citation distribution," Scientometrics, Springer;Akadémiai Kiadó, vol. 52(1), pages 3-12, September.
    5. Quentin L. Burrell, 2002. "The nth-citation distribution and obsolescence," Scientometrics, Springer;Akadémiai Kiadó, vol. 53(3), pages 309-323, March.
    6. Shovan Chowdhury, 2014. "Compounded Generalized Weibull Distributions - A Unified Approach," Working papers 148, Indian Institute of Management Kozhikode.
    7. Oliver Johnson & Dino Sejdinovic & James Cruise & Robert Piechocki & Ayalvadi Ganesh, 2014. "Non-Parametric Change-Point Estimation using String Matching Algorithms," Methodology and Computing in Applied Probability, Springer, vol. 16(4), pages 987-1008, December.
    8. A. Baccini & L. Barabesi & M. Cioni & C. Pisani, 2014. "Crossing the hurdle: the determinants of individual scientific performance," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(3), pages 2035-2062, December.
    9. Mingers, John & Leydesdorff, Loet, 2015. "A review of theory and practice in scientometrics," European Journal of Operational Research, Elsevier, vol. 246(1), pages 1-19.
    10. Sarabia, José María & Gómez-Déniz, Emilio & Sarabia, María & Prieto, Faustino, 2010. "A general method for generating parametric Lorenz and Leimkuhler curves," Journal of Informetrics, Elsevier, vol. 4(4), pages 524-539.
    11. Saralees Nadarajah & Vicente Cancho & Edwin Ortega, 2013. "The geometric exponential Poisson distribution," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 22(3), pages 355-380, August.
    12. Aisling J. Daly & Jan M. Baetens & Bernard De Baets, 2018. "Ecological Diversity: Measuring the Unmeasurable," Mathematics, MDPI, vol. 6(7), pages 1-28, July.
    13. Martínez-Rodríguez, A.M. & Sáez-Castillo, A.J. & Conde-Sánchez, A., 2011. "Modelling using an extended Yule distribution," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 863-873, January.
    14. Quentin L. Burrell, 2007. "Time-dependent aspects of co-concentration in informetrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 73(2), pages 161-174, November.
    15. Joshua Mitts, 2020. "Short and Distort," The Journal of Legal Studies, University of Chicago Press, vol. 49(2), pages 287-334.
    16. Quentin L. Burrell, 2014. "The individual author’s publication–citation process: theory and practice," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(1), pages 725-742, January.
    17. Ajiferuke Isola & Wolfram Dietmar, 2004. "Modelling the characteristics of Web page outlinks," Scientometrics, Springer;Akadémiai Kiadó, vol. 59(1), pages 43-62, January.
    18. Michael Nelson & J. Stephen Downie, 2002. "Informetric analysis of a music database," Scientometrics, Springer;Akadémiai Kiadó, vol. 54(2), pages 243-255, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:japsta:v:37:y:2010:i:9:p:1585-1603. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/CJAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.