IDEAS home Printed from
   My bibliography  Save this article

Vocabulary Richness Metric for Extracting Author’s Semantic Mark in English Written Literary Works


  • Madalina ZURINI





The present paper starts from a short introduction of the major aspects debated regarding the stylometric measures used for extracting the personal signature added by a particular author to its English written works. Those measures are used in the context of indicating an author from a limited cardinality set of authors being given a set of documents or a defined indicators values which characterizes the semantic way that an author is writing its works. The paper addresses the problems of the semantic level of a work depending on the tokens that he uses in the paper, tokens that are extracted in a preprocessing step of analysis. The tokens are defined using a lexical ontology, for the English words referring to WordNet, and the automatic extracting of those tokens from the words found in the particular processed papers. The main vocabulary richness evaluation metrics are presented taking into account the major literature review and extracting the main steps into a new proposed metric that is combining the vocabulary richness with the semantic layer of a paper. The concept of author mark is described. The objective of this research paper is highlighted into the new proposed metric that is non-dependent on the main subject discussed in the analyzed paper. This objective leads to a general metric that combines documents from different subjects into a metric that can describe the vocabulary richness of a specific author depending on the works that he had written. Furthermore, the analysis is conducting into a time evolution of this metric, using the extraction of the trend of the author’s vocabulary richness indicator. Using a set of 13 years values of this indicator upon a specific author, the results are presented in this research paper. Future work refers to inserting this metric into a general description of the author mark into his specific English written works.

Suggested Citation

  • Madalina ZURINI & Alin ZAMFIROIU, 2016. "Vocabulary Richness Metric for Extracting Author’s Semantic Mark in English Written Literary Works," Informatica Economica, Academy of Economic Studies - Bucharest, Romania, vol. 20(3), pages 37-45.
  • Handle: RePEc:aes:infoec:v:20:y:2016:i:3:p:37-45

    Download full text from publisher

    File URL:,%20Zamfiroiu.pdf
    Download Restriction: no


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:aes:infoec:v:20:y:2016:i:3:p:37-45. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Paul Pocatilu). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.