IDEAS home Printed from https://ideas.repec.org/a/gam/jstats/v4y2021i4p60-1068d702592.html
   My bibliography  Save this article

Stylometry and Numerals Usage: Benford’s Law and Beyond

Author

Listed:
  • Andrei V. Zenkov

    (Department of Modelling of Controllable Systems, Ural Federal University, 620002 Ekaterinburg, Russia
    Department of Information Technologies and Statistics, Ural State University of Economics, 620144 Ekaterinburg, Russia)

Abstract

We suggest two approaches to the statistical analysis of texts, both based on the study of numerals occurrence in literary texts. The first approach is related to Benford’s Law and the analysis of the frequency distribution of various leading digits of numerals contained in the text. In coherent literary texts, the share of the leading digit 1 is even larger than prescribed by Benford’s Law and can reach 50 percent. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic the author’s style feature, manifested in all (sufficiently long) literary texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced stylometric analysis. The proposed approaches are illustrated by examples of computer analysis of the literary texts in English and Russian.

Suggested Citation

  • Andrei V. Zenkov, 2021. "Stylometry and Numerals Usage: Benford’s Law and Beyond," Stats, MDPI, vol. 4(4), pages 1-18, December.
  • Handle: RePEc:gam:jstats:v:4:y:2021:i:4:p:60-1068:d:702592
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-905X/4/4/60/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-905X/4/4/60/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Matthew A. Cole & David J. Maddison & Liyun Zhang, 2020. "Testing the emission reduction claims of CDM projects using the Benford’s Law," Climatic Change, Springer, vol. 160(3), pages 407-426, June.
    2. Fewster, R. M., 2009. "A Simple Explanation of Benford's Law," The American Statistician, American Statistical Association, vol. 63(1), pages 26-32.
    3. Theoharry Grammatikos & Nikolaos I. Papanikolaou, 2021. "Applying Benford’s Law to Detect Accounting Data Manipulation in the Banking Industry," Journal of Financial Services Research, Springer;Western Finance Association, vol. 59(1), pages 115-142, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Roy Cerqueti & Claudio Lupi, 2023. "Severe testing of Benford’s law," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(2), pages 677-694, June.
    2. Louie Rivers & Tamara Dempsey & Jade Mitchell & Carole Gibbs, 2015. "Environmental Regulation and Enforcement: Structures, Processes and the Use of Data for Fraud Detection," Journal of Environmental Assessment Policy and Management (JEAPM), World Scientific Publishing Co. Pte. Ltd., vol. 17(04), pages 1-29, December.
    3. Sitsofe Tsagbey & Miguel de Carvalho & Garritt L. Page, 2017. "All Data are Wrong, but Some are Useful? Advocating the Need for Data Auditing," The American Statistician, Taylor & Francis Journals, vol. 71(3), pages 231-235, July.
    4. Kauko, Karlo, 2019. "Benford's law and Chinese banks' non-performing loans," BOFIT Discussion Papers 25/2019, Bank of Finland Institute for Emerging Economies (BOFIT).
    5. Charumathi Balakrishnan & Beemamol M, 2023. "Testing CO2 Emissions Data During Covid-19 Pandemic Using Benford’s Law," Energy RESEARCH LETTERS, Asia-Pacific Applied Economics Association, vol. 4(2), pages 1-6.
    6. Matthew A. Cole & David J. Maddison & Liyun Zhang, 2020. "Testing the emission reduction claims of CDM projects using the Benford’s Law," Climatic Change, Springer, vol. 160(3), pages 407-426, June.
    7. Mikkel Bennedsen, 2021. "Designing a statistical procedure for monitoring global carbon dioxide emissions," Climatic Change, Springer, vol. 166(3), pages 1-19, June.
    8. Baumgartner, Tim & Güttler, André, 2022. "Bitcoin flash crash on May 19, 2021: What did really happen on Binance?," IWH Discussion Papers 25/2022, Halle Institute for Economic Research (IWH).
    9. Pier Giacomo Cardinali & Pietro De Giovanni, 2022. "Responsible digitalization through digital technologies and green practices," Corporate Social Responsibility and Environmental Management, John Wiley & Sons, vol. 29(4), pages 984-995, July.
    10. Holz, Carsten A., 2014. "The quality of China's GDP statistics," China Economic Review, Elsevier, vol. 30(C), pages 309-338.
    11. Montag, Josef, 2017. "Identifying odometer fraud in used car market data," Transport Policy, Elsevier, vol. 60(C), pages 10-23.
    12. Biau, Damien, 2015. "The first-digit frequencies in data of turbulent flows," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 440(C), pages 147-154.
    13. Katherine M. Anderson & Kevin Dayaratna & Drew Gonshorowski & Steven J. Miller, 2022. "A New Benford Test for Clustered Data with Applications to American Elections," Stats, MDPI, vol. 5(3), pages 1-15, August.
    14. Kauko, Karlo, 2019. "Benford’s law and Chinese banks’ non-performing loans," BOFIT Discussion Papers 25/2019, Bank of Finland, Institute for Economies in Transition.
    15. Diego Jara & Felipe Parra & Alvaro Riascos & Mauricio Romero, 2011. "Análisis digital y detección de elecciones atípicas," Documentos CEDE 9064, Universidad de los Andes, Facultad de Economía, CEDE.
    16. Yan, Xiaoyong & Yang, Seong-Gyu & Kim, Beom Jun & Minnhagen, Petter, 2018. "Benford’s law and first letter of words," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 512(C), pages 305-315.
    17. Carè, R. & Weber, O., 2023. "How much finance is in climate finance? A bibliometric review, critiques, and future research directions," Research in International Business and Finance, Elsevier, vol. 64(C).
    18. Wang, Delu & Chen, Fan & Mao, Jinqi & Liu, Nannan & Rong, Fangyu, 2022. "Are the official national data credible? Empirical evidence from statistics quality evaluation of China's coal and its downstream industries," Energy Economics, Elsevier, vol. 114(C).
    19. Ionela Munteanu, 2020. "Financial Reporting Quality and Operational Efficiency in the Coastal Region of Romania," Ovidius University Annals, Economic Sciences Series, Ovidius University of Constantza, Faculty of Economic Sciences, vol. 0(2), pages 978-984, December.
    20. de Araújo Silva, Archibald & Aparecida Gouvêa, Maria, 2023. "Study on the effect of sample size on type I error, in the first, second and first-two digits excessmad tests," International Journal of Accounting Information Systems, Elsevier, vol. 48(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jstats:v:4:y:2021:i:4:p:60-1068:d:702592. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.