IDEAS home Printed from https://ideas.repec.org/a/eee/chsofr/v144y2021ics0960077921000321.html
   My bibliography  Save this article

Statistical metrics for languages classification: A case study of the Bible translations

Author

Listed:
  • Mehri, Ali
  • Jamaati, Maryam

Abstract

Automatic language classification is an important contribution to linguistic research. Four statistical features concerning long-range correlations are applied to classify syntactic properties of languages. We calculate Zipf’s exponent, Heaps’ exponent, fractal dimension and entropy, for the Bible translations to one hundred live languages from twenty-eight language families. The Bible has unique concept regardless of its language, but the discrepancy in grammatical rules of the languages leads to difference in extracted measures from its various translations. The results show that, geographical distance and cultural differences can lead to statistical discrepancies. All extracted features for the Bible translations have normal distribution around their average value. This fact categorizes the languages into two groups; a majority of normal languages and a minority of abnormal ones. There is also evident (anti)correlation relation between each pair of the mentioned metrics due to their respective mechanism. Standard deviation of the considered statistical features over language families is affected by geographical distance between communities that speak to their languages and their cultural diversity.

Suggested Citation

  • Mehri, Ali & Jamaati, Maryam, 2021. "Statistical metrics for languages classification: A case study of the Bible translations," Chaos, Solitons & Fractals, Elsevier, vol. 144(C).
  • Handle: RePEc:eee:chsofr:v:144:y:2021:i:c:s0960077921000321
    DOI: 10.1016/j.chaos.2021.110679
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0960077921000321
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.chaos.2021.110679?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mehri, Ali & Darooneh, Amir H., 2011. "The role of entropy in word ranking," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 390(18), pages 3157-3163.
    2. Petroni, Filippo & Serva, Maurizio, 2010. "Measures of lexical distance between languages," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(11), pages 2280-2283.
    3. Jamaati, Maryam & Mehri, Ali, 2018. "Text mining by Tsallis entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 490(C), pages 1368-1376.
    4. Ted Briscoe, 2008. "Language learning, power laws, and sexual selection," Mind & Society: Cognitive Studies in Economics and Social Sciences, Springer;Fondazione Rosselli, vol. 7(1), pages 65-76, June.
    5. Mehri, Ali & Agahi, Hamzeh & Mehri-Dehnavi, Hossein, 2019. "A novel word ranking method based on distorted entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 521(C), pages 484-492.
    6. Gao, Yuyang & Liang, Wei & Shi, Yuming & Huang, Qiuling, 2014. "Comparison of directed and weighted co-occurrence networks of six languages," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 393(C), pages 579-589.
    7. Marcelo A Montemurro & Damián H Zanette, 2011. "Universal Entropy of Word Ordering Across Linguistic Families," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-9, May.
    8. Ali Mehri & Sahar Mohammadpour Lashkari, 2016. "Power-law regularities in human language," The European Physical Journal B: Condensed Matter and Complex Systems, Springer;EDP Sciences, vol. 89(11), pages 1-6, November.
    9. Gamallo, Pablo & Pichel, José Ramom & Alegria, Iñaki, 2017. "From language identification to language distance," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 484(C), pages 152-162.
    10. Mehri, Ali & Darooneh, Amir H. & Shariati, Ashrafalsadat, 2012. "The complex networks approach for authorship attribution of books," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(7), pages 2429-2437.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gamallo, Pablo & Pichel, José Ramom & Alegria, Iñaki, 2017. "From language identification to language distance," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 484(C), pages 152-162.
    2. Jamaati, Maryam & Mehri, Ali, 2018. "Text mining by Tsallis entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 490(C), pages 1368-1376.
    3. Mehri, Ali & Agahi, Hamzeh & Mehri-Dehnavi, Hossein, 2019. "A novel word ranking method based on distorted entropy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 521(C), pages 484-492.
    4. Espitia, Diego & Larralde, Hernán, 2020. "Universal and non-universal text statistics: Clustering coefficient for language identification," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 553(C).
    5. Ramezani, Zahra & Pourdarvish, Ahmad, 2021. "Transfer learning using Tsallis entropy: An application to Gravity Spy," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 561(C).
    6. Quispe, Laura V.C. & Tohalino, Jorge A.V. & Amancio, Diego R., 2021. "Using virtual edges to improve the discriminability of co-occurrence text networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 562(C).
    7. Ingo Eduard Isphording & Sebastian Otten, 2013. "The Costs of Babylon—Linguistic Distance in Applied Economics," Review of International Economics, Wiley Blackwell, vol. 21(2), pages 354-369, May.
    8. repec:zbw:rwirep:0337 is not listed on IDEAS
    9. repec:zbw:hohpro:352 is not listed on IDEAS
    10. Jiang, Jingchi & Zheng, Jichuan & Zhao, Chao & Su, Jia & Guan, Yi & Yu, Qiubin, 2016. "Clinical-decision support based on medical literature: A complex network approach," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 459(C), pages 42-54.
    11. Isphording, Ingo E. & Piopiunik, Marc & Rodríguez-Planas, Núria, 2016. "Speaking in numbers: The effect of reading performance on math performance among immigrants," Economics Letters, Elsevier, vol. 139(C), pages 52-56.
    12. Ibrahim Bousmah & Gilles Grenier & David M. Gray, 2021. "Linguistic Distance, Languages of Work and Wages of Immigrants in Montreal," Journal of Labor Research, Springer, vol. 42(1), pages 1-28, March.
    13. Jingxian Liao & Guowei Yang & David Kavaler & Vladimir Filkov & Prem Devanbu, 2019. "Status, identity, and language: A study of issue discussions in GitHub," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-20, June.
    14. Teerasak Charoennapharat & Poti Chaopaisarn, 2022. "Factors Affecting Multimodal Transport during COVID-19: A Thai Service Provider Perspective," Sustainability, MDPI, vol. 14(8), pages 1-25, April.
    15. Heng Chen & Haitao Liu, 2018. "Quantifying Evolution of Short and Long-Range Correlations in Chinese Narrative Texts across 2000 Years," Complexity, Hindawi, vol. 2018, pages 1-12, February.
    16. repec:old:wpaper:352 is not listed on IDEAS
    17. Piotr Gabrielczak & Tomasz Serwach, 2021. "Firm-Size Distribution in Poland: Is Power Law Applicable?," Gospodarka Narodowa. The Polish Journal of Economics, Warsaw School of Economics, issue 2, pages 31-49.
    18. Piotr Gabrielczak & Tomasz Serwach, 2019. "Firm-size distribution in Poland – is power law applicable?," Lodz Economics Working Papers 3/2019, University of Lodz, Faculty of Economics and Sociology.
    19. Akimushkin, Camilo & Amancio, Diego R. & Oliveira, Osvaldo N., 2018. "On the role of words in the network structure of texts: Application to authorship attribution," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 495(C), pages 49-58.
    20. Vaishnavi Pillalamarri & Angelin Gladston, 2022. "SLIC-Based Cloud Removal Approach with Inpainting for Landsat 8 SAR Images," International Journal of Information Retrieval Research (IJIRR), IGI Global, vol. 12(1), pages 1-17, January.
    21. Liu, Yanyan & Li, Keping & Yan, Dongyang & Gu, Shuang, 2022. "A network-based CNN model to identify the hidden information in text data," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 590(C).
    22. Jennifer A. Byrne & Cyril Labbé, 2017. "Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines," Scientometrics, Springer;Akadémiai Kiadó, vol. 110(3), pages 1471-1493, March.
    23. Isphording, Ingo E. & Otten, Sebastian, 2014. "Linguistic barriers in the destination language acquisition of immigrants," Journal of Economic Behavior & Organization, Elsevier, vol. 105(C), pages 30-50.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:chsofr:v:144:y:2021:i:c:s0960077921000321. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Thayer, Thomas R. (email available below). General contact details of provider: https://www.journals.elsevier.com/chaos-solitons-and-fractals .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.