IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0020109.html
   My bibliography  Save this article

On the Accuracy of Language Trees

Author

Listed:
  • Simone Pompei
  • Vittorio Loreto
  • Francesca Tria

Abstract

Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it.

Suggested Citation

  • Simone Pompei & Vittorio Loreto & Francesca Tria, 2011. "On the Accuracy of Language Trees," PLOS ONE, Public Library of Science, vol. 6(6), pages 1-11, June.
  • Handle: RePEc:plo:pone00:0020109
    DOI: 10.1371/journal.pone.0020109
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0020109
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0020109&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0020109?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Russell D. Gray & Quentin D. Atkinson, 2003. "Language-tree divergence times support the Anatolian theory of Indo-European origin," Nature, Nature, vol. 426(6965), pages 435-439, November.
    2. Mark Pagel & Quentin D. Atkinson & Andrew Meade, 2007. "Frequency of word-use predicts rates of lexical evolution throughout Indo-European history," Nature, Nature, vol. 449(7163), pages 717-720, October.
    3. Wichmann, Søren & Holman, Eric W. & Bakker, Dik & Brown, Cecil H., 2010. "Evaluating linguistic distance measures," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 389(17), pages 3632-3639.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Job Schepens & Ton Dijkstra & Franc Grootjen & Walter J B van Heuven, 2013. "Cross-Language Distributions of High Frequency and Phonetically Similar Cognates," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-15, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nico Neureiter & Peter Ranacher & Nour Efrat-Kowalsky & Gereon A. Kaiping & Robert Weibel & Paul Widmer & Remco R. Bouckaert, 2022. "Detecting contact in language trees: a Bayesian phylogenetic model with horizontal transfer," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-14, December.
    2. Taraka Rama, 2013. "Phonotactic Diversity Predicts the Time Depth of the World’s Language Families," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-9, May.
    3. Job Schepens & Ton Dijkstra & Franc Grootjen & Walter J B van Heuven, 2013. "Cross-Language Distributions of High Frequency and Phonetically Similar Cognates," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-15, May.
    4. Klaus Desmet & Ignacio Ortuño-Ortín & Romain Wacziarg, 2009. "The political economy of ethnolinguistic cleavages," Working Papers 2009-17, Instituto Madrileño de Estudios Avanzados (IMDEA) Ciencias Sociales.
    5. Victor Ginsburgh & Shlomo Weber, 2020. "The Economics of Language," Journal of Economic Literature, American Economic Association, vol. 58(2), pages 348-404, June.
    6. Kristen, Cornelia & Mühlau, Peter & Schacht, Diana, 2016. "Language acquisition of recently arrived immigrants in England, Germany, Ireland, and the Netherlands," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 16(2), pages 180-212.
    7. Dustin S. Stoltz & Marshall A. Taylor, 2019. "Concept Mover’s Distance: measuring concept engagement via word embeddings in texts," Journal of Computational Social Science, Springer, vol. 2(2), pages 293-313, July.
    8. Aparicio Fenoll, Ainoa & Kuehn, Zoë, 2016. "Education Policies and Migration across European Countries," IZA Discussion Papers 9755, Institute of Labor Economics (IZA).
    9. Andrew Dickens, 2022. "Understanding Ethnolinguistic Differences: The Roles of Geography and Trade," The Economic Journal, Royal Economic Society, vol. 132(643), pages 953-980.
    10. Ainhoa Aparicio Fenoll & Zoë Kuehn, 2017. "Compulsory Schooling Laws and Migration Across European Countries," Demography, Springer;Population Association of America (PAA), vol. 54(6), pages 2181-2200, December.
    11. Eduardo G Altmann & Janet B Pierrehumbert & Adilson E Motter, 2011. "Niche as a Determinant of Word Fate in Online Groups," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-12, May.
    12. Stanisz, Tomasz & Drożdż, Stanisław & Kwapień, Jarosław, 2023. "Universal versus system-specific features of punctuation usage patterns in major Western languages," Chaos, Solitons & Fractals, Elsevier, vol. 168(C).
    13. Matthew J. Baker, 2021. "Foundations of the Age-Area Hypothesis," Palgrave Communications, Palgrave Macmillan, vol. 8(1), pages 1-17, December.
    14. Stelios Michalopoulos, 2012. "The Origins of Ethnolinguistic Diversity," American Economic Review, American Economic Association, vol. 102(4), pages 1508-1539, June.
    15. Carl Müller-Crepon & Yannick Pengl & Nils-Christian Bormann, 2022. "Linking Ethnic Data from Africa (LEDA)," Journal of Peace Research, Peace Research Institute Oslo, vol. 59(3), pages 425-435, May.
    16. Black, Nicole & Kunz, Johannes S., 2024. "The intergenerational effects of language proficiency on child health outcomes: Evidence from survey- and Census-matched health care records," Journal of Economic Behavior & Organization, Elsevier, vol. 225(C), pages 136-152.
    17. Ingo Isphording, 2013. "Returns to Local and Foreign Language Skills – Causal Evidence from Spain," Ruhr Economic Papers 0398, Rheinisch-Westfälisches Institut für Wirtschaftsforschung, Ruhr-Universität Bochum, Universität Dortmund, Universität Duisburg-Essen.
    18. repec:zbw:rwirep:0398 is not listed on IDEAS
    19. Isphording, Ingo E., 2014. "Disadvantages of linguistic origin—Evidence from immigrant literacy scores," Economics Letters, Elsevier, vol. 123(2), pages 236-239.
    20. Ackermann, Malte, 2013. "The communication of innovation: an empirical analysis of the advancement of innovation," Discussion Papers on Strategy and Innovation 13-02, Philipps-University Marburg, Department of Technology and Innovation Management (TIM).
    21. Damian Ruck & R. Alexander Bentley & Alberto Acerbi & Philip Garnett & Daniel J. Hruschka, 2017. "Role Of Neutral Evolution In Word Turnover During Centuries Of English Word Popularity," Advances in Complex Systems (ACS), World Scientific Publishing Co. Pte. Ltd., vol. 20(06n07), pages 1-16, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0020109. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.