IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i20p3831-d944388.html
   My bibliography  Save this article

The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora

Author

Listed:
  • Alexander Shvets

    (NLP Group, Pompeu Fabra University, 08018 Barcelona, Spain)

  • Leo Wanner

    (Catalan Institute for Research and Advanced Studies (ICREA) and NLP Group, Pompeu Fabra University, 08018 Barcelona, Spain)

Abstract

The speech of native speakers is full of idiosyncrasies. Especially prominent are lexically restricted binary word co-occurrences of the type high esteem , strong tea , run [ an ] experiment , war break(s) out , etc. In lexicography, such co-occurrences are referred to as collocations . Due to their semi-decompositional nature, collocations are of high relevance to a large number of natural language processing applications as well as to second language learning. A substantial body of work exists on the automatic recognition of collocations in textual material and, increasingly also on their semantic classification, even if not yet in the mainstream research. Especially classification with respect to the lexical function (LF) taxonomy, which is the most detailed semantically oriented taxonomy of collocations available to date, proved to be of real use to human speakers and machines alike. The most recent approaches in the field are based on multilingual neural graph transformer models that use explicit syntactic dependencies. Our goal is to explore whether the extension of such a model by a semantic relation extraction network improves its classification performance or whether it already learns the corresponding semantic relations from the dependencies and the sentential contexts, such that an additional relation extraction network will not improve the overall performance. The experiments show that the semantic relation extraction layer indeed improves the overall performance of a graph transformer. However, this improvement is not very significant, such that we can conclude that graph transformers already learn to a certain extent the semantics of the dependencies between the collocation elements.

Suggested Citation

  • Alexander Shvets & Leo Wanner, 2022. "The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora," Mathematics, MDPI, vol. 10(20), pages 1-21, October.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:20:p:3831-:d:944388
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/20/3831/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/20/3831/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:20:p:3831-:d:944388. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.