IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i12p2700-d1171140.html
   My bibliography  Save this article

A Method for Perception and Assessment of Semantic Textual Similarities in English

Author

Listed:
  • Omar Zatarain

    (Department of Computer Science and Engineering, CUValles, University of Guadalajara, Guadalajara 46600, Jalisco, Mexico)

  • Jesse Yoe Rumbo-Morales

    (Department of Computer Science and Engineering, CUValles, University of Guadalajara, Guadalajara 46600, Jalisco, Mexico)

  • Silvia Ramos-Cabral

    (Department of Computer Science and Engineering, CUValles, University of Guadalajara, Guadalajara 46600, Jalisco, Mexico)

  • Gerardo Ortíz-Torres

    (Department of Computer Science and Engineering, CUValles, University of Guadalajara, Guadalajara 46600, Jalisco, Mexico)

  • Felipe d. J. Sorcia-Vázquez

    (Department of Computer Science and Engineering, CUValles, University of Guadalajara, Guadalajara 46600, Jalisco, Mexico)

  • Iván Guillén-Escamilla

    (Department of Natural and Exact Sciences, CUValles, University of Guadalajara, Carr. Guadalajara-Ameca Km. 45.5 Ameca, Guadalajara 46600, Jalisco, Mexico)

  • Juan Carlos Mixteco-Sánchez

    (Department of Natural and Exact Sciences, CUValles, University of Guadalajara, Carr. Guadalajara-Ameca Km. 45.5 Ameca, Guadalajara 46600, Jalisco, Mexico)

Abstract

This research proposes a method for the detection of semantic similarities in text snippets; the method achieves an unsupervised extraction and comparison of semantic information by mimicking skills for the identification of clauses and possible verb conjugations, the selection of the most accurate organization of the parts of speech, and similarity analysis by a direct comparison on the parts of speech from a pair of text snippets. The method for the extraction of the parts of speech in each text exploits a knowledge base structured as a dictionary and a thesaurus to identify the possible labels of each word and its synonyms. The method consists of the processes of perception, debiasing, reasoning and assessment. The perception module decomposes the text into blocks of information focused on the elicitation of the parts of speech. The debiasing module reorganizes the blocks of information due to the biases that may be produced in the previous perception. The reasoning module finds the similarities between blocks from two texts through analyses of similarities on synonymy, morphological properties, and the relative position of similar concepts within the texts. The assessment generates a judgement on the output produced by the reasoning as the averaged similarity assessment obtained from the parts of speech similarities of blocks. The proposed method is implemented on an English language version to exploit a knowledge base in English for the extraction of the similarities and differences of texts. The system implements a set of syntactic and logical rules that enable the autonomous reasoning that uses a knowledge base regardless of the concepts and knowledge domains of the latter. A system developed with the proposed method is tested on the “test” dataset used on the SemEval 2017 competition on seven knowledge bases compiled from six dictionaries and two thesauruses. The results indicate that the performance of the method increases as the degree of completeness of concepts and their relations increase, and the Pearson correlation for the most accurate knowledge base is 77%.

Suggested Citation

  • Omar Zatarain & Jesse Yoe Rumbo-Morales & Silvia Ramos-Cabral & Gerardo Ortíz-Torres & Felipe d. J. Sorcia-Vázquez & Iván Guillén-Escamilla & Juan Carlos Mixteco-Sánchez, 2023. "A Method for Perception and Assessment of Semantic Textual Similarities in English," Mathematics, MDPI, vol. 11(12), pages 1-20, June.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:12:p:2700-:d:1171140
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/12/2700/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/12/2700/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Yingxu Wang & Omar A. Zatarain, 2017. "A Novel Machine Learning Algorithm for Cognitive Concept Elicitation by Cognitive Robots," International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), IGI Global, vol. 11(3), pages 31-46, July.
    2. H. W. Kuhn, 1955. "The Hungarian method for the assignment problem," Naval Research Logistics Quarterly, John Wiley & Sons, vol. 2(1‐2), pages 83-97, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Weiqiang Shen & Chuanlin Zhang & Xiaona Zhang & Jinglun Shi, 2019. "A fully distributed deployment algorithm for underwater strong k-barrier coverage using mobile sensors," International Journal of Distributed Sensor Networks, , vol. 15(4), pages 15501477198, April.
    2. András Frank, 2005. "On Kuhn's Hungarian Method—A tribute from Hungary," Naval Research Logistics (NRL), John Wiley & Sons, vol. 52(1), pages 2-5, February.
    3. Amit Kumar & Anila Gupta, 2013. "Mehar’s methods for fuzzy assignment problems with restrictions," Fuzzy Information and Engineering, Springer, vol. 5(1), pages 27-44, March.
    4. Nisse, Nicolas & Salch, Alexandre & Weber, Valentin, 2023. "Recovery of disrupted airline operations using k-maximum matching in graphs," European Journal of Operational Research, Elsevier, vol. 309(3), pages 1061-1072.
    5. Parvin Ahmadi & Iman Gholampour & Mahmoud Tabandeh, 2018. "Cluster-based sparse topical coding for topic mining and document clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 537-558, September.
    6. Bachtenkirch, David & Bock, Stefan, 2022. "Finding efficient make-to-order production and batch delivery schedules," European Journal of Operational Research, Elsevier, vol. 297(1), pages 133-152.
    7. Chenchen Ma & Jing Ouyang & Gongjun Xu, 2023. "Learning Latent and Hierarchical Structures in Cognitive Diagnosis Models," Psychometrika, Springer;The Psychometric Society, vol. 88(1), pages 175-207, March.
    8. Winker, Peter, 2023. "Visualizing Topic Uncertainty in Topic Modelling," VfS Annual Conference 2023 (Regensburg): Growth and the "sociale Frage" 277584, Verein für Socialpolitik / German Economic Association.
    9. Tran Hoang Hai, 2020. "Estimation of volatility causality in structural autoregressions with heteroskedasticity using independent component analysis," Statistical Papers, Springer, vol. 61(1), pages 1-16, February.
    10. P. Senthil Kumar & R. Jahir Hussain, 2016. "A Simple Method for Solving Fully Intuitionistic Fuzzy Real Life Assignment Problem," International Journal of Operations Research and Information Systems (IJORIS), IGI Global, vol. 7(2), pages 39-61, April.
    11. Caplin, Andrew & Leahy, John, 2020. "Comparative statics in markets for indivisible goods," Journal of Mathematical Economics, Elsevier, vol. 90(C), pages 80-94.
    12. Biró, Péter & Gudmundsson, Jens, 2021. "Complexity of finding Pareto-efficient allocations of highest welfare," European Journal of Operational Research, Elsevier, vol. 291(2), pages 614-628.
    13. Sallam, Gamal & Baroudi, Uthman, 2020. "A two-stage framework for fair autonomous robot deployment using virtual forces," Transportation Research Part A: Policy and Practice, Elsevier, vol. 141(C), pages 35-50.
    14. Péter Biró & Flip Klijn & Xenia Klimentova & Ana Viana, 2021. "Shapley-Scarf Housing Markets: Respecting Improvement, Integer Programming, and Kidney Exchange," Working Papers 1235, Barcelona School of Economics.
    15. Michal Brylinski, 2014. "eMatchSite: Sequence Order-Independent Structure Alignments of Ligand Binding Pockets in Protein Models," PLOS Computational Biology, Public Library of Science, vol. 10(9), pages 1-15, September.
    16. Chiwei Yan & Helin Zhu & Nikita Korolko & Dawn Woodard, 2020. "Dynamic pricing and matching in ride‐hailing platforms," Naval Research Logistics (NRL), John Wiley & Sons, vol. 67(8), pages 705-724, December.
    17. Fanrong Xie & Anuj Sharma & Zuoan Li, 2022. "An alternate approach to solve two-level priority based assignment problem," Computational Optimization and Applications, Springer, vol. 81(2), pages 613-656, March.
    18. Talmor, Irit, 2022. "Solving the problem of maximizing diversity in public sector teams," Socio-Economic Planning Sciences, Elsevier, vol. 81(C).
    19. Igor Custodio João & Andre Lucas & Julia Schaumburg, 2021. "Clustering Dynamics and Persistence for Financial Multivariate Panel Data," Tinbergen Institute Discussion Papers 21-040/III, Tinbergen Institute.
    20. Yan, Pengyu & Lee, Chung-Yee & Chu, Chengbin & Chen, Cynthia & Luo, Zhiqin, 2021. "Matching and pricing in ride-sharing: Optimality, stability, and financial sustainability," Omega, Elsevier, vol. 102(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:12:p:2700-:d:1171140. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.