IDEAS home Printed from https://ideas.repec.org/a/baq/taprar/v1y2024i2p20-24.html
   My bibliography  Save this article

Development of fuzzy search method for creating an efficient information search system in text data

Author

Listed:
  • Kyrylo Kleshch

    (National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»)

Abstract

The object of research is the processes of effective search for information in a set of textual data. The subject of the research is the fuzzy search method, which will allow to effectively solve the problem of searching for information in a set of textual data. The paper considers the process of developing a fuzzy search method, which consists of 9 consecutive steps and is required for a quick search for matches in a large set of text data. Based on this method, it is proposed to create a fuzzy search system that will solve the problem of finding the most relevant documents from a set of such documents.The proposed fuzzy search method combines the advantages of algorithms based on deterministic finite automata and algorithms based on dynamic programming for calculating the Damerau-Levenshtein distance. Such a combination allows to implement the symbol similarity table in an optimal way. As part of the work, an approach for creating a symbol similarity table was proposed and an example of such a table was created for symbols from the English alphabet, which allows to find the degree of similarity between two symbols with constant asymptotics and to convert the current symbol into its basic counterpart. For document filtering, a metric was developed to evaluate the correspondence of text data to a search phrase, which simultaneously takes into account the number of found and not found characters and the number of found and not found words.The Damerau-Levenstein algorithm allows to find the edit distance between two words, taking into account the following types of errors: substitution, addition, deletion, and transposition of characters. The work proposed a modification of this algorithm by using a similarity table to more accurately estimate the editing distance between two words.The developed method makes it possible to create a fuzzy search system that will help find the desired results faster and increase the relevance of the obtained results by sorting them according to the values of the proposed test data similarity metric.

Suggested Citation

  • Kyrylo Kleshch, 2024. "Development of fuzzy search method for creating an efficient information search system in text data," Technology audit and production reserves, PC TECHNOLOGY CENTER, vol. 1(2(75)), pages 20-24, February.
  • Handle: RePEc:baq:taprar:v:1:y:2024:i:2:p:20-24
    DOI: 10.15587/2706-5448.2024.298425
    as

    Download full text from publisher

    File URL: https://journals.uran.ua/tarp/article/download/298425/291384
    Download Restriction: no

    File URL: https://libkey.io/10.15587/2706-5448.2024.298425?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:baq:taprar:v:1:y:2024:i:2:p:20-24. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Iryna Prudius (email available below). General contact details of provider: https://journals.uran.ua/tarp/issue/archive .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.