IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v14y2022i9p4909-d797323.html
   My bibliography  Save this article

A Hybrid Model for the Measurement of the Similarity between Twitter Profiles

Author

Listed:
  • Niloufar Shoeibi

    (BISITE Research Group, University of Salamanca, 37007 Salamanca, Spain)

  • Nastaran Shoeibi

    (Faculty of Science, University of Salamanca, 37008 Salamanca, Spain)

  • Pablo Chamoso

    (BISITE Research Group, University of Salamanca, 37007 Salamanca, Spain)

  • Zakieh Alizadehsani

    (BISITE Research Group, University of Salamanca, 37007 Salamanca, Spain)

  • Juan Manuel Corchado

    (BISITE Research Group, University of Salamanca, 37007 Salamanca, Spain)

Abstract

Social media platforms have been an undeniable part of our lifestyle for the past decade. Analyzing the information that is being shared is a crucial step to understanding human behavior. Social media analysis aims to guarantee a better experience for the user and to increase user satisfaction. To draw any further conclusions, first, it is necessary to know how to compare users. In this paper, a hybrid model is proposed to measure the degree of similarity between Twitter profiles by calculating features related to the users’ behavioral habits. For this, first, the timeline of each profile was extracted using the official TwitterAPI. Then, three aspects of a profile were deliberated in parallel. Behavioral ratios are time-series-related information showing the consistency and habits of the user. Dynamic time warping was utilized to compare the behavioral ratios of two profiles. Next, the audience network was extracted for each user, and to estimate the similarity of two sets, the Jaccard similarity was used. Finally, for the content similarity measurement, the tweets were preprocessed using the feature extraction method; TF-IDF and DistilBERT were employed for feature extraction and then compared using the cosine similarity method. The results showed that TF-IDF had slightly better performance; it was therefore selected for use in the model. When measuring the similarity level of different profiles, a Random Forest classification model was used, which was trained on 19,900 users, revealing a 0.97 accuracy in detecting similar profiles from different ones. As a step further, this convoluted similarity measurement can find users with very short distances, which are indicative of duplicate users.

Suggested Citation

  • Niloufar Shoeibi & Nastaran Shoeibi & Pablo Chamoso & Zakieh Alizadehsani & Juan Manuel Corchado, 2022. "A Hybrid Model for the Measurement of the Similarity between Twitter Profiles," Sustainability, MDPI, vol. 14(9), pages 1-19, April.
  • Handle: RePEc:gam:jsusta:v:14:y:2022:i:9:p:4909-:d:797323
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/14/9/4909/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/14/9/4909/
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:14:y:2022:i:9:p:4909-:d:797323. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.