IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i3p508-d1039468.html
   My bibliography  Save this article

It’s All in the Embedding! Fake News Detection Using Document Embeddings

Author

Listed:
  • Ciprian-Octavian Truică

    (Computer Science and Engineering Department, Faculty of Automatic Control and Computers, University Politehnica of Bucharest, RO-060042 Bucharest, Romania
    These authors contributed equally to this work.)

  • Elena-Simona Apostol

    (Computer Science and Engineering Department, Faculty of Automatic Control and Computers, University Politehnica of Bucharest, RO-060042 Bucharest, Romania
    These authors contributed equally to this work.)

Abstract

With the current shift in the mass media landscape from journalistic rigor to social media, personalized social media is becoming the new norm. Although the digitalization progress of the media brings many advantages, it also increases the risk of spreading disinformation, misinformation, and malformation through the use of fake news. The emergence of this harmful phenomenon has managed to polarize society and manipulate public opinion on particular topics, e.g., elections, vaccinations, etc. Such information propagated on social media can distort public perceptions and generate social unrest while lacking the rigor of traditional journalism. Natural Language Processing and Machine Learning techniques are essential for developing efficient tools that can detect fake news. Models that use the context of textual data are essential for resolving the fake news detection problem, as they manage to encode linguistic features within the vector representation of words. In this paper, we propose a new approach that uses document embeddings to build multiple models that accurately label news articles as reliable or fake. We also present a benchmark on different architectures that detect fake news using binary or multi-labeled classification. We evaluated the models on five large news corpora using accuracy, precision, and recall. We obtained better results than more complex state-of-the-art Deep Neural Network models. We observe that the most important factor for obtaining high accuracy is the document encoding, not the classification model's complexity.

Suggested Citation

  • Ciprian-Octavian Truică & Elena-Simona Apostol, 2023. "It’s All in the Embedding! Fake News Detection Using Document Embeddings," Mathematics, MDPI, vol. 11(3), pages 1-29, January.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:3:p:508-:d:1039468
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/3/508/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/3/508/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Nida Aslam & Irfan Ullah Khan & Farah Salem Alotaibi & Lama Abdulaziz Aldaej & Asma Khaled Aldubaikil & M. Irfan Uddin, 2021. "Fake Detect: A Deep Learning Ensemble Model for Fake News Detection," Complexity, Hindawi, vol. 2021, pages 1-8, April.
    2. Ciprian-Octavian Truică & Elena-Simona Apostol & Jérôme Darmont & Ira Assent, 2021. "TextBenDS: a Generic Textual Data Benchmark for Distributed Systems," Information Systems Frontiers, Springer, vol. 23(1), pages 81-100, February.
    3. Hewamalage, Hansika & Bergmeir, Christoph & Bandara, Kasun, 2021. "Recurrent Neural Networks for Time Series Forecasting: Current status and future directions," International Journal of Forecasting, Elsevier, vol. 37(1), pages 388-427.
    4. Alexandre Bovet & Hernán A. Makse, 2019. "Influence of fake news in Twitter during the 2016 US presidential election," Nature Communications, Nature, vol. 10(1), pages 1-14, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Noha Alnazzawi & Najlaa Alsaedi & Fahad Alharbi & Najla Alaswad, 2022. "Using Social Media to Detect Fake News Information Related to Product Marketing: The FakeAds Corpus," Data, MDPI, vol. 7(4), pages 1-13, April.
    2. Ciprian-Octavian Truică & Elena-Simona Apostol, 2022. "MisRoBÆRTa: Transformers versus Misinformation," Mathematics, MDPI, vol. 10(4), pages 1-25, February.
    3. Uğur Baloğlu, 2021. "Trolls, Pressure and Agenda: The discursive fight on Twitter in Turkey," Media and Communication, Cogitatio Press, vol. 9(4), pages 39-51.
    4. Lee, Yoonjae & Ha, Byeongmin & Hwangbo, Soonho, 2022. "Generative model-based hybrid forecasting model for renewable electricity supply using long short-term memory networks: A case study of South Korea's energy transition policy," Renewable Energy, Elsevier, vol. 200(C), pages 69-87.
    5. Fang, Lei & He, Bin, 2023. "A deep learning framework using multi-feature fusion recurrent neural networks for energy consumption forecasting," Applied Energy, Elsevier, vol. 348(C).
    6. Mujtaba Ali Isani, 2021. "Methodological Problems of Using Arabic-Language Twitter as a Gauge for Arab Attitudes Toward Politics and Society," Contemporary Review of the Middle East, , vol. 8(1), pages 22-35, March.
    7. Oscar Claveria & Enric Monte & Petar Soric & Salvador Torra, 2022. ""An application of deep learning for exchange rate forecasting"," IREA Working Papers 202201, University of Barcelona, Research Institute of Applied Economics, revised Jan 2022.
    8. Nwaibeh, E.A. & Chikwendu, C.R., 2023. "A deterministic model of the spread of scam rumor and its numerical simulations," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 207(C), pages 111-129.
    9. Matthew R DeVerna & Rachith Aiyappa & Diogo Pacheco & John Bryden & Filippo Menczer, 2024. "Identifying and characterizing superspreaders of low-credibility content on Twitter," PLOS ONE, Public Library of Science, vol. 19(5), pages 1-17, May.
    10. Xipeng Liu & Xinmiao Li, 2024. "Unbiased evaluation of ranking algorithms applied to the Chinese green patents citation network," Scientometrics, Springer;Akadémiai Kiadó, vol. 129(6), pages 2999-3021, June.
    11. Natei Ermias Benti & Mesfin Diro Chaka & Addisu Gezahegn Semie, 2023. "Forecasting Renewable Energy Generation with Machine Learning and Deep Learning: Current Advances and Future Prospects," Sustainability, MDPI, vol. 15(9), pages 1-33, April.
    12. Paolo Libenzio Brignoli & Alessandro Varacca & Cornelis Gardebroek & Paolo Sckokai, 2024. "Machine learning to predict grains futures prices," Agricultural Economics, International Association of Agricultural Economists, vol. 55(3), pages 479-497, May.
    13. Wellens, Arnoud P. & Boute, Robert N. & Udenio, Maximiliano, 2024. "Simplifying tree-based methods for retail sales forecasting with explanatory variables," European Journal of Operational Research, Elsevier, vol. 314(2), pages 523-539.
    14. Saâdaoui, Foued & Rabbouch, Hana, 2024. "Financial forecasting improvement with LSTM-ARFIMA hybrid models and non-Gaussian distributions," Technological Forecasting and Social Change, Elsevier, vol. 206(C).
    15. Peter D. Lunn & Cameron A. Belton & Ciarán Lavin & Féidhlim P. McGowan & Shane Timmons & Deirdre A. Robertson, 2020. "Using behavioral science to help fight the Coronavirus," Journal of Behavioral Public Administration, Center for Experimental and Behavioral Public Administration, vol. 3(1).
    16. Jujie Wang & Zhenzhen Zhuang & Liu Feng, 2022. "Intelligent Optimization Based Multi-Factor Deep Learning Stock Selection Model and Quantitative Trading Strategy," Mathematics, MDPI, vol. 10(4), pages 1-19, February.
    17. James Flamino & Alessandro Galeazzi & Stuart Feldman & Michael W. Macy & Brendan Cross & Zhenkun Zhou & Matteo Serafino & Alexandre Bovet & Hernán A. Makse & Boleslaw K. Szymanski, 2023. "Political polarization of news media and influencers on Twitter in the 2016 and 2020 US presidential elections," Nature Human Behaviour, Nature, vol. 7(6), pages 904-916, June.
    18. Matthew Spradling & Jeremy Straub, 2022. "Evaluation of the Factors That Impact the Perception of Online Content Trustworthiness by Income, Political Affiliation and Online Usage Time," Future Internet, MDPI, vol. 14(11), pages 1-55, November.
    19. Lodh, Rishab & Dey, Oindrila, 2023. "“Fake news alert!”: A game of misinformation and news consumption behavior," MPRA Paper 118371, University Library of Munich, Germany.
    20. Xu, Shuqi & Mariani, Manuel Sebastian & Lü, Linyuan & Medo, Matúš, 2020. "Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data," Journal of Informetrics, Elsevier, vol. 14(1).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:3:p:508-:d:1039468. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.