Enhanced Fake News Detection with Domain-Specific Word Embeddings: A TorchText-Based Method for News Semantics Representation.

Sikhumbuzo Ngwenya; Tinashe Crispen Garidzira

doi:10.33022/ijcs.v14i4.4831

Authors

Sikhumbuzo Ngwenya University of Fort Hare
Tinashe Crispen Garidzira University of Fort Hare

DOI:

https://doi.org/10.33022/ijcs.v14i4.4831

Abstract

The prevalence of misinformation in digital media highlights the need for effective fake news detection methods. This paper presents a novel approach that leverages domain-specific word embeddings, trained specifically on news content, to improve the accuracy of fake news classification. Using TorchText, we generated 128-dimensional embeddings, optimized with Bi-LSTM and GRU models, achieving a test accuracy of 93.51% with a margin of error of 0.255. Two models were developed to classify fake news based on news headlines. The first model using pre-trained embeddings achieved a test accuracy of 96.51% with a margin of error of 0.102, and the second model trained without pre-trained embeddings, resulting in slightly worse resulting in a slightly lower accuracy of 96.23% with a loss of 0.104. The comparison highlights the significant impact of domain-specific integration on model performance. This study demonstrates the value of custom integration to improve semantic representation and fake news detection accuracy, providing a powerful tool to combat misinformation.