Application of Oversampling to Hate Speech Classification Using Bidirectional Encoder Representations from Transformers
Keywords:
Classification, SMOTE, BERT, Class Imbalance, OversamplingAbstract
The problem of class imbalance is a common challenge in classification model building, especially in the context of hate speech. This study evaluates the effectiveness of the SMOTE oversampling technique in improving the performance of hate speech classification models using BERT. The dataset used has significant class imbalance, with the largest number of samples in the Hate class, followed by Offensive, and Neither. Two experiments were conducted: one without using SMOTE and one with SMOTE applied. Results showed that the application of SMOTE improved the overall model accuracy from 85% to 88%. Precision for the Offensive minority class increased from 0.33 to 0.45, although recall decreased from 0.45 to 0.28. In the Neither class, the F1-score increased, indicating an improvement in the balance between precision and recall. Performance on the majority Hate class remained stable, indicating that SMOTE did not interfere with the model's performance on the already dominant class. Overall, the application of SMOTE provides significant benefits in handling class imbalance, especially in improving precision for minority classes, resulting in a more accurate classification model.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Risal Syahwaluddin, Debby Alita
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.