Penerapan Oversampling Pada Klasifikasi Ujaran Kebencian Menggunakan Bidirectional Encoder Representations from Transformers

Risal Syahwaluddin; Debby  Alita

doi:10.33022/ijcs.v13i4.4295

Authors

Risal Syahwaluddin Unversitas Teknokrat Indonesia
Debby Alita Universitas Teknokrat Indonesia

DOI:

https://doi.org/10.33022/ijcs.v13i4.4295

Keywords:

Classification, SMOTE, BERT, Class Imbalance, Oversampling

Abstract

The problem of class imbalance is a common challenge in classification model building, especially in the context of hate speech. This study evaluates the effectiveness of the SMOTE oversampling technique in improving the performance of hate speech classification models using BERT. The dataset used has significant class imbalance, with the largest number of samples in the Hate class, followed by Offensive, and Neither. Two experiments were conducted: one without using SMOTE and one with SMOTE applied. Results showed that the application of SMOTE improved the overall model accuracy from 85% to 88%. Precision for the Offensive minority class increased from 0.33 to 0.45, although recall decreased from 0.45 to 0.28. In the Neither class, the F1-score increased, indicating an improvement in the balance between precision and recall. Performance on the majority Hate class remained stable, indicating that SMOTE did not interfere with the model's performance on the already dominant class. Overall, the application of SMOTE provides significant benefits in handling class imbalance, especially in improving precision for minority classes, resulting in a more accurate classification model.