Leveraging Large Language Models for Multi-Domain Malware and Vulnerability Detection
DOI:
https://doi.org/10.33022/ijcs.v14i2.4769Abstract
This study presents the application of deep learning methodologies, particularly leveraging GPT-2, to enhance various aspects of cybersecurity, including source code vulnerability detection, malware detection, and mobile malware security. The first part introduces a method for identifying security vulnerabilities in C/C++ source code by fine-tuning a GPT-2 model on diverse open-source code datasets. The results show that the GPT-2 model, using default tokenizers and encoders, performs comparably to other deep learning methods in vulnerability detection. The second part explores the use of GPT-2 for improving malware detection, proposing a novel approach that classifies malware through opcode snippets and textual features. Fine-tuning GPT-2 on a diverse dataset of malware and benign software demonstrates enhanced detection accuracy and reduced false positives. Lastly, the study investigates mobile malware detection, proposing a framework that combines static and dynamic analysis using deep learning to detect unseen malware variants. The framework is evaluated on a comprehensive dataset, showing improved accuracy and fewer false positives than traditional methods. This integrated approach highlights the potential of deep learning, particularly GPT-2, to address the challenges of modern cybersecurity, offering robust solutions across multiple domains.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Attila Magyar

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.