Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/107701
PIRA download icon_1.1View/Download Full Text
Title: Use of subword tokenization for domain generation algorithm classification
Authors: Liew, SRC
Law, NF 
Issue Date: 2023
Source: Cybersecurity, 2023, v. 6, no. 1, 49
Abstract: Domain name generation algorithm (DGA) classification is an essential but challenging problem. Both feature-extracting machine learning (ML) methods and deep learning (DL) models such as convolutional neural networks and long short-term memory have been developed. However, the performance of these approaches varies with different types of DGAs. Most features in the ML methods can characterize random-looking DGAs better than word-looking DGAs. To improve the classification performance on word-looking DGAs, subword tokenization is employed for the DL models. Our experimental results proved that the subword tokenization can provide excellent classification performance on the word-looking DGAs. We then propose an integrated scheme that chooses an appropriate method for DGA classification depending on the nature of the DGAs. Results show that the integrated scheme outperformed existing ML and DL methods, and also the subword DL methods.
Keywords: Botnet detection
Domain names
Machine learning-based botnet detection
Network security
Publisher: Springer Singapore
Journal: Cybersecurity 
EISSN: 2523-3246
DOI: 10.1186/s42400-023-00183-8
Rights: © The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
The following publication Liew, S.R.C., Law, N.F. Use of subword tokenization for domain generation algorithm classification. Cybersecurity 6, 49 (2023) is available at https://doi.org/10.1186/s42400-023-00183-8.
Appears in Collections:Journal/Magazine Article

Files in This Item:
File Description SizeFormat 
s42400-023-00183-8.pdf1.56 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page views

43
Citations as of Apr 14, 2025

Downloads

9
Citations as of Apr 14, 2025

SCOPUSTM   
Citations

9
Citations as of Sep 12, 2025

WEB OF SCIENCETM
Citations

2
Citations as of Nov 14, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.