Use of subword tokenization for domain generation algorithm classification

Liew, SRC; Law, NF

doi:10.1186/s42400-023-00183-8

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/107701

Title:	Use of subword tokenization for domain generation algorithm classification
Authors:	Liew, SRC Law, NF
Issue Date:	2023
Source:	Cybersecurity, 2023, v. 6, no. 1, 49
Abstract:	Domain name generation algorithm (DGA) classification is an essential but challenging problem. Both feature-extracting machine learning (ML) methods and deep learning (DL) models such as convolutional neural networks and long short-term memory have been developed. However, the performance of these approaches varies with different types of DGAs. Most features in the ML methods can characterize random-looking DGAs better than word-looking DGAs. To improve the classification performance on word-looking DGAs, subword tokenization is employed for the DL models. Our experimental results proved that the subword tokenization can provide excellent classification performance on the word-looking DGAs. We then propose an integrated scheme that chooses an appropriate method for DGA classification depending on the nature of the DGAs. Results show that the integrated scheme outperformed existing ML and DL methods, and also the subword DL methods.
Keywords:	Botnet detection Domain names Machine learning-based botnet detection Network security
Publisher:	Springer Singapore
Journal:	Cybersecurity
EISSN:	2523-3246
DOI:	10.1186/s42400-023-00183-8
Rights:	© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The following publication Liew, S.R.C., Law, N.F. Use of subword tokenization for domain generation algorithm classification. Cybersecurity 6, 49 (2023) is available at https://doi.org/10.1186/s42400-023-00183-8.
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
s42400-023-00183-8.pdf		1.56 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show full item record

Page views

43

Citations as of Apr 14, 2025

Downloads

9

Citations as of Apr 14, 2025

SCOPUS^TM
Citations

13

Citations as of Apr 3, 2026

WEB OF SCIENCE^TM
Citations

2

Citations as of Nov 14, 2024

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM