Modeling of android software behavior feature and its applications in malicious program analysis

Fan, Ming

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/86137

Title:	Modeling of android software behavior feature and its applications in malicious program analysis
Authors:	Fan, Ming
Degree:	Ph.D.
Issue Date:	2019
Abstract:	Over the past ten years, due to three main advantages (e.g., the openness of source code, the richness of hardware selection, and millions of applications (apps)), Android has become the most popular mobile operating system. Meanwhile, it has also become the major target of mobile malware. The rapid increase in the number of Android malware poses great threats to the smartphone users, such as financial charge, information collection, and remote control. Thus, the in-depth study of the security issues of mobile apps is of great significance to the sound development of the smartphone ecosystem. However, existing malware analysis approaches are facing three main challenges, including morphological diversity of malicious code, lack of labeled dataset, and labor-intensive manual feature engineering process. Therefore, it is important to propose effective and efficient malware analysis approaches. To further study two sub problems in mobile security, i.e., malware detection and familial identification, two kinds of behavior models and four different types of features are proposed from the novel perspective of feature engineering. In this thesis, malware detection aim to detect whether a given app is malicious or not and familial identification aim to classify the malware samples to their corresponding families. Firstly, to overcome the low accuracy and efficiency problems caused by the morphological diversity of malicious code, the sensitive subgraph is first constructed as our analysis model. It can not only depict the sensitive behavior but also can be resilient to obfuscation techniques. Based on the sensitive subgraph, for malware detection, a structure-based feature called maximum sensitive subgraph is proposed to depict the most sensitive behavior of a given app. Based on the proposed feature, this study designs and implements DAPASA, a approach that detects Android piggybacked apps. DAPASA can not only detect the piggybacked apps dependently but also has the ability to complement permission-and API-based approaches from a new perspective of the invocation structure. For familial identification, a new feature called frequent sensitive subgraphs (fregraphs) is proposed to represent the common behaviors of malware samples that belong to the same family. Then, this study designs and implements FalDroid, an approach that automatically classifies Android malware into their corresponding families and selects representative malware samples in each family accordance with fregraphs. In this way, FalDroid can effectively reduce the analytical workload and accelerate malware analysis. Then, to overcome the limitation of existing supervised learning approaches in handling unlabeled dataset, the graph structure of sensitive subgraph is abstracted by leveraging the graph embedding techniques and a new feature called SRA is proposed to depict the similarity relationships of structural roles of sensitive API call nodes in a graph. The SRA feature can not only retain the semantic information of the graph but also can transform the high-cost graph matching into an easy-to-compute similarity calculation between vectors. Then this study designs and implements GefDroid, an approach that constructs a malware link network to depict the similarity relationships between all samples based on the SRA feature. In this way, this study can handle the unlabeled samples with unsupervised learning. After that, to ease the labor-intensive manual feature engineering process, this study proposes techniques that summarize the existing knowledge contained in magnanimity information of natural language documents and generates a novel type of features called sensitive behavior, which is represented as verb-objective phrases that are easy to understand. This study designs and implements CTDroid, an automatic feature engineering system. By using CTDroid, a set of informative features is constructed from technical blogs that can be utilized for Android malware analysis. The four approaches are evaluated on the datasets that consist of real benign apps and malware samples. The results of extensive experiments demonstrate that: DAPASA achieves good performance on detecting piggybacked apps with a true positive rate of 95% and a false positive rate of 0.7%; FalDroid can correctly classify 94.2% of malware samples into their families using approximately 4.6 seconds per app; GefDroid can achieve high agreements (0.707-0.883 in term of NMI) between our clustering results and the ground truth datasets; The features extracted by CTDroid perform well for malware analysis and are more informative than those of state-of-the-art approaches.
Subjects:	Hong Kong Polytechnic University -- Dissertations Application software -- Development Mobile communication systems -- Security measures
Pages:	iv, xviii, 166 pages : color illustrations
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/10175

Show full item record

Page views

178

Last Week
3

Last month

Citations as of Dec 21, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM