Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/117348
DC FieldValueLanguage
dc.contributorFaculty of Computer and Mathematical Sciencesen_US
dc.creatorGong, Xen_US
dc.creatorXu, Yen_US
dc.creatorZhang, Sen_US
dc.creatorHe, Cen_US
dc.date.accessioned2026-02-13T01:57:52Z-
dc.date.available2026-02-13T01:57:52Z-
dc.identifier.issn0893-6080en_US
dc.identifier.urihttp://hdl.handle.net/10397/117348-
dc.language.isoenen_US
dc.publisherPergamon Pressen_US
dc.subjectBinary code similarity detectionen_US
dc.subjectBinary similarity analysisen_US
dc.subjectFunction semanticen_US
dc.subjectGraph Matching Networks (GMN)en_US
dc.subjectTransformeren_US
dc.subjectVulnerability detectionen_US
dc.titleex2vec : enhancing assembly code semantics with end-to-end execution-aware embeddingsen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.volume189en_US
dc.identifier.doi10.1016/j.neunet.2025.107506en_US
dcterms.abstractBinary code similarity detection (BSCD), whose goal is to identify and analyze similar or identical functions in compiled binaries, is an essential task in computer security. Recent methods leveraging deep neural networks (DNN) for numerical vector representation of code have achieved significant success. However, these methods primarily adapt techniques from masked language modeling (MLM), encoding code instructions by predicting missing values from an instruction context, which limits their ability to fully capture execution semantics. In this paper, we propose Ex2vec, an innovative end-to-end encoding method that generates high-quality embeddings rich in execution semantics for BCSD. Ex2vec employs a novel pre-training strategy that enables the model to learn the impact of assembly instructions on register states, thus mitigating the reliance on learning the frequency and co-occurrence of the instructions in the assembly context. By simulating the execution of assembly instructions, Ex2Vec accurately captures the semantic features of assembly code, which is further demonstrated by Principal Component Analysis (PCA) that functionally similar instructions cluster closely in the embedding space. Extensive experiments on large datasets validate that Ex2vec performs exceptionally well in binary code similarity detection, surpassing all existing state-of-the-art methods. In real-world vulnerability detection experiments, Ex2Vec exhibits the highest accuracy.en_US
dcterms.accessRightsembargoed accessen_US
dcterms.bibliographicCitationNeural networks, Sept 2025, v. 189, 107506en_US
dcterms.isPartOfNeural networksen_US
dcterms.issued2025-09-
dc.identifier.scopus2-s2.0-105004392532-
dc.identifier.pmid40339297-
dc.identifier.artn107506en_US
dc.description.validate202602 bchyen_US
dc.description.oaNot applicableen_US
dc.identifier.SubFormIDG000940/2025-11-
dc.description.fundingSourceOthersen_US
dc.description.fundingTextThis work was supported in part by the National Natural Science Foundation of China under Grant 62406268, in part by the Central Government Guides Local Science and Technology Development Special Funds, China under Grant [2018]4008, in part by the Science and Technology Platform Project of Guizhou Province, China under grant ZSYS[2025]011, and in part by the Science and Technology Planned Project of Guizhou Province, China under grant [2023]YB449.en_US
dc.description.pubStatusPublisheden_US
dc.date.embargo2027-09-30en_US
dc.description.oaCategoryGreen (AAM)en_US
Appears in Collections:Journal/Magazine Article
Open Access Information
Status embargoed access
Embargo End Date 2027-09-30
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.