Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/109492
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineering-
dc.creatorLiu, T-
dc.creatorLee, KA-
dc.creatorWang, Q-
dc.creatorLi, H-
dc.date.accessioned2024-11-01T08:04:37Z-
dc.date.available2024-11-01T08:04:37Z-
dc.identifier.urihttp://hdl.handle.net/10397/109492-
dc.language.isoenen_US
dc.rights© 2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/en_US
dc.rightsThe following publication T. Liu, K. A. Lee, Q. Wang and H. Li, "Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2324-2337, 2024 is available at https://doi.org/10.1109/TASLP.2024.3385277.en_US
dc.subject2D CNNen_US
dc.subjectResNeten_US
dc.subjectSpeaker recognitionen_US
dc.subjectSpeaker verificationen_US
dc.subjectStride configurationen_US
dc.subjectTemporal resolutionen_US
dc.titleGolden Gemini is all you need : finding the sweet spots for speaker verificationen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.spage2324-
dc.identifier.epage2337-
dc.identifier.volume32-
dc.identifier.doi10.1109/TASLP.2024.3385277-
dcterms.abstractThe residual neural networks (ResNet) demonstrate the impressive performance in automatic speaker verification (ASV). They treat the time and frequency dimensions equally, following the default stride configuration designed for image recognition, where the horizontal and vertical axes exhibit similarities. This approach ignores the fact that time and frequency are asymmetric in speech representation. We address this issue and postulate Golden-Gemini Hypothesis, which posits the prioritization of temporal resolution over frequency resolution for ASV. The hypothesis is verified by conducting a systematic study on the impact of temporal and frequency resolutions on the performance, using a trellis diagram to represent the stride space. We further identify two optimal points, namely Golden Gemini , which serves as a guiding principle for designing 2D ResNet-based ASV models. By following the principle, a state-of-the-art ResNet baseline model gains a significant performance improvement on VoxCeleb, SITW, and CNCeleb datasets with 7.70%/11.76% average EER/minDCF reductions, respectively, across different network depths (ResNet18, 34, 50, and 101), while reducing the number of parameters by 16.5% and FLOPs by 4.1%. We refer to it as Gemini ResNet. Further investigation reveals the efficacy of the proposed Golden Gemini operating points across various training conditions and architectures. Furthermore, we present a new benchmark, namely the Gemini DF-ResNet, using a cutting-edge model.-
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationIEEE/ACM transactions on audio, speech, and language processing, 2024, v. 32, p. 2324-2337-
dcterms.isPartOfIEEE/ACM transactions on audio, speech, and language processing-
dcterms.issued2024-
dc.identifier.scopus2-s2.0-85190358173-
dc.description.validate202411 bcch-
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberOA_Othersen_US
dc.description.fundingSourceOthersen_US
dc.description.fundingTextAgency for Science, Technology and Research Council Research Fund; National Natural Science Foundation of China; Shenzhen Science and Technology Research Fund Fundamental Research Key Project; Internal Project Fund from Shenzhen Research Institute of Big Dataen_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryCCen_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
Liu_Golden_Gemini_All.pdf3.48 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

16
Citations as of Nov 24, 2024

Downloads

10
Citations as of Nov 24, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.