Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/111374
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineering-
dc.creatorYi, Lu-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/13408-
dc.language.isoEnglish-
dc.titleAdversarial learning for speaker verification and speech emotion recognition-
dc.typeThesis-
dcterms.abstractDeep learning employs optimization algorithms to train neural networks to learn knowledge from data. Despite the remarkable success of deep learning, training deep learning models remains challenging. For instance, collecting data can be costly, and insufficient training data may affect the models’ effectiveness to make decisions on unseen data. Additionally, deploying models trained on labeled data from one domain to another can lead to domain mismatch issues. This dissertation addresses the data sparsity and domain mismatch problems in speaker verification and speech emotion recognition.-
dcterms.abstractSpeaker verification, a biometric authentication method that uses one’s voice to verify a claimed identity, experiences performance degradation when applied to unseen domains. This thesis proposes several domain adaptation frameworks to mitigate this issue. One such framework is the adversarial separation and adaptation network (ADSAN), which disentangles domain-specific and shared components from speaker embeddings, achieving domain-invariant speaker representations. Moreover, a mutual information neural estimator (MINE) is integrated into the ADSAN to enhance the preservation of speaker discriminative information. Another proposed framework, the infomax domain separation and adaptation network (InfoMax-DSAN), applies domain adaptation directly to the speaker feature extractor, achieving an EER of 5.69% on the VOiCES Challenge 2019.-
dcterms.abstractConventional domain adaptation methods assume a common set of speakers across domains, which is impractical for speaker verification. To address this limitation, this thesis proposes incorporating the intra-speaker and between-speaker similarity distri­bution alignment to DSANs. While effective in reducing language mismatches, this framework is constrained to lightweight models. To enhance flexibility and scala­bility, a novel disentanglement approach for domain-specific features is introduced. It incorporates a shared frame-level feature extractor, which then diverges into a do­main classification branch and a speaker classification branch and forces the gradients from the domain branch not to interfere with the shared layers. Experimental results demonstrate improved performance on CN-Celeb1 and feasibility with more complex models, such as the residual networks.-
dcterms.abstractIn speech emotion recognition, acquiring labeled data for training emotion clas­sifiers poses challenges due to the ambiguity of speech containing multiple emotions. This data scarcity problem leads to overfitting. To tackle this issue, this thesis intro­duces a new data augmentation network called adversarial data augmentation network (ADAN). By forcing synthetic and real samples to share a common representation in the latent space, ADAN can alleviate the gradient vanishing problem that often occurs in a generative adversarial network. Experimental results on the EmoDB and IEMOCAP datasets demonstrate the effectiveness of ADAN in generating emotion-rich augmented data, yielding emotion classifiers competitive to state-of-the-art systems.-
dcterms.accessRightsopen access-
dcterms.educationLevelPh.D.-
dcterms.extent1 volume (various pagings) : color illustrations-
dcterms.issued2024-
dcterms.LCSHAutomatic speech recognition-
dcterms.LCSHEmotion recognition-
dcterms.LCSHBiometric identification-
dcterms.LCSHDeep learning (Machine learning)-
dcterms.LCSHMachine learning-
dcterms.LCSHHong Kong Polytechnic University -- Dissertations-
Appears in Collections:Thesis
Show simple item record

Page views

16
Citations as of Apr 14, 2025

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.