Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/107106
| Title: | Age-invariant speaker embedding for diarization of cognitive assessments | Authors: | Xu, SS Mak, MW Wong, KH Meng, H Kwok, TCY |
Issue Date: | 2021 | Source: | In Proceedings of 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), 24-27 January 2021, Hong Kong | Abstract: | This paper investigates an age-invariant speaker embedding approach to speaker diarization, which is an essential step towards the automatic cognitive assessments from speech. Studies have shown that incorporating speaker traits (e.g., age, gender, etc.) can improve speaker diarization performance. However, we found that age information in the speaker embeddings is detrimental to speaker diarization if there is a severe mismatch between the age distributions in the training data and test data. To minimize the detrimental effect of age mismatch, an adversarial training strategy is introduced to remove age variability from the utterance-level speaker embeddings. Evaluations on an interactive dialog dataset for Montreal cognitive assessments (MoCA) show that the adversarial training strategy can produce age-invariant embeddings and reduce diarization error rate (DER) by 4.33%. The approach also outperforms the conventional method even with less training data. | Keywords: | Age-invariant speaker embedding Deep neural networks Montreal cognitive assessments Speaker diarization |
Publisher: | Institute of Electrical and Electronics Engineers | ISBN: | 978-1-7281-6994-1 (Electronic) 978-1-7281-6995-8 (Print on Demand(PoD)) |
DOI: | 10.1109/ISCSLP49672.2021.9362084 | Description: | 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), 24-27 January 2021, Hong Kong | Rights: | © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The following publication S. S. Xu, M. -W. Mak, K. H. Wong, H. Meng and T. C. Y. Kwok, "Age-Invariant Speaker Embedding for Diarization of Cognitive Assessments," 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), Hong Kong, 2021 is available at https://doi.org/10.1109/ISCSLP49672.2021.9362084. |
| Appears in Collections: | Conference Paper |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| Xu_Age-Invariant_Speaker_Embedding.pdf | Pre-Published version | 554.11 kB | Adobe PDF | View/Open |
Page views
109
Last Week
4
4
Last month
Citations as of Nov 9, 2025
Downloads
42
Citations as of Nov 9, 2025
SCOPUSTM
Citations
6
Citations as of Dec 19, 2025
WEB OF SCIENCETM
Citations
1
Citations as of Dec 18, 2025
Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



