Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/107106
PIRA download icon_1.1View/Download Full Text
Title: Age-invariant speaker embedding for diarization of cognitive assessments
Authors: Xu, SS 
Mak, MW 
Wong, KH
Meng, H
Kwok, TCY
Issue Date: 2021
Source: In Proceedings of 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), 24-27 January 2021, Hong Kong
Abstract: This paper investigates an age-invariant speaker embedding approach to speaker diarization, which is an essential step towards the automatic cognitive assessments from speech. Studies have shown that incorporating speaker traits (e.g., age, gender, etc.) can improve speaker diarization performance. However, we found that age information in the speaker embeddings is detrimental to speaker diarization if there is a severe mismatch between the age distributions in the training data and test data. To minimize the detrimental effect of age mismatch, an adversarial training strategy is introduced to remove age variability from the utterance-level speaker embeddings. Evaluations on an interactive dialog dataset for Montreal cognitive assessments (MoCA) show that the adversarial training strategy can produce age-invariant embeddings and reduce diarization error rate (DER) by 4.33%. The approach also outperforms the conventional method even with less training data.
Keywords: Age-invariant speaker embedding
Deep neural networks
Montreal cognitive assessments
Speaker diarization
Publisher: Institute of Electrical and Electronics Engineers
ISBN: 978-1-7281-6994-1 (Electronic)
978-1-7281-6995-8 (Print on Demand(PoD))
DOI: 10.1109/ISCSLP49672.2021.9362084
Description: 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), 24-27 January 2021, Hong Kong
Rights: © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
The following publication S. S. Xu, M. -W. Mak, K. H. Wong, H. Meng and T. C. Y. Kwok, "Age-Invariant Speaker Embedding for Diarization of Cognitive Assessments," 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), Hong Kong, 2021 is available at https://doi.org/10.1109/ISCSLP49672.2021.9362084.
Appears in Collections:Conference Paper

Files in This Item:
File Description SizeFormat 
Xu_Age-Invariant_Speaker_Embedding.pdfPre-Published version554.11 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page views

130
Last Week
4
Last month
Citations as of Dec 21, 2025

Downloads

61
Citations as of Dec 21, 2025

SCOPUSTM   
Citations

6
Citations as of Dec 19, 2025

WEB OF SCIENCETM
Citations

1
Citations as of Dec 18, 2025

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.