Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/102164
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineeringen_US
dc.creatorZuo, Len_US
dc.creatorMak, MWen_US
dc.date.accessioned2023-10-11T01:57:56Z-
dc.date.available2023-10-11T01:57:56Z-
dc.identifier.issn0167-8655en_US
dc.identifier.urihttp://hdl.handle.net/10397/102164-
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.rights© 2023 Elsevier B.V. All rights reserved.en_US
dc.rights© 2023. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/en_US
dc.rightsThe following publication Zuo, L., & Mak, M.-W. (2023). Avoiding dominance of speaker features in speech-based depression detection. Pattern Recognition Letters, 173, 50–56 is available at https://doi.org/10.1016/j.patrec.2023.07.016.en_US
dc.subjectDepression detectionen_US
dc.subjectFeature disentanglementen_US
dc.subjectSpeaker embeddingen_US
dc.subjectSpeaker invarianceen_US
dc.titleAvoiding dominance of speaker features in speech-based depression detectionen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.spage50en_US
dc.identifier.epage56en_US
dc.identifier.volume173en_US
dc.identifier.doi10.1016/j.patrec.2023.07.016en_US
dcterms.abstractThe performance of speech-based depression detectors is limited by the scarcity and imbalance in depression data. We found that depression detectors could be strongly biased toward speaker features when the number of training speakers is insufficient. To address this issue, we propose a speaker-invariant depression detector (SIDD) that minimizes speaker information in the latent space. The SIDD consists of an autoencoder, a depression classifier, and a speaker-embedding projector. By incorporating speaker-embedding vectors into the autoencoder’s latent vectors, speaker information is effectively eliminated for the depression classifier. Experimental results demonstrate significant improvements achieved by minimizing speaker information, and our proposed method generally outperforms previous approaches for depression detection on the DAIC-WOZ dataset.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationPattern recognition letters, Sept 2023, v. 173, p. 50-56en_US
dcterms.isPartOfPattern recognition lettersen_US
dcterms.issued2023-09-
dc.identifier.eissn1872-7344en_US
dc.description.validate202310 bcchen_US
dc.description.oaAccepted Manuscripten_US
dc.identifier.FolderNumbera2475-
dc.identifier.SubFormID47754-
dc.description.fundingSourceOthersen_US
dc.description.fundingTextNational Natural Science Foundation of Chinaen_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryGreen (AAM)en_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
Zuo_Avoiding_Dominance_Speaker.pdfPre-Published version2.02 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

130
Last Week
4
Last month
Citations as of Nov 9, 2025

SCOPUSTM   
Citations

3
Citations as of Jun 21, 2024

WEB OF SCIENCETM
Citations

10
Citations as of Dec 4, 2025

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.