Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/106888
PIRA download icon_1.1View/Download Full Text
Title: Learning mixture representation for deep speaker embedding using attention
Authors: Lin, W 
Mak, MW 
Yi, L 
Issue Date: 2020
Source: The Speaker and Language Recognition Workshop (Odyssey 2020), 1-5 November 2020, Tokyo, Japan, p. 210-214
Abstract: Almost all speaker recognition systems involve a step that converts a sequence of frame-level features to a fixed dimension representation. In the context of deep neural networks, it is referred to as statistics pooling. In state-of-the-art speak recognition systems, statistics pooling is implemented by concatenating the mean and standard deviation of a sequence of frame-level features. However, a single mean and standard deviation are very limited descriptive statistics for an acoustic sequence even with a powerful feature extractor like a convolutional neural network. In this paper, we propose a novel statistics pooling method that can produce more descriptive statistics through a mixture representation. Our method is inspired by the expectation-maximization (EM) algorithm in Gaussian mixture models (GMMs). However, unlike the GMMs, the mixture assignments are given by an attention mechanism instead of the Euclidean distances between frame-level features and explicit centers. Applying the proposed attention mechanism to a 121-layer Densenet, we achieve an EER of 1.1\% in VoxCeleb1 and an EER of 4.77\% in VOiCES 2019 evaluation set.
Publisher: International Speech Communication Association (ISCA)
DOI: 10.21437/Odyssey.2020-30
Rights: © ISCA
The following publication Lin, W., Mak, M.W., Yi, L. (2020) Learning Mixture Representation for Deep Speaker Embedding Using Attention. Proc. The Speaker and Language Recognition Workshop (Odyssey 2020), 210-214 is available at https://doi.org/10.21437/Odyssey.2020-30.
Appears in Collections:Conference Paper

Files in This Item:
File Description SizeFormat 
lin20c_odyssey.pdf254.06 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page views

113
Last Week
12
Last month
Citations as of Nov 9, 2025

Downloads

57
Citations as of Nov 9, 2025

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.