Vector-based feature representations for speech signals : from supervector to latent vector

Jiang, Y; Leung, FHF

doi:10.1109/TMM.2020.3014559

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/107152

DC Field	Value	Language
dc.contributor	Department of Electrical and Electronic Engineering	-
dc.creator	Jiang, Y	-
dc.creator	Leung, FHF	-
dc.date.accessioned	2024-06-13T01:04:14Z	-
dc.date.available	2024-06-13T01:04:14Z	-
dc.identifier.issn	1520-9210	-
dc.identifier.uri	http://hdl.handle.net/10397/107152	-
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.rights	© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	The following publication Y. Jiang and F. H. F. Leung, "Vector-Based Feature Representations for Speech Signals: From Supervector to Latent Vector," in IEEE Transactions on Multimedia, vol. 23, pp. 2641-2655, 2021 is available at https://doi.org/10.1109/TMM.2020.3014559.	en_US
dc.subject	Acoustic and speech signal processing	en_US
dc.subject	Gaussian supervector	en_US
dc.subject	I-vector	en_US
dc.subject	Supervector and latent vector	en_US
dc.subject	Vector-based feature representation	en_US
dc.title	Vector-based feature representations for speech signals : from supervector to latent vector	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.spage	2641	-
dc.identifier.epage	2655	-
dc.identifier.volume	23	-
dc.identifier.doi	10.1109/TMM.2020.3014559	-
dcterms.abstract	There are two basic types of feature representations for speech signals. The first type refers to probabilistic models, such as the Gaussian mixture model (GMM). The second type refers to vector-based feature representations, such as the Gaussian supervector (GSV). Since vector-based feature representations are easier to use and process, they are more popular than probabilistic model-based feature representations. In this paper, we begin by explaining the rationale behind two widely used vector-based feature representations, viz. GSV and the i-vector, and then make extensions. GSV is a supervector (SV) based on maximum a posteriori (MAP) adaptation. Its computation is simple and fast, but its dimensionality is high and fixed. While the i-vector is a latent vector (LV) based on factor analysis (FA). Although the computation can be time-consuming because of additional model parameters, its dimensionality is changeable. To generalize GSV, we propose the MAP SV, which is also based on MAP adaptation but can have an even higher dimensionality and thus carry more information. To boost the computational efficiency of the i-vector, we adopt the concept of the mixture of factor analyzers (MFA) and propose the MFA LV, which exhibits a similar flexibility in dimensionality but is faster in computation. The experimental results for speaker identification and verification tasks demonstrate that, MAP SV can be more robust than GSV, and MFALV is comparable to or even better than the i-vector in effectiveness and meanwhile maintains a higher computational efficiency. With a powerful backend, GSV and MAP SV are comparable to the i-vector and MFALV, but the latter two are more flexible in dimensionality.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	IEEE transactions on multimedia, 2021, v. 23, p. 2641-2655	-
dcterms.isPartOf	IEEE transactions on multimedia	-
dcterms.issued	2021	-
dc.identifier.scopus	2-s2.0-85099596386	-
dc.identifier.eissn	1941-0077	-
dc.description.validate	202403 bckw	-
dc.description.oa	Accepted Manuscript	en_US
dc.identifier.FolderNumber	EIE-0264	en_US
dc.description.fundingSource	Others	en_US
dc.description.fundingText	The Hong Kong Polytechnic University	en_US
dc.description.pubStatus	Published	en_US
dc.identifier.OPUS	50097027	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Jiang_Vector-Based_Feature_Representations.pdf	Pre-Published version	1.16 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show simple item record

Page views

4

Citations as of Jun 30, 2024

Downloads

1

Citations as of Jun 30, 2024

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

Google ScholarTM

Altmetric

Google Scholar^TM