Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/106902
| DC Field | Value | Language |
|---|---|---|
| dc.contributor | Department of Electrical and Electronic Engineering | en_US |
| dc.creator | Lin, W | en_US |
| dc.creator | Mak, MW | en_US |
| dc.creator | Chien, JT | en_US |
| dc.date.accessioned | 2024-06-07T00:58:45Z | - |
| dc.date.available | 2024-06-07T00:58:45Z | - |
| dc.identifier.isbn | 978-1-7138-2069-7 | en_US |
| dc.identifier.uri | http://hdl.handle.net/10397/106902 | - |
| dc.description | 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020, 25-29 October 2020, Shanghai, China | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | International Speech Communication Association (ISCA) | en_US |
| dc.rights | Copyright © 2020 ISCA | en_US |
| dc.rights | The following publication Lin, W., Mak, M.-W., Chien, J.-T. (2020) Strategies for End-to-End Text-Independent Speaker Verification. Proc. Interspeech 2020, 4308-4312 is available at https://doi.org/10.21437/Interspeech.2020-2092. | en_US |
| dc.title | Strategies for end-to-end text-independent speaker verification | en_US |
| dc.type | Conference Paper | en_US |
| dc.identifier.spage | 4308 | en_US |
| dc.identifier.epage | 4312 | en_US |
| dc.identifier.doi | 10.21437/Interspeech.2020-2092 | en_US |
| dcterms.abstract | State-of-the-art speaker verification (SV) systems typically consist of two distinct components: a deep neural network (DNN) for creating speaker embeddings and a backend for improving the embeddings’ discriminative ability. The question which arises is: Can we train an SV system without a backend? We believe that the backend is to compensate for the fact that the network is trained entirely on short speech segments. This paper shows that with several modifications to the x-vector system, DNN embeddings can be directly used for verification. The proposed modifications include: (1) a mask-pooling layer that augments the training samples by randomly masking the frame-level activations and then computing temporal statistics, (2) a sampling scheme that produces diverse training samples by randomly splicing several speech segments from each utterance, and (3) additional convolutional layers designed to reduce the temporal resolution to save computational cost. Experiments on NIST SRE 2016 and 2018 show that our method can achieve state-of-the-art performance with simple cosine similarity and requires only half of the computational cost of the x-vector network. | en_US |
| dcterms.accessRights | open access | en_US |
| dcterms.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020, 25-29 October 2020, Shanghai, China, p. 4308-4312 | en_US |
| dcterms.issued | 2020 | - |
| dc.identifier.scopus | 2-s2.0-85098160006 | - |
| dc.relation.conference | International Speech Communication Association [Interspeech] | en_US |
| dc.description.validate | 202405 bcch | en_US |
| dc.description.oa | Version of Record | en_US |
| dc.identifier.FolderNumber | EIE-0159 | - |
| dc.description.fundingSource | RGC | en_US |
| dc.description.pubStatus | Published | en_US |
| dc.identifier.OPUS | 55969000 | - |
| dc.description.oaCategory | VoR allowed | en_US |
| Appears in Collections: | Conference Paper | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| lin20l_interspeech.pdf | 831.58 kB | Adobe PDF | View/Open |
Page views
94
Last Week
9
9
Last month
Citations as of Nov 9, 2025
Downloads
43
Citations as of Nov 9, 2025
SCOPUSTM
Citations
4
Citations as of Dec 19, 2025
WEB OF SCIENCETM
Citations
3
Citations as of Dec 18, 2025
Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



