Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/67643
DC FieldValueLanguage
dc.contributorDepartment of Electronic and Information Engineering-
dc.creatorTan, Z-
dc.creatorMak, MW-
dc.date.accessioned2017-07-27T08:33:53Z-
dc.date.available2017-07-27T08:33:53Z-
dc.identifier.isbn978-9-8814-7680-7 (electronic)-
dc.identifier.isbn978-1-4673-9593-9 (print on demand(PoD))-
dc.identifier.urihttp://hdl.handle.net/10397/67643-
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.subjectDeep belief networksen_US
dc.subjectDeep learningen_US
dc.subjectBottleneck featuresen_US
dc.subjectDenoising autoencoderen_US
dc.subjectSpeaker identificationen_US
dc.titleBottleneck features from SNR-adaptive denoising deep classifier for speaker identificationen_US
dc.typeConference Paperen_US
dc.identifier.spage1035-
dc.identifier.epage1040-
dc.identifier.doi10.1109/APSIPA.2015.7415429-
dcterms.abstractIn this paper, we explore the potential of using deep learning for extracting speaker-dependent features for noise robust speaker identification. More specifically, an SNR-adaptive denoising classifier is constructed by stacking two layers of restricted Boltzmann machines (RBMs) on top of a denoising deep autoencoder, where the top-RBM layer is connected to a soft-max output layer that outputs the posterior probabilities of speakers and the top-RBM layer outputs speaker-dependent bottleneck features. Both the deep autoencoder and RBMs are trained by contrastive divergence, followed by backpropagation fine-tuning. The autoencoder aims to reconstruct the clean spectra of a noisy test utterance using the spectra of the noisy test utterance and its SNR as input. With this denoising capability, the output from the bottleneck layer of the classifier can be considered as a low-dimension representation of denoised utterances. These frame-based bottleneck features are than used to train an iVector extractor and a PLDA model for speaker identification. Experimental results based on a noisy YOHO corpus show that the bottleneck features slightly outperform the conventional MFCC under low SNR conditions and that fusion of the two features lead to further performance gain, suggesting that the two features are complementary with each other.-
dcterms.bibliographicCitation2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Hong Kong, China, 16-19 Dec 2015, p.1035-1040-
dcterms.issued2015-
dc.relation.conferenceAsia-Pacific Signal and Information Processing Association (APSIPA). Summit and Conference-
dc.identifier.rosgroupid2015002470-
dc.description.ros2015-2016 > Academic research: refereed > Refereed conference paper-
Appears in Collections:Conference Paper
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

SCOPUSTM   
Citations

4
Last Week
0
Last month
Citations as of Aug 14, 2020

Page view(s)

85
Last Week
1
Last month
Citations as of Oct 25, 2020

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.