Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/111711
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineering-
dc.creatorYu, HB-
dc.creatorMak, MW-
dc.date.accessioned2025-03-13T02:22:11Z-
dc.date.available2025-03-13T02:22:11Z-
dc.identifier.urihttp://hdl.handle.net/10397/111711-
dc.description12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011, Florence, Italy, August 27-31, 2011en_US
dc.language.isoenen_US
dc.publisherInternational Speech Communication Associationen_US
dc.rightsCopyright © 2011 ISCAen_US
dc.rightsThe following publication Yu, H.-B., Mak, M.-W. (2011) Comparison of voice activity detectors for interview speech in NIST speaker recognition evaluation. Proc. Interspeech 2011, 2353-2356 is available at https://doi.org/10.21437/Interspeech.2011-61.en_US
dc.titleComparison of voice activity detectors for interview speech in NIST speaker recognition evaluationen_US
dc.typeConference Paperen_US
dc.identifier.spage2353-
dc.identifier.epage2356-
dc.identifier.doi10.21437/interspeech.2011-61-
dcterms.abstractInterview speech has become an important part of the NIST Speaker Recognition Evaluations (SREs). Unlike telephone speech, interview speech has substantially lower signal-to-noise ratio, which necessitates robust voice activity detection (VAD). This paper highlights the characteristics of interview speech files in NIST SREs and discusses the difficulties in performing speech/nonspeech segmentation in these files. To overcome these difficulties, this paper proposes using speech enhancement techniques as a preprocessing step for enhancing the reliability of energy-based and statistical-model-based VADs. It was found that spectral subtraction can make better use of the background spectrum than the likelihood-ratio tests in statistical-model-based VADs. A decision strategy is also proposed to overcome the undesirable effects caused by impulsive signals and sinusoidal background signals. Results on NIST 2010 SRE show that the proposed VAD outperforms the statistical-model-based VAD, the ETSI-AMR speech coder, and the ASR transcripts provided by NIST SRE Workshop.-
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2011, p. 2353-2356-
dcterms.issued2011-
dc.identifier.scopus2-s2.0-84865791238-
dc.relation.conferenceConference of the International Speech Communication Association [INTERSPEECH]-
dc.description.validate202503 bcch-
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberOA_Othersen_US
dc.description.fundingSourceSelf-fundeden_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryVoR alloweden_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
yu11_interspeech.pdf427.52 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

4
Citations as of Apr 14, 2025

Downloads

1
Citations as of Apr 14, 2025

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.