Comparison of voice activity detectors for interview speech in NIST speaker recognition evaluation

Yu, HB; Mak, MW

doi:10.21437/interspeech.2011-61

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/111711

DC Field	Value	Language
dc.contributor	Department of Electrical and Electronic Engineering	-
dc.creator	Yu, HB	-
dc.creator	Mak, MW	-
dc.date.accessioned	2025-03-13T02:22:11Z	-
dc.date.available	2025-03-13T02:22:11Z	-
dc.identifier.uri	http://hdl.handle.net/10397/111711	-
dc.description	12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011, Florence, Italy, August 27-31, 2011	en_US
dc.language.iso	en	en_US
dc.publisher	International Speech Communication Association	en_US
dc.rights	Copyright © 2011 ISCA	en_US
dc.rights	The following publication Yu, H.-B., Mak, M.-W. (2011) Comparison of voice activity detectors for interview speech in NIST speaker recognition evaluation. Proc. Interspeech 2011, 2353-2356 is available at https://doi.org/10.21437/Interspeech.2011-61.	en_US
dc.title	Comparison of voice activity detectors for interview speech in NIST speaker recognition evaluation	en_US
dc.type	Conference Paper	en_US
dc.identifier.spage	2353	-
dc.identifier.epage	2356	-
dc.identifier.doi	10.21437/interspeech.2011-61	-
dcterms.abstract	Interview speech has become an important part of the NIST Speaker Recognition Evaluations (SREs). Unlike telephone speech, interview speech has substantially lower signal-to-noise ratio, which necessitates robust voice activity detection (VAD). This paper highlights the characteristics of interview speech files in NIST SREs and discusses the difficulties in performing speech/nonspeech segmentation in these files. To overcome these difficulties, this paper proposes using speech enhancement techniques as a preprocessing step for enhancing the reliability of energy-based and statistical-model-based VADs. It was found that spectral subtraction can make better use of the background spectrum than the likelihood-ratio tests in statistical-model-based VADs. A decision strategy is also proposed to overcome the undesirable effects caused by impulsive signals and sinusoidal background signals. Results on NIST 2010 SRE show that the proposed VAD outperforms the statistical-model-based VAD, the ETSI-AMR speech coder, and the ASR transcripts provided by NIST SRE Workshop.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2011, p. 2353-2356	-
dcterms.issued	2011	-
dc.identifier.scopus	2-s2.0-84865791238	-
dc.relation.conference	Conference of the International Speech Communication Association [INTERSPEECH]	-
dc.description.validate	202503 bcch	-
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	OA_Others	en_US
dc.description.fundingSource	Self-funded	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	VoR allowed	en_US
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
yu11_interspeech.pdf		427.52 kB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Page views

166

Citations as of Feb 9, 2026

Downloads

45

Citations as of Feb 9, 2026

SCOPUS^TM
Citations

27

Citations as of May 8, 2026

Google Scholar^TM

Check