Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?

Chan, RKW; Wang, BX

doi:10.1016/j.forsciint.2024.112199

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/108857

Title:	Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison?
Authors:	Chan, RKW Wang, BX
Issue Date:	Oct-2024
Source:	Forensic science international : digital investigation, Oct. 2024, v. 363, 112199
Abstract:	A growing number of studies in forensic voice comparison have explored how elements of phonetic analysis and automatic speaker recognition systems may be integrated for optimal speaker discrimination performance. However, few studies have investigated the evidential value of long-term speech features using forensically-relevant speech data. This paper reports an empirical validation study that assesses the evidential strength of the following long-term features: fundamental frequency (F0), formant distributions, laryngeal voice quality, mel-frequency cepstral coefficients (MFCCs), and combinations thereof. Non-contemporaneous recordings with speech style mismatch from 75 male Australian English speakers were analyzed. Results show that 1) MFCCs outperform long-term acoustic phonetic features; 2) source and filter features do not provide considerably complementary speaker-specific information; and 3) the addition of long-term phonetic features to an MFCCs-based system does not lead to meaningful improvement in system performance. Implications for the complementarity of phonetic analysis and automatic speaker recognition systems are discussed.
Keywords:	Forensic voice comparison Likelihood-ratio Long-term acoustic-phonetic features Mel-frequency cepstral coefficients Non-contemporaneous recordings Speech style mismatch
Publisher:	Elsevier BV
Journal:	Forensic science international : digital investigation
ISSN:	0379-0738
EISSN:	1872-6283
DOI:	10.1016/j.forsciint.2024.112199
Rights:	© 2024 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies. © 2024. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/ The following publication Chan, R. K. W., & Wang, B. X. (2024). Do long-term acoustic-phonetic features and mel-frequency cepstral coefficients provide complementary speaker-specific information for forensic voice comparison? Forensic Science International, 363, 112199 is available at https://doi.org/10.1016/j.forsciint.2024.112199.
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Chan_Do_Long-term_Acoustic-phonetic.pdf	Pre-Published version	1.23 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show full item record

Page views

80

Citations as of Nov 10, 2025

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Google ScholarTM

Altmetric

Google Scholar^TM