Please use this identifier to cite or link to this item:
Title: Sensor fusion for audio-visual biometric authentication
Authors: Cheung, Ming-cheung
Degree: M.Phil.
Issue Date: 2005
Abstract: Although financial transactions via automatic teller machines (ATMs) have become commonplace, the security of these transactions remains a concern. In particular, the verification approach used by today's ATMs can be easily compromised because ATM cards and passwords can be lost or stolen. To overcome this limitation, a new verification approach known as biometrics has emerged. Rather than using passwords as the means of verification, biometric systems verify the identity of a person based on his or her physiological and behavioral characteristics. Numerous studies have shown that biometric systems can achieve high performance under controlled conditions. However, the performance of these systems can be severely degraded under real-world environments. For example, background noise and channel distortion in speech-based systems and variation in illumination intensity and lighting directions in face-based systems are known to be the major causes of performance degradation. To enhance the robustness of biometric systems, multimodal biometrics have been introduced. Multimodal techniques improve the robustness of biometric systems by using more than one biometric traits at the same time. Combining the information from different traits, however, is an important issue. This thesis proposes a multiple-source multiple-sample fusion algorithm to address this issue. The algorithm performs fusion at two levels: intramodal and intermodal. In intramodal fusion, the scores of multiple samples (e.g., utterances and video shots) obtained from the same modality are linearly combined, where the fusion weights are made dependent on the score distribution of the independent samples and the prior knowledge about the score statistics. More specifically, enrollment data are used to compute the mean scores of clients and impostors, which are considered to be the prior scores. During verification, the differences between the individual scores and the prior scores are used to compute the fusion weights. Because the fusion weights depend on verification data, the position of scores in the score sequences is detrimental to the final fused scores. To enhance the discrimination between client and impostor scores, this thesis proposes sorting the score sequences before fusion takes placed. Because verification performance depends on the prior scores, a technique that adapts the prior scores during verification is also developed. In intermodal fusion, the means of intramodal fused scores obtained from different modalities are fused by either linear weighted sums or support vector machines. The final fused score is then used for decision making. The intramodal multisample fusion was evaluated on the HTIMIT corpus and the 2001 NIST speaker recognition evaluation set, and the two-level fusion approach was evaluated on the XM2VTSDB audio-visual corpus. It was found that intramodal multisample fusion achieves a significant reduction in equal error rate as compared to a conventional approach in which equal weights are assigned to all scores. Further improvement can be obtained by either sorting the score sequences or adapting the prior scores. It was also found that multisample fusion can be readily combined with support vector machines for audio-visual biometric authentication. Results show that combining the audio and visual information can reduce error rates by as much as 71%.
Subjects: Hong Kong Polytechnic University -- Dissertations
Biometric identification
Multisensor data fusion
Automatic speech recognition
Pages: vi, viii, 95 leaves : ill. (some col.) ; 30 cm
Appears in Collections:Thesis

Show full item record

Page views

Last Week
Last month
Citations as of Nov 26, 2023

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.