Please use this identifier to cite or link to this item:
|Title:||Facial affect recognition : from feature engineering to deep learning||Authors:||Chen, Junkai||Advisors:||Chi, Zheru (EIE)||Keywords:||Human face recognition (Computer science)
Pattern recognition systems
|Issue Date:||2017||Publisher:||The Hong Kong Polytechnic University||Abstract:||Facial expression recognition has been a long standing problem and attracted growing interest from the affective computing community. This thesis presents the research I conducted for facial affect recognition with novel hand-crafted features and deep learning. Three main contributions are reported in this thesis. They include: (1) an effective approach with novel features for facial expression recognition in video; (2) a framework with multiple tasks for detecting and locating pain events in video; and (3) an effective method with a deep convolutional neural network for smile detection in the wild. In the first investigation, I propose novel features and an application of multi-kernel learning to combine multiple features for facial expression recognition in video. A new feature descriptor called Histogram of Oriented Gradients from Three Orthogonal Planes (HOG-TOP) is proposed to characterize facial appearance changes. A new effective geometric feature is also proposed to capture facial configuration changes. The role of audio modality on affect recognition is also explored. Multiple feature fusion is used to combine different features optimally. Experimental results show that our approach is robust in dealing with video-based facial expression recognition problems under lab-controlled environment and in the wild compared with the other state-of-the-art methods.
In the second investigation, I propose an effective framework with multiple tasks for pain event detection and locating. Histogram of Oriented Gradients (HOG) of fiducial points (P-HOG) and HOG-TOP are used to characterize spatial features and dynamic textures from video frames and video segments. Both frame-level and segment-level detections are based on trained Support Vector Machines (SVMs). Max pooling strategy is further used to obtain the global P-HOG and global HOG-TOP, and an SVM with multiple kernels is trained for pain event detection. Finally, an effective probabilistic fusion method is proposed to integrate the three different tasks (frame, segment and sequence) to locate pain events in video. Experimental results show that the proposed method outperforms other state-of-the-art methods both in pain event detection and pain event locating in video. In the third investigation, I propose an effective approach for smile detection in the wild with deep learning. Deep learning can effectively combine feature learning and classification into a single model. In this study, a deep convolutional network called Smile-CNN is used to perform feature learning and smile detection simultaneously. I also discuss the discriminative power of the learned features from the Smile-CNN model. By feeding the learned features to train an SVM or AdaBoost classifier, I show that the learned features have impressive discriminative power. Experimental results show that the proposed approach can achieve a promising performance in smile detection.
|Description:||xxii, 152 pages : color illustrations
PolyU Library Call No.: [THS] LG51 .H577P EIE 2017 Chen
|URI:||http://hdl.handle.net/10397/69913||Rights:||All rights reserved.|
|Appears in Collections:||Thesis|
Show full item record
Files in This Item:
|991021952841303411_link.htm||For PolyU Users||167 B||HTML||View/Open|
|991021952841303411_pira.pdf||For All Users (Non-printable)||3.21 MB||Adobe PDF||View/Open|
Citations as of Dec 11, 2017
Citations as of Dec 11, 2017
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.