Please use this identifier to cite or link to this item:
Title: Understanding user engagement level during tasks via facial responses, eye gaze and mouse movements
Authors: Kwok, Cho Ki
Degree: M.Phil.
Issue Date: 2018
Abstract: User engagement refers to the quality of the user experience (UX) on a particular task or interface. It emphasizes the positive aspects of human and computer interaction, and the desire to work on the same task longer and repeatedly [10]. Users spend time, emotion, attention and effort when they interact with technologies, and a successful application or task should be able to engage users, instead of simply being a "job" that needs to be completed. User engagement is therefore a complex phenomenon that encompasses three different dimensions: (1) cognitive engagement, (2) emotional engagement and (3) behavioral engagement. Researchers use different ways to measure user engagement level, such as self-reporting (e.g. questionnaires), observations (e.g. speech analysis, facial expression analysis) and web analytics (e.g. click-through rate, number of site visits, time spent). Nowadays, computers are equipped with high computational power and different kinds of sensors, which make possible automated human affect and mental state detection in a variety of situations. Using computers to "observe" human behaviors and using the observed information to detect levels of engagement could be useful in many situations, such as getting feedback for interface improvement or assuring quality of work generated by online workers (crowdsourcing) or students (e-learning). Therefore, there has been much previous work in detecting user engagement through various means such as facial expression, mouse movement or gaze movement. However, this work is hampered by three main challenges: (1) the constraints caused by using intrusive devices, (2) limitations of specific tasks (like gaming) which may produce user behavior different from daily computer usage, (3) and incomprehensive ground truth as collected by straightforward and direct survey questionnaires that capture users' self-reported numeric level of engagement, which may not cover the three dimensions of engagement. The work presented in this thesis focuses on non-intrusive visual cues, in particular, visual cues from facial expressions, eye gaze, and mouse cursor signals, for understanding users' level of engagement in human-computer interaction task. Addressing the first two limitations mentioned above, we conducted experiments and studied users' facial responses, eye gaze and mouse behaviors related to the change of engagement level during doing Language Learning tasks and Web Searching tasks. Non-intrusive devices, such as the mouse, Tobii eye tracker and off-the-shelf webcam, are used to capture users' behaviors in the experiment. By using Pearson's Correlation, Paired T-Test and single factor one way ANOVA, we select a useful feature set from the initial feature set. From the investigation, we have a better understanding of the relationship between engagement level and user behavior. For example, the facial action unit 5 ("upper lid raiser") is useful in engagement detection. We observed that this feature is indicative as sleepy users try to keep their eyes open to avoid falling asleep.
To address the third constraints, we collected an engagement dataset that includes a multi-dimension measurement of ground truth. It includes the User Engagement Scale (UES) [89], which covers the three dimensions of user engagement, as the self-reporting tool and the average UES scores can reliably represent the engagement level. It also includes the commonly-used NASA Task Load Index (NASA-TLX) annotations for measuring the cognitive work load. We include a further investigation into the correlation between the UES and TLX sub-scale scores. We analyze facial affect in two ways. First, we measure momentary affect through the facial action units in every frame of the facial response videos. We then move to an overall affect measurement through segment-based facial features to seek more representative features that cover the whole task period. The facial affect recognition model was extended into a real life application to identify video viewers' emotion. We developed an asynchronous video-sharing platform with Emotars, which allow users to share their affects and experience with others without disclosing their real facial expressions and/or features. We analyze the user experience of using this platform in four different dimensions, including emotion awareness, engagement, comfortableness and relationship. For eye gaze and mouse interaction, we make use of non-intrusive devices, i.e. mouse, Tobii eye tracker and off-the-shelf webcam, to collect eye and mouse interaction data. We investigated using mouse features for user intention prediction, or, in other words, predicting the next type of mouse interaction event. Results show that the mouse interaction features are representative of users' behavior. Finally, we group the type of features into three different groups according to the means of data collection: (1) webcam-based features, (2) Eye Tracker-Captured features, and (3) mouse cursor-based features. The performances of different combinations of modalities were evaluated. We apply machine learning techniques to build up user-independent models for both Language Learning tasks and Web Searching tasks separately. The findings suggest that the multimodal approach outperforms unimodal approaches in our studies. Evaluation results also demonstrate the versatility of our feature set, as it achieves reasonable performances of engagement detection in different tasks.
Subjects: Hong Kong Polytechnic University -- Dissertations
Human-computer interaction -- Measurement
Human-computer interaction -- Research
Pages: xiii, 107 pages : color illustrations
Appears in Collections:Thesis

Show full item record

Page views

Last Week
Last month
Citations as of May 28, 2023

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.