Please use this identifier to cite or link to this item:
Title: Mining human interaction signals for human affective and cognitive state detection
Authors: Wang, Jun
Degree: Ph.D.
Issue Date: 2021
Abstract: Human-Aware AI Systems are able to provide timely support to humans in different situations, based on the understanding of their mental state and intentions. As a step towards developing such systems, this thesis focuses on understanding humans' affective state and cognitive process when interacting with computers. For the affective state understanding, this thesis focuses on studying mental stress, one of the most prevalent negative affective states encountered by users when interacting with computers. Mental stress can affect both users' mental health and the quality of user experience. Previous work often detects mental stress based on bio-signals and physical information collected via intrusive devices, which is not feasible in daily life. Other studies have recently focused on non-intrusive stress detection approaches relying on behavioral signals, especially gaze and mouse behaviors. However, the consistency of users' behavioral patterns has seldom been investigated by previous studies. Our approach proposes a stress detection method that considers the consistency of gaze and mouse behaviors. Based on the result of the analysis on the subjects' behaviors during the experiment, we discover that when a user is stressed, his/her eye gaze behavior patterns are more consistent, and the proposed stress detection method can detect stress efficiently in a common e-Learning evaluation task. To take one step further, we find that most of the previous stress detection methods rely on the knowledge of user interface (UI) layout information, limiting their methods' generalizability, especially for tasks with dynamic UIs. Therefore, MGAttraction, a rotation- and translation-invariant coordinate system, is proposed to model the relative movement between gaze and mouse in this thesis. Based on that, a UI-agnostic stress detection method is proposed, which is able to work in the dynamic UI environment. We evaluate the performance of our method on a web searching task with dynamic UI. With the gaze location tracked by a commercial eye-tracker, the proposed UI-agnostic stress detection method can successfully detect stress and outperform the performance of state-of-the-art methods. To further generalizability, we explore the feasibility of substituting webcam video in place of eye-tracker gaze locations. The resulting system, using the webcam to estimate the gaze locations, is able to detect mental stress without sacrificing too much accuracy.
For the cognitive process understanding, this thesis studies the process of writing, which is one of the most common activities undertaken on a computer. Given that writing is an intensively cognitive process, it makes sense that users' age and the genre of writing that is being produced would affect the user behaviors. However, only a few studies have explored this relationship. In this thesis, the eye gaze behaviors and the typing dynamics in different writing stages are investigated for subjects in different age-groups: child, college, and the elderly, producing original articles in different genres: reminiscent, logical, and creative. We design both statistics-based features and sequence-based features to infer the cognitive process of writing. Statistics-based features focus on modeling the overall gaze-typing behaviors during the entire writing period, and sequence-based features focus on the transition of the gaze-typing behaviors with the development of the writing. Evaluation results illustrate that both the age-factors and article genres affect the writing behaviors, and our statistics-based and sequence-based features can successfully capture the differences in writing behaviors. Besides the writing process, this thesis also investigates the process of summarizing. Summarizing is a multitasking process requiring subjects to perform the reading/understanding process and writing process iteratively. In this thesis, we analyze users' cognitive process when carrying out summarizing tasks, as evidenced through their eye gaze and typing features, to obtain insight into different difficulty levels. Multimodal features are extracted from different summary writing phases, including reading and understanding the source, referring to content from the sources, rereading the already-generated text, typing the generated texts into the computer, and reviewing the already-generated texts. Each phase is determined based on the characteristics of gaze behaviors and typing dynamics. A classifier is constructed based on the multimodal features, which can discriminate the difficulty level of each summary writing in a decent performance and outperforms other models constructed on the part of modalities or a single modality. The potential reasons for the decent performance of multimodal features are also investigated. Experimental results in this thesis show success in detecting mental stress and writing cognitive process based on gaze and hands behaviors, which implies the effectiveness of behavioral signals used by human-aware AI systems to understand users' affective state and cognitive process.
Subjects: Human-computer interaction -- Psychological aspects
User interfaces (Computer systems) -- Psychological aspects
Hong Kong Polytechnic University -- Dissertations
Pages: xxii, 201 pages : color illustrations
Appears in Collections:Thesis

Show full item record

Page views

Citations as of Aug 7, 2022

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.