A corpus based computational model of the lexical aspect and viewpoint aspect in Chinese

Liu, Hongchao

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/86405

Title:	A corpus based computational model of the lexical aspect and viewpoint aspect in Chinese
Authors:	Liu, Hongchao
Degree:	Ph.D.
Issue Date:	2018
Abstract:	This thesis talks about the lexical aspect and viewpoint aspect in Mandarin Chinese through statistical and computational methodologies. I firstly show the necessity of studying verbs' situation type and applying statistical methodologies toward linguistics studies. Some of the previous aspectual studies on Mandarin Chinese deny the possibility and appropriateness to classify verbs into different situation types. However, they are self-contradictory by using at least three strategies. The first one is to assign the whole structure's situation type to the constituting verb. The second one is to use different terminologies to refer to the situation type in lexical level such as aspectual parameter which is the same with situation type in essence. The third one is to explicitly deny the possibility to classify verbs into different situation types but implicitly do the classification in lexical level. Some other studies are on the right track to admit that situation types are supposed to be differentiated in at least two levels including lexical and sentential level. However, none of them applied statistical validation toward the interaction between situation types and viewpoint aspects. Because of the problems in lexical aspect and methodologies, verb situation type and statistical validation fall into the focus of this thesis. Based on our own intuition and previous studies, I construct a hypothesis stating that aspectual markers including ZHE, LE1, LE2, GUO, ZAI and ZHENGZAI are able to classify different situation types. I also insist that situation type in lexical level is attached to the different senses of a verb instead of the verb per se and that situation type system is a prototype category. With a hypothesis stating that situation type system is a prototype category, the members of the category are supposed to be clustered based on their family resemblance represented by their ability to co-occur with different aspectual markers. Whether a verb or verb sense is able to co-occur with an aspectual marker is firstly judged by our own intuition and then cross-validated by other annotated resources. A matrix of co-occurrence is constructed including the verbal senses as the rows and the aspectual markers as the columns. The family resemblance is simulated by the distance of the rows position in the vector space represented by the matrix. Hierarchical clustering is implemented and automatically generates the situation type system based on the distance between members. In this way, three situation types are constructed and annotated to all of the selected verbs' senses. Since the situation type system is actually based on human intuition, a corpus- based validation is necessary. All the verb senses are manually linked to Sinica corpus' verbs and a co-occurrence frequency matrix is constructed based on the corpus data. Statistical methodologies such as multinomial logistic regression analysis, are used to validate our situation type system. Aspectual markers' relationship with situation type's cognitive conceptual features including [Telic], [Durative], [Dynamic] etc. are also constructed in this way. Finally, we construct a dataset with verb senses and their situation types and make evaluation tests on it. By using word embedding vectors and supporting vector machine classifier, a best accuracy of 72.05% is achieved.
Subjects:	Hong Kong Polytechnic University -- Dissertations Chinese language -- Dialects -- Mandarin Chinese language -- Grammar Chinese language -- Verb
Pages:	xii, 199 pages : illustrations
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/9604

Show full item record

Page views

51

Last Week
0

Last month

Citations as of Apr 21, 2024

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM