Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/97949
PIRA download icon_1.1View/Download Full Text
Title: Linking basic lexicon to shared ontology for endangered languages : a linked data approach toward Formosan languages
Other Title: 濒危语言基本词库与上层知识本体的链接-关联数据在台湾南岛语研究的应用
Authors: Huang, CR 
Hsieh, SK
Prevot, L
Hsiao, PY
Chang, HY
Issue Date: Jun-2018
Source: Journal of Chinese linguistics, June 2018, v. 46, no. 2, p. 227-268
Abstract: This paper proposes an innovative approach to link basic lexicon (e.g. Swadesh list) to upper ontology as the foundation of OntoLex interface to address the challenge of building language resources for endangered languages in the linked data paradigm. A linked data approach to language resources requires existing, and preferably sizable, language resources. For endangered and other less-resourced languages, however, the scarcity of existing resources limits the possibilities and potential benefits of linking. The challenges are then, how can construction of language resources for endangered language continue to thrive in the linked data paradigm, and how can the linked data approach benefit language resources for endangered languages. Our proposal requires the bare minimum of available data and we show with examples from Formosan languages (Austronesian or aboriginal languages of Taiwan (Blust 2013, 20) that 1) this approach is applicable to endangered languages, and that 2) in spite of the restrictions imposed by scarcity of resources, the linked linguistic data consisting of basic lexicon + upper ontology generate important new information. Comparing Swadesh lists from different languages allowed us to build a small shared ontology that reflects direct human experience, and can serve as the cross-lingual conceptual core. In addition, these micro-ontologized lexicons can be used as seeds for developing a fully-grown and more comprehensive documentation of linguistically motivated ontology for each language.
关联数据(linked data)研究法的兴起对濒危语言的语言典藏造成了极大的挑战。本文在本体词库界面(OntoLex)的基础上提出链接基本词库(如斯瓦迪斯词表 (Swadesh list))与上层知识本体的新进路,藉以验证关联数据方法在濒危语言语言典藏的可行性。关联数据是在网路语意化后构建语言资源最重要的手段。但是关联数据法成功的前提需要有现成的大量语料或语言资源可以链接。把这个研究法应用到濒危或其他资源匮乏语言,所有关联数据的优势都会因为缺乏可链接的现成资源而消失殆尽。 在关联数据范式主导网路研究与资源构建的环境下,濒危语言典藏面临了如何在资源匮乏的劣势中,连接产生新资源与新知识的严峻挑战。本文以台湾南岛语为对象,提出仅需要最少资源的资源链接方法,以证实 1) 关联数据法可以用于濒危语言, 2) 即便是资源匮乏, 基本词库与上层知识本体的链接可以产生新的文化知识。比较斯瓦迪斯词表在不同语言中的呈现,使研究者可进一步在上层共享知识本体的架构下 比较不同语言文化间的基本概念体系与生活经验差异。 这些核心知识本体更可以作为未来为这些语言的构建完整知识本体的基础。
Keywords: Endangered languages
Linked Data
Swadesh list
Ontology
SUMO
Formosan languages (Austronesian languages in Taiwan)
Publisher: Journal of Chinese Linguistics (JCL), The Chinese University of Hong Kong / Professor William S.-Y. Wang
Journal: Journal of Chinese linguistics 
ISSN: 0091-3723
DOI: 10.1353/jcl.2018.0009
Rights: Posted with permission of the Journal of Chinese Linguistics, the Chinese University of Hong Kong.
The Journal of Chinese Linguistics Vol.46, No.2 (June 2018): 227-268 © 2018 by The Journal of Chinese Linguistics. 0091-3723/2018/4602-0002$10: Linking basic lexicon to shared ontology for endangered languages: A linked data approach toward Formosan languages. By Chu-Ren Huang et al. All rights reserved.
The following publication Huang, C. R., Hsieh, S. K., Prévot, L., Hsiao, P. Y., & Chang, H. Y. (2018). Linking basic lexicon to shared ontology for endangered languages: a linked data approach toward Formosan languages. Journal of Chinese Linguistics, 46(2), 227-268 is available at https://www.jclhk.com.hk/jcl-2011-2020/jcl2018/.
Appears in Collections:Journal/Magazine Article

Files in This Item:
File Description SizeFormat 
Huang_Linking_Basic_Lexicon.pdf472.27 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page views

42
Citations as of Apr 21, 2024

Downloads

22
Citations as of Apr 21, 2024

SCOPUSTM   
Citations

4
Citations as of Apr 19, 2024

WEB OF SCIENCETM
Citations

3
Citations as of Apr 25, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.