Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/98200
PIRA download icon_1.1View/Download Full Text
Title: Database of Mandarin neighborhood statistics
Authors: Neergaard, K 
Xu, H 
Huang, CR 
Issue Date: May-2016
Source: In N Calzolari, K Choukri, T Declerck, S Goggi, M Grobelnik, B Maegaard, J Mariani, H Mazo, A Moreno, J Odijk & S Piperidis (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), p. 4032-4036. Portorož, Slovenia : European Language Resources Association (ELRA), 2016
Abstract: In the design of controlled experiments with language stimuli, researchers from psycholinguistic, neurolinguistic, and related fields, require language resources that isolate variables known to affect language processing. This article describes a freely available database that provides word level statistics for words and nonwords of Mandarin, Chinese. The featured lexical statistics include subtitle corpus frequency, phonological neighborhood density, neighborhood frequency, and homophone density. The accompanying word descriptors include pinyin, ascii phonetic transcription (sampa), lexical tone, syllable structure, dominant PoS, and syllable, segment and pinyin lengths for each phonological word. It is designed for researchers particularly concerned with language processing of isolated words and made to accommodate multiple existing hypotheses concerning the structure of the Mandarin syllable. The database is divided into multiple files according to the desired search criteria: 1) the syllable segmentation schema used to calculate density measures, and 2) whether the search is for words or nonwords. The database is open to the research community at https://github.com/karlneergaard/Mandarin-Neighborhood-Statistics.
Keywords: Lexical statistics
Phonological neighborhood density
Mandarin
Chinese
Publisher: European Language Resources Association (ELRA)
ISBN: 978-2-9517408-9-1
Description: Tenth International Conference on Language Resources and Evaluation (LREC'16), May 23-28, 2016, Portorož, Slovenia
Rights: Copyright by the European Language Resources Association
The LREC 2016 Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/)
The following publication Karl Neergaard, Hongzhi Xu, and Chu-Ren Huang. 2016. Database of Mandarin Neighborhood Statistics. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4032–4036, Portorož, Slovenia. European Language Resources Association (ELRA) is available at https://aclanthology.org/L16-1636
Appears in Collections:Conference Paper

Files in This Item:
File Description SizeFormat 
Huang_Database_Mandarin_Neighborhood.pdf11.68 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page views

82
Citations as of May 11, 2025

Downloads

57
Citations as of May 11, 2025

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.