Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/94267
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Chinese and Bilingual Studies-
dc.creatorLiu, K-
dc.creatorYe, R-
dc.creatorZhongzhu, L-
dc.creatorYe, R-
dc.date.accessioned2022-08-11T02:01:31Z-
dc.date.available2022-08-11T02:01:31Z-
dc.identifier.urihttp://hdl.handle.net/10397/94267-
dc.language.isoenen_US
dc.publisherPublic Library of Scienceen_US
dc.rights© 2022 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.en_US
dc.rightsThe following publication Liu, K., Ye, R., Zhongzhu, L., & Ye, R. (2022). Entropy-based discrimination between translated Chinese and original Chinese using data mining techniques. Plos one, 17(3), e0265633 is available at https://doi.org/10.1371/journal.pone.0265633en_US
dc.titleEntropy-based discrimination between translated Chinese and original Chinese using data mining techniquesen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.volume17-
dc.identifier.issue3-
dc.identifier.doi10.1371/journal.pone.0265633-
dcterms.abstractThe present research reports on the use of data mining techniques for differentiating between translated and non-translated original Chinese based on monolingual comparable corpora. We operationalized seven entropy-based metrics including character, wordform unigram, wordform bigram and wordform trigram, POS (Part-of-speech) unigram, POS bigram and POS trigram entropy from two balanced Chinese comparable corpora (translated vs non-translated) for data mining and analysis. We then applied four data mining techniques including Support Vector Machines (SVMs), Linear discriminant analysis (LDA), Random Forest (RF) and Multilayer Perceptron (MLP) to distinguish translated Chinese from original Chinese based on these seven features. Our results show that SVMs is the most robust and effective classifier, yielding an AUC of 90.5% and an accuracy rate of 84.3%. Our results have affirmed the hypothesis that translational language is categorically different from original language. Our research demonstrates that combining information-theoretic indicator of Shannon's entropy together with machine learning techniques can provide a novel approach for studying translation as a unique communicative activity. This study has yielded new insights for corpus-based studies on the translationese phenomenon in the field of translation studies.-
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationPLoS one, 2022, v. 17, no. 3, e0265633-
dcterms.isPartOfPLoS one-
dcterms.issued2022-
dc.identifier.scopus2-s2.0-85126996894-
dc.identifier.pmid35324927-
dc.identifier.eissn1932-6203-
dc.identifier.artne0265633-
dc.description.validate202208 bckw-
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumbera1531en_US
dc.identifier.SubFormID45351en_US
dc.description.fundingSourceSelf-fundeden_US
dc.description.pubStatusPublisheden_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
journal.pone.0265633.pdf1.05 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

46
Last Week
1
Last month
Citations as of May 12, 2024

Downloads

31
Citations as of May 12, 2024

SCOPUSTM   
Citations

7
Citations as of May 16, 2024

WEB OF SCIENCETM
Citations

5
Citations as of May 16, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.