Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/90390
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Chinese and Bilingual Studiesen_US
dc.contributorDepartment of Computingen_US
dc.contributorDepartment of Englishen_US
dc.contributorSchool of Accounting and Financeen_US
dc.creatorWan, Men_US
dc.creatorXiang, Ren_US
dc.creatorChersoni, Een_US
dc.creatorKlyueva, Nen_US
dc.creatorAhrens, Ken_US
dc.creatorMiao, Ben_US
dc.creatorBroadstock, Den_US
dc.creatorKang, Jen_US
dc.creatorYung, Aen_US
dc.creatorHuang, CRen_US
dc.date.accessioned2021-06-28T07:25:46Z-
dc.date.available2021-06-28T07:25:46Z-
dc.identifier.urihttp://hdl.handle.net/10397/90390-
dc.language.isoenen_US
dc.rightsACL materials are Copyright © 1963–2021 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).en_US
dc.rightsThe following publication Wan, M., Xiang, R., Chersoni, E., Klyueva, N., Ahrens, K., Miao, B., ... & Huang, C. R. (2019). PolyU_CBS-CFA at the FinSBD Task: Sentence Boundary Detection of Financial Data with Domain Knowledge Enhancement and Bilingual Training. In Proceedings of the First Workshop on Financial Technology and Natural Language Processing (pp. 122-129) is available at https://www.aclweb.org/anthology/W19-5521/en_US
dc.titlePolyU_CBS-CFA at the FinSBD task : sentence boundary detection of financial data with domain knowledge enhancement and bilingual trainingen_US
dc.typeConference Paperen_US
dc.identifier.spage122en_US
dc.identifier.epage129en_US
dcterms.abstractSentence Boundary Detection is a basic requirement in Natural Language Processing and remains a challenge to language processing for specific purposes especially with noisy source documents. In this paper, we deal with the processing of scanned financial prospectuses with a feature-oriented and knowledge-enriched approach. Feature engineering and knowledge enrichment are conducted with the participation of domain experts and for the detection of sentence boundaries in both English and French. Two versions of the detection system are implemented with a Random Forest Classifier and a Neural Network. We engineer a fused feature set of punctuation, digital number, capitalization, acronym, letter and POS tag for model fitting. For knowledge enhancement, we implement a rule-based validation by extracting a keyword dictionary from the out-of-vocabulary sequences in FinSBD’s datasets. Bilingual training on both English and French training sets are conducted to ensure the multilingual robustness of the system and to extend the relatively small training data. Without using any extra data, our system achieves fair results on both tracks in the shared task. Our results (English1 : F1-Mean = 0.835; French: F1-Mean = 0.86) as well as a post-task quick improvement with self-adaptive knowledge enhancement based on testing data demonstrate the effectiveness and robustness of bilingual training with multi-feature mining and knowledge enhancement for domainspecific SBD task.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationProceedings of the First Workshop on Financial Technology and Natural Language Processing, Macao, China, August 2019, p. 122-129en_US
dcterms.issued2019-08-
dc.relation.ispartofbookProceedings of the First Workshop on Financial Technology and Natural Language Processingen_US
dc.description.validate202106 bcvcen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumbera0670-n20-
dc.description.pubStatusPublisheden_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
W19-5521.pdf721.4 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

97
Last Week
1
Last month
Citations as of Apr 21, 2024

Downloads

26
Citations as of Apr 21, 2024

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.