Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/94416
PIRA download icon_1.1View/Download Full Text
Title: Analyzing firm reports for volatility prediction : a knowledge-driven text-embedding approach
Authors: Yang, Y
Zhang, K
Fan, Y 
Issue Date: Jan-2022
Source: INFORMS journal on computing, Jan. - Feb. 2022, v. 34, no. 1, p. 522-540
Abstract: Predicting stock return volatility is the key to investment and risk management. Traditional volatility-forecasting methods primarily rely on stochastic models. More recently, many machine-learning approaches, particularly text-mining techniques, have been implemented to predict stock return volatility, thus taking advantage of the availability of large amounts of unstructured data such as firm financial reports. Most existing studies develop simple but effective models to analyze text, such as dictionary-based matching algorithms that use a set of manually constructed keywords. However, the latent and deep semantics encoded in text are usually neglected. In this study, we build on recent progress in representation learning and propose a novel word-embedding method that incorporates external knowledge from a well-known finance-domain lexicon (the Loughran and McDonald (2011) word list), which helps us learn semantic relationships among words in firm reports for better volatility prediction. Using over 10 years of annual reports from Russell 3000 firms, we empirically show that, compared with cutting-edge benchmarks, our proposed method achieves significant improvement in terms of prediction error, for example, a 28.4% reduction on average. We also discuss the practical and methodological implications of our findings. Our financial-specific word-embedding program is available as open-source information so that researchers can use it to analyze financial reports and assess financial risks.
Summary of Contribution: Predicting stock return volatility is the key to investment and risk management. Traditional volatility-forecasting methods primarily rely on stochastic models. More recently, many machine-learning, especially text-mining, techniques have been developed to predict stock return volatility given the availability of a large amount of unstructured data, such as firm annual reports. Most existing research develops simple but effective approaches, for example, manually constructing a set of keywords to analyze texts. However, the latent and deep semantics encoded in texts are usually ignored. In this research, we build on recent progress in representation learning and propose a novel word-embedding method that incorporates external knowledge from the finance-domain lexicon of Loughran and McDonald (2011), which helps us learn the semantic relationships among words in firm annual reports for better volatility prediction. In this study, we make the following contributions. First, methodologically, we are among the first to incorporate finance-specific lexicon into representation learning for stock volatility prediction. We propose a novel knowledge-driven text-embedding model that is trained on a large amount of unstructured textual data to learn high quality word embedding. Our proposed approach is effective in predicting stock return volatility, and the approach can potentially have broader applications. Second, substantively, we empirically show that the domain lexicon enhanced text representation learning can indeed significantly improve the performance, compared with bag-of-words models and generic word embedding for volatility prediction. Domain knowledge combined with text learning plays a critical enabling role in understanding financial reports. Third, our method adds on to existing literature on designing financial information systems by incorporating ontology knowledge, common-sense knowledge, and general prior knowledge.
Keywords: Prediction
Machine learning
Word embedding
Knowledge
L&M dictionary
Publisher: INFORMS
Journal: INFORMS journal on computing 
ISSN: 1091-9856
EISSN: 1526-5528
DOI: 10.1287/ijoc.2020.1046
Rights: Copyright:© 2021 INFORMS
This is the accepted manuscript of the following article: Yang, Y., Zhang, K., & Fan, Y. (2022). Analyzing Firm Reports for Volatility Prediction: A Knowledge-Driven Text-Embedding Approach. INFORMS Journal on Computing, 34(1), 522-540, which has been published in final form at https://doi.org/10.1287/ijoc.2020.1046
Appears in Collections:Journal/Magazine Article

Files in This Item:
File Description SizeFormat 
Fan_Analyzing_Firm_Reports.pdfPre-Published version805.11 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page views

63
Last Week
1
Last month
Citations as of Mar 24, 2024

Downloads

434
Citations as of Mar 24, 2024

SCOPUSTM   
Citations

4
Citations as of Mar 22, 2024

WEB OF SCIENCETM
Citations

4
Citations as of Mar 28, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.