Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/108910
| Title: | Prediction system in big data analytics | Authors: | Tang, Wai Man | Degree: | Ph.D. | Issue Date: | 2024 | Abstract: | Forecasting and causality are essential to decision making and resource management by relating exogenous factors or events. In addition, investment return prediction is crucial to have proper risk control and management. Nowadays, applications using advanced technologies are involved in our daily life. Big data can be collected easier in lower cost. Knowledge can be extracted to indicate important changes in the time series of data, where exogenous factors or events should fit for the purpose, as they can be instantaneous or aggregated in certain duration. Prediction and causality are some key functions in data analysis, where models can be used to extract useful features and predict data trends. Feature selection and extraction are crucial methodologies in data analysis, where sequential data is transformed into suitable features for further analysis. Relevant factors or features should be selected, which embed essential information to explain the dependent variable. This is critical to ensure useful models and accurate results. In this thesis, our works focus on two key types of methods, they are conjoining spatio-temporal data for analysis by neural networks with deep learning, and novel factor subset selection in time-frequency representation. Applications in various aspects are studied. Chapter 2 investigates traffic speed data for multi-timestep forecasting. Congestion speed-cycle patterns of the target road segment are correlated to those of the nearby road segments. Appropriate input subset can be selected for neural network training with deep learning when input data dimensions are minimal. Chapter 3 investigates short-time Fourier Transform (STFT), where consistent patterns are used to identify factor subsets. Multi-factor model with factors in different timeframes should be more useful and practical to forecast future movements in the dynamic environment. Finally, Chapter 4 investigates wavelet transforms, and significant wavelet coefficients can be chosen as peaks by using continuous wavelet transform (CWT). Causality can be established by multiple factor models. Factor subsets are selected by factors with sample lags, which are represented by selecting appropriate wavelet coefficients in terms of both time and frequency. |
Subjects: | Big data Data mining Machine learning Hong Kong Polytechnic University -- Dissertations |
Pages: | xiii, 152 pages : color illustrations |
| Appears in Collections: | Thesis |
Access
View full-text via https://theses.lib.polyu.edu.hk/handle/200/13123
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


