Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/109409
DC FieldValueLanguage
dc.contributorDepartment of Applied Mathematics-
dc.creatorZheng, Yangzi-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/13188-
dc.language.isoEnglish-
dc.titleModel for zero-inflated proportion data analysis-
dc.typeThesis-
dcterms.abstractThe examination and interpretation of datasets containing a substantial number of zeros have become increasingly relevant across various disciplines, including ecology and sociological studies. While there has been extensive research on zero-inflated count data, models specifically designed for proportion data with a high occurrence of zeros remain relatively limited. This thesis addresses this gap by focusing on zero-inflated proportion data and proposing a novel modeling approach to distinguish between two types of zeros present in the dataset. The primary objective is to de­velop a regression model that can effectively capture and differentiate these two types of zeros. The first type of zero, which corresponds to random absence, is modeled using a binomial sampling approach. This accounts for instances where the propor­tion value is zero due to random factors or chance. The second type of zero, arising from unsuitability, is handled using a general classification indicator. This indicator helps identify situations where the proportion value is zero due to the unsuitability of certain conditions or factors. To achieve our objective, we propose both parametric and semi-parametric models, providing flexibility and robustness in capturing the characteristics of the zero-inflated proportion data. By introducing these innovative models, we aim to enhance the understanding and analysis of datasets with a high occurrence of zeros. This research contributes to the development of methodologies specifically tailored for zero-inflated proportion data, addressing a significant gap in the existing literature.-
dcterms.abstractIn the first section of our study, we focus on investigating a semi-parametric model. This model comprises two components: a regression component that incorporates weighted least squares to account for heterogeneity, and a classification component that benefits from an optimal decision rule derived from our model. To estimate the parameters based on the optimal decision rule, we employ the Nadaraya-Watson estimator. This estimator ensures the accuracy of our classification and contributes to the overall robustness of the model. The results of our investigation reveal that environmental features play a crucial role in understanding both types of zeros: those related to perfection and those resulting from random absence. By utilizing our pro­posed modeling approach, researchers can gain deeper insights into the factors that contribute to these different types of zeros, thereby improving their understanding of the underlying processes. Furthermore, our model demonstrates superior per­formance in both simulated and real-world scenarios when compared to traditional methods such as the Tobit model and the zero-inflated beta regression model. By significantly reducing prediction errors, our model is proven to be a valuable tool for accurate estimation and prediction in various applications. By presenting these find­ings, we highlight the effectiveness and practicality of our semi-parametric model, enabling researchers to make more informed decisions and gain a comprehensive understanding of the factors influencing both types of zeros and the positive percent rate.-
dcterms.abstractIn the second section, our main objective is to provide a precise interpretation of the factors that influence the defective rate. Particularly, we focus on the indicator part, which was left undefined in the first part but has garnered more attention due to its exploration of the covariates that distinguish the zero part from the non-zero part. In the original model assumption, the presence of the indicator part creates complexity in inferring the parameters. Taking inspiration from the smoothed maximum score estimator, we introduce a parametric model by replacing the indicator part with a smoothed kernel estimator. This substitution yields a continuously differentiable loss function, which greatly facilitates further analysis. Similar to the previous section, we take into account heterogeneity and utilize the weighted least square method to estimate both parameters. Subsequently, we establish the consistency and asymp­totically normal properties for both the regression and indicator estimators. These properties assure the reliability and validity of our estimators in capturing the under­lying relationships and distinguishing between the zero and non-zero parts effectively.-
dcterms.accessRightsopen access-
dcterms.educationLevelPh.D.-
dcterms.extentxviii, 78 pages : illustrations-
dcterms.issued2024-
dcterms.LCSHMathematics -- Data processing-
dcterms.LCSHRegression analysis-
dcterms.LCSHHong Kong Polytechnic University -- Dissertations-
Appears in Collections:Thesis
Show simple item record

Page views

50
Citations as of Apr 14, 2025

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.