Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/97158
| DC Field | Value | Language |
|---|---|---|
| dc.contributor | Department of Applied Mathematics | - |
| dc.creator | Feng, Jiahui | - |
| dc.identifier.uri | https://theses.lib.polyu.edu.hk/handle/200/12195 | - |
| dc.language.iso | English | - |
| dc.title | Association tests with incomplete covariates and high-dimensional auxiliary variables | - |
| dc.type | Thesis | - |
| dcterms.abstract | In many clinical and epidemiological studies, investigators are interested in testing the presence of association between an outcome variable and covariates of interest. Such analyses are often complicated by missing data. When variables of interest are missing for some subjects, it is desirable to use observed auxiliary variables, which are sometimes high-dimensional, to impute or predict the missing values to improve statistical efficiency. Although many methods have been developed for prediction using high-dimensional variables, it is challenging to perform valid inference based on the predicted values. In this dissertation, we propose novel association testing methods involving missing data with the goal of detecting relevant predictors for outcomes of interest. | - |
| dcterms.abstract | We first focus on parametric models and develop an association test for an outcome variable and a partially missing covariate, where the missing values can be predicted using a set of high-dimensional auxiliary variables. The proposed analysis consists of a model selection step and a testing step. Specifically, in the first step, we select a subset of auxiliary variables and fit a regression model of the covariate of interest against the selected features. In the second step, we perform the score test for the covariate in the outcome model under the full likelihood, which includes both the outcome model and the missing covariate model. We then extend the proposed method to a class of semiparametric transformation models for potentially right-censored survival outcomes. We propose a supremum test, where we consider multiple choices of transformation functions, perform individual score test under each outcome model, and take the supremum of the individual test statistics as the proposed test statistic. We show that the proposed testing procedure improves the test performance when the outcome model is unknown. | - |
| dcterms.abstract | The validity and advantages of the proposed methods are demonstrated both theoretically and numerically. We establish the asymptotic properties of the proposed test statistics under regularity conditions and show the validity of the tests under data-driven model selection procedures. We evaluate the proposed methods through extensive simulation studies, and show their superior performances over some existing methods. Real data analyses are carried out on major cancer genomic studies. | - |
| dcterms.accessRights | open access | - |
| dcterms.educationLevel | Ph.D. | - |
| dcterms.extent | ix, 139 pages : color illustrations | - |
| dcterms.issued | 2022 | - |
| dcterms.LCSH | Multivariate analysis | - |
| dcterms.LCSH | Missing observations (Statistics) | - |
| dcterms.LCSH | Hong Kong Polytechnic University -- Dissertations | - |
| Appears in Collections: | Thesis | |
Access
View full-text via https://theses.lib.polyu.edu.hk/handle/200/12195
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


