Association tests with incomplete covariates and high-dimensional auxiliary variables

Feng, Jiahui

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/97158

DC Field	Value	Language
dc.contributor	Department of Applied Mathematics	-
dc.creator	Feng, Jiahui	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/12195	-
dc.language.iso	English	-
dc.title	Association tests with incomplete covariates and high-dimensional auxiliary variables	-
dc.type	Thesis	-
dcterms.abstract	In many clinical and epidemiological studies, investigators are interested in testing the presence of association between an outcome variable and covariates of interest. Such analyses are often complicated by missing data. When variables of interest are missing for some subjects, it is desirable to use observed auxiliary variables, which are sometimes high-dimensional, to impute or predict the missing values to improve statistical efficiency. Although many methods have been developed for prediction using high-dimensional variables, it is challenging to perform valid inference based on the predicted values. In this dissertation, we propose novel association testing methods involving missing data with the goal of detecting relevant predictors for outcomes of interest.	-
dcterms.abstract	We first focus on parametric models and develop an association test for an outcome variable and a partially missing covariate, where the missing values can be predicted using a set of high-dimensional auxiliary variables. The proposed analysis consists of a model selection step and a testing step. Specifically, in the first step, we select a subset of auxiliary variables and fit a regression model of the covariate of interest against the selected features. In the second step, we perform the score test for the covariate in the outcome model under the full likelihood, which includes both the outcome model and the missing covariate model. We then extend the proposed method to a class of semiparametric transformation models for potentially right-censored survival outcomes. We propose a supremum test, where we consider multiple choices of transformation functions, perform individual score test under each outcome model, and take the supremum of the individual test statistics as the proposed test statistic. We show that the proposed testing procedure improves the test performance when the outcome model is unknown.	-
dcterms.abstract	The validity and advantages of the proposed methods are demonstrated both theoretically and numerically. We establish the asymptotic properties of the proposed test statistics under regularity conditions and show the validity of the tests under data-driven model selection procedures. We evaluate the proposed methods through extensive simulation studies, and show their superior performances over some existing methods. Real data analyses are carried out on major cancer genomic studies.	-
dcterms.accessRights	open access	-
dcterms.educationLevel	Ph.D.	-
dcterms.extent	ix, 139 pages : color illustrations	-
dcterms.issued	2022	-
dcterms.LCSH	Multivariate analysis	-
dcterms.LCSH	Missing observations (Statistics)	-
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	-
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/12195

Show simple item record

Page views

184

Last Week
4

Last month

Citations as of Apr 12, 2026

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM