Note:
Most of the below papers are downloadable as
Technical Reports.
- Population stratification
- Genetic Association Testing
Note: all programs assume that there is no missing value;
if you have missing values in your data, please impute or remove them
first.
- Basu S, Pan W, Shen X, Oetting WS (2011). Multi-locus Association Testing with Penalized Regression. submitted to Genet Epi.
R functions for score, SSU, SSUw, UminP, Sum tests.
R functions for Lasso-based LRT and Wasserman and Roeder's (Ann Stat 2009) Screen and Clean test,
R functions for Lasso-based averaging/selection tests with the score or SSU statistic.
R code to generate simulated genotypes in Table 10 and
Table 11.
- Pan W, Basu S, Shen X (2011). Adaptive Tests for Detecting Gene-Gene and Gene-Environment Interactions. submitted to Hum Hered.
R functions for (modified) adaptive Neyman's tests: aScore, aSSU, aSSUw, aSum, (aUminP--slow),
R functions for MC simulation-based adaptive UminP test,
R functions for aSum2 (with 2-directional searches),
R functions for score, SSU, SSUw, UminP, Sum tests.
- Han F, Pan W (2011). A Composite Likelihood Approach to
Latent Multivariate Gaussian Modeling of SNP Data
with Application to Genetic Association Testing. Biometrics.
R functions for composite likelihood-based tests,
R functions for maximum likelihood-based tests.
- Pan W, Shen X (2011). Adaptive Tests for Association Analysis of Rare Variants. Genet Epi.
R functions for (modified) adaptive Neyman's tests: aScore, aSSU, aSSUw, aSum, (aUminP--slow),
Simulation programs: Simulation programs are the same as those in Basu and Pan (2011) shown below.
An example for Table 2 (case I) in Pan and Shen (2011); it's also similar to those in Basu and Pan (2011) except that the casual RVs had different MAFs from those of non-causal ones.
- Basu S, Pan W (2011). Comparison of Statistical Tests for Disease Association with Rare Variants. Genet Epi.
R functions for Sequential Sum score tests,
score, SSU, SSUw, UminP, Sum and aSum tests,
wSSU-P test,
C-alpha test,
Li and Leal's CMC test and Madsen and Browning's weighted Sum test.
Simulation programs:
- simRareSNP.R:
generate rare SNPs disretized from some latent MVN variates
with correlation structure of CS; allow adding some
non-causal SNPs which will be correlated with causal ones
if rho!=0.
An example for Tables 3-5 in Basu and Pan (2011).
- simAR1Rare2.R:
generate rare SNPs disretized from some latent MVN variates
with correlation structure of AR1; allow adding some
non-causal SNPs which are INDEPEDENT of causal ones
no matter what's the value of of rho; the non-causal
SNPs also disretized from some latent MVN variates
with an AR-1 corr structure.
An example for Table 6 in Basu and Pan (2011).
- simRareCommonSNP.R:
add some independent CVs, as in Table 7 of Basu & Pan (2011).
An example for Table 7 in Basu and Pan (2011).
- Han F, Pan W (2010).
Powerful Multi-marker Association Tests:
Unifying Genomic Distance-Based Regression
and Logistic Regression
To appear Genet Epi.
R function. Some instruction is given
at the beginning of the R functions.
- Han F, Pan W (2010).
A Data-Adaptive Sum Test for Disease Association with Multiple
Common or Rare Variants.
To appear Human Heredity.
R function. Some instruction is given
at the beginning of the R function.
- Pan W (2010).
Statistical Tests of Genetic Association in the Presence of Gene-Gene
and Gene-Environment Interactions.
Human Heredity 69, 131-142.
Note: the format of the input data for the below R programs is
somewhat strange; in fact, there is no need to use the below two files;
you could simply create an appropriate genotype matrix X (e.g. with both
main effects and interactions), then call the function given by Pan (2009).
R function:
SumSqUs, scores, and UminP tests for logistic regression with only
main-effects, or with both main and 2-way interactions.
Note: use the input genotype score matrix X direct (without centering or
other transformation on X).
R function:
Similar to the above except the g-inverse is used for a possibly singular covariance matrix (e.g. for the score vector) when the input genotype matrix X is NOT of full rank (i.e. the SNPs are not linearly independent).
R function: to generate simulated data
as used in the paper.
An example R program to generate simulated
data and then apply the SumSqUs, scores, and UminP tests for a purely epistatic
genetic model.
- Pan W, Han F, Shen X (2010).
``Test Selection with Application to Detecting Disease Association with
Multiple SNPs".
Human Heredity 69, 120-130.
Note: Some instruction is given at the beginning of the R function.
R function
- Pan W (2010).
A Unified Framework for Detecting Genetic Association with Multiple SNPs
in a Candidate Gene or Region: Contrasting Genotype Scores and LD Patterns
between Cases and Controls.
Human Heredity 69, 1-13.
Note: Some instruction is given at the beginning of each R function.
R function:
SumSqUs, score, UminP tests.
R function:
Similar to the above SumSqUs/score/UminP tests
except that the generalized inverse (g-inv) is used
such that it works even if a covariance matrix (e.g. for the score statistic)
is singular.
R function: LRT/LRT-pc tests.
R function: similar to the above
LRT/LRT-pc tests except that one more pair of ourtput (p, k) is added to
deal with singular input genotype matrix X, where p is the p-value
and k is the # of PCs that can explain a default 99% of the
variation in original X.
R function: LDC/mLDC tests.
R function: use LDC/mLDC terms (and possibly with main effects) in logistic regression, then apply the SSUs, UminP and score tests.
- Pan W (2009).
Asymptotic tests of association with multiple SNPs in linkage
disequilibrium.
Genetic Epidemiology 33, 497-507.
Note: Some instruction is given at the beginning of each R function.
R function:
SumSqUs (i.e., SSU, SSUw), (multivariate) score, UminP tests.
R function:
Similar to the above SumSqUs/score/UminP tests
except that the generalized inverse (g-inv) is used
such that it works even if a covariance matrix (e.g. for the score statistic)
is singular.
- Zhou H, Pan W (2009).
Binomial Mixture Model-based Association Tests under Genetic Heterogeneity.
Annals of Human Genetics 73, 614-630.
Manual,
C++ program.
- Penalized Regression
- Luo C, Pan W, Shen X (2012).
A Two-Step Penalized Regression Method with Networked Predictors.
Statistics in Biosciences (a special issue on network data analysis),
4, 27-46.
Zip compressed Matlab code.
- Pan W, Xie B, Shen X. (2010).
``Incorporating Predictor Network in Penalized Regression with
Application to Microarray Data".
Biometrics 26, 501-508.
R program.
- Pan W. (2009).
``Network-Based Multiple Locus Linkage Analysis of Expression Traits".
Bioinformatics 25, 1390-1396.
R program for network-based regression,
Example R code and data for simulation set-up I:
R code for network-based regression,
R code for interpolation used for network-based regression,
R code for Lars,
(imputed) genotype data with the original 196 markers,
GPCR subnetwork,
network data
(after combining each network for each of multiple eQTL regresion models
into an "expanded" single regression model).
- Clustering Analysis
- Zhou H, Pan W, Shen X (2009).
Penalized model-based clustering with unconstrained covariance matrices.
Electronic Journal of Statistics 3, 1473-1496.
Manual,
R program.
- Pan, W., Shen, X. (2007).
Penalized Model-Based Clustering with Application to Variable Selection.
Journal of Machine Learning Research 22, 1145-1164.
Manual,
C++ program,
R program,
thanks to Hui Zhou who wrote the programs; a newer and improved version of the
R program, thanks to
Dr Jia Li at the Penn State U.
- Interval Censoring
- Pan, W. (2000)
``Smooth Estimation of the Survival for Interval Censored Data".
Statistics in Medicine, 19, 2611-2624
README,
SPlus function for NPMLE-based 2-sample tests,
SPlus function for bandwidth selection
in kernel smoothing,
SPlus function for kernel-smoother-based 2-sample tests,
SPlus function for logspline-based 2-sample tests,
C program for calculating NPMLE,
SPlus function for summarizing and drawing
the NPMLE/kernel/logspline estimate of the survival function, and an
example for its use.
- Pan, W. (2000)
``A Two-Sample Test with Interval Censored Data via Multiple Imputation".
Statistics in Medicine, 19, 1-11.
README,
SPlus function for PMDA,
Splus function for ABB,
C program for calculating NPMLE
and imputing,
sample makefile,
generated object file of the C
program in SunOS (in compressed form and decompress using gunzip).
- Pan, W. and Chappell, R. (1998)
``A Nonparametric Estimator of Survival Functions
for Arbitrarily Truncated and Censored Data".
Lifetime Data Analysis , 4, 187-202.
NPMLE (using GP),
NPMLE (using EM) and
INE for left-truncated and interval-censored data.
INE for left-truncated and
right-censored data.
A NEW Splus program for
INE for left-truncated and
right-censored data; it also contains a function to use the nonparametric
bootstrap to calculate point-wise confidence intervals of the survival
probabilities.
- Pan, W. and Chappell, R. (1998)
``Estimating Survival Curves with Left-truncated and
Interval-censored Data via the EMS Algorithm".
Communications in Statistics -- Theory and Methods,
27, 777-793.
EMS estimator for left-truncated and
interval-censored data.