 NEW! 5/18/2015: pathwaybased aSPUpath test,
available in the R package
aSPU, to be put on CRAN soon.
 Pan W, Kwak IY, Wei P (2015).
A Powerful PathwayBased Adaptive Test for Genetic Association
With Common or Rare Variants.
Submitted to Am J Hum Genet.
An R package for the SPU
and aSPU tests.
 NEW! 10/20/2014: the SPU and aSPU tests
 Pan W, Kim J, Zhang Y, Shen X, Wei P (2014).
A Powerful and Adaptive Association Test for Rare Variants.
Genetics, 197:10811095.
An R package aSPU for the SPU
and aSPU tests.
 Note: the SPU and aSPU tests can be used for other highdimensional
(or lowdimesnional) nongenetic (or genetic) data, as shown in:
Kim J, Wozniak JR, Mueller BA, Shen X, Pan W (2014).
Comparison of statistical tests for group differences in brain functional
networks. NeuroImage, 101:681694.
 Alternatively, you may want to use the below R functions directly:
R functions for permutationbased SPU/aSPU ,
bootstrapbased SPU/aSPU ,
and aSPUbased RV or predictor ranking .
Population stratification
 Liu B, Shen X, Pan W (2013).
Semisupervised spectral clustering with application to detect population
stratification. Frontiers in Genetics, 4:215.
R functions for SSSC.
Genetic Association Testing
Note: all programs assume that there is no missing value;
if you have missing values in your data, please impute or remove them
first.
 Basu S, Pan W, Shen X, Oetting WS (2011). Multilocus Association Testing with Penalized Regression. submitted to Genet Epi.
R functions for score, SSU, SSUw, UminP, Sum tests.
R functions for Lassobased LRT and Wasserman and Roeder's (Ann Stat 2009) Screen and Clean test,
R functions for Lassobased averaging/selection tests with the score or SSU statistic.
R code to generate simulated genotypes in Table 10 and
Table 11.
 Pan W, Basu S, Shen X (2011). Adaptive Tests for Detecting GeneGene and GeneEnvironment Interactions. submitted to Hum Hered.
R functions for (modified) adaptive Neyman's tests: aScore, aSSU, aSSUw, aSum, (aUminPslow),
R functions for MC simulationbased adaptive UminP test,
R functions for aSum2 (with 2directional searches),
R functions for score, SSU, SSUw, UminP, Sum tests.
 Han F, Pan W (2011). A Composite Likelihood Approach to
Latent Multivariate Gaussian Modeling of SNP Data
with Application to Genetic Association Testing. Biometrics.
R functions for composite likelihoodbased tests,
R functions for maximum likelihoodbased tests.
 Pan W, Shen X (2011). Adaptive Tests for Association Analysis of Rare Variants. Genet Epi.
R functions for (modified) adaptive Neyman's tests: aScore, aSSU, aSSUw, aSum, (aUminPslow),
Simulation programs: Simulation programs are the same as those in Basu and Pan (2011) shown below.
An example for Table 2 (case I) in Pan and Shen (2011); it's also similar to those in Basu and Pan (2011) except that the casual RVs had different MAFs from those of noncausal ones.
 Basu S, Pan W (2011). Comparison of Statistical Tests for Disease Association with Rare Variants. Genet Epi.
R functions for Sequential Sum score tests,
score, SSU, SSUw, UminP, Sum and aSum tests,
wSSUP test,
Calpha test,
Li and Leal's CMC test and Madsen and Browning's weighted Sum test.
Simulation programs:
 simRareSNP.R:
generate rare SNPs disretized from some latent MVN variates
with correlation structure of CS; allow adding some
noncausal SNPs which will be correlated with causal ones
if rho!=0.
An example for Tables 35 in Basu and Pan (2011).
 simAR1Rare2.R:
generate rare SNPs disretized from some latent MVN variates
with correlation structure of AR1; allow adding some
noncausal SNPs which are INDEPEDENT of causal ones
no matter what's the value of of rho; the noncausal
SNPs also disretized from some latent MVN variates
with an AR1 corr structure.
An example for Table 6 in Basu and Pan (2011).
 simRareCommonSNP.R:
add some independent CVs, as in Table 7 of Basu & Pan (2011).
An example for Table 7 in Basu and Pan (2011).
 Han F, Pan W (2010).
Powerful Multimarker Association Tests:
Unifying Genomic DistanceBased Regression
and Logistic Regression
To appear Genet Epi.
R function. Some instruction is given
at the beginning of the R functions.
 Han F, Pan W (2010).
A DataAdaptive Sum Test for Disease Association with Multiple
Common or Rare Variants.
To appear Human Heredity.
R function. Some instruction is given
at the beginning of the R function.
 Pan W (2010).
Statistical Tests of Genetic Association in the Presence of GeneGene
and GeneEnvironment Interactions.
Human Heredity 69, 131142.
Note: the format of the input data for the below R programs is
somewhat strange; in fact, there is no need to use the below two files;
you could simply create an appropriate genotype matrix X (e.g. with both
main effects and interactions), then call the function given by Pan (2009).
R function:
SumSqUs, scores, and UminP tests for logistic regression with only
maineffects, or with both main and 2way interactions.
Note: use the input genotype score matrix X direct (without centering or
other transformation on X).
R function:
Similar to the above except the ginverse is used for a possibly singular covariance matrix (e.g. for the score vector) when the input genotype matrix X is NOT of full rank (i.e. the SNPs are not linearly independent).
R function: to generate simulated data
as used in the paper.
An example R program to generate simulated
data and then apply the SumSqUs, scores, and UminP tests for a purely epistatic
genetic model.
 Pan W, Han F, Shen X (2010).
``Test Selection with Application to Detecting Disease Association with
Multiple SNPs".
Human Heredity 69, 120130.
Note: Some instruction is given at the beginning of the R function.
R function
 Pan W (2010).
A Unified Framework for Detecting Genetic Association with Multiple SNPs
in a Candidate Gene or Region: Contrasting Genotype Scores and LD Patterns
between Cases and Controls.
Human Heredity 69, 113.
Note: Some instruction is given at the beginning of each R function.
R function:
SumSqUs, score, UminP tests.
R function:
Similar to the above SumSqUs/score/UminP tests
except that the generalized inverse (ginv) is used
such that it works even if a covariance matrix (e.g. for the score statistic)
is singular.
R function: LRT/LRTpc tests.
R function: similar to the above
LRT/LRTpc tests except that one more pair of ourtput (p, k) is added to
deal with singular input genotype matrix X, where p is the pvalue
and k is the # of PCs that can explain a default 99% of the
variation in original X.
R function: LDC/mLDC tests.
R function: use LDC/mLDC terms (and possibly with main effects) in logistic regression, then apply the SSUs, UminP and score tests.
 Pan W (2009).
Asymptotic tests of association with multiple SNPs in linkage
disequilibrium.
Genetic Epidemiology 33, 497507.
Note: Some instruction is given at the beginning of each R function.
R function:
SumSqUs (i.e., SSU, SSUw), (multivariate) score, UminP tests.
R function:
Similar to the above SumSqUs/score/UminP tests
except that the generalized inverse (ginv) is used
such that it works even if a covariance matrix (e.g. for the score statistic)
is singular.
 Zhou H, Pan W (2009).
Binomial Mixture Modelbased Association Tests under Genetic Heterogeneity.
Annals of Human Genetics 73, 614630.
Manual,
C++ program.
Penalized Regression

Kim S, Pan W, Shen X (2013).
Networkbased penalized regression with application to genomic data.
Biometrics, 69, 582593.
Zip compressed Matlab code.
 Luo C, Pan W, Shen X (2012).
A TwoStep Penalized Regression Method with Networked Predictors.
Statistics in Biosciences (a special issue on network data analysis),
4, 2746.
Zip compressed Matlab code.
 Pan W, Xie B, Shen X. (2010).
``Incorporating Predictor Network in Penalized Regression with
Application to Microarray Data".
Biometrics 26, 501508.
R program.
 Pan W. (2009).
``NetworkBased Multiple Locus Linkage Analysis of Expression Traits".
Bioinformatics 25, 13901396.
R program for networkbased regression,
Example R code and data for simulation setup I:
R code for networkbased regression,
R code for interpolation used for networkbased regression,
R code for Lars,
(imputed) genotype data with the original 196 markers,
GPCR subnetwork,
network data
(after combining each network for each of multiple eQTL regresion models
into an "expanded" single regression model).
Clustering Analysis
 Liu B, Shen X, Pan W (2014). irPCA.
R code for irPCA, an example for
Simulation 1.
 Liu B, Shen X, Pan W (2013).
Semisupervised spectral clustering with application to detect population
stratification. Frontiers in Genetics, 4:215.
R functions for SSSC.
 Zhou H, Pan W, Shen X (2009).
Penalized modelbased clustering with unconstrained covariance matrices.
Electronic Journal of Statistics 3, 14731496.
Manual,
R program.
 Pan, W., Shen, X. (2007).
Penalized ModelBased Clustering with Application to Variable Selection.
Journal of Machine Learning Research 22, 11451164.
Manual,
C++ program,
R program,
thanks to Hui Zhou who wrote the programs; a newer and improved version of the
R program, thanks to
Dr Jia Li at the Penn State U.
Interval Censoring
 Pan, W. (2000)
``Smooth Estimation of the Survival for Interval Censored Data".
Statistics in Medicine, 19, 26112624
README,
SPlus function for NPMLEbased 2sample tests,
SPlus function for bandwidth selection
in kernel smoothing,
SPlus function for kernelsmootherbased 2sample tests,
SPlus function for logsplinebased 2sample tests,
C program for calculating NPMLE,
SPlus function for summarizing and drawing
the NPMLE/kernel/logspline estimate of the survival function, and an
example for its use.
 Pan, W. (2000)
``A TwoSample Test with Interval Censored Data via Multiple Imputation".
Statistics in Medicine, 19, 111.
README,
SPlus function for PMDA,
Splus function for ABB,
C program for calculating NPMLE
and imputing,
sample makefile,
generated object file of the C
program in SunOS (in compressed form and decompress using gunzip).
 Pan, W. and Chappell, R. (1998)
``A Nonparametric Estimator of Survival Functions
for Arbitrarily Truncated and Censored Data".
Lifetime Data Analysis , 4, 187202.
NPMLE (using GP),
NPMLE (using EM) and
INE for lefttruncated and intervalcensored data.
INE for lefttruncated and
rightcensored data.
A NEW Splus program for
INE for lefttruncated and
rightcensored data; it also contains a function to use the nonparametric
bootstrap to calculate pointwise confidence intervals of the survival
probabilities.
 Pan, W. and Chappell, R. (1998)
``Estimating Survival Curves with Lefttruncated and
Intervalcensored Data via the EMS Algorithm".
Communications in Statistics  Theory and Methods,
27, 777793.
EMS estimator for lefttruncated and
intervalcensored data.
 Pan, W. and Chappell, R. (1998)
``Estimating survival curves with lefttruncated
and intervalcensored data under monotone hazards".
Biometrics, 54, 10531060.
C code for monotone MLE and
NPMLE (based on Turnbull's EM)
for lefttruncated and
intervalcensored data.