More recent
-
NEW! 6/30/2016: R package pGMGM is available for free download in CRAN.
It offers estimation of multiple graphs based on penalized Gaussian Mixture Gaphical Models
along with clustering analysis, as reported in the below paper:
Chen Gao, Yunzhang Zhu, Xiaotong Shen, and Wei Pan.
Estimation of multiple networks in Gaussian mixture models.
Electron. J. Statist.
Volume 10, Number 1 (2016), 1133-1154.
download.
-
the aSPUs and aSPUsPath tests for a single trait--a SNP set (e.g. in a gene) and a single trait--a pathway (i.e. a set of genes)
associations with summary Z-statistics or p-values are available in
the R package aSPU. The methods are described in the below paper:
Kwak I-Y, Pan W (2015).
Adaptive Gene- and Pathway-Trait Association Testing with GWAS Summary Statistics.
To appear in Bioinformatics.
-
A new R package "highmean" for high-dimensional two-sample aSPU and
other tests
is available on CRAN. The methods are described in the below paper:
Xu G, Lin L, Wei P, Pan W (2016).
An adaptive two-sample test for high-dimensional means.
Biometrika 103 (3): 609-624.
-
A new R package "prclust" for penalized regression-based clustering
is available on CRAN. The methods are described in the below paper:
Chong Wu, Sunghoon Kwon, Xiaotong Shen, Wei Pan (2016).
A New Algorithm and Theory for Penalized Regression-based Clustering
JMLR 17(188):1-25, 2016.
http://jmlr.org/papers/v17/15-553.html
-
the aSPU test for multiple trait--single SNP associations with summary Z-statistics.
Kim J, Bai Y, Pan W (2015).
An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics.
Genetic Epidemiology 39:651-663. DOI 10.1002/gepi.21931
R function code for SPU/aSPU,
and how to do so for GWAS
example code.
- pathway-based aSPUpath test,
available in the R package
aSPU on CRAN:
In R, use the below command to install,
install.packages("aSPU")
- Pan W, Kwak IY, Wei P (2015).
A Powerful Pathway-Based Adaptive Test for Genetic Association
With Common or Rare Variants.
Am J Hum Genet 97:86-98.
R package "aSPU" for the aSPU and aSPUpath tests is available on CRAN.
- the SPU and aSPU tests
- Pan W, Kim J, Zhang Y, Shen X, Wei P (2014).
A Powerful and Adaptive Association Test for Rare Variants.
Genetics, 197:1081-1095.
R package "aSPU" for the SPU and aSPU tests is available on CRAN.
- Note: the SPU and aSPU tests can be used for other high-dimensional
(or low-dimesnional) non-genetic (or genetic) data, as shown in:
Kim J, Wozniak JR, Mueller BA, Shen X, Pan W (2014).
Comparison of statistical tests for group differences in brain functional
networks. NeuroImage, 101:681-694.
- Alternatively, you may want to use the below R functions directly:
R functions for permutation-based SPU/aSPU ,
bootstrap-based SPU/aSPU ,
and aSPU-based RV or predictor ranking .
Genetic Association Testing
Note: all programs assume that there is no missing value;
if you have missing values in your data, please impute or remove them
first.
- Park JY, Wu C, Basu S, McGue M, Pan W (2017).
Adaptive SNP-set Association Testing in Generalized Linear Mixed Models with Application to Family Studies. Submitted to Behav Genet.
Example R code.
- Basu S, Pan W, Shen X, Oetting WS (2011). Multi-locus Association Testing with Penalized Regression. Genet Epi.
R functions for score, SSU, SSUw, UminP, Sum tests.
R functions for Lasso-based LRT and Wasserman and Roeder's (Ann Stat 2009) Screen and Clean test,
R functions for Lasso-based averaging/selection tests with the score or SSU statistic.
R code to generate simulated genotypes in Table 10 and
Table 11.
- Pan W, Basu S, Shen X (2011). Adaptive Tests for Detecting Gene-Gene and Gene-Environment Interactions. Hum Hered.
R functions for (modified) adaptive Neyman's tests: aScore, aSSU, aSSUw, aSum, (aUminP--slow),
R functions for MC simulation-based adaptive UminP test,
R functions for aSum2 (with 2-directional searches),
R functions for score, SSU, SSUw, UminP, Sum tests.
- Han F, Pan W (2011). A Composite Likelihood Approach to
Latent Multivariate Gaussian Modeling of SNP Data
with Application to Genetic Association Testing. Biometrics.
R functions for composite likelihood-based tests,
R functions for maximum likelihood-based tests.
- Pan W, Shen X (2011). Adaptive Tests for Association Analysis of Rare Variants. Genet Epi.
R functions for (modified) adaptive Neyman's tests: aScore, aSSU, aSSUw, aSum, (aUminP--slow),
Simulation programs: Simulation programs are the same as those in Basu and Pan (2011) shown below.
An example for Table 2 (case I) in Pan and Shen (2011); it's also similar to those in Basu and Pan (2011) except that the casual RVs had different MAFs from those of non-causal ones.
- Basu S, Pan W (2011). Comparison of Statistical Tests for Disease Association with Rare Variants. Genet Epi.
R functions for Sequential Sum score tests,
score, SSU, SSUw, UminP, Sum and aSum tests,
wSSU-P test,
C-alpha test,
Li and Leal's CMC test and Madsen and Browning's weighted Sum test.
Simulation programs:
- simRareSNP.R:
generate rare SNPs disretized from some latent MVN variates
with correlation structure of CS; allow adding some
non-causal SNPs which will be correlated with causal ones
if rho!=0.
An example for Tables 3-5 in Basu and Pan (2011).
- simAR1Rare2.R:
generate rare SNPs disretized from some latent MVN variates
with correlation structure of AR1; allow adding some
non-causal SNPs which are INDEPEDENT of causal ones
no matter what's the value of of rho; the non-causal
SNPs also disretized from some latent MVN variates
with an AR-1 corr structure.
An example for Table 6 in Basu and Pan (2011).
- simRareCommonSNP.R:
add some independent CVs, as in Table 7 of Basu & Pan (2011).
An example for Table 7 in Basu and Pan (2011).
- Han F, Pan W (2010).
Powerful Multi-marker Association Tests:
Unifying Genomic Distance-Based Regression
and Logistic Regression
To appear Genet Epi.
R function. Some instruction is given
at the beginning of the R functions.
- Han F, Pan W (2010).
A Data-Adaptive Sum Test for Disease Association with Multiple
Common or Rare Variants.
To appear Human Heredity.
R function. Some instruction is given
at the beginning of the R function.
- Pan W (2010).
Statistical Tests of Genetic Association in the Presence of Gene-Gene
and Gene-Environment Interactions.
Human Heredity 69, 131-142.
Note: the format of the input data for the below R programs is
somewhat strange; in fact, there is no need to use the below two files;
you could simply create an appropriate genotype matrix X (e.g. with both
main effects and interactions), then call the function given by Pan (2009).
R function:
SumSqUs, scores, and UminP tests for logistic regression with only
main-effects, or with both main and 2-way interactions.
Note: use the input genotype score matrix X direct (without centering or
other transformation on X).
R function:
Similar to the above except the g-inverse is used for a possibly singular covariance matrix (e.g. for the score vector) when the input genotype matrix X is NOT of full rank (i.e. the SNPs are not linearly independent).
R function: to generate simulated data
as used in the paper.
An example R program to generate simulated
data and then apply the SumSqUs, scores, and UminP tests for a purely epistatic
genetic model.
- Pan W, Han F, Shen X (2010).
``Test Selection with Application to Detecting Disease Association with
Multiple SNPs".
Human Heredity 69, 120-130.
Note: Some instruction is given at the beginning of the R function.
R function
- Pan W (2010).
A Unified Framework for Detecting Genetic Association with Multiple SNPs
in a Candidate Gene or Region: Contrasting Genotype Scores and LD Patterns
between Cases and Controls.
Human Heredity 69, 1-13.
Note: Some instruction is given at the beginning of each R function.
R function:
SumSqUs, score, UminP tests.
R function:
Similar to the above SumSqUs/score/UminP tests
except that the generalized inverse (g-inv) is used
such that it works even if a covariance matrix (e.g. for the score statistic)
is singular.
R function: LRT/LRT-pc tests.
R function: similar to the above
LRT/LRT-pc tests except that one more pair of ourtput (p, k) is added to
deal with singular input genotype matrix X, where p is the p-value
and k is the # of PCs that can explain a default 99% of the
variation in original X.
R function: LDC/mLDC tests.
R function: use LDC/mLDC terms (and possibly with main effects) in logistic regression, then apply the SSUs, UminP and score tests.
- Pan W (2009).
Asymptotic tests of association with multiple SNPs in linkage
disequilibrium.
Genetic Epidemiology 33, 497-507.
Note: Some instruction is given at the beginning of each R function.
R function:
SumSqUs (i.e., SSU, SSUw), (multivariate) score, UminP tests.
R function:
Similar to the above SumSqUs/score/UminP tests
except that the generalized inverse (g-inv) is used
such that it works even if a covariance matrix (e.g. for the score statistic)
is singular.
- Zhou H, Pan W (2009).
Binomial Mixture Model-based Association Tests under Genetic Heterogeneity.
Annals of Human Genetics 73, 614-630.
Manual,
C++ program.
Population stratification
- Liu B, Shen X, Pan W (2013).
Semi-supervised spectral clustering with application to detect population
stratification. Frontiers in Genetics, 4:215.
R functions for SSSC.
Penalized Regression
-
Kim S, Pan W, Shen X (2013).
Network-based penalized regression with application to genomic data.
Biometrics, 69, 582-593.
Zip compressed Matlab code.
- Luo C, Pan W, Shen X (2012).
A Two-Step Penalized Regression Method with Networked Predictors.
Statistics in Biosciences (a special issue on network data analysis),
4, 27-46.
Zip compressed Matlab code.
- Pan W, Xie B, Shen X. (2010).
``Incorporating Predictor Network in Penalized Regression with
Application to Microarray Data".
Biometrics 26, 501-508.
R program.
- Pan W. (2009).
``Network-Based Multiple Locus Linkage Analysis of Expression Traits".
Bioinformatics 25, 1390-1396.
R program for network-based regression,
Example R code and data for simulation set-up I:
R code for network-based regression,
R code for interpolation used for network-based regression,
R code for Lars,
(imputed) genotype data with the original 196 markers,
GPCR subnetwork,
network data
(after combining each network for each of multiple eQTL regresion models
into an "expanded" single regression model).
Clustering Analysis
-
Chen Gao, Yunzhang Zhu, Xiaotong Shen, and Wei Pan.
Estimation of multiple networks in Gaussian mixture models.
Electron. J. Statist.
Volume 10, Number 1 (2016), 1133-1154.
download.
R package pGMGM is available for free download in CRAN.
- Wu C, Kwon S, Shen X, Pan W (2016). A New Algorithm and Theory for
Penalized Regression-based Clustering. Journal of Machine Learning Research 17(188):1-25.
R package "prclust" available on CRAN.
- Pan W, Shen X, Liu B (2013). Cluster Analysis: Unsupervised Learning via Supervised
Learning with a Non-convex Penalty. Journal of Machine Learning Research 14:1865-1889.
R package "prclust" available on CRAN, thanks to Chong Wu; in addition to the quadratic penalty
algorithm discussed in the paper, a faster and better ADMM algorithm is also implemented.
- Liu B, Shen X, Pan W (2014). irPCA. Updated 9/28/2015!
R code for irPCA, an example for
Simulation 1, and results for
dimension reduction and
integrative loadings.
- Liu B, Shen X, Pan W (2013).
Semi-supervised spectral clustering with application to detect population
stratification. Frontiers in Genetics, 4:215.
R functions for SSSC.
- Zhou H, Pan W, Shen X (2009).
Penalized model-based clustering with unconstrained covariance matrices.
Electronic Journal of Statistics 3, 1473-1496.
Manual,
R program.
- Pan, W., Shen, X. (2007).
Penalized Model-Based Clustering with Application to Variable Selection.
Journal of Machine Learning Research 22, 1145-1164.
Manual,
C++ program,
R program,
thanks to Hui Zhou who wrote the programs; a newer and improved version of the
R program, thanks to
Dr Jia Li at the Penn State U.
Interval Censoring
- Pan, W. (2000)
``Smooth Estimation of the Survival for Interval Censored Data".
Statistics in Medicine, 19, 2611-2624
README,
SPlus function for NPMLE-based 2-sample tests,
SPlus function for bandwidth selection
in kernel smoothing,
SPlus function for kernel-smoother-based 2-sample tests,
SPlus function for logspline-based 2-sample tests,
C program for calculating NPMLE,
SPlus function for summarizing and drawing
the NPMLE/kernel/logspline estimate of the survival function, and an
example for its use.
- Pan, W. (2000)
``A Two-Sample Test with Interval Censored Data via Multiple Imputation".
Statistics in Medicine, 19, 1-11.
README,
SPlus function for PMDA,
Splus function for ABB,
C program for calculating NPMLE
and imputing,
sample makefile,
generated object file of the C
program in SunOS (in compressed form and decompress using gunzip).
- Pan, W. and Chappell, R. (1998)
``A Nonparametric Estimator of Survival Functions
for Arbitrarily Truncated and Censored Data".
Lifetime Data Analysis , 4, 187-202.
NPMLE (using GP),
NPMLE (using EM) and
INE for left-truncated and interval-censored data.
INE for left-truncated and
right-censored data.
A NEW Splus program for
INE for left-truncated and
right-censored data; it also contains a function to use the nonparametric
bootstrap to calculate point-wise confidence intervals of the survival
probabilities.
- Pan, W. and Chappell, R. (1998)
``Estimating Survival Curves with Left-truncated and
Interval-censored Data via the EMS Algorithm".
Communications in Statistics -- Theory and Methods,
27, 777-793.
EMS estimator for left-truncated and
interval-censored data.
- Pan, W. and Chappell, R. (1998)
``Estimating survival curves with left-truncated
and interval-censored data under monotone hazards".
Biometrics, 54, 1053--1060.
C code for monotone MLE and
NPMLE (based on Turnbull's EM)
for left-truncated and
interval-censored data.