Computer Programs

More recent

NEW! 6/30/2016: R package pGMGM is available for free download in CRAN. It offers estimation of multiple graphs based on penalized Gaussian Mixture Gaphical Models along with clustering analysis, as reported in the below paper:
Chen Gao, Yunzhang Zhu, Xiaotong Shen, and Wei Pan. Estimation of multiple networks in Gaussian mixture models. Electron. J. Statist. Volume 10, Number 1 (2016), 1133-1154. download .
the aSPUs and aSPUsPath tests for a single trait--a SNP set (e.g. in a gene) and a single trait--a pathway (i.e. a set of genes) associations with summary Z-statistics or p-values are available in the R package aSPU. The methods are described in the below paper:
Kwak I-Y, Pan W (2015). Adaptive Gene- and Pathway-Trait Association Testing with GWAS Summary Statistics. To appear in Bioinformatics.
A new R package "highmean" for high-dimensional two-sample aSPU and other tests is available on CRAN. The methods are described in the below paper:
Xu G, Lin L, Wei P, Pan W (2016). An adaptive two-sample test for high-dimensional means. Biometrika 103 (3): 609-624.
A new R package "prclust" for penalized regression-based clustering is available on CRAN. The methods are described in the below paper:
Chong Wu, Sunghoon Kwon, Xiaotong Shen, Wei Pan (2016). A New Algorithm and Theory for Penalized Regression-based Clustering JMLR 17(188):1-25, 2016. http://jmlr.org/papers/v17/15-553.html
the aSPU test for multiple trait--single SNP associations with summary Z-statistics.
Kim J, Bai Y, Pan W (2015). An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics. Genetic Epidemiology 39:651-663. DOI 10.1002/gepi.21931
R function code for SPU/aSPU, and how to do so for GWAS example code.
pathway-based aSPUpath test, available in the R package aSPU on CRAN: In R, use the below command to install,
install.packages("aSPU")
Pan W, Kwak IY, Wei P (2015). A Powerful Pathway-Based Adaptive Test for Genetic Association With Common or Rare Variants. Am J Hum Genet 97:86-98.
R package "aSPU" for the aSPU and aSPUpath tests is available on CRAN.
the SPU and aSPU tests
Pan W, Kim J, Zhang Y, Shen X, Wei P (2014). A Powerful and Adaptive Association Test for Rare Variants. Genetics, 197:1081-1095.
R package "aSPU" for the SPU and aSPU tests is available on CRAN.
Note: the SPU and aSPU tests can be used for other high-dimensional (or low-dimesnional) non-genetic (or genetic) data, as shown in:
Kim J, Wozniak JR, Mueller BA, Shen X, Pan W (2014). Comparison of statistical tests for group differences in brain functional networks. NeuroImage, 101:681-694.
Alternatively, you may want to use the below R functions directly:
R functions for permutation-based SPU/aSPU , bootstrap-based SPU/aSPU , and aSPU-based RV or predictor ranking .

Genetic Association Testing
Note: all programs assume that there is no missing value; if you have missing values in your data, please impute or remove them first.

Park JY, Wu C, Basu S, McGue M, Pan W (2017). Adaptive SNP-set Association Testing in Generalized Linear Mixed Models with Application to Family Studies. Submitted to Behav Genet.
Example R code.
Basu S, Pan W, Shen X, Oetting WS (2011). Multi-locus Association Testing with Penalized Regression. Genet Epi.
R functions for score, SSU, SSUw, UminP, Sum tests. R functions for Lasso-based LRT and Wasserman and Roeder's (Ann Stat 2009) Screen and Clean test, R functions for Lasso-based averaging/selection tests with the score or SSU statistic.
R code to generate simulated genotypes in Table 10 and Table 11.
Pan W, Basu S, Shen X (2011). Adaptive Tests for Detecting Gene-Gene and Gene-Environment Interactions. Hum Hered.
R functions for (modified) adaptive Neyman's tests: aScore, aSSU, aSSUw, aSum, (aUminP--slow), R functions for MC simulation-based adaptive UminP test, R functions for aSum2 (with 2-directional searches), R functions for score, SSU, SSUw, UminP, Sum tests.
Han F, Pan W (2011). A Composite Likelihood Approach to Latent Multivariate Gaussian Modeling of SNP Data with Application to Genetic Association Testing. Biometrics.
R functions for composite likelihood-based tests, R functions for maximum likelihood-based tests.
Pan W, Shen X (2011). Adaptive Tests for Association Analysis of Rare Variants. Genet Epi.
R functions for (modified) adaptive Neyman's tests: aScore, aSSU, aSSUw, aSum, (aUminP--slow),
Simulation programs: Simulation programs are the same as those in Basu and Pan (2011) shown below.
An example for Table 2 (case I) in Pan and Shen (2011); it's also similar to those in Basu and Pan (2011) except that the casual RVs had different MAFs from those of non-causal ones.
Basu S, Pan W (2011). Comparison of Statistical Tests for Disease Association with Rare Variants. Genet Epi.
R functions for Sequential Sum score tests, score, SSU, SSUw, UminP, Sum and aSum tests, wSSU-P test, C-alpha test, Li and Leal's CMC test and Madsen and Browning's weighted Sum test.
Simulation programs:
- simRareSNP.R: generate rare SNPs disretized from some latent MVN variates with correlation structure of CS; allow adding some non-causal SNPs which will be correlated with causal ones if rho!=0.
  An example for Tables 3-5 in Basu and Pan (2011).
- simAR1Rare2.R: generate rare SNPs disretized from some latent MVN variates with correlation structure of AR1; allow adding some non-causal SNPs which are INDEPEDENT of causal ones no matter what's the value of of rho; the non-causal SNPs also disretized from some latent MVN variates with an AR-1 corr structure.
  An example for Table 6 in Basu and Pan (2011).
- simRareCommonSNP.R: add some independent CVs, as in Table 7 of Basu & Pan (2011).
  An example for Table 7 in Basu and Pan (2011).
Han F, Pan W (2010). Powerful Multi-marker Association Tests: Unifying Genomic Distance-Based Regression and Logistic Regression To appear Genet Epi.
R function. Some instruction is given at the beginning of the R functions.
Han F, Pan W (2010). A Data-Adaptive Sum Test for Disease Association with Multiple Common or Rare Variants. To appear Human Heredity.
R function. Some instruction is given at the beginning of the R function.
Pan W (2010). Statistical Tests of Genetic Association in the Presence of Gene-Gene and Gene-Environment Interactions. Human Heredity 69, 131-142.
Note: the format of the input data for the below R programs is somewhat strange; in fact, there is no need to use the below two files; you could simply create an appropriate genotype matrix X (e.g. with both main effects and interactions), then call the function given by Pan (2009).
R function: SumSqUs, scores, and UminP tests for logistic regression with only main-effects, or with both main and 2-way interactions. Note: use the input genotype score matrix X direct (without centering or other transformation on X).
R function: Similar to the above except the g-inverse is used for a possibly singular covariance matrix (e.g. for the score vector) when the input genotype matrix X is NOT of full rank (i.e. the SNPs are not linearly independent).
R function: to generate simulated data as used in the paper.
An example R program to generate simulated data and then apply the SumSqUs, scores, and UminP tests for a purely epistatic genetic model.
Pan W, Han F, Shen X (2010). ``Test Selection with Application to Detecting Disease Association with Multiple SNPs". Human Heredity 69, 120-130.
Note: Some instruction is given at the beginning of the R function.
R function
Pan W (2010). A Unified Framework for Detecting Genetic Association with Multiple SNPs in a Candidate Gene or Region: Contrasting Genotype Scores and LD Patterns between Cases and Controls. Human Heredity 69, 1-13.
Note: Some instruction is given at the beginning of each R function.
R function: SumSqUs, score, UminP tests.
R function: Similar to the above SumSqUs/score/UminP tests except that the generalized inverse (g-inv) is used such that it works even if a covariance matrix (e.g. for the score statistic) is singular.
R function: LRT/LRT-pc tests.
R function: similar to the above LRT/LRT-pc tests except that one more pair of ourtput (p, k) is added to deal with singular input genotype matrix X, where p is the p-value and k is the # of PCs that can explain a default 99% of the variation in original X.
R function: LDC/mLDC tests.
R function: use LDC/mLDC terms (and possibly with main effects) in logistic regression, then apply the SSUs, UminP and score tests.
Pan W (2009). Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genetic Epidemiology 33, 497-507.
Note: Some instruction is given at the beginning of each R function.
R function: SumSqUs (i.e., SSU, SSUw), (multivariate) score, UminP tests.
R function: Similar to the above SumSqUs/score/UminP tests except that the generalized inverse (g-inv) is used such that it works even if a covariance matrix (e.g. for the score statistic) is singular.
Zhou H, Pan W (2009). Binomial Mixture Model-based Association Tests under Genetic Heterogeneity. Annals of Human Genetics 73, 614-630.
Manual, C++ program.

Population stratification

Liu B, Shen X, Pan W (2013). Semi-supervised spectral clustering with application to detect population stratification. Frontiers in Genetics, 4:215.
R functions for SSSC.

Penalized Regression

Kim S, Pan W, Shen X (2013). Network-based penalized regression with application to genomic data. Biometrics, 69, 582-593.
Zip compressed Matlab code.
Luo C, Pan W, Shen X (2012). A Two-Step Penalized Regression Method with Networked Predictors. Statistics in Biosciences (a special issue on network data analysis), 4, 27-46.
Zip compressed Matlab code.
Pan W, Xie B, Shen X. (2010). ``Incorporating Predictor Network in Penalized Regression with Application to Microarray Data". Biometrics 26, 501-508.
R program.
Pan W. (2009). ``Network-Based Multiple Locus Linkage Analysis of Expression Traits". Bioinformatics 25, 1390-1396.
R program for network-based regression,
Example R code and data for simulation set-up I:
R code for network-based regression, R code for interpolation used for network-based regression, R code for Lars, (imputed) genotype data with the original 196 markers, GPCR subnetwork, network data (after combining each network for each of multiple eQTL regresion models into an "expanded" single regression model).

Clustering Analysis

Chen Gao, Yunzhang Zhu, Xiaotong Shen, and Wei Pan. Estimation of multiple networks in Gaussian mixture models. Electron. J. Statist. Volume 10, Number 1 (2016), 1133-1154. download .
R package pGMGM is available for free download in CRAN.
Wu C, Kwon S, Shen X, Pan W (2016). A New Algorithm and Theory for Penalized Regression-based Clustering. Journal of Machine Learning Research 17(188):1-25.
R package "prclust" available on CRAN.
Pan W, Shen X, Liu B (2013). Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty. Journal of Machine Learning Research 14:1865-1889.
R package "prclust" available on CRAN, thanks to Chong Wu; in addition to the quadratic penalty algorithm discussed in the paper, a faster and better ADMM algorithm is also implemented.
Liu B, Shen X, Pan W (2014). irPCA. Updated 9/28/2015!
R code for irPCA, an example for Simulation 1, and results for dimension reduction and integrative loadings.
Liu B, Shen X, Pan W (2013). Semi-supervised spectral clustering with application to detect population stratification. Frontiers in Genetics, 4:215.
R functions for SSSC.
Zhou H, Pan W, Shen X (2009). Penalized model-based clustering with unconstrained covariance matrices. Electronic Journal of Statistics 3, 1473-1496.
Manual, R program.
Pan, W., Shen, X. (2007). Penalized Model-Based Clustering with Application to Variable Selection. Journal of Machine Learning Research 22, 1145-1164.
Manual, C++ program, R program, thanks to Hui Zhou who wrote the programs; a newer and improved version of the R program, thanks to Dr Jia Li at the Penn State U.

Interval Censoring

Pan, W. (2000) ``Smooth Estimation of the Survival for Interval Censored Data". Statistics in Medicine, 19, 2611-2624 README, SPlus function for NPMLE-based 2-sample tests, SPlus function for bandwidth selection in kernel smoothing, SPlus function for kernel-smoother-based 2-sample tests, SPlus function for logspline-based 2-sample tests, C program for calculating NPMLE, SPlus function for summarizing and drawing the NPMLE/kernel/logspline estimate of the survival function, and an example for its use.
Pan, W. (2000) ``A Two-Sample Test with Interval Censored Data via Multiple Imputation". Statistics in Medicine, 19, 1-11. README, SPlus function for PMDA, Splus function for ABB, C program for calculating NPMLE and imputing, sample makefile, generated object file of the C program in SunOS (in compressed form and decompress using gunzip).
Pan, W. and Chappell, R. (1998) ``A Nonparametric Estimator of Survival Functions for Arbitrarily Truncated and Censored Data". Lifetime Data Analysis , 4, 187-202. NPMLE (using GP), NPMLE (using EM) and INE for left-truncated and interval-censored data. INE for left-truncated and right-censored data. A NEW Splus program for INE for left-truncated and right-censored data; it also contains a function to use the nonparametric bootstrap to calculate point-wise confidence intervals of the survival probabilities.
Pan, W. and Chappell, R. (1998) ``Estimating Survival Curves with Left-truncated and Interval-censored Data via the EMS Algorithm". Communications in Statistics -- Theory and Methods, 27, 777-793. EMS estimator for left-truncated and interval-censored data.
Pan, W. and Chappell, R. (1998) ``Estimating survival curves with left-truncated and interval-censored data under monotone hazards". Biometrics, 54, 1053--1060. C code for monotone MLE and NPMLE (based on Turnbull's EM) for left-truncated and interval-censored data.