On testing genetic association with multiple traits
R package development
- We developed a R package MSKAT for testing the rare variant set association with multiple phenotypes.
- The current implementation requires the davies function in the CompQuadForm R package.
- A variation of the multivariate kernel machine regression (MKMR) is adapted to rare variant set association test.
- Ref: Maity,A., Sullivan,P.F., Tzeng,J. (2012) Multivariate Phenotype Association Analysis by Marker-Set Kernel Machine Regression. Genetic Epidemiology, 36, 686-695.
- MKMR was orginally developed for common variants and relied on resampling to compute significance p-values.
- We wrap the authors' posted R codes (http://www4.stat.ncsu.edu/~maity/software/Multivariate_Response_KMTest.txt) into a R function 'MKMR()', but add the (default) option to more efficiently compute analytical p-values using the Davies method instead of the original resampling approach.
- Two multivariate versions of SKAT are implemented.
- Ref: Wu,B., Pankow,J.S. (2016). Sequence kernel association test of multiple continuous phenotypes. Genetic Epidemiology, 40(2), 91-100.
- The Gene Association with Multiple Traits (GAMuT) is also implemented.
- Ref: Broadaway, K.A., Cutler, D.J., Duncan, R., Moore, J.L., Ware, E.B., Jhun, M.A., Bielak, L.F., Zhao, W., Smith, J.A., Peyser, P.A., Kardia, S.L.R., Ghosh, D., Epstein, M.P. (2016). A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants. The American Journal of Human Genetics, 98, 525–540.
- We wrap the authors' posted R codes (http://genetics.emory.edu/labs/epstein/software/gamut) into a R function 'gamut()'.
- We also develop an alternative implementation wrapped into a R function 'fgamut()' that is several orders of magnitude faster than gamut().
- MSKAT R packages
Several comments
1. MSKAT vs GAMuT
- GAMuT with the project matrix kernel roughly corresponds to the \(Q\) statistic in MSKAT.
- GAMuT with the linear kernel roughly corresponds to the \(Q'\) statistic in MSKAT.
2. Computing the p-value for variant set association test
- Reduce to calculating the tail probability for weighted sum of 1-DF
chi-square distributions.
- Default computation approaches implemented in GAMuT do not have accurate results at small significance levels.
- GAMuT has very liberal type I errors at extreme significance levels.
- We adopt a numerical approach to accurately computing the analytical p-values for variant set association test.
- The implementation requires the davies function in the CompQuadForm R package.
- main idea: using the Davies method with more stringent convergence criterion. Otherwise it could lead to very liberal results.
- Ref: Wu,B., Guan,W., Pankow,J.S. (2016) On efficient and accurate calculation of significance p-values for sequence kernel association test of variant set. Annals of human genetics, 80(2), 123-135.
3. Testing general (non-continuous) outcomes
- Using arguments of CLT, we can roughly treat general outcomes as continuous and apply the multi-trait testing methods. Then the test statistics and their 1-DF chi-square null distributions asymptotically hold.
- The accuracy of asymptotic approximation may not be very accurate
for finite sample sizes. For example, for binary data
- The Bernoulli variance is \(\theta(1-\theta)\) where \(\theta\) is event probabilities.
- In GAMuT, we approximate Bernoulli variance with constant normal variance.
- For binary events with a narrow range of \(\theta\), GAMuT approximation could work well.
- Binary data with extreme event probabilities \(\theta\) could show a wide range of variations. Type I error inflation could occur.
- Generally we need to be cautious when applying the method to non-continuous
outcomes.
- More computing intensive resampling/permutation p-values should be computed for verification.
4. Effects of adjusting for covariates
- The modeling framework of MSKAT is a more unified approach to naturally accounting/adjusting for (especially ancestry) covariates.
- As a result, MSKAT could have better finite sample performance than GAMuT: it
appropriately accounts for the estimation of covariates when
computing the asymptotic distributions.
- GAMuT typically uses unadjusted genotype covariance matrix.
- MSKAT appropriately uses (covariate) adjusted genotype covariance matrix, which is generally smaller than the unadjusted covariance matrix (i.e., smaller eigen-values).
- So MSKAT will have smaller p-values than GAMuT under population
stratification.
- MSKAT offers more improvements under strong population differences.
- Computationally MSKAT is much more efficient than GAMuT and scales very well to genome and exome-wide analyses.
Simulate rare variant set
- Simulate variants in a selected gene region based on the multivariate normal distribution (MVN) with compound symmetry covariance matrix \(\Sigma\).
- Assume marginal variance is one and the pairwise correlation is \(\rho\): \(\Sigma_{jj}=1, \Sigma_{kj}=\rho, \forall k\neq j\).
- Assume marginal MAFs \((\theta_1,\ldots,\theta_m)\) for the \(m\) variants.
- compute quantiles \(q_j=\Phi^{-1}(\theta_j)\).
- simulate \(Z_1\sim N(0,\Sigma)\) and \(Z_2\sim N(0,\Sigma)\) (independently).
- \(Z_1=(z_{11},\ldots,z_{1m}), Z_2=(z_{21},\ldots,z_{2m})\)
- construct the j-th variant genotype as \(g_j = I(z_{1j}\lt q_j)+I(z_{2j}\lt q_j)\).
- Typical approach is to applying SVD on \(\Sigma\) to compute
\(\Sigma^{0.5}\), which is then multiplied by independent standard normal random variables to simulate
\(Z\) from \(N(0,\Sigma)\).
- not scalable to large variant set and sample size.
- We implement a R function to very quickly simulate variants as follows.
- Simulate iid \((x_1,\ldots,x_m,x_{m+1})\) from standard normal \(N(0,1)\).
- Set \(z_j=\sqrt{1-\rho}x_j+\sqrt{\rho}x_{m+1}\), and threshold them at the quantiles.
- We can easily verify that \(var(z_j)=1, cov(z_j,z_l)=\rho, \forall j\neq l\).
- We can easily compute LD for any given variant MAFs.
- Generally for rare variants with small MAFs, the LD is small even for relatively large \(\rho\).
- For example, for \(\rho=0.5\), and \(\theta_j=0.01\), we compute LD as follows.
- let \(q_1=\Phi^{-1}(0.01)=-2.326\), compute \(\Pr(z_1\leq q_1, z_2\leq q_1)=0.00129\) (using the mvtnorm R package)
- so \(corr[I(z_1\leq q_1), I(z_2\leq q_1)] = (0.00129-0.01^2)/0.01/0.99=0.121\), and hence LD=0.015.
- With MAF=0.01, we need to have \(\rho=0.755\) to reach pairwise LD=0.1.
- For convenience, we implement a R function to compute pairwise LD based on the input \(\rho\) and MAFs \((\theta_j,\theta_l)\).
## Simulate variant set ## m number of variants ## rho compound symmetry correlation ## N population size to be simulated ## maf0,maf1 lower and upper bound for the MAF ## return an Nxm genotype matrix (assuming HWE) SimVS <- function(m=16, rho=0.5, N=2e5, maf0=0.001,maf1=0.01){ ## G sim MAFs = runif(m, maf0,maf1) q1 = qnorm(MAFs) Z1 = matrix(rnorm(N*m),N,m)*sqrt(1-rho) + rnorm(N)*sqrt(rho) Z2 = matrix(rnorm(N*m),N,m)*sqrt(1-rho) + rnorm(N)*sqrt(rho) G = t( 1*(t(Z1)<q1) + 1*(t(Z2)<q1) ) return(G) } ## compute pairwise LD ## rho compound symmetry correlation ## maf MAFs of the two variants ## return the computed LD (squared correlation) library(mvtnorm) LD <- function(rho=0.5,maf=c(0.01,0.005)){ q = qnorm(maf) sigs = matrix(rho,2,2); diag(sigs) = 1 pr = pmvnorm(rep(-Inf,2),q, sigma=sigs) a1 = (pr-maf[1]*maf[2])/sqrt(maf[1]*(1-maf[1]))/sqrt(maf[2]*(1-maf[2])) return(a1^2) } ## some example rho = seq(0,0.95, length=20) LDs = rep(0,20) for(i in 1:20){ LDs[i] = LD(rho[i], rep(0.01,2)) } plot(rho, LDs, type='b', xlab=expression(rho), ylab='LD') ## reverse solving rho for a given LD uniroot(function(x) LD(x,rep(0.01,2))-0.1, c(0,0.95))$root
Simulation study
On comment 1 and 2
Consider 1000 unrelated individuals with three outcomes: two continuous and one binary
- Two covariates
- one binary indicator \(X_1\) with event probability of 0.5.
- one continuous covariate \(X_2\) simulated from the standard normal dist.
- Traits simulated as follows
- simulate three continuous traits \((Z_1,Z_2,Z_3)\) from MVN
- means equal to \(1+0.5(X_1+X_2)\)
- marginal unit variance and pairwise correlation \(\gamma=0.25\)
- the last binary trait generated as follows
- given \(Z_3-1-0.5(X_1+X_2)=\epsilon\sim N(0,1)\), let \(p=\Phi(\epsilon), \eta = \log\frac{p}{1-p}\)
- define \(D=1\) if \(\eta+0.5(X_1+X_2) >0\) and \(D=0\) otherwise.
- can easily verify the marginal logistic model: \(\mbox{logit}\Pr(D=1)=0.5(X_1+X_2)\).
- simulate three continuous traits \((Z_1,Z_2,Z_3)\) from MVN
- Two covariates
- We simulate a set of 40 variants using the previous approach.
- MAFs \(\{\theta_j\}\) uniformly simulated from interval \((0.001,0.01)\)
- MVN correlation \(\theta=0.5\)
- Variants are equally weighted (i.e., Beta dist based weights with two parameters equal to 1).
- Simulation results
- Conduct \(4\times 10^8\) null simulations to get stable results
- Consider significance levels \(\alpha=10^{-2},\ldots,10^{-6},10^{-7},5\times 10^{-8}\).
- Results reported as the inflation ratio, defined as the empirical type I errors divided by the
nominal significance level \(\alpha\)
- value larger than 1 means inflated type I errors.
Compare three sets of methods
- Original GAMuT implementations with linear and projection kernels.
- "Corrected GAMuT": the same GAMuT test statistics, the only difference is the modified p-value calculation by using the numerical algorithm of Wu et al. (2016).
- MSKAT statistics Q and Q'.
\(\alpha\) 10-2 10-3 10-4 10-5 10-6 10-7 5x10-8 GAMuT Proj 0.81 0.70 0.60 0.43 0.47 4.69 9.38 GAMuT Linear 0.81 0.71 0.62 0.48 0.32 3.13 6.26 GAMuT Proj Corrected 0.81 0.70 0.61 0.52 0.39 0.20 0.15 GAMuT Linear Corrected 0.81 0.71 0.63 0.56 0.47 0.20 0.20 MSKAT Q 0.80 0.69 0.59 0.51 0.39 0.17 0.15 MSKAT Q' 0.80 0.70 0.62 0.55 0.46 0.20 0.20 - Some conclusions
- GAMuT has severely inflated type I errors for small significance levels of 10-7 and 5x10-8.
- MSKAT always properly controlled the type I errors.
- "Corrected GAMuT" has very similar results as MSKAT.
- linear/projection kernels correspond to Q' and Q respectively.
- "Corrected GAMuT" vs original GAMuT
- very similar results at large \(\alpha\) levels (10-2,10-3,10-4)
- the original GAMuT has conservative p-values at \(\alpha=10^{-5}\), but severely inflated p-values at \(\alpha=10^{-7},5\times 10^{-8}\).
- The following simulation results for GAMuT are based on the "Corrected GAMuT".
On comment 3
- Consider \(n\) unrelated individuals with four outcomes: two continuous and two binary
- Two covariates
- one binary indicator \(X_1\) with event probability of 0.5.
- one positive covariate \(X_2\) simulated from the log normal dist.
- Traits simulated as follows
- simulate four continuous traits from MVN
- means equal to \(1+X_1+X_2\)
- marginal unit variance and pairwise correlation \(\gamma=0.25\)
- dichotomize the last two traits at their median to produce the binary traits.
- simulate four continuous traits from MVN
- Simulate a set of 40 variants using the previous approach.
- MAFs \(\{\theta_j\}\) uniformly simulated from interval \((0.001,0.01)\)
- MVN correlation \(\theta=0.5\)
- Two covariates
- Simulation results for GAMuT
- Conduct \(4\times 10^7\) null simulations for \(n=1000,2000,5000\)
- Consider significance levels \(\alpha=10^{-2},\ldots,10^{-5},2.5\times 10^{-6}\).
- Results reported as the inflation ratio (empirical type I errors divided by the
nominal significance level \(\alpha\))
- value larger than 1 means inflated type I errors.
Results for n=1000
\(\alpha\) 10-2 10-3 10-4 10-5 2.5x10-6 10-6 GAMuT Proj 0.95 1.12 1.91 4.76 8.81 14.2 GAMuT Linear 0.94 1.21 2.31 6.03 11.5 17.9 MSKAT Q 0.94 1.15 2.13 5.97 12.7 21.6 MSKAT Q' 0.94 1.25 2.58 7.58 15.7 26.9 Results for n=2000
\(\alpha\) 10-2 10-3 10-4 10-5 2.5x10-6 10-6 GAMuT Proj 1.02 1.21 1.97 5.01 10.3 16.7 GAMuT Linear 1.02 1.30 2.32 6.10 12.2 20.1 MSKAT Q 1.01 1.22 2.05 5.51 11.8 19.5 MSKAT Q' 1.02 1.31 2.41 6.69 13.6 23.2 Results for n=5000
\(\alpha\) 10-2 10-3 10-4 10-5 2.5x10-6 10-6 GAMuT Proj 1.04 1.17 1.63 3.39 6.52 10.1 GAMuT Linear 1.05 1.23 1.83 4.01 7.37 11.3 MSKAT Q 1.04 1.17 1.64 3.47 6.74 10.5 MSKAT Q' 1.05 1.23 1.85 4.10 7.69 11.9 - Conclusions
- GAMuT only controlled type I errors well at the 0.01 level.
- GAMuT has severely inflated type I errors at small \(\alpha\) levels.
- inflation generally got worse at smaller significance levels.
- For n=1000 and n=2000, we got comparable inflations.
- With large sample size n=5000, type I errors were still
severely inflated (though reduced).
- might need really huge sample sizes for the CLT to kick in to produce accurate p-values.
- worthwhile to check/verify small p-values by resampling/permutations.
- Inflation is mainly due to the two binary traits (non-constant variances).
On comment 4
- Consider 1000 unrelated individuals with four continuous outcomes
- Two covariates
- one binary indicator \(X_1\) with event probability of 0.5.
- one continuous covariate \(X_2\) simulated from the standard normal dist.
- Traits simulated as four continuous traits from MVN
- marginal unit variance and pairwise correlation \(\gamma=0.25\)
- means equal to \(1+X_1+X_2+\sum_{j\in A}\beta_jg_j\)
- \(g_j\) is the genotype for variant \(j\), \(\beta_j\) is the effect size, and \(A\) is the set of causal variants.
- causal variant set A could be different across traits.
- Simulate a set of \(m\) variants using the previous approach.
- MVN correlation \(\theta=0.5\)
- Two covariates
- Power simulation 1
- set MAFs as follows
- simulate \((\theta_1,\ldots,\theta_m)\) uniformly from interval \((0.001,0.005)\).
- set MAF as \(\theta_j+0.01X_1\) for variant j.
- here we have population stratification.
- variants equally weighted.
- Conduct \(10^5\) simulations for \(m=20,40,80\).
- For each trait, we randomly select 4 variants as causal with effect size \(\beta_j=0.5\).
- Consider significance levels \(\alpha=10^{-5},2.5\times 10^{-6},10^{-6},10^{-7}\).
Results for \(m=80\)
\(\alpha\) 10-5 2.5x10-6 10-6 10-7 Q 0.446 0.345 0.286 0.168 GAMuT Proj 0.406 0.306 0.250 0.141 Q' 0.544 0.438 0.375 0.237 GAMuT Linear 0.509 0.403 0.339 0.206 Results for \(m=40\)
\(\alpha\) 10-5 2.5x10-6 10-6 10-7 Q 0.590 0.485 0.419 0.273 GAMuT Proj 0.556 0.448 0.382 0.239 Q' 0.649 0.546 0.479 0.325 GAMuT Linear 0.618 0.512 0.444 0.291 Results for \(m=20\)
\(\alpha\) 10-5 2.5x10-6 10-6 10-7 Q 0.755 0.668 0.607 0.454 GAMuT Proj 0.734 0.641 0.577 0.420 Q' 0.780 0.693 0.633 0.474 GAMuT Linear 0.760 0.668 0.605 0.442
- set MAFs as follows
- Power simulation 2
- set MAFs as follows
- simulate \((\theta_1,\ldots,\theta_m)\) uniformly from interval \((0.001,0.002)\).
- set MAF as \(\theta_j+0.02X_1\) for variant j.
- here we have stronger population stratification.
- variants equally weighted.
- Conduct \(10^5\) simulations for \(m=20,40,80\).
- For each trait, we randomly select 4 variants as causal with effect size \(\beta_j=0.4\).
- Consider significance levels \(\alpha=10^{-5},2.5\times 10^{-6},10^{-6},10^{-7}\).
Results for \(m=80\)
\(\alpha\) 10-5 2.5x10-6 10-6 10-7 Q 0.457 0.354 0.293 0.172 GAMuT Proj 0.359 0.261 0.206 0.105 Q' 0.596 0.491 0.423 0.268 GAMuT Linear 0.507 0.397 0.329 0.193 Results for \(m=40\)
\(\alpha\) 10-5 2.5x10-6 10-6 10-7 Q 0.574 0.462 0.393 0.252 GAMuT Proj 0.475 0.359 0.300 0.174 Q' 0.678 0.573 0.504 0.343 GAMuT Linear 0.604 0.484 0.410 0.256 Results for \(m=20\)
\(\alpha\) 10-5 2.5x10-6 10-6 10-7 Q 0.732 0.635 0.573 0.409 GAMuT Proj 0.657 0.556 0.485 0.320 Q' 0.799 0.707 0.642 0.485 GAMuT Linear 0.740 0.635 0.566 0.397 - Conclusions
- MSKAT generally has larger power than GMAuT.
- more so under stronger population differences and for larger variant sets (in terms of percentage of increase).
- This is because GMAuT uses an unadjusted genotype covariance, while MSKAT properly accounts for adjusting for covariate effects and uses an adjusted genotype covariance matrix.
- MSKAT generally has larger power than GMAuT.
- set MAFs as follows
Simulation R codes
On comment 1,2: GAMuT vs MSKAT
- Following the approach of MSKAT (Wu and Pankow, 2016) using the multiple linear regression modeling based score test, we developed a very fast implementation of GAMuT, called fgamut(), which will reproduce the p-values for the GAMuT with linear and projection kernels and also output the corresponding corrected p-values using the approach of Wu et al. (2016).
- We verify that fgamut() reproduces the results from gamut() (adapted from authors'
posted R codes).
- Sometimes we see very small differences: due to numerical rounding differences of using eigen() and svd() to compute eigen values.
- Generally fgamut() is several orders of magnitude faster than
gamut().
- especially so for large sample size and variant set.
- The gamut() occasionally gives aberrant results for very small p-values. fgamut() produces much more accurate results than gamut() at small p-values.
- MSKAT generally produces smaller p-values than (corrected) GAMuT under population stratification.
library(CompQuadForm) library(MSKAT) n=5e3; K=4; p = 4; m = 100 X = matrix(rnorm(n*p), n,p); Z = rbinom(n,1,0.5); XZ = cbind(X,Z) ## simulate genotype from MVN rho = 0.5; maf1 = 0.005; maf2 = 0.015 q1 = qnorm(maf1); q2 = qnorm(maf2) qz = q1 + Z*(q2-q1) G1 = matrix(rnorm(n*m), n,m)*sqrt(1-rho) + rnorm(n)*sqrt(rho) G2 = matrix(rnorm(n*m), n,m)*sqrt(1-rho) + rnorm(n)*sqrt(rho) G = 1*(G1<qz) + 1*(G2<qz) yrho = 0.25 Y = matrix(rnorm(n*K), n,K)*sqrt(1-yrho) + rnorm(n)*sqrt(yrho) Y[,1] = Y[,1] + rowSums(XZ) - rowSums(G[,1:3])*0.5 + G[,m]*0.7 Y[,2] = Y[,2] + rowSums(XZ) + rowSums(G[,c(1,4,5)])*0.5 - G[,m-1]*0.6 ## weighted test MSKAT(MSKAT.cnull(Y,XZ), G, W.beta=c(1,25)) fgamut(Y,G,XZ, W.beta=c(1,25)) gamut(Y,G,XZ, W.beta=c(1,25)) ## unweighted test MSKAT(MSKAT.cnull(Y,XZ), G, W.beta=c(1,1)) fgamut(Y,G,XZ, W.beta=c(1,1)) gamut(Y,G,XZ, W.beta=c(1,1)) ## timing system.time( gamut(Y,G,XZ, W.beta=c(1,25)) ) system.time( fgamut(Y,G,XZ, W.beta=c(1,25)) )
- Simulation function to verify type I errors
library(CompQuadForm) library(MSKAT) SimSize2 = function(B=1e3, n=1e3, yrho=0.25, grho=0.5, m=40, MAF0=0.001,MAF1=0.01){ Pval = matrix(1, B,6) for(ib in 1:B){ Xd = rnorm(n); Xg = rbinom(n,1,0.5); Xs = cbind(Xd,Xg) maf.1 = runif(m, MAF0,MAF1) qz = qnorm(maf.1) ## rho1 =sqrt(1-grho); rho2 = sqrt(grho) Z1 = matrix(rnorm(n*m), n,m)*rho1 + rnorm(n)*rho2 Z2 = matrix(rnorm(n*m), n,m)*rho1 + rnorm(n)*rho2 G = 1*(Z1<qz) + 1*(Z2<qz) ## eta = 1+0.5*(Xd+Xg) rho1 = sqrt(1-yrho); rho2 = sqrt(yrho) tmp = matrix(rnorm(n*3),n,3)*rho1+rnorm(n)*rho2 Ys = tmp[,1:2] + eta pr = pnorm(tmp[,3]) Yu = eta-1 + log(pr/(1-pr)) D = I(Yu>0) ## Y = cbind(Ys,D) pval1 = MSKAT(MSKAT.cnull(Y,Xs), G, W.beta=c(1,1)) pval2 = fgamut(Y,G,Xs, W.beta=c(1,1)) Pval[ib,] = c(pval1, pval2) } return(Pval) } res = SimSize2(1e3, n=1e3,yrho=0.25, grho=0.5, m=40) colMeans(res<1e-2)
On comment 3
- Type I error with mixed traits.
MixSize = function(B=1e2, n=1e3, yrho=0.25, grho=0.5, m=40, MAF0=0.001,MAF1=0.01){ ## sim data Pval = matrix(1, B,6) for(ib in 1:B){ Xd = exp(rnorm(n)) Xg = rbinom(n,1,0.5); Xs = cbind(Xd,Xg) maf.1 = runif(m, MAF0,MAF1) qz = qnorm(maf.1) ## rho1 =sqrt(1-grho); rho2 = sqrt(grho) Z1 = matrix(rnorm(n*m), n,m)*rho1 + rnorm(n)*rho2 Z2 = matrix(rnorm(n*m), n,m)*rho1 + rnorm(n)*rho2 G = 1*(Z1<qz) + 1*(Z2<qz) ## outcome sim eta = 1+Xd+Xg rho1 = sqrt(1-yrho); rho2 = sqrt(yrho) tmp = matrix(rnorm(n*4),n,4)*rho1+rnorm(n)*rho2 Yc = tmp[,1:2] + eta Yd = tmp[,3:4] + eta D = apply(Yd, 2, function(a) 1*(a>median(a))) ## Y = cbind(Yc,D) pval1 = MSKAT(MSKAT.cnull(Y,Xs), G, W.beta=c(1,1)) pval2 = fgamut(Y,G,Xs, W.beta=c(1,1)) Pval[ib,] = c(pval1, pval2) } return(Pval) } res1 = MixSize(1e3, n=1e3, yrho=0.25, grho=0.5, m=40, MAF0=0.001,MAF1=0.01) colMeans(res1<1e-2) res2 = MixSize(1e3, n=2e3, yrho=0.25, grho=0.5, m=40, MAF0=0.001,MAF1=0.01) colMeans(res2<1e-2) res5 = MixSize(1e3, n=5e3, yrho=0.25, grho=0.5, m=40, MAF0=0.001,MAF1=0.01) colMeans(res5<1e-2)
On comment 4
- Power comparison under population stratification.
library(CompQuadForm) library(MSKAT) PwrSim = function(B=1e2, n=1e3, yrho=0.25, grho=0.5, m=50, MAF0=0.001,MAF1=0.005, gps=0.01){ ## sim data Pval = matrix(1, B,6) for(ib in 1:B){ Xd = rnorm(n) Xg = rbinom(n,1,0.5); Xs = cbind(Xd,Xg) maf.1 = runif(m, MAF0,MAF1) maf.2 = outer(Xg*gps, maf.1, '+') qz = qnorm(maf.2) ## rho1 =sqrt(1-grho); rho2 = sqrt(grho) Z1 = matrix(rnorm(n*m), n,m)*rho1 + rnorm(n)*rho2 Z2 = matrix(rnorm(n*m), n,m)*rho1 + rnorm(n)*rho2 G = 1*(Z1<qz) + 1*(Z2<qz) ## outcome sim eta = 1+Xd+Xg rho1 = sqrt(1-yrho); rho2 = sqrt(yrho) tmp = matrix(rnorm(n*4),n,4)*rho1+rnorm(n)*rho2 Y = tmp + eta Y[,1] = Y[,1] + rowSums(G[,sample(1:m, size=4, rep=FALSE)])*0.5 Y[,2] = Y[,2] + rowSums(G[,sample(1:m, size=4, rep=FALSE)])*0.5 Y[,3] = Y[,3] + rowSums(G[,sample(1:m, size=4, rep=FALSE)])*0.5 Y[,4] = Y[,4] + rowSums(G[,sample(1:m, size=4, rep=FALSE)])*0.5 ## pval1 = MSKAT(MSKAT.cnull(Y,Xs), G, W.beta=c(1,1)) pval2 = fgamut(Y,G,Xs, W.beta=c(1,1)) Pval[ib,] = c(pval1, pval2) } return(Pval) } res = PwrSim(1e2, n=1e3, yrho=0.25, grho=0.5, m=40, MAF0=0.001,MAF1=0.005, gps=0.01) colMeans(res<1e-6)