This document is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The slide set referred to in this document is “GWAS 3”.

We saw previously how a strict significance threshold (like 5e-8) is needed in GWAS in order to avoid reporting false positives (i.e., null variants that reach the threshold). On the other hand, a strict threshold makes it relatively difficult to get even the true effects to reach the threshold. In other words, we tend to have many false negatives.

Next we will study the properties of the variants and the study design that determine how likely we are to catch the true effects: we study statistical power in GWAS. (See slides 1-7.) A review article on the topic by Sham and Purcell 2014.

Statistical power of a statistical significance test is the probability that the test will reject the null hypothesis \(H_0\) (in GWAS, \(H_0\) says that \(\beta=0\)) at the given significance threshold when the data follow a specific alternative hypothesis \(H_1\). In the GWAS setting, \(H_1\) is specified by fixing the study design (total sample size or case and controls counts) and the parameters defining the variant (MAF and effect size).

To compute the P-value, we only needed to consider the null hypothesis \(H_0\). For a power analysis, we also need to define explicitly how the true effects look like, i.e., we need to quantify the alternative hypothesis \(H_1\). Of course, not all true effects are the same, and therefore power analysis is often represented as a power curve over a plausible range of parameter values.

3.1 Test statistic under the alternative

By assuming that the sampling distribution of effect size estimate \(\widehat{\beta}\) is Normal (which works well for large sample sizes and common variants), we have that \(\widehat{\beta} \sim \mathcal{N}(\beta,\textrm{SE}^2)\), where \(\beta\) is the true effect size and SE the standard error of the estimator. As we saw previously, it follows that under the null, i.e. when \(\beta=0\), the Wald’s test statistic \(z = \widehat{\beta}/\textrm{SE}\) is distributed approximately as \(z\sim \mathcal{N}(0,1)\), which result is used for deriving a P-value. The other way to get a P-value is from the chi-square distribution as \(z^2 \sim \chi^2_1\). We will often use the chi-square distribution because then we need to consider only the upper tail of the distribution to compute (the two-sided) P-value, which also makes visualization of the distibution simpler than with the Normal distribution.

What about when \(\beta \neq 0\), that is, when the variant has a non-zero effect? Then \(z\sim \mathcal{N}(\beta/\textrm{SE}, 1)\) and \(z^2 \sim \chi^2_1((\beta/\textrm{SE})^2)\), which is called a chi-square distribution with 1 degree of freedom and non-centrality parameter NCP=\((\beta/\textrm{SE})^2\). When \(\beta=0\) this reduces to the standard central chi-square distribution \(\chi^2_1 = \chi^2_1(0)\).

Example 3.1. Let’s illustrate these distributions by a simulation of GWAS results under the alternative and under the null. To save time, we don’t do regressions with genotype-phenotype data but simulate the effect estimates directly from their known distributions. First, however, we need to find the standard error, and that we do by fitting one regression model. We will visualize the distributions both for the Wald statistic (having a Normal distribution) and for its square (having a chi-square distribution).

n = 500 # individuals
p = 5000 # SNPs for both null and alternative
f = 0.5 # MAF
b.alt = 0.2 # effect size under the alternative hypothesis
x = rbinom(n, 2, f) # genotypes at 1 SNP for n ind 
y = scale( rnorm(n) ) # random phenotype normalized to have sample sd=1
se = summary( lm( y ~ x ) )$coeff[2,2] # pick SE, and assume it stays constant and independent of beta
b.hat.null = rnorm(p, 0, se) # estimates under null
b.hat.alt = rnorm(p, b.alt, se)  # estimates under alternative

# Plot observed densities of z-scores 
plot(NULL, xlim = c(-3,6), ylim = c(0,0.5), xlab = "z", 
     ylab = "density", col = "white") # empty panel for plotting
lines(density( (b.hat.null/se) ), col = "black", lwd = 2) # Wald statistic for null variants
lines(density( (b.hat.alt/se) ), col = "red", lwd = 2) # Wald statistic for alternative variants
# add theoretical densities for z-scores
x.seq = seq(-3, 6, 0.01)
lines(x.seq, dnorm(x.seq, 0, 1), col = "blue", lty = 2) # for null
lines(x.seq, dnorm(x.seq, b.alt/se, 1), col = "orange", lty = 2) # for alternative

# Plot observed densities of z^2 
plot(NULL, xlim = c(0,35), ylim = c(0,1), xlab = expression(z^2), 
     ylab = "density", col = "white") # empty panel for plotting
lines(density( (b.hat.null/se)^2 ), col = "black", lwd = 2) # chi-square stat for null variants
lines(density( (b.hat.alt/se)^2 ), col = "red", lwd = 2) # chi-square stat for alternative variants
# Let's add theoretical densities of the chi-square distributions
x.seq = seq(0, 35, 0.01)
lines(x.seq, dchisq(x.seq, df = 1, ncp = 0), col = "blue", lty = 2) # ncp=0 for null
lines(x.seq, dchisq(x.seq, df = 1, ncp = (b.alt/se)^2), col = "orange", lty = 2) # ncp = (beta/se)^2 for alternative
legend("topright", leg = c("NULL obs'd","ALT obs'd","NULL theor","ALT theor"),
       col = c("black","red","blue","orange"), 
       lty = c(1,1,2,2), lwd = c(2,2,1,1) )
# Let's add significance thresholds corresponding to 0.05 and 5e-8
# By definition, the thresholds are always computed under the null.
q.thresh = qchisq( c(0.05, 5e-8), df = 1, ncp = 0, lower = FALSE)
abline(v = q.thresh, col = c("darkgreen", "springgreen"), lty = 3)
text( q.thresh+2, c(0.4,0.4), c("P<0.05","P<5e-8") )

The theoretical distributions match well with the observed ones so we trust that we now understand the relevant parameters also of the theoretical chi-square distribution. We also see that almost the whole of the alternative distribution is to the right of the significance threhsold of 0.05 but to the left of threshold 5e-8. Thus, with these parameters, we would discover almost all variants at level 0.05 but almost none at the genome-wide significance level of 5e-8.

How to compute the exact proportion of the distribution to the right of a given threshold value?

q.thresh = qchisq(c(0.05,5e-8), df = 1, ncp = 0, lower = FALSE) # signif. thresholds in chi-square units
pchisq(q.thresh, df = 1, ncp = (b.alt/se)^2, lower = FALSE) # corresponding right tail probabilities
## [1] 0.89821524 0.01321279

So we have a probability of 90% to detect a variant at significance threshold 0.05, when effect size is 0.2 SD units of a quantitative phenotype, MAF is 50% and the sample size is 500. This probability is also called statistical power corresponding to the given parameters (significance threshold, \(\beta\), MAF and \(n\)). On the other hand, with these parameters, we only have a power of 1.3% at the genome-wide significance level 5e-8. An interpretation is that we are likely to discover 90 out of 100 variants having the parameters considered at the significance level 0.05 but only about 1 out of 100 at the level 5e-8.

Question. What is “power” under the null? This may be a confusing question since we use concept of power to describe the ability to detect non-zero effects, but, technically, we can interpret this question as asking for the probability of getting a significant result when the null hypothesis holds. And, by definition, this probability is the significance threshold \(\alpha\), and does not depend on \(n\) or MAF. Consequently, the power to detect any non-zero effect can never be less than \(\alpha\) and will get close to \(\alpha\) for tiny effects (\(\beta \approx 0\)) which are almost indistinguishable from 0.

Typically, we would like to design studies to have a good power (say at least 80%) to detect the types of variants that we are interested in. How can we do that?

3.2 Ingredients of power

The parameters affecting power are

  1. Sample size \(n\); increasing sample size increases power because it increases accuracy of effect size estimate.

  2. Effect size \(\beta\); increasing the absolute value of effect size increases power because it increases difference from the null model.

  3. Minor allele frequency \(f\); increasing MAF increases power because it increases accuracy of effect size estimate.

  4. Significance threshold \(\alpha\); increasing the threshold increases power because larger significance thresholds are easier to achieve.

  5. In case-control studies, the proportion of cases \(\phi\); moving \(\phi\) closer to 0.5 increases power because it increases accuracy of effect size estimate.

We’ll soon discuss intuition why each of these parameters affects power. But let’s first write down how these parameters and power are tied together.

For a given significance level \(\alpha\), power is determined by the non-centrality parameter NCP\(=(\beta/\textrm{SE})^2\) of the chi-square distribution. The mean of the distribution is 1+NCP; hence the larger the NCP, the larger the power, because the distribution moves to right with increasing NCP. We see that the NCP increases with \(\beta^2\) and this explains why increasing \(|\beta|\) increases power. We also see that the NCP increases as SE decreases and therefore we need to know how SE depends on \(n\), \(f\) and \(\phi\).

3.2.1 Formulas for standard errors

For the linear model \[y = \mu + x\beta + \varepsilon,\] SE of \(\widehat{\beta}\) is \[\textrm{SE}_\textrm{lin} = \frac{\sigma}{\sqrt{n \textrm{Var}(x)}} \approx \frac{\sigma}{\sqrt{2n f (1-f)}},\] where the variance of genotype \(x\) is, under Hardy-Weinberg equilibrium, approximately \(2f(1-f)\), and \(\sigma\) is the standard deviation of the error term \(\varepsilon\): \(\sigma^2 = \textrm{Var}(y) - \beta^2 \textrm{Var}(x).\) This form for SE is a direct consequence of the variance estimate of the mean-centered linear model: \(\textrm{Var}(\widehat{\beta})=\sigma^2/ \sum_{i=1}^n x_i^2 .\)

In a typical QT-GWAS, the effects of variants on the total phenotypic variance are small (< 1%) and then we can assume that the error variance \(\sigma^2 \approx \textrm{Var}(y),\) which is 1 if the phenotype is processed by quantile normalization or scaling of the residuals after regressing out the covariate effects.

For binary case-control data analyzed by logistic regression, \[\textrm{SE}_\textrm{bin} \approx \frac{1}{\sqrt{n \textrm{Var}(x) \phi (1-\phi)}} \approx \frac{1}{\sqrt{2 n f (1-f) \phi (1-\phi)}}.\] Thus, the difference to the linear model formula is that \(\sigma\) is replaced by 1 and \(n\) is replaced by an effective sample size \(n \phi (1-\phi).\) For derivation, see Appendix A of Vukcevic et al. 2012. Note: often the effective sample size is defined as \(4 n \phi (1-\phi)\), because that quantity tells that what would be the sample size in a hypothetical study that has equal number of cases and controls and whose power matches the power of our study.

Smaller SE means higher precision of the effect size estimate. Both of the formulas show how SE decreases with increasing sample size \(n\), with increasing MAF \(f\) and, for binary data, SE decreases as \(\phi\) gets closer to 0.5. These formulas work well for typical GWAS settings, but may not hold when some parameter (\(n\) or \(f\) or \(\phi(1-\phi)\)) gets close to zero. To know exactly when the formulas start to break down, it is best to do simulations. In particular, the formula may not be good for the rare variant testing (\(f<0.001\)).

Now we can write down the NCPs of additive GWAS models as \[\textrm{NCP}_\textrm{lin} = (\beta/\textrm{SE}_\textrm{lin})^2 \approx 2 f (1-f) n \beta^2/\sigma^2 \qquad \textrm{ and } \qquad \textrm{NCP}_\textrm{bin} = (\beta/\textrm{SE}_\textrm{bin})^2 \approx 2 f (1-f) n \phi(1-\phi) \beta^2.\]

Let’s next discuss how and why each parameter affects the NCPs, and hence power, the way it does.

3.2.2 Sample size

Out of the parameters affecting power, the sample size is most clearly under the control of study design. Therefore, it is the primary parameter by which we can design studies of sufficient power.

Increasing \(n\) decreases SE in the regression models in proportion to \(1/\sqrt{n}\). In the GWAS context, we can think that larger \(n\) leads to more accurate estimate of the phenotypic means in each of the genotype classes. Therefore, as \(n\) grows, we also have more accurate estimate of the phenotypic difference between the genotype classes, which means that we are better able to distinguish a true phenotypic difference between genotype groups from zero. In other words, we have a larger power to detect a genotype’s effect on the phenotype.

Example 3.2. Above we saw that with \(n=500\) (and MAF = 0.5, \(\beta=0.2\)) we had only 1% power at significance level \(\alpha=\) 5e-8. Let’s determine how large \(n\) should be to achieve 90% power.

f = 0.5
b.alt = 0.2
sigma = sqrt(1 - 2*f*(1-f)*b.alt^2) # error sd after SNP effect is accounted for (see next part for explanation)
ns = seq(500, 4000, 10) # candidate values for n
ses = sigma/sqrt(ns*2*f*(1-f)) # SEs corresponding to each candidate n
q.thresh = qchisq(5e-8, df = 1, ncp = 0, lower = F) # chi-sqr threshold corresponding to alpha = 5e-8
pwr = pchisq(q.thresh, df = 1, ncp=(b.alt/ses)^2, lower=F) # power at alpha = 5e-8 for VECTOR of SE values
plot(ns, pwr, col = "darkgreen", xlab = "n", ylab = "power", 
     main = paste0("QT sd=1; MAF=",f,"; beta=",b.alt), t = "l", lwd = 1.5)
abline(h = 0.9, lty = 2)

# Let's output the first n that gives power >= 90%
ns[min(which(pwr >= 0.9))]
## [1] 2230

So, we need \(n = 2230\) in order to have power of 90%.

3.2.3 Effect size and variance explained

When effect size is \(\beta\) and MAF is \(f\), then variance explained by the additive effect on the genotype is \(\textrm{Var}(x\beta) = \textrm{Var}(x) \beta^2 \approx 2f(1-f)\beta^2\). When total phenotypic variance of a quantitative trait is 1, then \(2f(1-f)\beta^2\) is also the proportion of the variance explained by the variant. For example, in our ongoing example setting, the variance explained by the variant is

## [1] 0.02

That is, the variant explains 2% of the variation of the phenotype. This is a very large variance explained compared to a typical common variant association with complex traits, such as BMI or height, but is more realistic for some molecular traits, such as metabolite levels, that are less complex genetically and may have larger effects from individual variants.

Example 3.3. What if we wanted to find a suitable \(n\) that gives 90% power for MAF=50% when the variant explained only 0.5% of the phenotype?

f = 0.5
y.explained = 0.005
b.alt = sqrt(y.explained / (2*f*(1-f)) ) # this is beta that explains 0.5%
sigma = sqrt(1 - y.explained) # error sd after SNP effect is accounted for
ns = seq(1000, 12000, 10) # candidate n
ses = sigma / sqrt( ns*2*f*(1-f) ) # SE corresponding to each n
q.thresh = qchisq(5e-8, df = 1, ncp = 0, lower = F) # threshold corresp alpha = 5e-8
pwr = pchisq(q.thresh, df = 1, ncp = (b.alt/ses)^2, lower = F) # power at alpha = 5e-8
plot(ns,pwr, col = "darkgreen", xlab = "n", ylab = "power", 
     main = paste0("QT sd=1; MAF=",f,"; beta=",b.alt), t = "l", lwd = 1.5)
abline( h = 0.9, lty = 2 )

# Let's output n that is the first that gives power >= 90%
ns[min(which(pwr >= 0.9))]
## [1] 9030

So, we needed to multiply the sample size by a factor of ~4. This really makes a difference in practice. It is very different to collect 2230 individuals than to collect 9030. Power calculations are important!

Could we maybe had guessed the factor of 4 without doing the actual power calculation? For a fixed \(\alpha\), power is determined by NCP. Earlier we determined the parameters that give 90% power. If we equate NCP defined by these parameters with an NCP with a new effect size and with an unknown sample size \(n_2\), we can solve for \(n_2.\) Furthermore, as \(f=0.5\) remains constant, it cancels out and we get \[ \frac{2f(1-f)\beta_1^2 n_1}{\sigma_1^2} = \frac{2f(1-f)\beta_2^2 n_2}{\sigma_2^2} \longrightarrow n_2 = \frac{\beta_1^2 n_1 \sigma_2^2}{\beta_2^2 \sigma_1^2} = n_1 \frac{0.2^2}{0.1^2} \frac{1-0.005}{1-0.02} \approx 4.0612\,\cdot n_1. \] We conclude that when the variance explained by the variant is small (say < 2%), and we drop the variance explained by a factor of \(a\), (which is the same as dropping the effect size by a factor of \(\sqrt{a}\)), we must increase the sample size approximately by a factor of \(a\) to maintain constant power. If variance explained by the variant is larger than a few percents, then it will have a bigger effect on the result, but in GWAS, typically, the variance explained remains < 1%.

3.2.4 Minor allele frequency

When other parameters than MAF