This document is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The slide set referred to in this document is “GWAS 9”.

Consider GWAS data on \(n\)
individuals and \(p\) SNPs.
**GWAS summary statistics** can include a varying
combination of the following information, for each variant \(l\), or region \(\textrm{reg},\)

**Association statistics**\(A_l = \left(\text{EA}_l, \widehat{\beta}_{l}, \textrm{SE}_{l}, P_{l} \right)\) where EA is the effect allele for which \(\widehat{\beta}_{l}\) is reported. These statistics are of size \(4p\), and hence their proportion compared to the whole data is \(4/n\).**Information statistics**\(I_l = (\textrm{EAF}_l, \textrm{INFO}_l, \textrm{QC}_l,\ldots),\) where EAF is the effect allele frequency (sometimes given only in controls), INFO is an imputation information score, and QC includes quality control measures, such as Hardy-Weinberg P-value and missingness rate of the genotype calls. Proportion to raw data is around \(10/n\) (assuming 10 pieces of information per variant).**LD-matrix**\(R_{\textrm{reg}}\) for certain region(s) of the genome. For a region of size \(p_{\text{reg}}\), the size of \(R_{\textrm{reg}}\) is about \(\frac{1}{2}p_{\text{reg}}^2\), whereas that of the raw data is \(n p_{\text{reg}}\). Thus, the ratio is \(p_{\text{reg}}/(2n)\).

When \(n\) is of order \(10^5\), the summary statistics take only a tiny fraction of the space required by the raw data. Additionally, raw genotype-phenotye data are sensitive, personal data and cannot be shared freely, whereas usually there is no legal restrictions for sharing the GWAS summary statistics. For these reasons, large consortia, such as GIANT or CARDIoGRAMplusC4D, are distributing their results as summary statistics. Therefore there is a need for methods that can further analyze the summary statistics, e.g., in fine-mapping, in imputation, in heritability estimation or in gene-level testing. The utilization of summary statistics is reviewed by Pasaniuc & Price 2017.

**Example 9.1.** Reminder how the central association
statistics are related to each other.

If we are given

`b.est=`

\(\widehat{\beta}\) and`se=`

\(\textrm{SE}\), we can compute the P-value as`pchisq((b.est/se)^2, df=1, lower=F)`

.If we are given \(\widehat{\beta}\) and P-value

`pval`

, we can compute SE as`sqrt(b.est^2 / qchisq(pval, df=1, lower=F))`

.If we are given SE and P-value, we can compute

`b.est=`

\(\widehat{\beta}\) for the trait increasing allele as`sqrt(se^2*qchisq(pval, df=1, lower=F))`

. (We need to know which allele is the trait increasing from external information.)Assuming that no (strong) covariates have been applied, we can further infer one of sample size, MAF or case-proportion from SE given that the other two are known, according to the formulas in GWAS3. We can also estimate an SE from these study parameters, and use it with a known effect estimate to derive a P-value.

In these notes, we go through how the association summary statistics
are produced through *meta-analysis* of individual GWAS and how
some analyses, that we have so far done from raw data, could be done
using only the summary data. We end with an introduction to polygenic
scores that utilize summary statistics for prediction of phenotypes from
genetic data.

Suppose we have results from \(K\) independent GWAS on the same phenotype. (Here independent means that the samples of the GWAS are not overlapping.) Hence, for each variant \(l\), we have \(K\) sets of GWAS association statistics \(A_{kl} = \left(\widehat{\beta}_{kl}, \textrm{SE}_{kl}, P_{kl} \right).\) How could we combine these \(K\) pieces of information into a single combined estimate of the effect size, SE and P-value?

A combination of summary-level results from two or more studies into
a single estimate is called **meta-analysis** (“meta”
refers to something happening at a higher level, meta-analysis is an
“analysis of analyses”). A review of meta-analysis in GWAS by Evangelou & Ioannidis
2013.

In practice, all large GWAS are nowadays meta-analyses carried out by international consortia and a consortium may contain even over hundred individual studies. Often each study runs a GWAS and shares the summary statistics with a centralized analysis group that carries out the meta-analysis. While this approach avoids sharing sensitive individual-level genotype-phenotype data and also operates only with the light-weight summary data, it also restricts the set of possible downstream analyses, since only the marginal additive effect estimates are available. It has become clear that in order to maximize the scientific output from the consortia efforts, future meta-analyses should be designed so that all raw data will be collected in a single place. Unfortunately, this is not always easy because of legal issues related to the sharing of genotype-phenotype data.

Let’s get back to the question how do we combine \(K\) sets of GWAS association statistics on same (or at least similar) phenotype: \(A_{kl} = \left(\widehat{\beta}_{kl}, \textrm{SE}_{kl}, P_{kl} \right)\). The answer depends on what we assume about the possible variation between the true underlying effects \(\beta_{kl}\) for \(k=1,\ldots,K\).

The most common assumption is that all studies are measuring the same
underlying quantity, i.e., \(\beta_{1l}=\ldots
= \beta_{Kl}\) and there are no (noticeable) differences in
phenotype definitions and no distinct biases between the studies. This
is called the fixed-effects model because the effect size is assumed to
be the same across the studies. In this case, the statistically most
efficient unbiased linear estimator of the common effect size \(\beta\) is the **inverse-variance
weighted (IVW)** estimator (here denoted by F, Fixed-effect
estimator):

\[\begin{align} \widehat{\beta}_{l,F} &= \frac{w_{1l}\widehat{\beta}_{1l} + \ldots + w_{Kl}\widehat{\beta}_{Kl}}{w_{1l}+\ldots +w_{Kl}}\\ \text{SE}_{l,F} &= \left(w_{1l} + \ldots + w_{Kl} \right)^{-\frac{1}{2}},\quad \text{where the weight}\\ w_{kl}&= \frac{1}{\text{SE}_{kl}^2} \text{ is the inverse-variance of study } k. \end{align}\]

In statistics, the inverse of the variance is called
**precision**. With that in mind, the above formulas are
easy to remember:

The weight given to each study in IVW estimator is proportional to the precision of the study and the weights sum to 1.

The precision of the IVW estimator is the sum of the precisions of the individual studies.

**Efficiency of IVW.** It sounds intuitively reasonable
to weight each estimate by its precision, but what is the mathematical
argument behind this? Let’s consider the case of two studies and assume
that both yield unbiased estimates of the common effect size \(\beta\), with precisions \(w_i=1/\text{SE}_i^2\) for \(i=1,2\). This means that \[\text{E}\left(\widehat{\beta}_i\, |\,
\beta\right) = \beta \quad\text{ and }\quad
\text{Var}\left(\widehat{\beta}_i\right) = \text{SE}_i^2 =
\frac{1}{w_i}, \quad \text{ for } i=1,2.
\]

Consider all possible linear estimators \(t(u) = u\widehat{\beta}_1 +
(1-u)\widehat{\beta}_2\) determined by value of \(u \in[0,1]\). Estimator \(t(u)\) is unbiased for all \(u\) as \[\text{E}(t(u)\,|\,\beta) =
u\text{E}\left(\widehat{\beta}_1\,|\,\beta\right) +
(1-u)\text{E}\left(\widehat{\beta}_2\,|\,\beta\right) = u\beta +
(1-u)\beta = \beta.\] Thus, on average, any weighting scheme
gives the correct answer, and the question is, which one of these
weightings gives the most precise combined estimate (= smallest
variance). \[\text{Var}(t(u)) =
\text{Var}\left(u\widehat{\beta}_1\right) +
\text{Var}\left((1-u)\widehat{\beta}_2\right)
= u^2 \frac{1}{w_1} + (1-u)^2 \frac{1}{w_2} = u^2\left(\frac{1}{w_1} +
\frac{1}{w_2} \right)
-\frac{2}{w_2}u+\frac{1}{w_2}.
\] This is a second order polynomial with respect to \(u\) and has its minimum where the
derivative is 0, i.e., at \(u_0 =
\frac{2/w_2}{2(1/w_1 + 1/w_2)} = \frac{w_1}{w_1 + w_2},\) which
is the IVW. We conclude that IVW is **the minimum variance
unbiased linear estimator** of the fixed effect model.

**Example 9.2.** Suppose that we do a fixed-effect
meta-analysis using IVW of two studies on LDL-cholesterol where the
sample sizes of the studies are \(n_1=5,000\) and \(n_2 =10,000\). If both studies have applied
similar covariates and hence have similar error variance \(\sigma_\varepsilon^2\), then the precisions
of the studies are \(w_i =
2n_if_i(1-f_i)/\sigma^2_\varepsilon\) for \(i=1,2.\) At a SNP that has same MAF \(f\) in both studies, the precisions are
proportional to \(n_i\) and hence the
weights of the IVW are \(\frac{w_1}{w_1+w_2}=0.333\) and \(\frac{w_2}{w_1+w_2}=0.666\) and the
precision of the IVW is \(w_F = w_1 + w_2 =
2(n_1+n_2)f(1-f)/\sigma^2_\varepsilon\), which is the same as
precision from a study with \(n_1+n_2=15,000\) samples. Indeed, with
linear model, the precision from splitting the data into any subsets and
then combining them with IVW is (approximately) the same as doing a
joint analysis of all the data with separate intercept terms for each
subset (Exercise). (If subsets are small and there are covariates
involved, then random noise causes some numerical differences between
the approaches.)

**Example 9.3.** Suppose that we do IVW meta-analysis of
two studies on Parkinson’s disease where \(n_1=10,000\) of which \(3,000\) are cases (\(\phi_1=0.3\)) and \(n_2 = 6,000\) of which \(3,000\) are cases (\(\phi_2=0.5\)). Thus the effective sample
sizes are \(n_{e1}=10000\cdot 0.3\cdot
0.7=2100\) and \(n_{e2}=6000\cdot
0.5\cdot 0.5=1500.\) If we assume that the MAF of the SNP is the
same in both studies, then the precisions of the studies are \(w_i = 2n_{ei}f(1-f)\) for \(i=1,2\) and the weights of the IVW are
\(\frac{w_1}{w_1+w_2}=\frac{2100}{3600}=0.583\)
and \(\frac{w_2}{w_1+w_2}=\frac{1500}{3600}=0.417\)
and the precision of the IVW estimator is \(w_F = w_1 + w_2 = 2(n_{e1}+n_{e2})f(1-f),\)
which is the same as precision from a study with effective sample size
of \(n_{e1}+n_{e2}\). It can be shown
that by splitting the case-control data into subsets, the sum of the
effective sample sizes over the subsets is always \(\le\) the effective sample size of the
whole data (and the equality holds when the case proportion is constant
across the subsets). This gives a technical explanation why, in logistic
regression, an inclusion of a binary covariate, such as sex or
population label, causes a decrease in precision, and hence increase in
SE compared to a single joint analysis of all data without the
covariate. This follows because a use of a binary covariate is
approximately equivalent to splitting the data by the covariate value,
analyzing subsets separately and combining the results using IVW
(Exercise).

**Example 9.4.** Consider the association statistics at
SNP rs11984041 on the large vessel subtype of Ischemic Stroke in Bellenguez et
al. (2012). They reported a discovery OR 1.50 (1.25-1.79) and
replication1 OR 1.38 (1.17-1.63) and replication2 OR 1.39 (1.15-1.68),
all for the same allele. What is the combined estimate, SE and P-value
using the fixed-effects meta-analysis? (They report 1.42 (1.28-1.57),
P=1.87e-11.)

**Answer.** Let’s make a function `meta.F()`

that does the IVW meta-analysis for given estimates and SEs.

```
meta.F <- function(b.est, se){
#returns inverse-variance weighted meta-analysis estimate, SE and P-value.
b.F = sum(b.est / se^2) / sum(1 / se^2)
se.F = 1 / sqrt(sum(1 / se^2))
p.F = pchisq( (b.F / se.F)^2, df = 1, lower = F)
return(list(b.F = b.F, se.F = se.F, p.F = p.F))
}
```

With these data, we need to compute the SEs from the 95%CIs and then use IVW.

```
b.est = log(c(1.50, 1.38, 1.39)) #beta is logOR for case-control data
ci = log(matrix(c(1.25, 1.79,
1.17, 1.63,
1.15, 1.68), byrow = T, ncol = 2))
se = (ci[,2] - ci[,1])/(2*1.96) #length of 95%CI is 2*1.96*SE
meta.res = meta.F(b.est, se)
meta.res
```

```
## $b.F
## [1] 0.3513526
##
## $se.F
## [1] 0.05227737
##
## $p.F
## [1] 1.805663e-11
```

```
c(OR = exp(meta.res$b.F), low95 = exp(meta.res$b.F - 1.96*meta.res$se.F),
up95 = exp(meta.res$b.F + 1.96*meta.res$se.F), pval = meta.res$p.F)
```

```
## OR low95 up95 pval
## 1.420988e+00 1.282600e+00 1.574309e+00 1.805663e-11
```

The results match well given the accuracy of two decimals in CIs to compute SE. Let’s visualize them by a forest plot.

```
forest.plot <- function(x, intervals, labels = NULL, main = NULL, xlab = "Effect size",
pchs = rep(19,length(x)), cols = rep("black", length(x)),
cexs = rep(1,length(x))){
K = length(x)
stopifnot(nrow(intervals) == K)
plot(0, col="white", xlim = c( min(c(intervals[,1],0) - 0.05), max(c(intervals[,2],0) + 0.05)),
ylim = c(0, K+1), xlab = xlab, ylab = "", yaxt = "n",main = main)
axis(2, at = K:1, labels = labels, cex.axis = 0.8)
arrows(intervals[,1], K:1, intervals[,2], K:1,
code = 3, angle = 90, length = 0.02, col = cols)
points(x, K:1, pch = pchs, cex = cexs, col = cols)
abline(v = 0,lty = 2)
}
```

```
b.est = c(b.est, meta.res$b.F)
ci = rbind(ci, c(meta.res$b.F + c(-1,1)*1.96*meta.res$se.F))
labs = c("Discv", "Rep1", "Rep2", "Meta")
main.txt = "rs11984041 Stroke/LVD"
forest.plot(b.est, ci, labels = labs, main = main.txt, xlab = "logOR",
pchs = c(19, 19, 19, 18), cexs = c(.8, .8, .8, 1.3), cols = c(1, 1, 1, 4))
```

The plot shows that the estimates are very consistent with each
other. We also see how the uncertainty has decreased in the combined
estimate compared to the individual studies. (In practice,
visualizations are recommended to be done with existing R-packages such
as `meta`

that have many more options.)

When we talk about *heterogeneity* in meta-analysis we mean
that the true effect sizes between studies are different. How can we
assess that from the observed data? Let’s first list heterogeneity
measures that are typically reported in meta-analyses.

**Cochran’s Q.** Assume as the null hypothesis that all
\(K\) studies are measuring the same
effect \(\beta\) and that the errors
are Normally distributed: \(\widehat{\beta}_k
\sim \mathcal{N}(\beta,\text{SE}_k^2)\), for study \(k\). Then each \(z_k = (\widehat{\beta}_k - \beta)/\text{SE}_k \sim
\mathcal{N}(0,1)\), and, assuming that the studies are
independent, the sum of the squares of these \(K\) independent standard Normals has a
chi-square distribution with \(K\)
degrees of freedom: \(\sum_{k=1}^K z_k^2 \sim
\chi_K^2\). When we replace true \(\beta\) with the fixed-effect estimate
\(\widehat{\beta}_F\) from IVW method,
then we have a heterogeneity measure called **Cochran’s
Q**: \[Q =
\sum_{k=1}^K\left(\frac{\widehat{\beta}_k -
\widehat{\beta}_F}{\text{SE}_k}\right)^2
=\sum_{k=1}^K w_k \left(\widehat{\beta}_k - \widehat{\beta}_F\right)^2,
\]

that under the null hypothesis of the fixed-effect assumption has distribution \(\chi_{K-1}^2\) (where one degree of freedom is lost as we have used the data to estimate the common mean \(\widehat{\beta}_F\) to replace the true \(\beta\)). This distribution can be used to derive P-values for heterogeneity. However, when \(K\) is small (say < 5), there is little power to see any heterogeneity with this test, and when \(K\) is large (say > 100), then even a small heterogeneity becomes statistically significant.

**\(I^2\).** To change
the focus from P-value to the amount of heterogeneity, a heterogeneity
index \(I^2\) has been proposed: \[I^2 = \frac{Q-(K-1)}{Q} = 1 -
\frac{K-1}{Q}.\] This value is between 0 and 1 and often reported
as percentage (negative values are rounded up to 0). The idea is that if
\(Q\) shows only the null hypothesis
expectation of variation (which is \(K-1\)), then \(I^2=0\%\) indicating no heterogeneity,
whereas values > 50% are interpreted as moderate amount of
heterogeneity and > 75% high heterogeneity. With a small number of
studies, the uncertainty around estimate of \(I^2\) is large and little can inferred
statistically.

**Between study variance \(T^2\).** Often a model for
heterogeneity between the true effects is a two-stage hierarchical model
where the heterogeneity is defined by a variance parameter \(T^2\). We assume that first each true
effect is sampled from \((\beta_k|\beta, T^2)
\sim \mathcal{N}(\beta,T^2)\) and then our estimates are formed
by adding some noise around these values as \((\widehat{\beta}_k | \beta_k) \sim
\mathcal{N}(\beta_k, \text{SE}_k^2).\) From this model, a
commonly-used estimate for \(T^2\) is
\[\widehat{T^2} = \frac{Q-(K-1)}{\sum_{k=1}^K
w_k - \frac{\sum_{k=1}^K w_k^2}{\sum_{k=1}^K w_k}}.\]

This is on the same scale as the (squared) effect sizes, which makes it different from \(Q\) and \(I^2\) that are independent of the effect size scale.

When \(T^2>0\) in the model
formulation above, we have defined the standard
**random-effects** meta-analysis model where the effect
sizes across studies are not assumed to be exactly the same but still
they are possibly quite similar (namely, when \(T\) is small compared to the common mean
\(\beta\) of all effects). In
statistics literature, it is common to derive a P-value assuming \(\beta=0\) from such a model and call that
the random-effects model’s P-value. Such a test is not suitable for GWAS
where the relevant null hypothesis is that all effects are 0, rather
than that only their mean is 0. This issue is explained by Han
& Eskin 2011 and they also propose a modification that tests the
null hypothesis of exactly 0 effect in every study.

As a conclusion, the three quantities listed above to measure heterogeneity are often reported in meta-analyses but they are often not that informative in cases where there are only a handful of studies.

The question of heterogeneity between studies can be more flexibly defined in the Bayesian framework where a set of models with different assumptions about heterogeneity can be directly compared against each other and the interpretation of the results of the model comparison does not depend on the sample size.

Let’s remind ourselves how, in GWAS 4, we compared the model with a non-zero effect to the null model using the approximate Bayes factor, ABF.

We assumed that under the alternative hypothesis \(H_1\), there was a non-zero effect \(\beta \sim \mathcal{N}(0,\tau^2)\) whereas under the null hypothesis \(\beta=0\). Then we derived the marginal likelihoods that these models give for the observed data, and these marginal likelihoods were proportional to Normal densities:

\[\begin{align} P(\text{Data}\,|\,H_1) &= c\cdot \mathcal{N}(\widehat{\beta};0, \text{SE}^2 + \tau^2)\\ P(\text{Data}\,|\,H_0) &= c\cdot \mathcal{N}(\widehat{\beta};0, \text{SE}^2) \end{align}\]

From this we got an approximate Bayes factor in favor of \(H_1\) vs. \(H_0\) as \[ \text{ABF}_{1:0} = \frac{P(\text{Data}\,|\,H_1)}{P(\text{Data}\,|\,H_0)} = \frac{\mathcal{N}(\widehat{\beta};0, \text{SE}^2 + \tau^2)}{\mathcal{N}(\widehat{\beta};0, \text{SE}^2 )}. \]

Finally, we derived the posterior probability of \(H_1\) under the assumption that one of \(H_0\) and \(H_1\) is true and that the prior probability of it being \(H_1\) was \(p_1 = 1-p_0\): \[ P(H_1\,|\, \text{Data}) = \frac{p_1\cdot \text{ABF}_{1:0}}{p_0+p_1 \cdot \text{ABF}_{1:0}}. \]

Thus, in order to get to a probabilistic model comparison, we needed
to set values for two parameters: \(\tau\) that determines how large effects we
expect to see under \(H_1\), and \(p_1\) that determines how probable we think
that the alternative hypothesis is *a priori*, before we have
seen the data.

Let’s see how we generalize this to multiple studies. We use a multivariate Normal distribution as the prior distribution of the effect size vector \(\pmb{\beta} = (\beta_1,\ldots,\beta_K)^T \sim \mathcal{N}_K(0,\pmb{\Theta}),\) where the prior matrix \(\pmb{\Theta}\) is assumed to take the form \[\pmb{\Theta} = \tau^2 \left[ \begin{array}{cccc} 1& \theta_{12}& \ldots &\theta_{1K}\\ \theta_{12}& 1& \ldots &\theta_{2K}\\ \vdots &\vdots&\ddots &\vdots \\ \theta_{1K}& \theta_{2K}&\ldots& 1 \end{array} \right]. \]

Thus, the parameter \(\tau^2\) still defines the prior variance of any one effect size parameter, but now effect sizes from two studies may be correlated as defined by prior correlation \(\theta_{ij}\). For example,

the fixed-effect model \(H_F\) results if we set all \(\theta_{ij}=1\),

the independent-effect model \(H_I\) results if we set \(\theta_{ij}=0\),

the standard random-effect model \(H_R(\rho)\) results if we set \(\theta_{ij}=\rho\) for some value of \(\rho>0,\) where values close to 1 assume only little heterogeneity and values close to 0 assume almost independent effects; our default is \(\rho=0.9\),

the null model \(H_0\) is defined by setting \(\tau^2=0\) (and then the values of \(\theta_{ij}\) do not matter).

The likelihood function defined by the observed data is also proportional to a multivariate Normal density \(\mathcal{N}_K(\widehat{\pmb{\beta}};\pmb{\beta}, \pmb{\Sigma}).\) If we assume that the studies are independent (no overlapping samples), then \(\pmb{\Sigma}\) is simply a diagonal matrix where the diagonal is \((\text{SE}_1^2,\ldots,\text{SE}_K^2)\).

The marginal likelihood for data given the model \(m\) (defined by prior variance matrix \(\pmb{\Theta}_m\)) is \[P(\text{Data} \, | \, H_m) = c\cdot \mathcal{N}_K(\widehat{\pmb{\beta}}; \pmb{0}, \pmb{\Sigma} + \pmb{\Theta}_m ). \]

Approximate Bayes factor between any two models \(m\) and \(\ell\) is \[\text{ABF}_{m:\ell} = \frac{P(\text{Data} \, | \, H_m)}{P(\text{Data} \, | \, H_\ell) } =\frac{\mathcal{N}_K(\widehat{\pmb{\beta}}; \pmb{0}, \pmb{\Sigma} + \pmb{\Theta}_m )}{\mathcal{N}_K(\widehat{\pmb{\beta}}; \pmb{0}, \pmb{\Sigma} + \pmb{\Theta}_\ell )}. \]

If model \(m\) is given a prior probability \(p_m\) (and \(p_0+\ldots +p_K=1\)), then we can compute the posterior probability for model \(m\) as \[ P(H_m \, | \, \text{Data}) = \frac{p_m\cdot \text{ABF}_{m:0} }{\sum_{\ell=0}^K p_\ell\cdot \text{ABF}_{\ell:0}}, \] where we have computed ABFs between all models and the null model (and \(\text{ABF}_{0:0}=1\)). Thus, the posterior probability of a model is proportional to the product of the prior probability of the model and ABF of the model.

Let’s write a function `abf.mv()`

that computes ABFs and
posterior probabilities for any given set of prior matrices and prior
probabilities. First, we need a density for the multivariate normal.
(There is also a package `mvtnorm`

for that.)

```
log.dmvnorm <- function(x, mu = rep(0, length(x)), S = diag(1, length(x)) ){
#returns log of density of MV-Normal(mean = mu, var = S) at x
K = length(mu)
stopifnot(all(dim(S) == K))
stopifnot(length(x) == K)
chol.S = chol(S) #Cholesky decomposition
log.det = 2*sum(log(diag(chol.S))) #log of det(S)
inv.chol.S = solve(t(chol.S)) #inverse of cholesky^T
return(-K/2*log(2*pi) - 0.5*(log.det + crossprod(inv.chol.S %*% (x-mu))))
}
abf.mv <- function(b.est, Sigmas, prior = rep(1,length(Sigmas))){
#Returns posterior probabilities of the models listed in Sigmas by their
# total variance matrix (= sum of prior + likelihood variance matrices)
#Returns also ABFs w.r.t the first model in Sigmas.
M = length(Sigmas) #number of models
K = length(b.est) #number of studies
prior = prior/sum(prior)
log.abf = sapply(Sigmas, function(x){log.dmvnorm(b.est, S = x)})
abf = exp(log.abf - log.abf[1]) #abf w.r.t the first model
posterior = prior*abf
posterior = posterior/sum(posterior)
return(list(posterior = posterior, abf = abf))
}
```

**Example 9.5.** Let’s generate 4 data sets for 10
case-control studies where the effective sample size varies between 250
and 2500 and MAF varies between 0.4 and 0.5.

- 1st data set: all studies estimate the same effect \(\beta=0.1\).
- 2nd data set: there is heterogeneity and the true effects come from \(\mathcal{N}(0.1, 0.04^2)\).
- 3rd data set: there is heterogeneity but no correlation in effects as they come from \(\mathcal{N}(0, 0.1^2).\)
- 4th data set has a null SNP in all studies.

```
K = 10
n.eff = runif(K, 250, 2500)
f = runif(K, 0.4, 0.5)
se = 1/sqrt(2*n.eff*f*(1-f)) #SEs
w = 1/se^2 #precisions
b = 0.1 #true mean of effects in 1 and 2
B.est = cbind(rnorm(K, b, se), #fixed effects
rnorm(K, b, sqrt(se^2 + 0.04^2)), #correlated random effects
rnorm(K, rep(0,K), sqrt(se^2 + 0.1^2)), #independent effects
rnorm(K, rep(0,K), se)) #null model
```

Let’s then compare 4 models in these data sets: (1) fixed-effects, (2) correlated-effects, (3) independent-effects and (4) the null.

We specify these models by their matrices (\(\pmb{\Sigma} + \pmb{\Theta}_m\)), run
`abf.mv()`

and print the forest plot of the data as well as a
barplot of the posterior probability across the 4 competing models,
assuming the prior probability of each model is the same (= 0.25).

Let’s also print the standard heterogeneity measures: value of \(I^2\) and the P-value from Cochran’s \(Q\)-statistic.

```
Sigma = diag(se^2) #this is the variance of likelihood function -- same for all models
tau2 = 0.2^2 #prior variance of effect size -- same for all non-null models
S.fix = tau2 * matrix(1, K, K) #fixed-effect model has 1s in the correlation matrix
S.cor = tau2 * matrix(0.9, K, K) #correlated effects has corr of 0.9 as off-diagonal
diag(S.cor) = tau2 #... and corr of 1 on the diagonal
S.ind = tau2 * diag(K) #diagonal matrix for independent effects model, off-diagonals = 0
S.null = matrix(0, K, K) #null model has 0 effects
Var.matrices = list(Sigma + S.null, Sigma + S.fix, Sigma + S.cor, Sigma + S.ind)
par(mfrow = c(4,2))
for(ii in 1:ncol(B.est)){
#Standard heterogeneity measures:
b.F = sum(w*B.est[,ii]) / sum(w) #IVW estimate under fixed-effect model
Q = sum( w * (B.est[,ii] - b.F)^2 ) #Cochran's Q
pval.Q = pchisq(Q, df = K-1, lower = F)
I2 = 1 - (K-1)/Q #I^2 from Q
#Bayesian model comparison:
abf.out = abf.mv(B.est[,ii], Sigmas = Var.matrices) #by default, prior is uniform
ci = cbind(B.est[,ii] - 1.96*se, B.est[,ii] + 1.96*se) #95%CIs
forest.plot(B.est[,ii], ci, main = paste("Data set",ii), xlab = "logOR")
barplot(abf.out$posterior, ylim = c(0,1), cex.sub = 1.3,
sub = paste0("I2=",max(c(0,round(I2*100))),"% het P=",signif(pval.Q,2)),
names.arg = c("NULL", "FIX", "COR", "IND"),
col = c("gray","limegreen","orange","dodgerblue"))
}
```

The Bayesian approach gives a way to assess whether there is heterogeneity in the effect sizes by comparing correlated effects model and/or independent effects model to the fixed effect model. Above it indicates heterogeneity in data sets 2 and 3, as expected. Similarly the \(I^2\) value together with P-value from \(Q\) indicate heterogeneity in sets 2 and 3.

Extensions of the Bayesian approach to overlapping samples between studies, e.g., due to shared controls, and to the subset models, where the effect is non-zero only in particular studies, are discussed by Trochet et al. 2019.

As the final comment about heterogeneity: Whenever you have an interesting variant in a GWAS meta-analysis, make a forest plot over all the cohorts to see how the effects look like and don’t rely only on some quantitative heterogeneity measures, especially if there are only a couple of studies included.

A crucial part of all meta-analyses is to include in the analysis
*all* the data available on the particular research question. In
particular, one should never use the results of the studies to decide
which studies to include or leave out since that will obviously bias the
results of the meta-analysis. Studies can be left out because of quality
issues or differences in phenotypes, for example, but these must be
objective criteria that are not based on the results of the study in any
one SNP. In general, meta-analyses in epidemiology and social science
etc. are hampered by **publication bias** which means that
only studies reporting statistically significant results are published
whereas null studies never find their way to public. Consequently, a
meta-analysis may report a significant effect based on published studies
even though there could be another set of unpublished studies that could
show that, when all information is combined, there is no effect. The
pubication bias is less of a problem in GWAS, because GWAS results are
published simultaneously genome-wide, not separately for the
“significant” SNPs.

Meta-analysis yields a set of association statistics (\(\widehat{\beta}\), SE, P-value). Let’s look at how we can do some of the downstream analyses with these pieces of information without an access to the full raw genotype-phenotype data.

Let’s consider the joint linear model with \(p\) SNPs with the mean centered phenotype \(y\) and standardized genotypes (and then we can drop the intercept term from the model): \[\pmb{y} = \pmb{X^*} \pmb{\lambda}^* + \pmb{\varepsilon}.\] The least squares estimator and its variance are

\[\begin{align} \widehat{\pmb{\lambda}}^* &= \left(\pmb{X^*}^T\pmb{X^*}\right)^{-1} \pmb{X^*}^T\pmb{y},\\ \text{Var}\left(\widehat{\pmb{\lambda}}^*\right) &= \sigma_J^2 \left(\pmb{X^*}^T\pmb{X^*}\right)^{-1}, \end{align}\]

where

- \(\sigma_J^2 = \text{Var}(\varepsilon) = \text{Var}(y) - (\widehat{\pmb{\lambda}}^*)^T \pmb{R} \widehat{\pmb{\lambda}}^*\) is the error variance from the Joint model and \(\pmb{R}\) is the LD matrix of the SNPs in \(\pmb{X}\).

It turns out that these quantities can be written using summary data
from the marginal models \(\pmb{y}=\pmb{x}^*_l\beta_l+\pmb{\varepsilon}_l\),
since \[\begin{align}
\left(\pmb{X^*}^T\pmb{X^*}\right) &= n \pmb{R}\\
\pmb{X^*}^T\pmb{y} &= n \widehat{\pmb{\beta}}^*\\
\text{Var}(y) &= (\widehat{\beta}^*_l)^2 + \widehat{\sigma}_l^2 =
(\widehat{\beta}_l^*)^2 + 2 n f_l(1-f_l)\cdot \text{SE}_l^2,
\end{align}\] where \(f_l\) is
the MAF of SNP \(l\) and \(\text{SE}_l\) is the standard error of the
*allelic* marginal effect \(\widehat{\beta}_l\), as reported by
GWAS.

With these formulas we have that \[\begin{align} \widehat{\pmb{\lambda}}^* &= \pmb{R}^{-1} \widehat{\pmb{\beta}}^*,\\ \text{Var}\left(\widehat{\pmb{\lambda}}^*\right) &= \frac{\widehat{\sigma_J^2}}{n} \pmb{R}^{-1},\\ \widehat{\sigma_J^2} &= \text{median}_{l=1}^p\left\{(\widehat{\beta}_l^*)^2 + 2 n f_l(1-f_l)\cdot \text{SE}_l^2\right\} - (\widehat{\pmb{\beta}}^*)^T \pmb{R}^{-1} \widehat{\pmb{\beta}}^*, \end{align}\] where the median is taken over all the available SNPs in \(\pmb{X}\) and its function is to reduce noise compared to the corresponding variance estimate taken from any one \(l\). In particular, we do not need an access to raw \(\pmb{X}\) and \(\pmb{y}\) in order to do a stepwise forward selection or probabilistic fine-mapping as long as we have the marginal GWAS association statistics and the LD matrix available.

If association statistics come from an IVW fixed-effect meta-analysis, then the LD-matrix is a weighted sum of the LD-matrices of individual studies, where the weights are proportional to the sample sizes of the studies Before the meta-analysis, one must make sure that all the studies have measured the effects on the same scale in order that the fixed-effect meta-analysis of the effect sizes makes sense. Typically, this is ensured by normalizing the trait to have a variance of 1 in each study before running the GWAS.

If the association statistics are coming from a **logistic
regression model applied to case-control data**, we modify the
above formulas by setting \(\sigma_J^2=1\) and replacing the sample
size \(n\) with the effective sample
size \(n_e\). If summary data come from
one study, then \(n_e = n
\phi(1-\phi)\), where \(\phi\)
is the proportion of cases in data. If summary data are from a
meta-analysis over several studies, then \(n_e\) is the sum of the effective sample
sizes \(n_{e}^{(i)} = n^{(i)} \phi_i
(1-\phi_i)\) over individuals studies, where \(n^{(i)}\) is the total sample size (cases +
controls) of study \(i\) and \(\phi_i\) is the proportion of cases in
study \(i\). In this meta-analysis
case, the LD-matrix is a weighted sum of the LD-matrices of individual
studies, where the weights are proportional to the effective sample
sizes of the studies. This approximation works well when the effect
sizes are not very large, MAF is not very small and \(\phi\) is quite balanced, say, within
(0.2,0.8).

The idea of computing the joint model from the summary statistics was introduced by Yang et al. 2012 and has been widely used through the conditional and joint analysis (COJO) module of the software package GCTA. The same idea is used in many fine-mapping software packages such as FINEMAP.

Let’s try it with two SNPs and a quantitative trait. (Uses
`geno.2loci()`

from Section 7 to generate genotypes.)

```
n = 2000
maf = c(0.3, 0.4)
lambda = c(0.2, -0.05) #true causal effects
r = 0.5 #LD btw SNPs 1 & 2
X = geno.2loci(n, r, mafs = maf, return.geno = TRUE)
R = cor(X) #LD in data
f = colSums(X)/2/n #MAFs in data
sc = sqrt(2*f*(1-f)) #scaling constants
y = scale(X %*% lambda + rnorm(n, 0, sqrt(1 - var(X %*% lambda))))
b.est = rbind(summary(lm(y ~ X[,1]))$coeff[2,1:2],
summary(lm(y ~ X[,2]))$coeff[2,1:2]) #col1 estimate, col2 SE
b.s = b.est[,1]*sc #scaled betas
l.s.sumstat = solve(R, b.s) #computes scaled lambdas as R^-1 * b.s
sigma2.J.sumstat = median(b.s^2 + n*sc^2*b.est[,2]^2) - as.vector(t(b.s) %*% solve(R, b.s))
l.s.se.sumstat = sqrt(sigma2.J.sumstat/n * diag(solve(R))) #SEs of scaled lambdas
cbind(lambda = l.s.sumstat/sc, se = l.s.se.sumstat/sc) #show on allelic scale
```

```
## lambda se
## [1,] 0.21911598 0.03996448
## [2,] -0.05144214 0.03695407
```

```
#Compare to the joint model on raw data
summary(lm(y ~ X))$coeff[2:3,1:2]
```

```
## Estimate Std. Error
## X1 0.21883814 0.04014083
## X2 -0.05050388 0.03681416
```

Same results up to the 3rd decimal. (Binary trait case left as an exercise.)

Given that the association summary statistics from large
meta-analyses are publicly available, it would be very nice if we could
do joint models and fine-mapping by combining those statistics with
LD-information from some **reference database**, without
needing the access to the original genotypes. Indeed, this is how
GCTA-COJO analyses are done. Unfortunately, with recent large datasets,
such as the UK Biobank, it has become clear that the accuracy of the
LD-estimates must increase together with the GWAS sample size.
Otherwise, the summary statistic methods start reporting false positives
because of the inconsistency between the highly precise effect estimates
and the LD information from the reference data (Benner
et al. 2017). Hence, in general, we will need LD-information from
the same data from which the GWAS summary statistics were calculated in
order to do reliable fine-mapping and joint analysis. This is one reason
why future meta-analyses should be planned in such a way that all data
are collected in one place, and why we will need new ways to seamlessly
distribute LD-information as another type of GWAS summary
statistics.

Our goal so far has been to identify **causal variants**
that tell about the biology of the phenotype and propose ways for
targeted treatments.

Another way to utilize GWAS results is to **predict**
phenotypes. There is a difference between understanding the causes of a
phenomenon and an ability to predict the phenomenon: While understanding
typically implies a good prediction, a good prediction does not
necessarily require understanding. For example, we do not need to know
which of the two variants in high LD with each other is a causal one in
order to do a good prediction: Either of the variants will do almost
equally well when used in a prediction model, because, due to high LD,
they carry almost the same information about the genetic differences
between individuals.

Let’s consider the standard additive model for the phenotype across the whole genome: \[ y_i = \eta_i + \varepsilon_i = \sum_{k=1}^p x_{ik} \lambda_{k} + \varepsilon_i. \] If we knew the true causal effects \(\lambda_{k}\), then we could do the perfect prediction of the genetic component \(\eta_i = \sum_k x_{ik} \lambda_{k}\) for individual \(i\) given her/his genotypes. In the population, this perfect genetic prediction would explain the proportion \(h^2\) (=additive heritability) of the total phenotypic variance and this would be as good as an additive genetic prediction ever gets.

By a **polygenic score** (PGS) we mean an instance of
the additive genetic predictor defined by a set of weights \(\pmb{\alpha} = (\alpha_k)_{k=1}^p\) that
predicts the genetic component of individual \(i\) as \[
\text{PGS}_i(\pmb{\alpha}) = \sum_{k=1}^p x_{ik} \alpha_{k}.
\] Typically, the weights \(\alpha_k\) are obtained from the GWAS
summary statistics \(\widehat{\beta}_k\), possibly with some
variable selection and/or shrinkage of the effect sizes and/or
LD-adjustments to approximate the causal effects \(\lambda_k\) rather than the marginal
effects \(\beta_k\).

Such PGS can then be tested against the known phenotype values in a test cohort to see how much phenotypic variation it explains.

A guide for making PGS by Choi et al. 2020 and the corresponding online tutorial. These articles talk about developing and evaluating polygenic risk prediction models (Chatterjee et al. 2016) and personal and clinical utility of PS (Torkamani et al. 2018).

The methods to derive the best possible PGS weights \(\alpha_k\) are currently one of the hot topics in the GWAS field since the recent results have shown that the current GWAS summary statistics can already provide useful predictive discrimination for risk of several diseases (slides 17-21). Hope is that by more advanced modeling, the accuracy can be further improved.

New methods are typically compared to the two reference methods:
**P-value clumping** and LDpred. Both of these take
in the GWAS association statistics and produce PGS-weights by accounting
for LD. P-value clumping prunes away variants that are in high LD with
each other whereas LDpred is a Bayesian method that outputs estimates
for the causal effect sizes for each variant by accounting for the
LD-structure around the variant.

**P-value clumping (\(r^2\),\(d\),\(P_{\text{thr}}\)).** The simplest
PGS uses the marginal GWAS effect estimates \(\widehat{\beta}_k\) as weights. Suppose
that we have two variants in high LD. Their marginal effects are almost
the same and including them both in the PGS is likely to overestimate
the joint contribution from these two SNPs and, hence, reduce the PGS
accuracy. To avoid this, we do some **LD-pruning** meaning
that we will only include non-zero weights for SNPs that are not in high
LD with each other. For example, we may require that \(r^2<0.1\) between all pairs of variants
that have non-zero weights in PGS. In practice, such LD-pruning is
applied only within certain window size (e.g., \(d=1\)Mb) for computational reasons and
because LD decays quickly with distance in homogeneous populations.
**P-value clumping** means LD-pruning that prefers to leave
in the data the variant with the lowest GWAS P-value and prune away its
LD-friends that have higher GWAS P-values. This way our final LD-pruned
data set contains as many of the top GWAS variants (in terms of the
lowest P-values) as possible given the pruning parameter \(r^2\). Typically, there is also a P-value
threshold (between 0.05 and 5e-8) to ensure that all variants included
in PGS with non-zero weights will have GWAS P-value \(<P_{\text{thr}}\).

**Training, validation and testing.** Let’s put the
generation of the PGS-weights using P-value clumping method to the
context of typical prediction model building having three independent
data sets for each of training, validation and testing.

Training data is an existing large GWAS on the phenotype of interest where we have access to the marginal association statistics. Ideally, we would also have access to the LD information of the training data, but when this is not possible, we use external reference data from the GWAS population as an approximation to the LD in training data.

Validation data are genotype-phenotype data that we use to tune the parameters of the PGS model, namely \(r^2\) and \(P_{\text{thr}}\) (while \(d\) is often assumed fixed). This means that we will make a set of PGS for a grid of values of \(r^2\) and \(P_{\text{thr}}\), and test in the validation set how each of them performs. We choose the best performing version of the PGS as our final PGS.

Testing data are individual level genotype-phenotype data that are independent from training and validation data. Testing data are used only to test the final PGS that was chosen at the validation step. The performance of the PGS in testing data is expected to generalize to other data sets that have similar properties: same population, same phenotype etc. Note that the performance in validation data may overestimate the performance in some new data since the validation data were used for optimizing the PGS parameters. The performance in testing data does not suffer from this problem and it is therefore the final result to report about the PGS chosen.

PRSice2 is software to generate and validate PGS given the summary statistics and validation data. Also PLINK2 does P-value clumping and computes PGS.