In the previous lecture, we considered a multiple testing problem where thousands of tests were done while only a few of them were expected to be non-null. There, the family-wise error control (FWER) seemed a reasonable way to filter a few most prominent candidates for being true positives from the vast set of null variables.
Let’s now consider situations where we may have a considerable amount of true positives \(m=p-p_0\) among the \(p\) tested variable, e.g., \(m/p = 10\%\). Let’s start by writing up easy ways to generate realistic sets of P-values for such a situation.
Suppose that in linear regression \(y = \mu
+ x \beta +\varepsilon\) the true slope is \(\beta\). If \(\textrm{Var}(x) = v_x\) and error variance
is \(\textrm{Var}(\varepsilon) =
\sigma^2\), then the sampling variance of \(\widehat{\beta}\) is \(v_\beta = \sigma^2/(n v_x)\) where \(n\) is the sample size, and z-score for
testing the slope is distributed as \[z=\frac{\widehat{\beta}}{\sqrt{v_\beta}}\sim
\mathcal{N}\left(\frac{\beta}{\sqrt{v_\beta}},1\right). \] We can
compute P-values for such z-scores using
pchisq(z^2, df = 1, lower = F), because under the null
\(z \sim \mathcal{N}(0,1)\) and hence
\(z^2 \sim \chi^2_1\), i.e., \(z^2\) follows the central chi-square
distribution with one degree of freedom. With this method, we can easily
generate P-values for \(p_0\)
null variables (\(\beta=0\)) and \(m\) non-null variables \(\beta \neq 0\) and test various inference
methods on those P-values. Note that the P-value
distribution of the non-null variables will depend on sample size \(n\), true value \(\beta\) and the ratio of predictor and
error variances \(v_x/\sigma^2\),
whereas the P-value distribution of the null variables remains
always the same, namely, Uniform(0,1).
Let’s generate P-values for \(p=1000\) variables that we assume each have been tested independently for a linear association with its own outcome variable. Let’s also assume that \(m=100\) of them actually do have a non-zero effect, each explaining 1% of the variance of the corresponding outcome. We follow the procedure from Lecture 0 to generate such data, and set \(\sigma^2 = 1\) and \(v_x = 1\).
set.seed(11102017)
n = 1000 # sample size
p = 1000 # number of variables to be tested
m = 100 # number of non-null tests
b = sqrt(0.01 / (1 - 0.01)) # this means each predictor explains 1%, See Lecture 0.
# Generating P-values: 1,...,m are non-null effects, m + 1,...,p are null.
eff = c(rep(TRUE, m), rep(FALSE, p - m)) # indicator for non-null effects
pval = pchisq(rnorm(p, b*sqrt(n)*as.numeric(eff), 1)^2, df = 1, lower = F) # note multiplication by the indicator 'eff'
boxplot(-log10(pval)[eff], -log10(pval)[!eff], col = c("limegreen", "gray"),
names = c("EFFECTS", "NULL"), ylab = "-log10 Pval")
abline(h = -log10(0.05), col = "blue", lty = 2) # significance threshold 0.05
abline(h = -log10(0.05 / p), col = "red", lty = 2) # Bonferroni corrected threshold of 0.05
We see that the non-null effects tend to have smaller P-values (i.e. larger -log10 P-values) than null effects, but there is some overlap between the distributions and therefore, from the P-values alone, we cannot make a rule that detected all true positives but did not report any false positives.
Remember the characterization of test results according to this table:
|
True state of the world
|
|||
|---|---|---|---|
| Test result | not null | null | Total |
| negative (not significant) | FN | TN | p-D |
| positive (significant, discovery) | TD | FD | D |
| Total | p-p0 | p0 | p |
Let’s print such a table when we first do inference based on a P-value threshold 0.05 and then use the Bonferroni correction.
alpha = 0.05
dis = (pval < alpha) # logical indicating discoveries
table(dis, !eff, dnn = c("discov","null"))
## null
## discov FALSE TRUE
## FALSE 10 851
## TRUE 90 49
dis = (p.adjust(pval, method = "bonferroni") < alpha)
table(dis, !eff, dnn = c("discov","null"))
## null
## discov FALSE TRUE
## FALSE 73 900
## TRUE 27 0
The problem with the raw P-value threshold is that there are many false discoveries (49/139 = 35% of all discoveries are false). The problem with the Bonferroni correction is that only about a third (27/100) of all non-zero effects are discovered. To have something in between these two extremes, we try to directly control the proportion of false discoveries made out of all discoveries.
Let’s define False Discovery Proportion (FDP) as a random variable \[ \textrm{FDP}=\frac{\text{FD}}{\max\{1,D\}}=\left\{\begin{array}{cc} \frac{\text{FD}}{D}, \text{ if }D>0.\\ 0, \text{ if }D=0. \end{array}\right.\]
False Discovery Rate (FDR) is the expectation of FDP: \[ \textrm{FDR}=\textrm{E}(\textrm{FDP}).\] While FDP is the proportion of false discoveries among all discoveries in one particular realization of the testing set-up, FDR describes the expected behavior of this proportion over many possible data sets. By controlling FDR at a given level \(\alpha_{F}\), we will (on average) tolerate more false discoveries as the number of tests increases as long as we will also keep on doing more true discoveries. If a method guarantees to keep \(\alpha_F = 0.1\), then, in expectation, when we make 10 discoveries we allow 1 of them to be a false discovery, whereas when we make 100 discoveries we allow 10 false discoveries among them. Note the difference to FWER control where we always allow at most 1 false discovery in the experiment, no matter whether we are doing 1, 10, 1000 or 100,000 discoveries altogether.
Given a rule for calling variables significant, the false positive rate is the rate that null variables are called significant and is can be measured by the significance threshold that determines the significance from the P-values. The FDR, instead, is the rate at which variables labelled significant are truly null. For example, a false positive rate of 5% means that, on average, 5% of the truly null variables in the study will be called significant. An FDR of 5% means that, on average, among all variables called significant, 5% are truly null.
Let’s see which P-value threshold would yield a false discovery proportion (FDP) of \(\alpha_F = 0.05\) in our ongoing example. (Remember, FDP is the realized value of FD/D in a particular data set whereas FDR is the expectation of FDP when the procedure is applied over a distribution of data sets. In real life, we couldn’t know exact FDP for any one data set, but in this simulations we can!)
sorted_pval = sort(pval) # sorted from smallest to largest
sorted_eff = eff[order(pval)] # whether each sorted pval is a non-null effect
fdp = cumsum(!sorted_eff) / (1:p) # false discovery proportion
cols = rep("red", p) # nulls will remain red
cols[sorted_eff] = "orange" # non-nulls are set orange
plot(log10(sorted_pval), fdp,
xlab = "log10 P-value", ylab = "FDP",
col = cols, cex = 0.7, pch = 20)
alpha = 0.05 # significance level
ii.fdp = max(which(fdp <= alpha))
print(paste("fdp <=",alpha,"when P-value is <=",signif(sorted_pval[ii.fdp],3)) )
## [1] "fdp <= 0.05 when P-value is <= 0.00632"
grid()
abline(v = log10(sorted_pval[ii.fdp]), col = "blue")
abline(h = alpha, col = "blue")
# shows the step where fdp <= alpha breaks
cbind(D = ii.fdp:(ii.fdp + 1),
FD = fdp[ii.fdp:(ii.fdp + 1)] * c(ii.fdp, ii.fdp + 1),
fdp = fdp[ii.fdp:(ii.fdp + 1)],
pval = sorted_pval[ii.fdp:(ii.fdp + 1)])
## D FD fdp pval
## [1,] 75 3 0.04000000 0.006315223
## [2,] 76 4 0.05263158 0.006431357
So if we had a method that controlled FDR at 0.05, here we would expect it to give about \(D=75\) discoveries of which about \(\text{FD}=3\) are false. Compare this to \(D=139\) and \(\text{FD}=49\) with an unadjusted P-value threshold of 0.05 and to \(D=27\) and \(\text{FD}=0\) with a Bonferroni adjusted threshold of \(0.05/p\).
But how could we in general control FDR given a set of P-values? A widely used method for this was formulated by Yoav Benjamini and Yosef Hochberg in 1995.
Let \(H_j\) be the null hypothesis for test \(j\) and let \(P_j\) be the corresponding P-value. Denote the ordered sequence of P-values as \(P_{(1)} \leq P_{(2)} \leq \ldots \leq P_{(p)}\) and let \(H_{(j)}\) be the hypothesis corresponding to the \(j\)th P-value. BH procedure at level \(\alpha_F\) (BH(\(\alpha_F\))) is to \[\textrm{reject the null hypotheses } H_{(1)},\ldots,H_{(k)}\textrm{, where } k \textrm{ is the largest index } j \textrm{ for which } P_{(j)} \leq \frac{j}{p}\alpha_F.\] Theorem (BH). For independent tests and for any configuration of false null hypotheses, BH(\(\alpha_F\)) controls the FDR at level \(\alpha_F\).
Original proof given in Benjamini and Hochberg, 1995, J. R. Statist. Soc. B. 57:289-300.
Intuitive explanation why BH works. Consider P-value \(P_{(j)}\). Since \(P_{(j)}\) is the \(j\)th smallest P-value, if we draw a significance threshold at \(P_{(j)}\), we have made exactly \(j\) discoveries. On the other hand, we expect that out of all \(p_0\) null effects about \(p_0\cdot P_{(j)}\) give a P-value \(\leq P_{(j)}\). Thus, we have an approximation for false discovery proportion at threshold \(P_{(j)}\) \[\textrm{FDP}(P_{(j)}) \approx \frac{p_0\cdot P_{(j)}}{j} \leq \frac{p \cdot P_{(j)}}{j}.\] If we simply ask that for which \(j\) is this estimated FDP\((P_{(j)})\leq \alpha_F\), we get \[\frac{p \cdot P_{(j)}}{j} \leq \alpha_F\, \Leftrightarrow\, P_{(j)} \leq \frac{j}{p}\alpha_F,\] which is the condition of the BH procedure.
A rigorous proof of the BH theorem is much more complicated than the intuitive argument above, as shown below.
Proof of BH. (Adapted from Aditya Guntuboyina.)
Assume we have \(p\) hypotheses and corresponding P-values \(\pmb{P}=(P_1,\ldots,P_p)\) and that BH(\(\alpha_F\)) method rejects \(k\) of these hypotheses. This means that when the P-values are in the ascending order \((P_{(1)},\ldots,P_{(p)})\), then \(k\) is the largest index that satisfies the BH-condition \[P_{(k)} \leq {\frac{k}{p}\alpha_F}.\qquad \textrm{(1)}\] We can write FDP as \(\frac{1}{\max\{k,1\}}\sum_{j \in I_0} \textrm{I}\{P_j \leq \alpha_F\cdot k / p\}\), where \(I_0\) is the set of \(p_0\) true null hypotheses, and \(\textrm{I}\) is an indicator function. \[ \textrm{FDR} = \textrm{E}(\textrm{FDP}) = \textrm{E}\left(\sum_{j \in I_0} \frac{\textrm{I}\{P_j \leq \alpha_F\cdot k / p\}}{\max\{k,1\}}\right) \qquad (2) \] The problem with trying to simplify this further is that \(P_j\) and \(k\) are not independent. The idea in this proof is to replace \(k\) with a closely related quantity \(\tilde{k}_j\) that is independent of \(P_j\).
For any \(j\), denote by \(\tilde{\pmb{P}}_j = (P_1,\ldots,P_{j-1},0,P_{j+1},\ldots P_p)\) the sequence of P-values where \(P_j\) has been changed to 0. Denote by \(\tilde{k}_j\) the number of rejections that BH(\(\alpha_F\)) method does when it is applied to \(\tilde{\pmb{P}}_j\). Importantly, \(\tilde{k}_j\) only depends on \((P_i)_{i\neq j}\) but not on \(P_j\), that, by an assumption of the BH theorem, is independent of \((P_i)_{i\neq j}\).
Clearly \(k \leq \tilde{k}_j\) since the difference between the two sequences of P-values is that the element \(j\) in \(\tilde{\pmb{P}}_j\) is \(0 \leq P_j\) while all other elements are the same between the two sequences. It is also clear that \(\tilde{k}_j \geq 1\) since BH(\(\alpha_F\)) always rejects the hypothesis corresponding to P-value 0.
We next show that \[ \frac{\textrm{I}\{P_j \leq \alpha_F\cdot k / p\}}{\max\{k,1\}} \leq \frac{\textrm{I}\{P_j \leq \alpha_F\cdot \tilde{k}_j / p\}}{\tilde{k}_j},\qquad \textrm{for all } j=1,\ldots,p.\qquad (3) \]
Since the right-hand side is non-negative, the inequality (3) holds when \(\textrm{I}\{P_j \leq \alpha_F\cdot k / p\}=0.\)
Consider then the remaining case \(\textrm{I}\{P_j \leq \alpha_F\cdot k / p\} = 1\). Since \(k\) is the largest index for which the BH-condition (1) holds, \(P_j \leq P_{(k)},\) and hence the ordered sequences of P-values \(\pmb{P}\) and \(\tilde{\pmb{P}}_j\) agree on every element that is \(\geq P_j\). In particular, they agree on every element that is \(\geq P_{(k)}\). Since \(k\) was the largest index in the ordered sequence \(\pmb{P}\) for which the BH-condition (1) holds, it follows that BH(\(\alpha_F\)) rejects exactly the same hypotheses from both sequences. Thus, \(\tilde{k}_j = k\) and \(\textrm{I}\{P_j \leq \alpha_F\cdot \tilde{k}_j / p\} = \textrm{I}\{P_j \leq \alpha_F\cdot k / p\} = 1,\) which proves the inequality (3) also in this case.
Let’s get back to formula (2) and apply (3). \[\begin{eqnarray*} \textrm{FDR} &=& \textrm{E}(\textrm{FDP}) = \textrm{E}\left(\sum_{j \in I_0} \frac{\textrm{I}\{P_j \leq \alpha_F\cdot k / p\}}{\max\{k,1\}}\right) \leq \textrm{E}\left( \sum_{j \in I_0} \frac{\textrm{I}\{P_j \leq \alpha_F\cdot \tilde{k}_j / p\}}{\tilde{k}_j} \right) \\ &=& \textrm{E}\left(\textrm{E}\left( \sum_{j \in I_0} \frac{\textrm{I}\{P_j \leq \alpha_F\cdot \tilde{k}_j / p\}}{\tilde{k}_j}\,\middle|\,\tilde{k}_j \right)\right) = \textrm{E}\left( \sum_{j \in I_0} \frac{\textrm{E}(\textrm{I}\{P_j \leq \alpha_F\cdot \tilde{k}_j / p\} \,|\, \tilde{k}_j)}{\tilde{k}_j} \right)\\ &=& \textrm{E}\left( \sum_{j \in I_0} \frac{ \alpha_F \tilde{k}_j}{p \tilde{k}_j} \right) = \frac{p_0}{p}\alpha_F \leq \alpha_F. \end{eqnarray*}\] The last line follows since the sum is over \(p_0\) true null hypotheses whose P-values are uniformly distributed in (0,1) and because \(P_j\) is independent of \(\tilde{k}_j\). This proves that BH(\(\alpha_F\)) controls FDR at level \(\alpha_F\) (actually even more strictly at level \(\alpha_F \cdot p_0/p\)).\(\qquad \square\)
Like in the Holm procedure, in BH the rejection of the tested hypotheses depends not only on their P-values but also on their rank among all the P-values. We see that the critical threshold increases from the Bonferroni threshold \(\alpha_F/p\) for the smallest P-value to the unadjusted threshold \(\alpha_F\) for the largest P-value. Crucially, if any P-value \(P_{(j)}\) is below its own threshold \(j\cdot \alpha_F/p\), it means that also ALL the hypotheses corresponding to the smaller P-values will be rejected, no matter whether they are below their own thresholds.
Let’s apply BH to our current data, first manually and then by using
p.adjust.
alpha = 0.05
ii.bh = max(which(sorted_pval <= ((1:p) * alpha) / p))
print(paste("Reject 1,...,",ii.bh," i.e, if P-value <=",signif(sorted_pval[ii.bh],3)))
## [1] "Reject 1,..., 66 i.e, if P-value <= 0.00298"
print(paste0("Discoveries:",ii.bh,
"; False Discoveries:",sum(!sorted_eff[1:ii.bh]),
"; fdp=",signif(sum(!sorted_eff[1:ii.bh]) / ii.bh, 2)))
## [1] "Discoveries:66; False Discoveries:2; fdp=0.03"
pval_bh = p.adjust(pval, method = "BH")
# These are pvals adjusted by
# (1) a factor p/rank[i] AND
# (2) by the fact that no adjusted P-value can be larger than any of other adjusted
# P-values that come later in the ranking of the original P-values (See home exercises).
sum(pval_bh < alpha) # should be D given above
## [1] 66
Let’s draw a picture with the P-value threshold (blue), Bonferroni threshold (purple) and BH threshold (cyan). Let’s sort the P-values from the smallest to the largest and draw the P-values on -log10 scale.
plot(1:p, -log10(sorted_pval), pch = 3, xlab = "rank", ylab = "-log10 pval", col = cols, cex = 0.7)
grid()
abline(h = -log10(alpha), col = "blue", lwd = 2)
abline(h = -log10(alpha /p), col = "purple", lwd = 2)
lines(1:p, -log10((1:p) * alpha / p), col = "cyan", lwd = 2)
legend("topright", legend = paste(c("P=","BH","Bonferr."), alpha),
col = c("blue","cyan","purple"), lwd = 2)
Figure tells that BH quite nicely draws the threshold around the point where the non-null effects (orange) start to mix with the null effects (red), whereas Bonferroni is very stringent and the raw P-value based significance level is very liberal, when the goal is to separate the orange from the red.
Exercise. Suppose that all your \(p=1000\) P-values were between
\((0.01,0.05)\), so they are not very
small but still they are all at the left tail of the
Uniform(0,1) distribution. How would the three methods (Bonferroni, BH,
“P<0.05”) label variables significant in this case, when we
control each at level \(0.05\) as in
Figure above? Earlier we saw that by using a significance level of \(P\leq 0.05\), we tend to get a lot of false
positives when \(p\) is large. How can
our FDR method, that should keep the false discovery rate small, then
agree with the traditional significance threshold in this case? Why does
it think that there are not a lot of false positives among these 1000
variables?
It is important to remember that BH procedure was proven for independent tests and therefore it is not as generally applicable as the Bonferroni and Holm methods. (In practice, however, BH has been shown to be quite robust to violations of the independence assumption.) An extension of BH was proven to control FDR in all cases by Benjamini and Yekutieli (The Annals of Statistics 2001, Vol. 29, No. 4, 1165-1188).
Theorem. When the Benjamini-Hochberg procedure is conducted with \(\alpha_F/\left(\sum_{j=1}^p \frac{1}{j}\right)\) in place of \(\alpha_F\), the procedure controls the FDR at level \(\leq \alpha_F\). (Proof skipped.)
This Benjamini-Yekutieli (BY) procedure can also be done by
p.adjust.
alpha = 0.05
ii.by = max(which(sorted_pval <= ((1:p) * alpha / sum(1 / (1:p)) / p)))
print(paste("Reject 1,...,",ii.by," i.e, if P-value <",signif(sorted_pval[ii.by],2)))
## [1] "Reject 1,..., 44 i.e, if P-value < 0.00029"
print(paste0("Discoveries:",ii.by,
"; False Discoveries:",sum(!sorted_eff[1:ii.by]),
"; fdp=",signif(sum(!sorted_eff[1:ii.by]) / ii.by,2)))
## [1] "Discoveries:44; False Discoveries:0; fdp=0"
pval_by = p.adjust(pval, method = "BY") # these are pvals adjusted by factor p/rank[i]*sum(1/(1:p))
sum(pval_by < alpha) # should be D given above
## [1] 44
As we see, BY is more conservative than BH, which is the price to pay for the proven guarantees of the control of FDR in case of all possible dependency structures.
We have \[\textrm{FDR} = \textrm{E}(\textrm{FDP}) = \textrm{Pr}(D=0) \cdot 0 + \textrm{Pr}(D > 0) \cdot \textrm{E}\left(\frac{\text{FD}}{D}\,\middle|\, D>0 \right) = \textrm{Pr}(D > 0) \cdot \textrm{E}\left(\frac{\text{FD}}{D}\,\middle|\,D>0\right).\]
FDR weakly controls FWER. If all null hypotheses are true, then the concept of FDR is the same as FWER. To see this note that here \(\text{FD}=D\) and therefore if \(\text{FD}=0\) then \(\textrm{FDP}=0\) and if \(\text{FD}>0\) then \(\textrm{FDP}=1\). Thus, \(\textrm{FDR} = \textrm{E}(\textrm{FDP}) = \textrm{Pr}(\text{FD}>0) = \textrm{FWER}\). This means that FDR weakly controls FWER: If all null hypotheses are true, then any method that controls FDR at level \(\alpha\) also controls FWER at level \(\alpha\). However, if some null hypotheses are false, then FDR doesn’t typically control FWER.
FWER controls FDR. Because \(\textrm{FDP} \leq \textrm{I}(\text{FD}>0)\), where \(\textrm{I}\) is the indicator function, by taking expectation, \[\textrm{FDR} = \textrm{E}(\textrm{FDP}) \leq \textrm{Pr}(\text{FD} > 0) = \textrm{FWER}.\] Thus, any method that controls FWER at level \(\alpha\) also controls FDR at level \(\alpha.\)
Example 2.1. Let’s see how BH controls FWER under the global null hypothesis.
p = 1000 # variables for each data set
alpha = 0.1 # target FWER
R = 1000 # replications of data set
res = matrix(NA, ncol = 3, nrow = R) # collect the number of discoveries by BH, Holm, Bonferroni
for(rr in 1:R){
# Generate P-values from the null distribution
pval = runif(p)
res[rr,] = c(sum(p.adjust(pval, method = "BH") < alpha),
sum(p.adjust(pval, method = "holm") < alpha),
sum(p.adjust(pval, method = "bonferroni") < alpha))
}
apply(res > 0, 2, mean) # which proportion of data sets report at least one discovery?
## [1] 0.102 0.097 0.097
All are close to the target FWER value \(\alpha=0.1.\)
Example 2.2. BH method has a property that status of being a discovery at level \(\alpha\) can change if a new variable is included among the tested variables.
pval = c(0.014, 0.09, 0.05, 0.16)
p.adjust(pval, method = "BH")
## [1] 0.056 0.120 0.100 0.160
p.adjust(c(pval, 0.001), method = "BH")
## [1] 0.03500000 0.11250000 0.08333333 0.16000000 0.00500000
Now the status at FDR level \(\alpha = 0.05\) of P-value 0.014 has changed when we added one more variable with a lower P-value. This is because this new variable is taken as a discovery and, intuitively, this inclusion then allows some more false discoveries to be included among the discoveries without breaking the idea of controlling FDR.
Example 2.3. Bonferroni, Holm and BH behave differently in whether the P-values close to each other can become the same after the adjustment. Let’s demonstrate this with one tie in the middle of 4 P-values.
pval = c(0.01, 0.02, 0.02, 0.03)
rbind(pval, p.adjust(pval, "bonferroni"))
## [,1] [,2] [,3] [,4]
## pval 0.01 0.02 0.02 0.03
## 0.04 0.08 0.08 0.12
rbind(pval, p.adjust(pval, "holm"))
## [,1] [,2] [,3] [,4]
## pval 0.01 0.02 0.02 0.03
## 0.04 0.06 0.06 0.06
rbind(pval, p.adjust(pval, "BH"))
## [,1] [,2] [,3] [,4]
## pval 0.01000000 0.02000000 0.02000000 0.03
## 0.02666667 0.02666667 0.02666667 0.03
Bonferroni multiplies each P-value by same value (here 4) and hence tie will stay as it is but cannot unite with its neighbors.
Holm first multiplies by \((p-j+1)\), i.e., the 2nd value by 3 (to get 0.06) and 3rd value by 2 (to get 0.04) and the 4th value stays as it is (0.03). But then it needs to make sure that the adjusted P-values are in ascending order and it does this, for each P-value, by taking the maximum with P-values on the left. Hence, all last 3 P-values become 0.06.
BH first multiplies by \((p/j)\), i.e., the 1st value by 4 (to get 0.04), the 2nd value by 2 (to get 0.04) and 3rd value by 4/3 (to get 0.0266667) and the 4th value stays as it is (0.03). It also makes sure that the adjusted P-values become in ascending order but it makes this by taking, for each P-value, the minimum with all other P-values on the right hand side. Hence also the 1st and 2nd P-values become 0.0266667.
We say that Holm method is a step-down procedure and BH method is a step-up procedure. Note that in this terminology, “up” and “down” refer to the ordering of the test statistics and not to the ordering of the P-values which is (confusingly) just the opposite!
Step-down procedures start from the hypothesis with the largest test statistic (and lowest P-value) and step down through the sequence of hypotheses in descending order of their test statistics while rejecting the hypotheses. The procedure stops at the first non-rejection and labels all remaining hypotheses as not-rejected.
Step-up procedures start from the hypothesis with the lowest test statistic value (and highest P-value) and step up through the sequence of hypotheses in ascending order of their test statistics while retaining hypotheses. The procedure stops at the first rejection and labels all remaining hypotheses as rejected.
Single step procedures are the ones where the same criterion is used for each hypothesis, independent of its rank among the test statistics. The Bonferroni correction and fixed significance level testing are examples of single step procedures.
Questions.
If you use BH(\(\alpha_F\)) method to choose the significant variables, are you guaranteed to have at most \(\alpha_F D\) false discoveries among the chosen variables, where \(D\) is the total number of discoveries made by the procedure?
If you have \(p=1000\) variables to test, when should you use FDR control and when FWER control?
What could be a situation when you would rather use BY procedure than BH procedure to control for FDR?