This text is
based on pp 28-30 of Hemilä
(2006)
These documents have up to date links to documents that are available
via
the net.
Harri Hemilä
Department of Public Health
University of Helsinki,
Helsinki, Finland
harri.hemila@helsinki.fi
The most severe problems of meta-analysis are related to the
experimental similarity of the studies that are combined and their
validity. ‘Combining apples and oranges’ has been commonly used as a
metaphor to describe this problem, but often the meta-analytic mixtures
are so heterogeneous that ‘combining rotten fruits’ might sometimes be
a more appropriate way to describe the problem (Feinstein 1995).
Improper consideration of the experimental features of trials is
illustrated by an early, often-cited meta-analysis that combined the
findings of 6 small randomized trials examining the value of
anticoagulants in acute myocardial infarction (Chalmers et al. 1977);
this was Thomas Chalmers’ third meta-analysis. Of the 6 randomized
trials included, 2 contained no criteria for the diagnosis of
myocardial infarction, and in 3 others the published definitions were
so inexact that the patient populations could not be reproducibly
identified (Goldman & Feinstein 1979). The treatments also varied
from heparin alone to heparin with warfarin or warfarin derivatives and
even warfarin with optional heparin, but the modes of action of heparin
and warfarin are different and it should not be assumed that these
treatments are pharmacologically equivalent enough to be combined in a
meta-analysis (Goldman & Feinstein 1979).
In another example of the lack of properly considering the experimental
aspects of controlled trials included in meta-analyses, Bailar (1995;
MacArthur et al. 1995) discussed one meta-analysis by the Chalmers
group of 6 trials dealing with the effects of diethylstilbestrol on the
outcome of pregnancy (Goldstein et al. 1989). One of the 6 trials did
not deal with diethylstilbestrol at all, and 3 others were
methodologically flawed enough to destroy any credibility in their
reported findings. Two studies appeared to have had adequate
methodological strength, but one dealt with a series of pregnant women
from the general populations, while the other was limited to pregnant
women who had diabetes. It seems questionable, at best, to pool results
from these last two studies without some thoughtful discussion (Bailar
1995). Furthermore, although Goldstein et al. had excluded one trial
based on the probable non-alternate assignment of treatment as well as
inconsistencies in the text, Bailar identified the same problems in 3
of 5 trials that Goldstein et al. did include. Considering the various
shortcomings, Bailar concluded that the ‘typical odds ratio’ calculated
by Goldstein et al. (1989) was meaningless.
In a further meta-analysis by the Chalmers group, 9 trials with 744
participants were combined to form a pooled estimate of antibiotic
prophylaxis for recurrent acute otitis media (Williams et al. 1993).
"Of those [nine] studies, 2 were on special populations, Alaskan
Eskimos and asthmatics involving 388 children. They are
nonrepresentative groups of the general population because of
differences in severity and pathogenesis. Williams et al. also included
3 small crossover trials which used sulfisoxazole (a drug not
recommended for extended use). Excluding those studies, there were only
4 RCTs with a combined population of 235 patients [in contrast to
Williams’ 744 patients]. In that group, the rate difference was only
0.067 [episodes per month; in contrast to Williams’ estimate of 0.11]"
(Cantekin 1994, 1998).
In addition to the problems in the Goldstein et al. meta-analysis
(1989), Bailar (1995) also pointed out serious shortcomings in 4 other
meta-analyses. Klein (2000) pointed out severe flaws in 4 meta-analyses
comparing psychotherapy with pharmacotherapy. Indeed, there are
numerous examples of unreliable meta-analyses.
The concept of meta-analysis seems to imply objectivity, but the
selection criteria can vary substantially when different research
groups carry out meta-analysis of the ‘same’ topic. In a
‘meta-meta-analysis’ Katerndahl and Lawler (1999) analyzed 23
meta-analyses that had examined the value of cholesterol reduction in
coronary heart disease, finding substantial variation in the
meta-analyses and their conclusions. Similarly, Prins and Buller (1996)
discussed the divergent findings in 4 meta-analyses on the preferred
dosage of aminoglycosides and concluded that "The physician can only
follow the conclusion of the meta-analysis most closely in accordance
with his or her own beliefs." The divergent and incompatible
conclusions of the first three meta-analyses on vitamin C and the
common cold were mentioned above (p 28 of Hemilä 2006).
Although many meta-analyses of controlled trials are problematic, the
problems are even greater in meta-analyses of non-experimental studies,
since there may be consistent biases in the studies included. A
meta-analysis on the association between chlorination of drinking water
and cancer risk by the Chalmers group (Morris et al. 1992) included
such severely biased studies that, independently, Bailar (1995) and
Shapiro (1997) pointed out various problems. Also, Cantor (1994)
commented that "A recent meta-analysis [by Morris et al.] … has
probably confused the situation. This exercise may have been premature
since most of the input data came from studies with (1) inadequate
control of confounding and other sources of bias, and (2) highly
limited estimates of historical exposure to drinking water
contaminants."
A meta-analysis of the association between alcohol consumption and
breast cancer by the Chalmers group (Longnecker et al. 1988) included
studies that had such severe methodological defects, that the
meta-analysis was considered seriously misleading by Shapiro (1994,
1997; see also Rosenberg 1989), who published both the initial
association which led to the series of studies, and finally the large
study finding a null result. Because of the large variety of potential
biases in non-experimental studies, Shapiro (1997) considered that "The
meta-analysis of nonrandomized observational studies resembles the
attempt of a quadriplegic person to climb Mount Everest unaided."
The main limitations and challenges of meta-analysis are related to the
experimental issues. The statistical methods available for combining
P-values or data on individual studies are well established (e.g.,
Greenland 1987, 1998; Laird & Mosteller 1990; Fleiss 1993; Higgins
& Green 2005 ss 8.6 to 8.8), but there are examples of demonstrably
invalid methods of analysis even in some of the influential
meta-analyses. In one meta-analysis, the average case fatality
percentage was calculated for 6 trials, without using any weight, even
though the number of participants in the 6 trials varied from 53 to
1,427; i.e., close to 30 fold (Chalmers et al. 1977). In fact, since
the large trials found a considerably smaller benefit than the small
trials, combining the actual data instead of percentages led to a
substantially smaller difference (by 30%) between the pooled study
groups (Goldman & Feinstein 1979). The Chalmers (1977)
meta-analysis was cited in a recent systematic review comparing
randomized and non-randomized trials (Kunz & Oxman 1998), without
noting its severe methodological problems, yet Oxman was an editor of
the previous edition of the Cochrane Reviewers’ Handbook, which
commented that "interpretation of results is dependent upon the
validity of the included studies" (Clarke & Oxman 2002 s 6).
Lack of basic arithmetic is also seen in a meta-analysis of vitamin C
and the common cold in which treatment effects on ‘duration in days’
were averaged without considering either the differences in the size of
the trials or the large variations in the cold duration in the control
groups (Chalmers 1975; see pp 36-8). Sometimes meta-analysts are not
familiar with the standard methods either; for example, in his vitamin
C common cold meta-analysis, Chalmers (1975) claimed that in the
earlier meta-analysis on vitamin C and the common cold, Pauling (1971a)
had "averaged ‘p’ values from the different studies." However, in his
statistical analysis Pauling used the well-established Fisher method of
calculating the combined P-value from several independent P-values
(1938; Sokal & Rohlf 1981; Laird & Mosteller 1990), which
cannot be described as naive ‘averaging.’ Furthermore, Chalmers himself
used the same Fisher method two years after his criticism (Chalmers et
al. 1977).
Because of many published meta-analyses were not properly performed,
several experts have been rather skeptical as to its usefulness
(Meinert 1989; Spitzer 1991; Feinstein 1995; Feinstein & Horwitz
1997; Eysenck 1994; Bailar 1995, 1997a,b, 1999; Shapiro 1994, 1997).
Bailar (1995) commented that "Meta-analysis has been seized with
enthusiasm by many scientists not trained in statistics and cognate
sciences, and it is clear in conversations that many of them have
utterly unrealistic views about its scope and power." Shapiro (1997)
noted that "I think there is something profoundly amiss in the
uncritical way in which the epidemiologists, and indeed the medical
profession as a whole, have allowed themselves to be seduced by the
numerological abradacabra of meta-analysis." Meinert (1989) commented
that "There are no easy, inexpensive answers to complex questions and
attempts to substitute small trials and meta-analysis for large trials
is illusionary and detrimental to both medicine and clinical trials."
In an editorial in a major journal, Bailar (1997a) further commented
that "In my own review of selected meta-analyses, problems were so
frequent and so serious … that it was difficult to trust the overall
‘best estimates’ … I still prefer conventional narrative reviews of the
literature, a type of summary familiar to readers of the countless
review articles on important medical issues." Although such strong
comments can be understood against the background of the severe errors
in some of the products by meta-analysts, ‘meta-analysis’ as a method
will not disappear. As a statistical tool it has inherent strengths and
weaknesses which should be understood by those carrying out such
analyses, and by readers of the conclusions of meta-analyses.
Furthermore, meta-analyses have provided a large number of conclusions
that have been consistent with later findings.
The political and social consequences of meta-analysis have also
aroused concern. In particular, the Evidence-Based Medicine (EBM)
movement puts great weight on the semiofficial meta-analyses of the
Cochrane Collaboration (EBMWG 1992; Chalmers et al. 1992; Editorial
1992; Sackett 1994; Bero & Rennie 1995; Hill 2000; Cochrane
2005a). Feinstein and Horwitz (1997) concluded a thorough
critique with the comment that "The threat of official, corporate, or
private abuse will always remain, whenever any collection of
information has been prominently heralded as the ‘best available
evidence.’ A new form of dogmatic authoritarianism may then be revived
in modern medicine, but the pronouncements will come from Cochranian
Oxford rather than Galenic Rome." Shapiro (1994) was worried that
"Government departments will continue to make public health decisions,
often misguided ones, based on the results of meta-analyses." Bailar
(1995) commented that "A traditional narrative review can do much more
than estimate parameters, and the additions are critical to the
progress of science. Meta-analysis … is a poor tool for developing new
concepts, new hypotheses, and new methods of study… meta-analysis has
never been promoted as an alternative to thoughtful but unstructured
reading, but it may nevertheless carry the seeds of a diminished
respect for and a diminished role for simple browsing through the
primary literature."
References
Bailar JC (1995) The practice of meta-analysis. J Clin Epidemiol 48:149-57
Bailar JC (1999) Passive smoking, coronary heart disease, and
meta-analysis. N
Engl J Med 340:958-9 * comments in: (1999);341:697-700 ]
Bero L, Rennie D (1995) The Cochrane Collaboration: preparing,
maintaining, and disseminating systematic reviews of the effects of
health care. JAMA
274:1935-8
Cantekin EI (1994) Antibiotics to prevent acute otitis media and to
treat otitis media with effusion [letter]. JAMA 272:203-4
Cantekin EI (1998) Aggressive and ineffective therapy for otitis media.
Otorhinolaryngol Nova 8:136-47
Cantor KP (1994) Water chlorination, mutagenicity, and cancer
epidemiology [editorial]. Am
J Public Health 84:1211-3
Chalmers I, Dickersin K, Chalmers TC (1992) Getting to grips with
Archie Cochrane’s agenda. BMJ
305:786-8
Chalmers TC, Matta RJ, Smith H, Kunzler AM (1977) Evidence favoring the
use of anticoagulants in the hospital phase of acute myocardial
infarction. N Engl J Med 297:1091-6
Clarke M, Oxman AD, eds (2002) Cochrane Reviewers’ Handbook 4.1.5. In:
The Cochrane Library, Issue 2, 2002. Oxford: Update Software.
Fleiss JL (1993) The statistical basis of meta-analysis. Stat Meth Med
Res 2:121-45
Goldman L, Feinstein AR (1979) Anticoagulants and myocardial
infarction: the problems of pooling, drowning, and floating. Ann Intern
Med 90:92-4
Goldstein PA, Sacks HS, Chalmers TC (1989) Hormone administration for
the maintenance of pregnancy. In: Effective Care in Pregnancy and
Childbirth. Vol. I: Pregnancy [Chalmers et al., eds]. Oxford, UK:
Oxford University Press. pp 612-23
Greenland S (1987) Quantitative methods in the review of epidemiologic
literature. Epidemiol Rev 9:1-30
Higgins JPT, Green S, eds (2005) Cochrane Handbook for Systematic
Reviews of Interventions 4.2.5 [updated May 2005]. In: The Cochrane
Library, Issue 2, 2005. Chichester, UK: John Wiley & Sons Ltd.
Katerndahl DA, Lawler WR (1999) Variability in meta-analytic results
concerning the value of cholesterol reduction in coronary heart
disease: a meta-meta-analysis. Am
J Epidemiol 149:429-41
Klein DF (2000) Flawed meta-analyses comparing psychotherapy with
pharmacotherapy. Am J Psychiatry
157:1204-11 * comments in: (2001);158:1164-6
Kunz R, Oxman AD (1998) The unpredictability paradox: review of
empirical comparisons of randomized and non-randomized clinical trials.
BMJ
317:1185-90BMJ
Laird NM, Mosteller F (1990) Some statistical methods for combining
experimental results. Int J Technol Assess Health Care 6:5-30
Longnecker MP, Berlin JA, Orza MJ, Chalmers TC (1988) A meta-analysis
of alcohol consumption in relation to risk of breast cancer. JAMA 260:652-6
* comments in: Rosenberg (1989), Shapiro (1994, 1997),
Bailar (1995)
MacArthur C, Foran PJ, Bailar JC (1995) Qualitative assessment of
studies included in a meta-analysis: DES and the risk of pregnancy
loss. J
Clin Epidemiol 48:739-47
Shapiro S (1994) Meta-analysis/Shmeta-analysis. Am J Epidemiol
140:771-8 * comments in: (1994);140:779-91; (1995);142:779-92
Shapiro S (1997) Is meta-analysis a valid approach to the evaluation of
small effects in observational studies? J Clin
Epidemiol 50:223-9
Sokal RR, Rohlf FJ (1981) Combining probabilities from tests of
significance. In: Biometry, 2nd edn. San Francisco: Freeman. pp 779-82
Spitzer W (1991) Meta-meth-analysis: unanswered questions about
aggregating data. J Clin
Epidemiol 44:103-7
Williams RL, Chalmers TC, Stange KC, et al. (1993) Use of antibiotics
in preventing recurrent acute otitis media and in treating otitis media
with effusion: a meta-analytic attempt to resolve the brouhaha. JAMA 270:1344-51
* correction: (1993);271:430 ; comments in: (1994);271:430 ; Cantekin
(1994, 1998)