Some examples of problems with meta-analysis

by Harri Hemilä

This text is based on pp 28-30 of Hemilä (2006)
These documents have up to date links to documents that are available via the net.
Harri Hemilä
Department of Public Health
University of Helsinki, Helsinki, Finland


These file are at:

Version May 29, 2012

The most severe problems of meta-analysis are related to the experimental similarity of the studies that are combined and their validity. ‘Combining apples and oranges’ has been commonly used as a metaphor to describe this problem, but often the meta-analytic mixtures are so heterogeneous that ‘combining rotten fruits’ might sometimes be a more appropriate way to describe the problem (Feinstein 1995).

Improper consideration of the experimental features of trials is illustrated by an early, often-cited meta-analysis that combined the findings of 6 small randomized trials examining the value of anticoagulants in acute myocardial infarction (Chalmers et al. 1977); this was Thomas Chalmers’ third meta-analysis. Of the 6 randomized trials included, 2 contained no criteria for the diagnosis of myocardial infarction, and in 3 others the published definitions were so inexact that the patient populations could not be reproducibly identified (Goldman & Feinstein 1979). The treatments also varied from heparin alone to heparin with warfarin or warfarin derivatives and even warfarin with optional heparin, but the modes of action of heparin and warfarin are different and it should not be assumed that these treatments are pharmacologically equivalent enough to be combined in a meta-analysis (Goldman & Feinstein 1979).

In another example of the lack of properly considering the experimental aspects of controlled trials included in meta-analyses, Bailar (1995; MacArthur et al. 1995) discussed one meta-analysis by the Chalmers group of 6 trials dealing with the effects of diethylstilbestrol on the outcome of pregnancy (Goldstein et al. 1989). One of the 6 trials did not deal with diethylstilbestrol at all, and 3 others were methodologically flawed enough to destroy any credibility in their reported findings. Two studies appeared to have had adequate methodological strength, but one dealt with a series of pregnant women from the general populations, while the other was limited to pregnant women who had diabetes. It seems questionable, at best, to pool results from these last two studies without some thoughtful discussion (Bailar 1995). Furthermore, although Goldstein et al. had excluded one trial based on the probable non-alternate assignment of treatment as well as inconsistencies in the text, Bailar identified the same problems in 3 of 5 trials that Goldstein et al. did include. Considering the various shortcomings, Bailar concluded that the ‘typical odds ratio’ calculated by Goldstein et al. (1989) was meaningless.

In a further meta-analysis by the Chalmers group, 9 trials with 744 participants were combined to form a pooled estimate of antibiotic prophylaxis for recurrent acute otitis media (Williams et al. 1993). "Of those [nine] studies, 2 were on special populations, Alaskan Eskimos and asthmatics involving 388 children. They are nonrepresentative groups of the general population because of differences in severity and pathogenesis. Williams et al. also included 3 small crossover trials which used sulfisoxazole (a drug not recommended for extended use). Excluding those studies, there were only 4 RCTs with a combined population of 235 patients [in contrast to Williams’ 744 patients]. In that group, the rate difference was only 0.067 [episodes per month; in contrast to Williams’ estimate of 0.11]" (Cantekin 1994, 1998).

In addition to the problems in the Goldstein et al. meta-analysis (1989), Bailar (1995) also pointed out serious shortcomings in 4 other meta-analyses. Klein (2000) pointed out severe flaws in 4 meta-analyses comparing psychotherapy with pharmacotherapy. Indeed, there are numerous examples of unreliable meta-analyses.

The concept of meta-analysis seems to imply objectivity, but the selection criteria can vary substantially when different research groups carry out meta-analysis of the ‘same’ topic. In a ‘meta-meta-analysis’ Katerndahl and Lawler (1999) analyzed 23 meta-analyses that had examined the value of cholesterol reduction in coronary heart disease, finding substantial variation in the meta-analyses and their conclusions. Similarly, Prins and Buller (1996) discussed the divergent findings in 4 meta-analyses on the preferred dosage of aminoglycosides and concluded that "The physician can only follow the conclusion of the meta-analysis most closely in accordance with his or her own beliefs." The divergent and incompatible conclusions of the first three meta-analyses on vitamin C and the common cold were mentioned above (p 28 of Hemilä 2006).

Although many meta-analyses of controlled trials are problematic, the problems are even greater in meta-analyses of non-experimental studies, since there may be consistent biases in the studies included. A meta-analysis on the association between chlorination of drinking water and cancer risk by the Chalmers group (Morris et al. 1992) included such severely biased studies that, independently, Bailar (1995) and Shapiro (1997) pointed out various problems. Also, Cantor (1994) commented that "A recent meta-analysis [by Morris et al.] … has probably confused the situation. This exercise may have been premature since most of the input data came from studies with (1) inadequate control of confounding and other sources of bias, and (2) highly limited estimates of historical exposure to drinking water contaminants."

A meta-analysis of the association between alcohol consumption and breast cancer by the Chalmers group (Longnecker et al. 1988) included studies that had such severe methodological defects, that the meta-analysis was considered seriously misleading by Shapiro (1994, 1997; see also Rosenberg 1989), who published both the initial association which led to the series of studies, and finally the large study finding a null result. Because of the large variety of potential biases in non-experimental studies, Shapiro (1997) considered that "The meta-analysis of nonrandomized observational studies resembles the attempt of a quadriplegic person to climb Mount Everest unaided."

The main limitations and challenges of meta-analysis are related to the experimental issues. The statistical methods available for combining P-values or data on individual studies are well established (e.g., Greenland 1987, 1998; Laird & Mosteller 1990; Fleiss 1993; Higgins & Green 2005 ss 8.6 to 8.8), but there are examples of demonstrably invalid methods of analysis even in some of the influential meta-analyses. In one meta-analysis, the average case fatality percentage was calculated for 6 trials, without using any weight, even though the number of participants in the 6 trials varied from 53 to 1,427; i.e., close to 30 fold (Chalmers et al. 1977). In fact, since the large trials found a considerably smaller benefit than the small trials, combining the actual data instead of percentages led to a substantially smaller difference (by 30%) between the pooled study groups (Goldman & Feinstein 1979). The Chalmers (1977) meta-analysis was cited in a recent systematic review comparing randomized and non-randomized trials (Kunz & Oxman 1998), without noting its severe methodological problems, yet Oxman was an editor of the previous edition of the Cochrane Reviewers’ Handbook, which commented that "interpretation of results is dependent upon the validity of the included studies" (Clarke & Oxman 2002 s 6).

Lack of basic arithmetic is also seen in a meta-analysis of vitamin C and the common cold in which treatment effects on ‘duration in days’ were averaged without considering either the differences in the size of the trials or the large variations in the cold duration in the control groups (Chalmers 1975; see pp 36-8). Sometimes meta-analysts are not familiar with the standard methods either; for example, in his vitamin C common cold meta-analysis, Chalmers (1975) claimed that in the earlier meta-analysis on vitamin C and the common cold, Pauling (1971a) had "averaged ‘p’ values from the different studies." However, in his statistical analysis Pauling used the well-established Fisher method of calculating the combined P-value from several independent P-values (1938; Sokal & Rohlf 1981; Laird & Mosteller 1990), which cannot be described as naive ‘averaging.’ Furthermore, Chalmers himself used the same Fisher method two years after his criticism (Chalmers et al. 1977).

Because of many published meta-analyses were not properly performed, several experts have been rather skeptical as to its usefulness (Meinert 1989; Spitzer 1991; Feinstein 1995; Feinstein & Horwitz 1997; Eysenck 1994; Bailar 1995, 1997a,b, 1999; Shapiro 1994, 1997). Bailar (1995) commented that "Meta-analysis has been seized with enthusiasm by many scientists not trained in statistics and cognate sciences, and it is clear in conversations that many of them have utterly unrealistic views about its scope and power." Shapiro (1997) noted that "I think there is something profoundly amiss in the uncritical way in which the epidemiologists, and indeed the medical profession as a whole, have allowed themselves to be seduced by the numerological abradacabra of meta-analysis." Meinert (1989) commented that "There are no easy, inexpensive answers to complex questions and attempts to substitute small trials and meta-analysis for large trials is illusionary and detrimental to both medicine and clinical trials." In an editorial in a major journal, Bailar (1997a) further commented that "In my own review of selected meta-analyses, problems were so frequent and so serious … that it was difficult to trust the overall ‘best estimates’ … I still prefer conventional narrative reviews of the literature, a type of summary familiar to readers of the countless review articles on important medical issues." Although such strong comments can be understood against the background of the severe errors in some of the products by meta-analysts, ‘meta-analysis’ as a method will not disappear. As a statistical tool it has inherent strengths and weaknesses which should be understood by those carrying out such analyses, and by readers of the conclusions of meta-analyses. Furthermore, meta-analyses have provided a large number of conclusions that have been consistent with later findings.

The political and social consequences of meta-analysis have also aroused concern. In particular, the Evidence-Based Medicine (EBM) movement puts great weight on the semiofficial meta-analyses of the Cochrane Collaboration (EBMWG 1992; Chalmers et al. 1992; Editorial 1992; Sackett 1994; Bero & Rennie 1995; Hill 2000; Cochrane 2005a).  Feinstein and Horwitz (1997) concluded a thorough critique with the comment that "The threat of official, corporate, or private abuse will always remain, whenever any collection of information has been prominently heralded as the ‘best available evidence.’ A new form of dogmatic authoritarianism may then be revived in modern medicine, but the pronouncements will come from Cochranian Oxford rather than Galenic Rome." Shapiro (1994) was worried that "Government departments will continue to make public health decisions, often misguided ones, based on the results of meta-analyses." Bailar (1995) commented that "A traditional narrative review can do much more than estimate parameters, and the additions are critical to the progress of science. Meta-analysis … is a poor tool for developing new concepts, new hypotheses, and new methods of study… meta-analysis has never been promoted as an alternative to thoughtful but unstructured reading, but it may nevertheless carry the seeds of a diminished respect for and a diminished role for simple browsing through the primary literature."


Bailar JC (1995) The practice of meta-analysis. J Clin Epidemiol 48:149-57

Bailar JC (1997a) The promise and problems of meta-analysis. N Engl J Med 337:559-61

Bailar JC (1997b) Assessing assessments. Science 277:528-9

Bailar JC (1999) Passive smoking, coronary heart disease, and meta-analysis. N Engl J Med 340:958-9  * comments in: (1999);341:697-700 ]

Bero L, Rennie D (1995) The Cochrane Collaboration: preparing, maintaining, and disseminating systematic reviews of the effects of health care. JAMA 274:1935-8

Cantekin EI (1994) Antibiotics to prevent acute otitis media and to treat otitis media with effusion [letter]. JAMA 272:203-4

Cantekin EI (1998) Aggressive and ineffective therapy for otitis media. Otorhinolaryngol Nova 8:136-47

Cantor KP (1994) Water chlorination, mutagenicity, and cancer epidemiology [editorial]. Am J Public Health 84:1211-3

Chalmers I, Dickersin K, Chalmers TC (1992) Getting to grips with Archie Cochrane’s agenda. BMJ 305:786-8

Chalmers TC (1975) Effects of ascorbic acid on the common cold: an evaluation of the evidence. Am J Med 58:532-6  ***  SEE PROBLEMS OF CHALMERS' REVIEW

Chalmers TC, Matta RJ, Smith H, Kunzler AM (1977) Evidence favoring the use of anticoagulants in the hospital phase of acute myocardial infarction. N Engl J Med 297:1091-6

Clarke M, Oxman AD, eds (2002) Cochrane Reviewers’ Handbook 4.1.5. In: The Cochrane Library, Issue 2, 2002. Oxford: Update Software.

Cochrane (2005a) The Cochrane Collaboration.

EBMWG [Evidence-Based Medicine Working Group] (1992) Evidence-Based Medicine: a new approach to teaching the practice of medicine. JAMA 268:2420-5

Editorial (1992) Cochrane’s legacy. Lancet 340:1131-2 

Eysenck HJ (1994) Meta-analysis and its problems. BMJ 309:789-92

Feinstein AR (1995) Meta-analysis: statistical alchemy for the 21st century. J Clin Epidemiol 48:71-9

Feinstein AR, Horwitz RI (1997) Problems in the “evidence” of “Evidence-Based Medicine.” Am J Med 103:529-35 

Fisher RA (1938) Statistical Methods for Research Workers, 7th edn. London: Oliver and Boyd. pp 104-6   * 1925 edition in net

Fisher RA (1948) Combining independent tests of significance.  American Statistician 2;(5):30

Fleiss JL (1993) The statistical basis of meta-analysis. Stat Meth Med Res 2:121-45

Goldman L, Feinstein AR (1979) Anticoagulants and myocardial infarction: the problems of pooling, drowning, and floating. Ann Intern Med 90:92-4

Goldstein PA, Sacks HS, Chalmers TC (1989) Hormone administration for the maintenance of pregnancy. In: Effective Care in Pregnancy and Childbirth. Vol. I: Pregnancy [Chalmers et al., eds]. Oxford, UK: Oxford University Press. pp 612-23

Greenland S (1987) Quantitative methods in the review of epidemiologic literature. Epidemiol Rev 9:1-30

Higgins JPT, Green S, eds (2005) Cochrane Handbook for Systematic Reviews of Interventions 4.2.5 [updated May 2005]. In: The Cochrane Library, Issue 2, 2005. Chichester, UK: John Wiley & Sons Ltd.

Hill GB (2000) Archie Cochrane and his legacy. J Clin Epidemiol 53:1189-92

Katerndahl DA, Lawler WR (1999) Variability in meta-analytic results concerning the value of cholesterol reduction in coronary heart disease: a meta-meta-analysis. Am J Epidemiol 149:429-41

Klein DF (2000) Flawed meta-analyses comparing psychotherapy with pharmacotherapy. Am J Psychiatry 157:1204-11  * comments in: (2001);158:1164-6 

Kunz R, Oxman AD (1998) The unpredictability paradox: review of empirical comparisons of randomized and non-randomized clinical trials. BMJ 317:1185-90 BMJ 

Laird NM, Mosteller F (1990) Some statistical methods for combining experimental results. Int J Technol Assess Health Care 6:5-30

Longnecker MP, Berlin JA, Orza MJ, Chalmers TC (1988) A meta-analysis of alcohol consumption in relation to risk of breast cancer. JAMA 260:652-6  * comments in: Rosenberg (1989), Shapiro (1994, 1997), Bailar (1995)

MacArthur C, Foran PJ, Bailar JC (1995) Qualitative assessment of studies included in a meta-analysis: DES and the risk of pregnancy loss. J Clin Epidemiol 48:739-47

Meinert CL (1989) Meta-analysis: science or religion? Cont Clin Trials 10:257S-63S

Morris RD, Audet AM, Angelillo IF, Chalmers TC, Mosteller F (1992) Chlorination, chlorination by-products, and cancer: a meta-analysis. Am J Publ Health 82:955-63 AJPH  * correction: (1993);83:1257; comments in: Cantor (1994), Shapiro (1997), Bailar (1995) 

Pauling L (1971a) The significance of the evidence about ascorbic acid and the common cold. Proc Natl Acad Sci USA 68:2678-81  PMC  *  SEE PROBLEMS OF PAULING'S REVIEW

Prins JM, Buller HR (1996) Meta-analysis: the final answer, or even more confusion? [letter]. Lancet 348:199

Rosenberg L (1989) Meta-analysis of alcohol and risk of breast cancer [letter]. JAMA 261:383

Sackett DL (1994) Cochrane collaboration. BMJ 309:1514-5

Shapiro S (1994) Meta-analysis/Shmeta-analysis. Am J Epidemiol 140:771-8 * comments in: (1994);140:779-91; (1995);142:779-92 

Shapiro S (1997) Is meta-analysis a valid approach to the evaluation of small effects in observational studies? J Clin Epidemiol 50:223-9

Sokal RR, Rohlf FJ (1981) Combining probabilities from tests of significance. In: Biometry, 2nd edn. San Francisco: Freeman. pp 779-82

Spitzer W (1991) Meta-meth-analysis: unanswered questions about aggregating data. J Clin Epidemiol 44:103-7

Williams RL, Chalmers TC, Stange KC, et al. (1993) Use of antibiotics in preventing recurrent acute otitis media and in treating otitis media with effusion: a meta-analytic attempt to resolve the brouhaha. JAMA 270:1344-51 * correction: (1993);271:430 ; comments in: (1994);271:430 ; Cantekin (1994, 1998)

Copyright: © 2006-2009 Harri Hemilä. This text is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.  

Creative Commons License
Vitamin C and infections in animals by Harri Hemilä is licensed under a Creative Commons Attribution 1.0 Finland License.
Based on a work at

Valid HTML 4.01 Transitional