This text is
based on pp 30-32 of Hemilä
(2006)
These documents have up to date links to documents that are available
via
the net.
Harri Hemilä
Department of Public Health
University of Helsinki,
Helsinki, Finland
harri.hemila@helsinki.fi
The internal validity of the studies included in a meta-analysis is a
relevant concern. For example, substantial baseline differences in the
treatment groups were found in therapeutic trials that used
‘non-random’ assignment of participants to the treatment groups, and
‘unblinded randomization’ led to substantial baseline differences,
whereas ‘blinded randomization’ led to relatively similar baseline
variable levels (Chalmers et al. 1983). Because of such severe problems
with non-randomized studies, the advocates of EBM have suggested that
"If you find that the study was not randomized, we’d suggest that you
stop reading it and go on to the next article" (Sackett et al. 1997 p
94).
If such an opinion became common it would completely transform current
medicine, since probably not even extensive literature searches would
reveal randomized trials supporting the widespread beliefs that
smoking, high-level alcohol usage, and overweight increase the risk of
poor health. It is also inappropriate to require that therapeutic
conclusions should be based simply on randomized trials. For example,
Sir Austin Bradford Hill, who designed the first modern randomized
controlled trial (Doll 1992, 1998; Yoshioka 1998; Hampton 2002;
Armitage 2003), commented that "Any belief that the controlled trial is
the only way [to study therapeutic efficacy] would mean not that the
pendulum had swung too far but that it had come right off its hook"
(Hill 1966).
In any case, because of the problems related to the study quality,
Chalmers et al. (1981) proposed a quality scale to assess the validity
of trials. About two dozen further ‘quality scales’ have since been
devised. The scoring systems, however, have various shortcomings
(Higgins & Green 2005 ss 6.7 to 6.11). Scoring is based on whether
something was reported rather than whether it was done appropriately in
the study. For example, if the original investigators explicitly stated
criteria for the diagnosis of ‘congestive heart failure’ the trial is
given ‘quality points’ because of the explicit definition. However, if
‘congestive heart failure’ is defined as ‘use of digitalis’ the
evidence is of poor scientific quality and is clinically silly, but
still gets the ‘quality score points’ because of the explicit
definition (Feinstein 1995). A recent survey requesting the technical
features directly from the investigators found that in many cases
randomization and allocation concealment were appropriate although they
were not properly described in the study reports, so that Hill et al.
(2002) concluded that it is likely to be inappropriate to characterize
the quality of randomized controlled trials as ‘good’ or ‘poor’ on the
basis of the published report.
Furthermore, many scores also contain items that are not directly
related to validity, such as whether a power calculation was done
(related to precision and not validity) or whether the inclusion and
exclusion criteria were clearly described (related to applicability and
not validity) (Higgins & Green 2005 s 6.7). In a recent comparison,
the summary quality scores were not significantly associated with
treatment effects, indicating that the relevant methodological aspects
should be assessed individually (Jüni et al. 1999). In another
recent meta-analysis of 276 trials, double blinding and allocation
concealment, two quality measures that are frequently used in
meta-analyses, were not associated with treatment effect (Balk et al.
2002).
It has been argued that quality scoring is based on subjective
assignment of points based on features of the studies, and quality
scoring submerges important information by combining disparate study
features into a single score (Greenland 1998). "It also introduces an
unnecessary and somewhat arbitrary subjective element into the analysis
via the scoring scheme. Quality scoring can and should be replaced by
direct categorical and regression analyses of the impact of each
quality item. Such item-specific analyses let the data, rather than the
investigator, indicate the importance of each item in determining the
estimated effect."
Shapiro (1997) commented that "Who are these meta-analysts, sitting on
high, to decide for the rest of us what is and is not good quality, and
then to measure it? Quality is best evaluated qualitatively: as opposed
to meta-analysis, in any adequate qualitative review, we require that
the author should give reasons for judging the quality of any given
study as good or bad in transparent and easily comprehensible language.
It is then up to the reader to decide whether he agrees or disagrees."
It is desirable to use a placebo in controlled trials to increase their
internal validity. However, a recent meta-analysis of studies comparing
a placebo group to a no-treatment group found that there was no placebo
effect in studies with binary outcomes and, among studies with
continuous outcomes, only those that measured pain showed evidence of
the placebo effect (Hrobjartsson & Gøtzsche 2001, 2004).
Consequently, lack of a placebo should not lead to the mechanical
exclusion of a trial from a meta-analysis, since the relevance of the
placebo depends on the topic.
Because of the various problems of ‘quality scores,’ the current
version of Cochrane Reviewers’ Handbook suggests that "Reviewers should
avoid the use of ‘quality scores’ and undue reliance on detailed
quality assessments. It is not supported by empirical evidence, it can
be time-consuming, and it is potentially misleading" (Higgins &
Green 2005 s 6.11). Thus, it is not reasonable to employ a rigid
mechanical algorithm to discard ‘low quality score’ studies from
meta-analysis. The features related to validity should rather be
considered case by case because the relevant features depend on the
particular scientific question. One type of ‘quality scale’ was used in
the fourth meta-analysis on vitamin C and the common cold for selecting
‘high quality’ trials for deeper analysis (Kleijnen et al. 1989; see pp
38- 41 in Hemilä). Also, one kind of ‘quality scale’ was used in a
recent review on echinacea and the common cold when selecting two
‘best’ trials on which the conclusions were based (Caruso &
Gwaltney 2005 [see Hemilä 2005a]).
Although randomization is a feasible method of allocating participants
in most controlled trials, it seems that the problems caused by the
lack of randomization have been grossly exaggerated. For example,
Thomas Chalmers’ much cited classical study (1983) suggesting that
‘blinded random allocation’ leads to smaller treatment effects than
‘nonrandom assignment’ was itself severely biased. The group of
‘blinded randomization’ trials contained 9 trials about beta-blockers
and 0 trials about coronary care units. In contrast, the group of
‘non-random assignment’ trials contained 1 trial about beta-blockers
and 11 trials about coronary care units. With such extremely biased
distribution of study topics between ‘random’ and ‘non-random’
allocation groups, it is not reasonable to assume that the method of
allocation is the only reason for the difference between the findings
in the two groups, even though Chalmers et al. (1983) did so. For
example, they presented the ‘results of trials in terms of
case-fatality rates’ by the method of allocation without stratifying by
the topic of the trials; there are probably substantial base-line
differences between the participants in beta-blocker trials and
coronary care unit trials. Some other tables in Chalmers et al. (1983)
are also misleading as pointed out earlier (Gillman & Runyan 1984).
In spite of the severe methodological shortcomings, the Chalmers et al.
paper (1983) has been extensively cited, e.g., by EBM proponents when
claiming that "Studies in which treatment is allocated by any method
other than randomization tend to show larger (and frequently
false-positive) treatment effects than do randomized trials" (Guyatt,
Sackett, Cook 1993), "Less rigorous studies tend to overestimate the
effectiveness of therapeutic and preventive interventions" (Oxman et
al. 1994), and "Because the potential for bias is much greater in
cohort and case-control studies than in RCTs, recommendations from
overviews combining observational studies will be much weaker" (Guyatt,
Sackett, et al. 1995). Thus, in this case the EBM advocates did not
read critically the paper they cited, although they emphasize the
importance of critical reading elsewhere (e.g., Sackett et al. 1997 pp
79-156).
The Chalmers 1983 paper was also cited in a recent systematic review
comparing randomized and non-randomized trials drawing the conclusion
"direction of bias: overestimation of effect" (Kunz & Oxman 1998)
which makes no sense considering the extremely biased distribution of
study topics between the ‘blinded randomization’ and ‘non-random
assignment’ groups mentioned above. Furthermore, the Chalmers 1983
paper was cited in the Cochrane Reviewers’ Handbook without paying
attention to its lack of validity (Clarke & Oxman 2002 ss 4.2 and
6.3), although the Handbook does comment that "Interpretation of
results is dependent upon the validity of the included studies" (Clarke
& Oxman 2002 s 6), and a guideline-paper for readers of reviews
also stated that "Authors will come to correct conclusions only if they
accurately assess the validity of the primary studies on which the
review is based" (Oxman & Guyatt 1988).
A recent comparison of randomized controlled trials with observational
studies on 19 different treatments found that the estimates of
treatment effects from the controlled trials and the observational
studies were similar. In only 2 of the 19 analyses did the combined
magnitude of the treatment effect in observational studies lie outside
the 95% CI of the pooled estimate of the controlled trials (Benson
& Hartz 2000). Another analysis of 5 clinical topics also found
that the average results of observational studies were remarkably
similar to those of controlled trials (Concato et al. 2000). Both of
these two analyses were motivated by the overemphasis on randomization
by EBM advocates. Furthermore, a recent analysis of a large set of
studies focusing on cirrhosis and hepatitis saw no difference between
nonrandomized studies and randomized trials in the ‘20-year survival of
conclusions’ derived from these studies (Poynard et al. 2002).
Balk EM, Bonis PAL, Moskowitz H, et al. (2002) Correlation of quality
measures with estimates of treatment effect in meta-analyses of
randomized controlled trials. JAMA 287:2973-82
[comments in: (2002);288:2406-9 ]
Benson K, Hartz AJ (2000) A comparison of observational studies and
randomized, controlled trials. N Engl J Med
342:1878-86 [comments in: (2000);342:1907-9; (2000);343:1194-7 ]
Caruso TJ, Gwaltney JM Jr (2005)
Treatment of the common cold with
echinacea: a structured review.
Clin Infect Dis 40:807-10
* comments
in: Hemilä
2005a
Chalmers TC, Celano P, Sacks HS, Smith H (1983) Bias in treatment
assignment in controlled clinical trials. N Engl J Med 309:1358-61
[comments in: Gillman & Runyan (1984) ]
Chalmers TC, Smith H, Blackburn B, et al. (1981) A method for assessing
the quality of a randomized control trial. Cont Clin
Trials 2:31-49
Concato J, Shan N, Horwitz RI (2000) Randomized, controlled trials,
observational studies, and the hierarchy of research designs. N Engl J Med
342:1887-92 [comments in: (2000);342:1907-9; (2000);343:1194-7 ]
Clarke M, Oxman AD, eds (2002) Cochrane Reviewers’ Handbook 4.1.5. In:
The Cochrane Library, Issue 2, 2002. Oxford: Update Software.
Doll R (1992) Sir Austin Bradford Hill and the progress of medical
science. BMJ
305:1521-6
Greenland S (1998) Meta-analysis. In: Rothman & Greenland (1998)
Modern Epidemiology, 2nd edn. Boston, MA: Lippincott Williams Wilkins,
pp 643-73
Gillman MW, Runyan DK (1984) Bias in treatment assignment in controlled
clinical trials [letter]. N Engl J Med 310:1610-1
Guyatt GH, Sackett DL, Cook DJ (1993) Users’ guides to the medical
literature: how to use an article about therapy or prevention. JAMA 270:2598-601
Guyatt GH, Sackett DL, Sinclair MD, et al. (1995) Users’ guides to the
medical literature: a method for grading health care recommendations. JAMA 274:1800-4
Hampton JR (2002) Evidence-based medicine, opinion-based medicine, and
real-world medicine. Persp
Biol Med 45:549-68
Hemilä H (2005a)
Echinacea, vitamin C, the common cold, and blinding [letter]. Clin
Infect Dis
41:762-3 * comments on:
Caruso & Gwaltney
(2005)
Higgins JPT, Green S, eds (2005) Cochrane Handbook for Systematic
Reviews of Interventions 4.2.5 [updated May 2005]. In: The Cochrane
Library, Issue 2, 2005. Chichester, UK: John Wiley & Sons Ltd.
Hill AB (1966) Reflections on the controlled trial. Ann Rheum Dis
25:107-13
Hill CL, LaValley MP, Felson DT (2002) Discrepancy between published
report and actual conduct of randomized clinical trials. J Clin
Epidemiol 55:783-6 7
Hrobjartsson A, Gøtzsche PC (2001) Is the placebo powerless? An
analysis of clinical trials comparing placebo with no treatment. N Engl J Med
344:1594-602 *correction: (2001);345:304 * comments in:
(2001);344:1630-2; (2001);345:1276-9 *** see update: J Internal
Med (2004);256:91-100
Jüni P, Witschi A, Bloch R, Egger M (1999) The hazards of scoring
the quality of clinical trials for meta-analysis. JAMA 282:1054-60
* comments in: (1999);282:1083-5; (2000);283:1421-3 ]
Kunz R, Oxman AD (1998) The unpredictability paradox: review of
empirical comparisons of randomized and non-randomized clinical trials.
BMJ
317:1185-90BMJ
Oxman AD, Cook DJ, Guyatt GH, et al. (1994) Users’ guides to the
medical literature: how to use an overview. JAMA 272:1367-71
Poynard T, Munteanu M, Ratziu V, et al. (2002) Truth survival in
clinical research: an evidence-based requiem? Ann Intern Med
136:888-95 * comments in: (2002):137:932
Sackett DL, Richardson WS, Rosenberg WS, Haynes RB (1997)
Evidence-Based Medicine: How to Practice and Teach EBM. NY: Churchill
Livingstone [book review: JAMA (1997);278:168-9; JAMA
(2000);284:2382-3 ; BMJ (1996);313:1410 BMJ
; Can Med Assoc J
(1997);157:788
Shapiro S (1997) Is meta-analysis a valid approach to the evaluation of
small effects in observational studies? J Clin
Epidemiol 50:223-9
Yoshioka A (1998) Use of randomization in the Medical Research
Council’s clinical trial of streptomycin in pulmonary tuberculosis in
the 1940s. BMJ
317:1220-3BMJ