Quality of studies included

by Harri Hemilä

This text is based on pp 30-32 of Hemilä (2006)
These documents have up to date links to documents that are available via the net.

Harri Hemilä
Department of Public Health
University of Helsinki, Helsinki, Finland
harri.hemila@helsinki.fi

Home: http://www.mv.helsinki.fi/home/hemila

These file are at: http://www.mv.helsinki.fi/home/hemila/metaanalysis

Version May 29, 2012

The internal validity of the studies included in a meta-analysis is a relevant concern. For example, substantial baseline differences in the treatment groups were found in therapeutic trials that used ‘non-random’ assignment of participants to the treatment groups, and ‘unblinded randomization’ led to substantial baseline differences, whereas ‘blinded randomization’ led to relatively similar baseline variable levels (Chalmers et al. 1983). Because of such severe problems with non-randomized studies, the advocates of EBM have suggested that "If you find that the study was not randomized, we’d suggest that you stop reading it and go on to the next article" (Sackett et al. 1997 p 94).

If such an opinion became common it would completely transform current medicine, since probably not even extensive literature searches would reveal randomized trials supporting the widespread beliefs that smoking, high-level alcohol usage, and overweight increase the risk of poor health. It is also inappropriate to require that therapeutic conclusions should be based simply on randomized trials. For example, Sir Austin Bradford Hill, who designed the first modern randomized controlled trial (Doll 1992, 1998; Yoshioka 1998; Hampton 2002; Armitage 2003), commented that "Any belief that the controlled trial is the only way [to study therapeutic efficacy] would mean not that the pendulum had swung too far but that it had come right off its hook" (Hill 1966).

In any case, because of the problems related to the study quality, Chalmers et al. (1981) proposed a quality scale to assess the validity of trials. About two dozen further ‘quality scales’ have since been devised. The scoring systems, however, have various shortcomings (Higgins & Green 2005 ss 6.7 to 6.11). Scoring is based on whether something was reported rather than whether it was done appropriately in the study. For example, if the original investigators explicitly stated criteria for the diagnosis of ‘congestive heart failure’ the trial is given ‘quality points’ because of the explicit definition. However, if ‘congestive heart failure’ is defined as ‘use of digitalis’ the evidence is of poor scientific quality and is clinically silly, but still gets the ‘quality score points’ because of the explicit definition (Feinstein 1995). A recent survey requesting the technical features directly from the investigators found that in many cases randomization and allocation concealment were appropriate although they were not properly described in the study reports, so that Hill et al. (2002) concluded that it is likely to be inappropriate to characterize the quality of randomized controlled trials as ‘good’ or ‘poor’ on the basis of the published report.

Furthermore, many scores also contain items that are not directly related to validity, such as whether a power calculation was done (related to precision and not validity) or whether the inclusion and exclusion criteria were clearly described (related to applicability and not validity) (Higgins & Green 2005 s 6.7). In a recent comparison, the summary quality scores were not significantly associated with treatment effects, indicating that the relevant methodological aspects should be assessed individually (Jüni et al. 1999). In another recent meta-analysis of 276 trials, double blinding and allocation concealment, two quality measures that are frequently used in meta-analyses, were not associated with treatment effect (Balk et al. 2002).

It has been argued that quality scoring is based on subjective assignment of points based on features of the studies, and quality scoring submerges important information by combining disparate study features into a single score (Greenland 1998). "It also introduces an unnecessary and somewhat arbitrary subjective element into the analysis via the scoring scheme. Quality scoring can and should be replaced by direct categorical and regression analyses of the impact of each quality item. Such item-specific analyses let the data, rather than the investigator, indicate the importance of each item in determining the estimated effect."

Shapiro (1997) commented that "Who are these meta-analysts, sitting on high, to decide for the rest of us what is and is not good quality, and then to measure it? Quality is best evaluated qualitatively: as opposed to meta-analysis, in any adequate qualitative review, we require that the author should give reasons for judging the quality of any given study as good or bad in transparent and easily comprehensible language. It is then up to the reader to decide whether he agrees or disagrees."

It is desirable to use a placebo in controlled trials to increase their internal validity. However, a recent meta-analysis of studies comparing a placebo group to a no-treatment group found that there was no placebo effect in studies with binary outcomes and, among studies with continuous outcomes, only those that measured pain showed evidence of the placebo effect (Hrobjartsson & Gøtzsche 2001, 2004). Consequently, lack of a placebo should not lead to the mechanical exclusion of a trial from a meta-analysis, since the relevance of the placebo depends on the topic.

Because of the various problems of ‘quality scores,’ the current version of Cochrane Reviewers’ Handbook suggests that "Reviewers should avoid the use of ‘quality scores’ and undue reliance on detailed quality assessments. It is not supported by empirical evidence, it can be time-consuming, and it is potentially misleading" (Higgins & Green 2005 s 6.11). Thus, it is not reasonable to employ a rigid mechanical algorithm to discard ‘low quality score’ studies from meta-analysis. The features related to validity should rather be considered case by case because the relevant features depend on the particular scientific question. One type of ‘quality scale’ was used in the fourth meta-analysis on vitamin C and the common cold for selecting ‘high quality’ trials for deeper analysis (Kleijnen et al. 1989; see pp 38- 41 in Hemilä). Also, one kind of ‘quality scale’ was used in a recent review on echinacea and the common cold when selecting two ‘best’ trials on which the conclusions were based (Caruso & Gwaltney 2005 [see Hemilä 2005a]).

Although randomization is a feasible method of allocating participants in most controlled trials, it seems that the problems caused by the lack of randomization have been grossly exaggerated. For example, Thomas Chalmers’ much cited classical study (1983) suggesting that ‘blinded random allocation’ leads to smaller treatment effects than ‘nonrandom assignment’ was itself severely biased. The group of ‘blinded randomization’ trials contained 9 trials about beta-blockers and 0 trials about coronary care units. In contrast, the group of ‘non-random assignment’ trials contained 1 trial about beta-blockers and 11 trials about coronary care units. With such extremely biased distribution of study topics between ‘random’ and ‘non-random’ allocation groups, it is not reasonable to assume that the method of allocation is the only reason for the difference between the findings in the two groups, even though Chalmers et al. (1983) did so. For example, they presented the ‘results of trials in terms of case-fatality rates’ by the method of allocation without stratifying by the topic of the trials; there are probably substantial base-line differences between the participants in beta-blocker trials and coronary care unit trials. Some other tables in Chalmers et al. (1983) are also misleading as pointed out earlier (Gillman & Runyan 1984).

In spite of the severe methodological shortcomings, the Chalmers et al. paper (1983) has been extensively cited, e.g., by EBM proponents when claiming that "Studies in which treatment is allocated by any method other than randomization tend to show larger (and frequently false-positive) treatment effects than do randomized trials" (Guyatt, Sackett, Cook 1993), "Less rigorous studies tend to overestimate the effectiveness of therapeutic and preventive interventions" (Oxman et al. 1994), and "Because the potential for bias is much greater in cohort and case-control studies than in RCTs, recommendations from overviews combining observational studies will be much weaker" (Guyatt, Sackett, et al. 1995). Thus, in this case the EBM advocates did not read critically the paper they cited, although they emphasize the importance of critical reading elsewhere (e.g., Sackett et al. 1997 pp 79-156).

The Chalmers 1983 paper was also cited in a recent systematic review comparing randomized and non-randomized trials drawing the conclusion "direction of bias: overestimation of effect" (Kunz & Oxman 1998) which makes no sense considering the extremely biased distribution of study topics between the ‘blinded randomization’ and ‘non-random assignment’ groups mentioned above. Furthermore, the Chalmers 1983 paper was cited in the Cochrane Reviewers’ Handbook without paying attention to its lack of validity (Clarke & Oxman 2002 ss 4.2 and 6.3), although the Handbook does comment that "Interpretation of results is dependent upon the validity of the included studies" (Clarke & Oxman 2002 s 6), and a guideline-paper for readers of reviews also stated that "Authors will come to correct conclusions only if they accurately assess the validity of the primary studies on which the review is based" (Oxman & Guyatt 1988).

A recent comparison of randomized controlled trials with observational studies on 19 different treatments found that the estimates of treatment effects from the controlled trials and the observational studies were similar. In only 2 of the 19 analyses did the combined magnitude of the treatment effect in observational studies lie outside the 95% CI of the pooled estimate of the controlled trials (Benson & Hartz 2000). Another analysis of 5 clinical topics also found that the average results of observational studies were remarkably similar to those of controlled trials (Concato et al. 2000). Both of these two analyses were motivated by the overemphasis on randomization by EBM advocates. Furthermore, a recent analysis of a large set of studies focusing on cirrhosis and hepatitis saw no difference between nonrandomized studies and randomized trials in the ‘20-year survival of conclusions’ derived from these studies (Poynard et al. 2002).

References

Armitage P (2003) Fisher, Bradford Hill, and randomization. Int J Epidemiol 32:925-8

Balk EM, Bonis PAL, Moskowitz H, et al. (2002) Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA 287:2973-82 [comments in: (2002);288:2406-9 ]

Benson K, Hartz AJ (2000) A comparison of observational studies and randomized, controlled trials. N Engl J Med 342:1878-86 [comments in: (2000);342:1907-9; (2000);343:1194-7 ]

Caruso TJ, Gwaltney JM Jr (2005) Treatment of the common cold with echinacea: a structured review. Clin Infect Dis 40:807-10 * comments in: Hemilä 2005a

Chalmers TC, Celano P, Sacks HS, Smith H (1983) Bias in treatment assignment in controlled clinical trials. N Engl J Med 309:1358-61 [comments in: Gillman & Runyan (1984) ]

Chalmers TC, Smith H, Blackburn B, et al. (1981) A method for assessing the quality of a randomized control trial. Cont Clin Trials 2:31-49

Concato J, Shan N, Horwitz RI (2000) Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med 342:1887-92 [comments in: (2000);342:1907-9; (2000);343:1194-7 ]

Clarke M, Oxman AD, eds (2002) Cochrane Reviewers’ Handbook 4.1.5. In: The Cochrane Library, Issue 2, 2002. Oxford: Update Software.

Doll R (1992) Sir Austin Bradford Hill and the progress of medical science. BMJ 305:1521-6

Doll R (1998) Controlled trials: the 1948 watershed. BMJ 317:1217-20 BMJ

Greenland S (1998) Meta-analysis. In: Rothman & Greenland (1998) Modern Epidemiology, 2nd edn. Boston, MA: Lippincott Williams Wilkins, pp 643-73

Gillman MW, Runyan DK (1984) Bias in treatment assignment in controlled clinical trials [letter]. N Engl J Med 310:1610-1

Guyatt GH, Sackett DL, Cook DJ (1993) Users’ guides to the medical literature: how to use an article about therapy or prevention. JAMA 270:2598-601

Guyatt GH, Sackett DL, Sinclair MD, et al. (1995) Users’ guides to the medical literature: a method for grading health care recommendations. JAMA 274:1800-4

Hampton JR (2002) Evidence-based medicine, opinion-based medicine, and real-world medicine. Persp Biol Med 45:549-68

Hemilä H (2005a) Echinacea, vitamin C, the common cold, and blinding [letter]. Clin Infect Dis 41:762-3 * comments on: Caruso & Gwaltney (2005)

Higgins JPT, Green S, eds (2005) Cochrane Handbook for Systematic Reviews of Interventions 4.2.5 [updated May 2005]. In: The Cochrane Library, Issue 2, 2005. Chichester, UK: John Wiley & Sons Ltd.

Hill AB (1966) Reflections on the controlled trial. Ann Rheum Dis 25:107-13

Hill CL, LaValley MP, Felson DT (2002) Discrepancy between published report and actual conduct of randomized clinical trials. J Clin Epidemiol 55:783-6 7

Hrobjartsson A, Gøtzsche PC (2001) Is the placebo powerless? An analysis of clinical trials comparing placebo with no treatment. N Engl J Med 344:1594-602 *correction: (2001);345:304 * comments in: (2001);344:1630-2; (2001);345:1276-9 *** see update: J Internal Med (2004);256:91-100

Hrobjartsson A, Gøtzsche PC (2004) Placebo interventions for all clinical conditions. Cochrane Database Syst Rev (3):CD003974

Jüni P, Witschi A, Bloch R, Egger M (1999) The hazards of scoring the quality of clinical trials for meta-analysis. JAMA 282:1054-60 * comments in: (1999);282:1083-5; (2000);283:1421-3 ]

Kleijnen J, Riet G, Knipschild PG (1989) Vitamine C en verkoudheid; overzicht van een megadosis literatuur [in Dutch]. Ned Tijdschr Geneeskd 133;1532-5
English translation: Vitamin C and the common cold; a review of the megadose literature. In: Food Supplements and Their Efficacy. pp 21-8. Thesis for University of Limburg (1991); Netherlands; ISBN 90 900 4581 3

Kunz R, Oxman AD (1998) The unpredictability paradox: review of empirical comparisons of randomized and non-randomized clinical trials. BMJ 317:1185-90 BMJ

Oxman AD, Cook DJ, Guyatt GH, et al. (1994) Users’ guides to the medical literature: how to use an overview. JAMA 272:1367-71

Oxman AD, Guyatt GH (1988) Guidelines for reading literature reviews. Can Med Assoc J 138:697-703

Poynard T, Munteanu M, Ratziu V, et al. (2002) Truth survival in clinical research: an evidence-based requiem? Ann Intern Med 136:888-95 * comments in: (2002):137:932

Sackett DL, Richardson WS, Rosenberg WS, Haynes RB (1997) Evidence-Based Medicine: How to Practice and Teach EBM. NY: Churchill Livingstone [book review: JAMA (1997);278:168-9; JAMA (2000);284:2382-3 ; BMJ (1996);313:1410 BMJ ; Can Med Assoc J (1997);157:788

Shapiro S (1997) Is meta-analysis a valid approach to the evaluation of small effects in observational studies? J Clin Epidemiol 50:223-9

Yoshioka A (1998) Use of randomization in the Medical Research Council’s clinical trial of streptomycin in pulmonary tuberculosis in the 1940s. BMJ 317:1220-3 BMJ

Copyright: © 2006-2009 Harri Hemilä. This text is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Vitamin C and infections in animals by Harri Hemilä is licensed under a Creative Commons Attribution 1.0 Finland License.
Based on a work at www.mv.helsinki.fi/home/hemila/metaanalysis