OCS Redux: International Statistical Panel Puts Observational Cohort Study Research Possibilities into Perspective, Identifies Limitations

International Statistical Panel Puts Observational Cohort Study Research Possibilities into Perspective, Identifies Limitations

March 2000

“Blunt Instrument”

One of the most frequent objections to carrying out randomized, controlled trials to answer questions about long-term effectiveness is that observational cohort studies could provide the answers faster, more cheaply and as a closer proxy to real-world clinical experience.

The National Institutes of Health funds many observational cohort studies such as the men only Multicenter AIDS Cohort Study (MACS), the Women’s Interagency Health Study (WIHS), and the ALIVE study of injection drug users. The Centers for Disease Control and Prevention also funds some observational cohorts such as the Adult & Adolescent Spectrum of Disease (AASD), in over 23,000 HIV-infected patients, and the Hospital Outpatient Study (HOPS), in about 5,000.

Researchers — both cohort-based as well as trial-based — provided a number of explanations to why observational studies will be inadequate to tease out the answers to “When to start?” “What to start with?” “When to change?” and “What to change to.” Here are a collection of quotes and paraphrases from the Division of AIDS Satellite Statistical Symposium, held on January 11, 2000.

Cohort studies complement and supplement — but do not replace — randomized clinical trials.
Alvaro Muñoz, chief statistician for the MACS, which has followed 5,000 gay men since 1984, said that “Cohort studies supplement randomized clinical trials by measuring individual effectiveness and complement them by providing measures of population effectiveness.” Thus, the MACS showed that single- and dual-nucleoside therapy had time-limited effects on AIDS morbidity and mortality, and that HAART had dramatic effects. Both treatment periods — the nucleoside era and the HAART era — showed 50% or greater reductions in mortality and morbidity compared to the preceding one — an effect which is possible to measure in a cohort study. Smaller, but still clinically significant effects (e.g., smaller than 50-100%) cannot be detected by such studies.
Cohort studies provide suggestive information for generating hypotheses and designing randomized trials.
Five ongoing European cohort studies following over 12,000 HIV-infected individuals have indicated that the dramatic benefit of HAART seen in advanced HIV infection (people entering with CD4 <200 cells/mm³) is not seen in those with higher levels (see table below).Statistician Steve Self of the Fred Hutchinson Cancer Research Center in Seattle stated that these studies are provocative, and raise important questions, but that, “You can only get rid of that level of uncertainty by doing a randomized clinical trial.”
Cohort studies can only detect large differences.
Jim Neaton, chief statistician for the CPCRA, pointed out that “MACS hit this [the efficacy of HAART] with a blunt instrument. It didn’t need a more sensitive one because HAART had an 80% effect. But the kind of strategy treatment effects we’re trying to look for are unlikely to be as large. The methods are rigorous for evaluating the introduction of a potent regimen; you could be dead wrong with a 20% difference, with an 80% difference you’d have to be half-dead not to notice it.”Tom Fleming, chair of the adult ACTG and CPCRA Data & Safety Monitoring Board, said, “Cohort studies are a useful approach to look at a population for prognostic factors, changing use of therapies over time, event rates evolving over time. . . . Cohort studies provide a blunt instrument: if the effect is big, you can see it (the signal is bigger than the bias inherent in the lack of randomization). You need to be sure that your ‘signal’ is greater than the factors that influence your choice of early vs. late.”
Cohort studies, unlike randomization, cannot control for unrecognized covariates.
Fleming continued, “The covariates that are recognized are the tip of the iceberg. So my ultimate answer: randomization. . . . The integrity of randomization is only preserved if you follow everyone. If missingness [losses to follow-up, or missing data points] occurs in a dependent manner, it introduces bias. . . . What if the real, unknown, dominant covariates are not captured? My idea is to randomize and maximize follow-up.”
Cohort studies cannot detect small but clinically relevant differences.
Jim Neaton said, “With the moderate effects, e.g., with two nukes versus one or zero, you’re ‘off,’ or you got the wrong answer, but with the protease inhibitors we got knocked over the head. It was a huge difference. [He was referring to the fact that the MACS analysis, unadjusted for baseline CD4, was unable to detect significant benefits with single- or dual-nucleoside therapy. When adjusted for baseline covariates, the MACS detected the ˜50% reduction in short-term mortality conferred by AZT.]
Cohort studies cannot reliably detect “equivalence.”
Amy Justice, from the Veterans’ Administration, said, “In a When-to-Start (WTS) trial, we want to know if they’re equivalent, not whether one is better, and that’s the hardest to do from a power standpoint. How small of a difference is so small we wouldn’t want to know about that?”
There is no substitute for randomization.
Glenn Satten, of the CDC said, “Randomization is better than anything else” for answering these questions. Steve Self said, “There is no substitute for randomized clinical trials to give us answers to strategy questions that can give us an even 20% [relative risk = 1.2] effect.”
We do not have any large-scale randomized studies which can show clinically the benefit of early vs. later initiation of antiretroviral therapy.
Tom Fleming said, “We need to have all these types of studies. In particular, now, large-scale randomized trials are lacking in the landscape in treatment trials. These need to be long-term and large studies…. Here I’m an advocate of large simple trials. In cardiology the large simple trial has established itself as a very important tool. The complexities in HIV disease mean that we’re not going to be able to design these studies as simply.”
Such randomized clinical trials may need to be big enough to detect at 20% — not just a 50-100% — difference between strategies.
The size of the clinical difference to be measured has a major impact on the size and duration of the clinical trial design. Tom Fleming compared two “When-to-start” sample sizes presented by Michael Hughes (SDAC, Harvard) and Jim Neaton (CPCRA, Minnesota). He concluded that, as estimated by Mike Hughes, “The difference of a relative risk of 2.0, a 100% improvement, would require 88 events. Jim Neaton calculated a 20% improvement, or a relative risk of 1.2, which would require 1,200 events. “These are important achievable differences,” Fleming added. “We need large numbers followed long-term.”Amy Justice added, “I’ve been to three meetings at which these issues have been discussed over the last six months. It’s not randomized trials vs. cohort studies, it’s how much do we want to spend? How large are the effect sizes we want to see? How long do we have to follow them for? Are there going to be differences in event rates early or late?”
The duration of these studies will be question-specific.
Fleming concluded, “These questions (e.g., length of study) are study specific. I’d want these trials to be revealing some insights at 5-7 years, but for front-line, I want to know ten years [especially if we’re looking at lifelong therapy].”Jack Killen closed the workshop by remarking, “We’re still rounding out the adult therapeutics agenda and are about to go into recompeting the pediatrics agenda. . . . An astonishing amount has been accomplished here, and there has been an astonishing amount of agreement — which I would not have imagined seeing even six months ago. We’re coming away from this that there are very important questions — this is the highest “new” adult priority, and a key component in the pediatrics research agenda. These will be very difficult, very high-risk studies — randomized controlled clinical trials are optimal because of the magnitude of effects we want to measure: large ‘N’ with long-term follow-up. Cohorts complement randomized clinical trials — there are currently no on-going randomized clinical trials that directly address these needs. Some plans may be in the wings in a couple of the cooperative groups.”

“What’s next?,” Killen added. “We need to assess the plans in the wings for how they relate to the scientific and operational requirements that have been defined here. Move forward with due haste but also with due care and diligence — feasibility, complexity of study designs, need for equipoise and buy-in among providers and subjects. Once again, we are breaking ground in HIV.”

Cohort Studies Suggesting Immediate HAART Confers No Extra Benefit Compared with Deferred HAART in HIV-Infected Persons CD4 >200 cells/mm³

Study	N	Reference
EuroSIDA	7,331	Phillips, 1999
Swiss Cohort	3,342	AIDS, 1999
ICONA (Italy)	805	Phillips, 1999
Frankfurt Cohort	536	Phillips, 1999
Royal Free	500	Phillips, 1999
Hemophilia Cohort