The evolution of the modern clinical
trial dates back at least to the eighteenth century [1, 2]. Lind, in
his classical study on board the Salisbury, evaluated six treatments for
scurvy in 12 patients. One of the two who was given oranges and
lemons recovered quickly and was fit for duty after 6 days. The
second was the best recovered of the others and was assigned the
role of nurse to the remaining ten patients. Several other
comparative studies were also conducted in the eighteenth and
nineteenth centuries. The comparison groups comprised literature
controls, other historical controls, and concurrent controls
The concept of randomization was
introduced by Fisher and applied in agricultural research in 1926
[3]. Probably the first clinical
trial that used a form of random assignment of participants to
study groups was reported in 1931 by Amberson et al.
[4]. After careful matching of 24
patients with pulmonary tuberculosis into comparable groups of 12
each, a flip of a coin determined which group received sanocrysin,
a gold compound commonly used at that time. The British Medical
Research Council trial of streptomycin in patients with
tuberculosis, reported in 1948, used random numbers in the
allocation of individual participants to experimental and control
groups [5, 6].
The principle of blinding was also
introduced in the trial by Amberson et al. [4]. The participants were not aware of whether
they received intravenous injections of sanocrysin or distilled
water. In a trial of cold vaccines in 1938, Diehl and coworkers
[7] referred to the saline solution
given to the subjects in the control group as a placebo.
One of the early trials from the
National Cancer Institute of the National Institutes of Health in
1960 randomly assigned patients with leukemia to either 6-azauracil
or placebo. No treatment benefit was observed in this double-blind
trial [8].
In the past several decades, the
randomized clinical trial has emerged as the preferred method in
the evaluation of medical interventions. Techniques of
implementation and special methods of analysis have been developed
during this period. Many of the principles have their origins in
work by Hill [9–12]. For a brief history of key developments in
clinical trials, see Chalmers [13].
The original authors of this book have
spent their careers at the U.S. National Institutes of Health, in
particular, the National Heart, Lung, and Blood Institute, and/or
academia. The two new authors have been academically based
throughout their careers. Therefore, many of the examples reflect
these experiences. We also cite papers which review the history of
clinical trials development at the NIH [14–18].
The purpose of this chapter is to
define clinical trials, review the need for them, discuss timing
and phasing of clinical trials, and present an outline of a study
Fundamental Point
A properly planned and executed clinical trial
is the best experimental technique for assessing the effectiveness
of an intervention. It also contributes to the identification of
possible harms.
What Is a Clinical Trial?
We define a clinical trial as a
prospective study comparing the
effects and value of intervention (s) against a control in human beings. Note
that a clinical trial is prospective, rather than retrospective.
Study participants must be followed forward in time. They need not
all be followed from an identical calendar date. In fact, this will
occur only rarely. Each participant however, must be followed from
a well-defined point in time, which becomes time zero or baseline
for that person in the study. This contrasts with a case-control
study, a type of retrospective observational study in which
participants are selected on the basis of presence or absence of an
event or condition of interest. By definition, such a study is not
a clinical trial. People can also be identified from medical
records or other data sources and subsequent records can be
assessed for evidence of new events. With the increasing
availability of electronic health records, this kind of research
has become more feasible and may involve many tens of thousands of
individuals. It is theoretically possible that the participants can
be identified at the specific time they begin treatment with one or
another intervention selected by the clinician, and then followed
by means of subsequent health records. This type of study is not
considered to be a clinical trial because it is unlikely that it is
truly prospective. That is, many of the participants would have
been identified after initiation of treatment and not directly
observed from the moment of initiation. Thus, at least some of the
follow-up data are retrospective. It also suffers from the major
limitation that treatment is not chosen with an element of
randomness. Thus associations between treatment and outcome are
nearly always influenced by confounding factors, some of which are
measured (and thus can be accounted for with adjustment) and others
unmeasured (that cannot be). Of course, electronic records and
registries can work effectively in collaboration with randomization
into clinical trials. As exemplified by the Thrombus Aspiration in
ST-Elevation Myocardial Infarction in Scandinavia (TASTE) trial
[19], electronic registries
greatly simplified the process of identifying and obtaining initial
information on those people eligible for the trial. As noted by
Lauer and D’Agostino [20],
however, translating this approach into other settings will not be
A clinical trial
must employ one or more intervention techniques. These may be
single or combinations of diagnostic, preventive, or therapeutic
drugs, biologics, devices, regimens, procedures, or educational
approaches. Intervention techniques should be applied to
participants in a standard fashion in an effort to change some
outcome. Follow-up of people over a period of time without active
intervention may measure the natural history of a disease process,
but it does not constitute a clinical trial. Without active
intervention the study is observational because no experiment is
being performed.
Early phase studies may be controlled
or uncontrolled. Although common terminology refers to phase I and
phase II trials, because they are sometimes uncontrolled, we will
refer to them as clinical studies. A trial, using our definition,
contains a control group
against which the intervention group is compared. At baseline, the
control group must be sufficiently similar in relevant respects to
the intervention group in order that differences in outcome may
reasonably be attributed to the action of the intervention. Methods
for obtaining an appropriate control group are discussed in Chaps.
5 and 6. Most often a new intervention is
compared with, or used along with, best current standard therapy.
Only if no such standard exists or, for several reasons discussed
in Chap. 2, is not available, is it
appropriate for the participants in the intervention group to be
compared to participants who are on no active treatment. “No active
treatment” means that the participant may receive either a placebo
or no treatment at all. Obviously, participants in all groups may
be on a variety of additional therapies and regimens, so-called
concomitant treatments, which may be either self-administered or
prescribed by others (e.g., other physicians).
For purposes of this book, only
studies in human beings
will be considered as clinical trials. Certainly, animals (or
plants) may be studied using similar techniques. However, this book
focuses on trials in people, and each clinical trial must therefore
incorporate participant safety considerations into its basic
design. Equally important is the need for, and responsibility of,
the investigator to inform fully potential participants about the
trial, including information about potential benefits, harms, and
treatment alternatives [21–24]. See
Chap. 2 for further discussion of ethical
Unlike animal studies, in clinical
trials the investigator cannot dictate what an individual should
do. He can only strongly encourage participants to avoid certain
medications or procedures which might interfere with the trial.
Since it may be impossible to have “pure” intervention and control
groups, an investigator may not be able to compare interventions,
but only intervention strategies. Strategies refer to attempts at
getting all participants to adhere, to the best of their ability,
to their originally assigned intervention. When planning a trial,
the investigator should recognize the difficulties inherent in
studies with human subjects and attempt to estimate the magnitude
of participants’ failure to adhere strictly to the protocol. The
implications of less than perfect adherence are considered in Chap.
As discussed in Chaps. 6 and 7, the ideal clinical trial is one that is
randomized and double-blind. Deviation from this standard has
potential drawbacks which will be discussed in the relevant
chapters. In some clinical trials compromise is unavoidable, but
often deficiencies can be prevented or minimized by employing
fundamental features of design, conduct, and analysis.
A number of people distinguish between
demonstrating “efficacy” of an intervention and “effectiveness” of
an intervention. They also refer to “explanatory” trials, as
opposed to “pragmatic” or “practical” trials. Efficacy or
explanatory trials refer to what the intervention accomplishes in
an ideal setting. The term is sometimes used to justify not using
an “intention-to-treat” analysis. As discussed in Chaps.
8 and 18, that is insufficient
justification. Effectiveness or pragmatic trials refer to what the
intervention accomplishes in actual practice, taking into account
inclusion of participants who may incompletely adhere to the
protocol or who for other reasons may not respond to an
intervention. Both sorts of trials may address relevant questions
and both sorts need to be properly performed. Therefore, we do not
consider this distinction between trials as important as the proper
design, conduct, and analysis of all trials in order to answer
important clinical or public health questions, regardless of the
setting in which they are done.
The SPIRIT 2013 Statement (Standard
Protocol Items: Recommendations for Interventional Trials)
[25], as well as the various
International Conference on Harmonisation (ICH) documents
[26] devote considerable attention
to the quality of trials, and the features that make for high
quality. Poorly designed, conducted, analyzed, and reported trials
foster confusion and even erroneous interpretation of results.
People have argued over what key elements deserve the most
attention versus those that expend resources better used elsewhere.
However, unless certain characteristics such as unbiased assignment
to treatment of sufficient numbers of adequately characterized
participants, objective and reasonably complete assessment of the
primary and secondary outcomes, and proper analysis are performed,
the trial may not yield interpretable results. Much of the rest of
this book expands on these issues.
Clinical Trial Phases
In this book we focus on the design
and analysis of randomized trials comparing the effectiveness and
adverse effects of two or more treatments. Several steps or phases
of clinical research, however, must occur before this comparison
can be implemented. Classically, trials of pharmaceutical agents
have been divided into phases I through IV. Studies with other
kinds of interventions, particularly those involving behavior or
lifestyle change or surgical approaches, will often not fit neatly
into those phases. In addition, even trials of drugs may not fit
into a single phase. For example, some may blend from phase I to
phase II or from phase II to phase III. Therefore, it may be easier
to think of early phase studies and late phase studies.
Nevertheless, because they are in common use, and because early
phase studies, even if uncontrolled, may provide information
essential for the conduct of late phase trials, the phases are
defined below.
A good summary of phases of clinical
trials and the kinds of questions addressed at each phase was
prepared by the International Conference on Harmonisation
[26]. Figure 1.1, taken from that
document, illustrates that research goals can overlap with more
than one study phase.

Correlation between development phases and
types of study [26]
Thus, although pharmacology studies in
humans that examine drug tolerance, metabolism, and interactions,
and describe pharmacokinetics and pharmacodynamics, are generally
done as phase I, some pharmacology studies may be done in other
trial phases. Therapeutic exploratory studies, which look at the
effects of various doses and typically use biomarkers as the
outcome, are generally thought of as phase II. However, sometimes,
they may be incorporated into other phases. The usual phase III
trial consists of therapeutic confirmatory studies, which
demonstrate clinical usefulness and examine the safety profile. But
such studies may also be done in phase II or phase IV trials.
Therapeutic use studies, which examine the drug in broad or special
populations and seek to identify uncommon adverse effects, are
almost always phase IV (or post-approval) trials.
Phase I Studies
Although useful pre-clinical
information may be obtained from in vitro studies or animal models,
early data must also be obtained in humans. People who participate
in phase I studies generally are healthy volunteers, but may be
patients who have already tried and failed to improve on the
existing standard therapies. Phase I studies attempt to estimate
tolerability and characterize pharmacokinetics and
pharmacodynamics. They focus on questions such as bioavailability
and body compartment distribution of the drug and metabolites. They
also provide preliminary assessment of drug activity
[26]. These studies may also
assess feasibility and safety of pharmaceutical or biologic
delivery systems. For example, in gene transfer studies, the action
of the vector is an important feature. Implantable devices that
release an active agent require evaluation along with the agent to
assess whether the device is safe and delivers the agent in
appropriate doses.
Buoen et al. reviewed 105 phase I
dose-escalation studies in several medical disciplines that used
healthy volunteers [27]. Despite
the development of new designs, primarily in the field of cancer
research, most of the studies in the survey employed simple
dose-escalation approaches.
Often, one of the first steps in
evaluating drugs is to estimate how large a dose can be given
before unacceptable toxicity is experienced by patients
[28–33]. This is usually referred to as the
maximally tolerated dose. Much of the early literature has
discussed how to extrapolate animal model data to the starting dose
in humans [34] or how to step up
the dose levels to achieve the maximally tolerated dose.
In estimating the maximally tolerated
dose, the investigator usually starts with a very low dose and
escalates the dose until a prespecified level of toxicity is
obtained. Typically, a small number of participants, usually three,
are entered sequentially at a particular dose. If no specified
level of toxicity is observed, the next predefined higher dose
level is used. If unacceptable toxicity is observed in any of the
three participants, additional participants, usually three, are
treated at the same dose. If no further toxicity is seen, the dose
is escalated to the next higher dose. If additional unacceptable
toxicity is observed, then the dose escalation is terminated and
that dose, or perhaps the previous dose, is declared to be the
maximally tolerated dose. This particular design assumes that the
maximally tolerated dose occurs when approximately one-third of the
participants experience unacceptable toxicity. Variations of this
design exist, but most are similar.
Some [32, 35–37] have
proposed more sophisticated designs in cancer research that specify
a sampling scheme for dose escalation and a statistical model for
the estimate of the maximally tolerated dose and its standard
error. The sampling scheme must be conservative in dose escalation
so as not to overshoot the maximally tolerated dose by very much,
but at the same time be efficient in the number of participants
studied. Many of the proposed schemes utilize a step-up/step-down
approach; the simplest being an extension of the previously
mentioned design to allow step-downs instead of termination after
unacceptable toxicity, with the possibly of subsequent step-ups.
Further increase or decrease in the dose level depends on whether
or not toxicity is observed at a given dose. Dose escalation stops
when the process seems to have converged around a particular dose
level. Once the data are generated, a dose response model is fit to
the data and estimates of the maximally tolerated dose can be
obtained as a function of the specified probability of a toxic
response [32].
Bayesian approaches have also been
developed [38, 39]. These involve methods employing continual
reassessment [35, 40] and escalation with overdose control
[41]. Bayesian methods involve the
specification of the investigators’ prior opinions about the
agent’s dose-toxicity profile, which is then used to select
starting doses, and escalation rules. The most common Bayesian
phase I design is called the continual reassessment method,
[35] in which the starting dose is
set to the prior estimate of the maximally tolerated dose. After
the first cohort of participants (typically of size 1, 2, or 3,
though other numbers are possible), the estimate is updated and the
next participant(s) assigned to that estimate. The process is
repeated until a prespecified number of participants have been
assigned. The dose at which a hypothetical additional participant
would be assigned constitutes the final estimate of the maximally
tolerated dose. Bayesian methods that constrain the number of total
toxicities have also been developed (escalation with overdose
control) as have designs that allow for two or more treatments
[42] and methods that allow for
incomplete follow-up of long-term toxicities (time-to-event
continual reassessment method) [43]. Many variations have been proposed. An
advantage of Bayesian phase I designs is that they are very
flexible, allowing risk factors and other sources of information to
be incorporated into escalation decisions. A disadvantage is their
complexity, leading to unintuitive dose assignment rules.
Phase II Studies
Once a dose or range of doses is
determined, the next goal is to evaluate whether the drug has any
biological activity or effect. The comparison may consist of a
concurrent control group, historical controls, or pre-treatment
status versus post-treatment status. Because of uncertainty with
regard to dose-response, phase II studies may also employ several
doses, with perhaps four or five intervention arms. They will look,
for example, at the relationship between blood level and activity.
Genetic testing is common, particularly when there is evidence of
variation in rate of drug metabolism. Participants in phase II
studies are usually carefully selected, with narrow inclusion
criteria [26].
Although sometimes phase II studies
are used for regulatory agency approval of a product, generally
phase II studies are performed to make a decision as to whether to
further develop a new drug or device. As such, the purpose is to
refine an estimate of the probability of success in phase III.
Success depends on a variety of factors, including estimated
beneficial and adverse effects, feasibility, and event rates of the
target population. Because phase II trials by definition do not
have adequate power to define the effect on major clinical
outcomes, the estimate of treatment effect and harm may depend on
multiple inputs, including effects on biomarkers, on more common
but less definitive clinical outcomes (like unstable angina rather
than myocardial infarction) and on more minor safety signals (like
minor bleeding or modest elevation in liver function tests).
The phase II design depends on the
quality and adequacy of the phase I study. The results of the phase
II study will, in turn, be used to design the phase III trial. The
statistical literature for phase II studies, which had been rather
limited [46–52] has expanded [53, 54] and, as
with phase I studies, includes Bayesian methods [55, 56].
One of the traditional phase II
designs in cancer is based on the work of Gehan [46], which is a version of a two stage design.
In the first stage, the investigator attempts to rule out drugs
which have no or little biologic activity. For example, he may
specify that a drug must have some minimal level of activity, say,
in 20% of patients. If the estimated activity level is less than
20%, he chooses not to consider this drug further, at least not at
that maximally tolerated dose. If the estimated activity level
exceeds 20%, he will add more participants to get a better estimate
of the response rate. A typical study for ruling out a 20% or lower
response rate enters 14 participants. If no response is observed in
the first 14 participants, the drug is considered not likely to
have a 20% or higher activity level. The number of patients added
depends on the degree of precision desired, but ranges from 10 to
20. Thus, a typical cancer phase II study might include fewer than
30 people to estimate the response rate. As is discussed in Chap.
8, the precision of the estimated
response rate is important in the design of the controlled trial.
In general, phase II studies are smaller than they ought to
Some [32, 47,
57] have proposed designs which
have more stages or a sequential aspect. Others [50, 58] have
considered hybrids of phase II and phase III designs in order to
enhance efficiency. While these designs have desirable statistical
properties, the most vulnerable aspect of phase II, as well as
phase I studies, is the type of person enrolled. Usually, phase II
studies have more exclusion criteria than phase III comparative
trials. Furthermore, the outcome in the phase II study (e.g., tumor
response) may be different than that used in the definitive
comparative trial (e.g., survival). Refinements may include time to
failure [54] and unequal numbers
of participants in the various stages of the phase II study
[59]. Bayesian designs for phase
II studies require prior estimates, as was the case for phase I
studies, but differ in that they are priors of efficacy measures
for the dose or doses to be investigated rather than of toxicity
rates. Priors are useful for incorporating historical data into the
design and analysis of phase II trials. Methods are available for
continuous [60], bivariate
[60], and survival outcomes
[61]. These methods can account
not only for random variations in participant responses within
institutions but also for systematic differences in outcomes
between institutions in multicenter trials or when several control
groups are combined. They also acknowledge the fact that historical
efficacy measures of the control are estimated with error. This
induces larger sample sizes than in trials which assume efficacy of
the control to be known, but with correspondingly greater
resistance to false positive and false negative errors. Bayesian
methods can also be used in a decision-theoretic fashion to
minimize a prespecified combination of these errors for a given
sample size [62, 63].
Although not generally considered
phase II studies, some pilot (or feasibility or vanguard) studies
may serve similar functions. Particularly for studies of
non-pharmacologic interventions, these pilot studies can uncover
possible problems in implementing and assessing an intervention.
Here, we distinguish pilot studies conducted for this purpose from
those done to see if a design for a later phase trial is feasible.
For example, can participant screening and enrollment and
maintenance of adherence be successfully implemented?
Phase III/IV Trials
The phase III and phase IV trials are
the clinical trials defined earlier in the chapter. They are
generally designed to assess the effectiveness of new interventions
or existing interventions with new indications and thereby, their
value in clinical practice. They also examine adverse effects, but,
as described below and in Chap. 12, assessment of harm in clinical
trials has limitations. The focus of most of this book is on these
late phase trials. However, many design assumptions depend on
information obtained from phase I and phase II studies, or some
combination of early phase studies.
Phase III trials of chronic conditions
or diseases often have a short follow-up period for evaluation,
relative to the period of time the intervention might be used in
practice. In addition, they focus on efficacy or effectiveness, but
knowledge of safety is also necessary to evaluate fully the proper
role of an intervention in clinical practice. A procedure or device
may fail after a few years and have adverse sequelae for the
patient. In 2014, the FDA warned that morcellation to treat uterine
fibroids by laparoscopic means, a procedure that had been used for
years, could lead to spreading of unsuspected uterine sarcoma
[64]. Thus, long-term surveillance
of an intervention believed to be effective in phase III trials is
often necessary. Such long-term studies or studies conducted after
regulatory agency approval of the drug or device are referred to as
phase IV trials. Drugs may be approved on the basis of intermediate
or surrogate outcomes or biomarkers, such as blood pressure or
cholesterol lowering. They may also be approved after relatively
short term studies (weeks or months), even though in practice, in
the case of chronic conditions, they may be taken for years or even
decades. Even late phase clinical trials are limited in size to
several hundred or thousand (at most, a few tens of thousands) of
participants. Yet the approved drugs or devices will possibly be
used by millions of people. This combination of incomplete
information about clinical outcomes, relatively short duration, and
limited size means that sometimes the balance between benefit and
harm becomes clear only when larger phase IV studies are done, or
when there is greater clinical experience. One example is some of
the cyclooxygenase 2 (COX 2) inhibitors, which had been approved
for arthritis pain, but only disclosed cardiovascular problems
after larger trials were done. These larger trials were examining
the effects of the COX 2 inhibitors on prevention of colon cancer
in those with polyps [65,
66]. Similarly, only after they
had been on the market were thiazolidinediones, a class of drugs
used for diabetes, found to be associated with an increase in heart
failure [67].
Regulatory agency approval of drugs,
devices, and biologics may differ because, at least in the United
States, the regulations for these different kinds of interventions
are based on different laws. For example, FDA approval of drugs
depends greatly on at least one well-designed clinical trial plus
supporting evidence (often, another clinical trial). Approval of
devices relies less on clinical trial data and more on engineering
characteristics of the device, including similarity with previously
approved devices. (For further discussion of regulatory issues, see
Chap. 22.) Devices, however, are often
implanted, and unless explanted, may be present for the life of the
participant. Therefore, there are urgent needs for truly long-term
data on performance of devices in vivo. Assessment of devices also
depends, more so than drugs, on the skill of the person performing
the implantation. As a result, the results obtained in a clinical
trial, which typically uses primarily well-trained investigators,
may not provide an accurate balance of harm and benefit in general
The same caution applies to clinical
trials of procedures of other sorts, whether surgical or lifestyle
intervention, where only highly skilled practitioners are
investigators. But unlike devices, procedures may have little or no
regulatory oversight, although those paying for care often consider
the evidence.
Why Are Clinical Trials Needed?
Well-designed and sufficiently large
randomized clinical trials are the best method to establish which
interventions are effective and generally safe and thereby improve
public health. Unfortunately, a minority of recommendations in
clinical practice guidelines are based on evidence from randomized
trials, the type of evidence needed to have confidence in the
results [68]. Thus, although
trials provide the essential foundation of evidence, they do not
exist for many commonly used therapies and preventive measures.
Improving the capacity, quality and relevance of clinical trials is
a major public health priority.
Much has been written about the advent
of individualized medicine, where an intervention (usually a drug
or biologic) is used specifically in a person for whom it was
designed or who has a specific genetic marker. We may someday reach
the point where that is possible for many conditions and therapies.
But we are not there yet. With rare exceptions, the best we can
generally do is to decide to use or not use a treatment that has
been evaluated in a clinical trial in a given population. Even when
we better understand the genetic components of a condition, the
interaction with the environment usually precludes full knowledge
of a disease’s patterns and course. Therefore, almost always, a
clinical trial is the most definitive method of determining whether
an intervention has the postulated effect. Even when a drug is
designed to be used in people with selected genetic markers,
clinical trials are still commonly conducted. An example is
trastuzumab, which is beneficial in women with HER2 receptors in
breast cancer [69–71]. Even here, treatment is only partly
successful and can have major adverse effects. Benefits of using
pharmacogenetics in the decisions to achieve optimum dosing of
warfarin have been claimed from some studies, but not in others
[72–75]. Given the uncertain knowledge about disease
course and the usual large variations in biological measures, it is
often difficult to say on the basis of uncontrolled clinical
observation whether a new treatment has made a difference to
outcome, and if it has, what the magnitude is. A clinical trial
offers the possibility of such judgment because there exists a
control group which, ideally, is comparable to the intervention
group in every way except for the intervention being studied.
The consequences of not conducting
appropriate clinical trials at the proper time can be both serious
and costly. An example was the uncertainty as to the efficacy and
safety of digitalis in congestive heart failure. Only in the 1990s,
after the drug had been used for over 200 years, was a large
clinical trial evaluating the effect of digitalis on mortality
mounted [76]. Intermittent
positive pressure breathing became an established therapy for
chronic obstructive pulmonary disease without good evidence of
benefits. One trial suggested no major benefit from this very
expensive procedure [77].
Similarly, high concentration of oxygen was used for therapy in
premature infants until a clinical trial demonstrated that it could
cause blindness [78].
A clinical trial can determine the
incidence of adverse effects or complications of the intervention.
Few interventions, if any, are entirely free of undesirable
effects. However, drug toxicity might go unnoticed without the
systematic follow-up measurements obtained in a clinical trial of
sufficient size. The Cardiac Arrhythmia Suppression Trial
documented that commonly used anti-arrhythmic drugs were harmful in
patients who had a history of myocardial infarction, and raised
questions about routine use of an entire class of anti-arrhythmic
agents [79]. Corticosteroids had
been commonly used to treat people with traumatic brain injury.
Small clinical trials were inconclusive, and a meta-analysis of 16
trials showed no difference in mortality between corticosteroids
and control [80]. Because of the
uncertainty as to benefit, a large clinical trial was conducted.
This trial, with far more participants than the others combined,
demonstrated a significant 18% relative increase in mortality at 14
days [81] and a 15% increase at 6
months in the corticosteroid group [82]. As a result, an update of the meta-analysis
recommended against the routine use of corticosteroids in people
with head injury [83]. Niacin was
widely believed to be a safe and effective treatment to improve
lipid parameters and reduce coronary heart disease events for
patients at risk [84,
85]. The Atherothrombosis
Intervention in Metabolic Syndrome with Low HDL/High Triglycerides:
Impact on Global Health Outcomes (AIM-HIGH) trial failed to show
added benefit from long-acting niacin in 3,414 participants with
cardiovascular disease receiving statin therapy [86]. A concern with that trial was that it might
have been underpowered. The Heart Protection Study 2-Treatment of
HDL to Reduce the Incidence of Vascular Events (HPS2-THRIVE)
[87] was designed to provide
definitive information regarding the clinical effects of a
combination formulation of niacin and laropiprant, an agent to
prevent flushing side effects, on top of simvastatin. That trial of
25,673 participants also showed no reduction in the primary outcome
of vascular events, but increases in serious adverse
gastrointestinal events, infection, and onset and poor control of
In the final evaluation, an
investigator must compare the benefit of an intervention with its
other, often unwanted effects in order to decide whether, and under
what circumstances, its use should be recommended. The financial
implications of an intervention, particularly if there is limited
benefit, must also be considered. Several studies have indicated
that drug eluting stents have somewhat less restenosis than bare
metal stents in percutaneous coronary intervention [88, 89]. The
cost difference, however, can be considerable, especially since
more than one stent is typically inserted. The Comparison of
Age-Related Macular Degeneration Treatments Trials (CATT) showed
that ranibizumab and bevacizumab were similarly effective at the
1-year point with respect to visual acuity in people with
age-related macular degeneration [90]. Bevacizumab appeared to have various more
serious adverse effects, but was one-fortieth the cost of
ranibizumab. Whether the difference in the adverse events is real
is uncertain, as another trial of the same agents in the same
population did not show it [91].
In both examples, are the added benefits or possibly fewer adverse
events, which may be defined and measured in different ways, of the
more expensive interventions worth the extra cost? Such assessments
are not statistical in nature. They must rely on the judgment of
the investigator and the medical practitioner as well as on those
who pay for medical care. Clinical trials rarely fully assess costs
of the interventions and associated patient care, which change over
time, and cannot replace clinical judgment; they can only provide
data so that decisions are evidence-based.
People suffering from or being treated
for life-threatening diseases for which there are no known
effective therapies and those caring for them often argue that
controlled clinical trials are not needed and that they have a
right to experimental interventions. Because there may be little
hope of cure or even improvement, patients and their physicians
want to have access to new interventions, even if those
interventions have not been shown to be safe and effective by means
of the usual clinical trial. They want to be in studies of these
interventions, with the expectation that they will receive the new
treatment, rather than the control (if there is a control group).
Those with the acquired immunodeficiency syndrome (AIDS) used to
make the case forcefully that traditional clinical trials are not
the sole legitimate way of determining whether interventions are
useful [92–95]. This is undeniably true, and clinical trial
researchers need to be willing to modify, when necessary, aspects
of study design or management. Many have been vocal in their
demands that once a drug or biologic has undergone some minimal
investigation, it should be available to those with
life-threatening conditions, should they desire it, even without
late phase clinical trial evidence [96]. If the patient community is unwilling to
participate in clinical trials conducted along traditional lines,
or in ways that are scientifically “pure,” trials are not feasible
and no information will be forthcoming. When the situation involves
a rare, life-threatening genetic disorder in children, what level
of evidence is needed for patients and their families, clinicians,
and regulatory authorities to approve use of new agents? When
should accelerated or “fast track” approval occur? Should there be
interim approval based on less rigid criteria, with use restricted
to specific cases and situations? When should post-approval trials
be required? The U.S. FDA approved bedaquiline for drug-resistant
tuberculosis on the basis of a randomized trial of 160 patients
with time to culture conversion as the primary outcome, even though
the study was too small to reliably detect clinical outcomes
[97, 98]. This was done because of the urgent need
for new drugs and with the requirement that a “confirmatory trial”
would be conducted. Investigators need to involve the relevant
communities or populations at risk, even though this could lead to
some compromises in design and scientific purity. Investigators
need to decide when such compromises so invalidate the results that
the study is not worth conducting. It should be noted that the
rapidity with which trial results are demanded, the extent of
community involvement, and the consequent effect on study design,
can change as knowledge of the disease increases, as at least
partially effective therapy becomes available, and as understanding
of the need for valid research designs, including clinical trials,
develops. This happened to a great extent with AIDS trials.
Although investigators should design
clinical trials using the fundamentals discussed in this book, they
must consider the context in which the trial is being conducted.
The nature of the disease or condition being studied and the
population and setting in which it is being done will influence the
outcomes that are assessed, the kind of control, the size, the
duration, and many other factors.
Clinical trials are conducted because
it is expected that they will influence practice and therefore
improve health [99–104]. Traditionally, there has been
considerable delay in adoption of evidence from trials, depending
on the direction of the results, strength of the findings, methods
of dissemination of results, and other evidence. There is indirect
evidence, though, that the results of clinical trials can affect
practice, which in turn may improve health outcomes. Ford et al.
[105] estimated that about half
of the reduction in death from coronary artery disease in the
United States between 1980 and 2000 was due to better control of
risk factors. The other half of the reduction was due to improved
treatments, most of which were based on clinical trial results. A
specific example of change in practice based on evidence from
trials and improved survival comes from a national registry in
Sweden during 1996–2007. Increase use of reperfusion therapy,
revascularization, and medications such as aspirin, beta blockers,
clopidogrel, and statins in treatment of ST segment elevation
myocardial infarction was associated with a 50% decrease in
mortality over this relatively short period [106]. In the United States, a registry that
included 350 hospitals from 2001 to 2003 showed 11% lower
in-hospital mortality for each 10% improvement in hospital-level
adherence to guideline-based treatment, with most of those
treatment recommendations based on clinical trial results
There is no such thing as a perfect
study. However, a well thought-out, well-designed, appropriately
conducted and analyzed clinical trial is an effective tool. While
even well designed clinical trials are not infallible, they
generally provide a sounder rationale for intervention than is
obtainable by other research methods. On the other hand, poorly
designed, conducted, and reported trials can be misleading. Also,
without supporting evidence, no single study ought to be
definitive. When interpreting the results of a trial, consistency
with data from laboratory, animal, epidemiological, and other
clinical research must be considered.
Some have claimed that observational
studies provide the “correct” answer more often than not and that
therefore clinical trials are often superfluous [108, 109].
Others have pointed out that sometimes, results of observational
studies and clinical trials are inconsistent. Observational
studies, many of them large, suggested that use of antioxidants
would reduce the risk of cancer and heart disease. These agents
began to be widely used as a result. Later, large randomized
controlled trials evaluating many of the antioxidants demonstrated
no benefit or even harm [110].
Similarly, because of the results from observational studies,
hormone therapy was advocated for post-menopausal women as a way to
prevent or reduce heart disease. Results of large clinical trials
[111–113] cast considerable doubt on the findings
from the observational studies. Whether the differences are due to
the inherent limitations of observational studies (see Chap.
5) or more specifically to the
“healthy user bias” has been debated, but these and numerous other
examples [114] support the belief
that observational studies are unreliable in determining modest
intervention effects.
We believe that pitting one kind of
clinical research against another is inappropriate. Both
observational epidemiology studies, including registries, and
clinical trials have their strengths and weaknesses; both have
their place [115]. Proper
understanding of the strengths and weaknesses of clinical trials,
and how the results of well-designed and conducted trials can be
used in conjunction with other research methodologies, is by far
the best way of improving public health and scientific
Problems in the Timing of a Trial
Once drugs and procedures of unproved
clinical benefit have become part of general medical practice,
performing an adequate clinical trial becomes difficult ethically
and logistically. Some people advocate instituting clinical trials
as early as possible in the evaluation of new therapies
[116, 117]. The trials, however, must be feasible.
Assessing feasibility takes into account several factors. Before
conducting a trial, an investigator needs to have the necessary
knowledge and tools. He must know something about the expected
adverse effects of the intervention and what outcomes to assess and
have the techniques to do so. Well run clinical trials of adequate
magnitude are costly, and therefore almost always require sponsors
willing to pay for them, and should be done only when preliminary
evidence of the efficacy and harm of an intervention looks
promising enough to warrant the effort and expense involved.
Another aspect of timing is
consideration of the relative stability of the intervention. If
active research will be likely to make the intended intervention
outmoded in a short time, studying such an intervention may be
inappropriate. This is particularly true in long-term clinical
trials, or studies that take many months to develop. One of the
criticisms of trials of surgical interventions has been that
surgical methods are constantly being improved. Evaluating an
operative technique of several years past, when a study was
initiated, may not reflect the current status of surgery
These issues were raised years ago in
connection with the Veterans Administration study of coronary
artery bypass surgery [121]. The
trial showed that surgery was beneficial in subgroups of patients
with left main coronary artery disease and three vessel disease,
but not overall [121–123].
Critics of the trial argued that when the trial was started, the
surgical techniques were still evolving. Therefore, surgical
mortality in the study did not reflect what occurred in actual
practice at the end of the long-term trial. In addition, there were
wide differences in surgical mortality between the cooperating
clinics [124] that may have been
related to the experience of the surgeons. Defenders of the study
maintained that the surgical mortality in the Veterans
Administration hospitals was not very different from the national
experience at the time [125]. In
the Coronary Artery Surgery Study [126] surgical mortality was lower than in the
Veterans Administration trial, suggesting better technique. The
control group mortality, however, was also lower. Despite
continuing evolving technology, including the development of
drug-eluting stents, many trials of coronary stents have been
successfully undertaken [127,
128]. The changes in stent design
and use of medications to limit stent thrombosis have been
incorporated into each new trial.
Review articles show that surgical
trials have been successfully undertaken [129, 130]
and, despite challenges, can and should be conducted
[131, 132]. While the best approach might be to
postpone a trial until a procedure has reached is the point where
it is unlikely to change greatly, at least in the near term, such a
postponement will probably mean waiting until the procedure has
been widely accepted as efficacious for some indication, thus
making it difficult, if not impossible to conduct the trial.
However, as noted by Chalmers and Sacks [133], allowing for improvements in operative
techniques in a clinical trial is possible. As in all aspects of
conducting a clinical trial, judgment must be used in determining
the proper time to evaluate an intervention.
Study Protocol
Every well-designed clinical trial
requires a protocol. The study protocol can be viewed as a written
agreement between the investigator, the participant, and the
scientific community. The contents provide the background, specify
the objectives, and describe the design and organization of the
trial. Every detail explaining how the trial is carried out does
not need to be included, provided that a comprehensive manual of
procedures contains such information. The protocol serves as a
document to assist communication among those working in the trial.
It should also be made available to others upon request. Many
protocols are now being published in on-line journals.
The protocol should be developed
before the beginning of participant enrollment and should remain
essentially unchanged except perhaps for minor updates. Careful
thought and justification should go into any changes. Major
revisions which alter the direction of the trial should be rare. If
they occur, the rationale behind such changes and the process by
which they are made need to be clearly described. An example is the
Cardiac Arrhythmia Suppression Trial, which, on the basis of
important study findings, changed intervention, participant
eligibility criteria, and sample size [134].
Numerous registries of clinical trials
now exist. The WHO International Clinical Trials Registry Platform
(ICTRP) [135] lists those
registries, including [136], one of the original registries that are
acceptable to the International Committee of Medical Journal
Editors. Registration of all late phase trials and many early phase
studies is now advocated, and indeed required by many journals and
sponsors. Journals will not publish results of trials or study
design papers unless the study has been registered at one of the
many sites. The U.S. National Institutes of Health requires that
trials that it funds be registered [137], as does the Food and Drug Administration
for trials it oversees [138]. The
registry sites have, at a minimum, information about the study
population, intervention and control, response variables, and other
key elements of the study design. Reasons for registering trials
include reducing the likelihood that trial results are not
published or otherwise made known, providing a way to compare the
study design as initially described with what was published, and
allowing other researchers to determine what else is happening in
their area of interest. From the registry, we
know that the majority (62%) of registered trials enroll 100 or
fewer participants, the majority of trials (66%) are single center,
and there is substantial variability in use of randomization,
blinding, and use of monitoring committees [139]. We applaud the practice of registration,
and encourage all investigators to go further by including links to
their protocols at the registry sites. See Chap. 22 for a further discussion of trial
A guidance for developing a clinical
trials protocol has been published by the Standard Protocol Items:
Recommendations for Interventional Trials (SPIRIT 2013 Statement)
[25]. Topic headings of a typical
protocol which also serve as an outline of the subsequent chapters
in this book are given below:
- A.
Background of the study
- B.
- 1.
Primary question and response variable
- 2.
Secondary questions and response variables
- 3.
Subgroup hypotheses
- 4.
Adverse effects
- 1.
- C.
Design of the study
- 1.
Study population
- (a)
Inclusion criteria
- (b)
Exclusion criteria
- (a)
- 2.
Sample size assumptions and estimates
- 3.
Enrollment of participants
- (a)
Informed consent
- (b)
Assessment of eligibility
- (c)
Baseline examination
- (d)
Intervention allocation (e.g., randomization method)
- (a)
- 4.
- (a)
Description and schedule
- (b)
Measures of compliance
- (a)
- 5.
Follow-up visit description and schedule
- 6.
Ascertainment of response variables
- (a)
- (b)
Data collection
- (c)
Quality control
- (a)
- 7.
Assessment of Adverse Events
- (a)
Type and frequency
- (b)
- (c)
- (a)
- 8.
Data analysis
- (a)
Interim monitoring, including data monitoring committee role
- (b)
Final analysis
- (a)
- 9.
Termination policy
- 1.
- D.
- 1.
Participating investigators
- (a)
Statistical unit or data coordinating center
- (b)
Laboratories and other special units
- (c)
Clinical center(s)
- (a)
- 2.
Study administration
- (a)
Steering committees and subcommittees
- (b)
Monitoring committee
- (c)
Funding organization
- (a)
- 1.
Definitions of eligibility
Definitions of response
Informed Consent Form
