In clinical trials, baseline refers to
the status of a participant before the start of intervention.
Baseline data may be measured by interview, questionnaire, physical
examination, laboratory tests, and procedures. Measurement need not
be only numerical in nature. It can also mean classification of
study participants into categories based on factors such as absence
or presence of some trait or condition.
There are multiple uses of the baseline
data. First, there is a need to describe the trial participants so
the readers of the trial publications can determine to which
patient population the findings apply. Second, it is important
methodologically to present relevant baseline data by study group
in order to allow assessment of group comparability and answer the
question whether randomization generated balanced groups. Third,
baseline data are used in statistical analyses to control for any
baseline imbalances. Finally, baseline data form the basis for
subgroup analyses of the trial findings. This chapter is concerned
with these uses of the baseline data.
Fundamental Point
Relevant baseline data should be measured in
all study participants before the start of
intervention.
Uses of Baseline Data
Description of Trial Participants
Because the trial findings in a strict
sense only apply to the participants enrolled in the trial, it is
essential that they are properly and as completely as possible
described. This is done in the typical Table 1 of the results
publication. The description of baseline covariates provides
documentation that can guide cautious extrapolation of trial
findings to other populations with the same medical condition
[1]. A common limitation is that
the characteristics of the excluded participants with the study
condition are seldom reported. To the reader, it would also be
helpful to know what proportion of the study population was
excluded. In other words, what was the recruitment yield? Also of
interest would be a presentation of the reasons for exclusion.
Based on the published information, clinicians need to know to
which of their patients the findings directly apply. They also need
to know the characteristics of the excluded patients with the study
condition, so they can determine whether the trial findings can
reasonably be extrapolated.
The amount of data collected at
baseline depends on the nature of the trial and the purpose for
which the data will be used. As mentioned elsewhere, some trials
have simple protocols which means that detailed documentation of
many baseline variables is omitted and only a few key demographic
and medical variables are ascertained. If such trials are large, it
is reasonable to expect that good balance between groups will be
achieved. Because the goals of these trials are restricted to
answering the primary question and one or two secondary questions,
the other uses for baseline data may not be necessary.
Baseline Comparability
Baseline data allow people to evaluate
whether the study groups were comparable before intervention was
started. The assessment of comparability typically includes
pertinent demographic and socioeconomic characteristics, risk or
prognostic factors, medications, and medical history. This
assessment is necessary in both randomized and nonrandomized
trials. In assessment of comparability in any trial, the
investigator can only look at relevant factors about which she is
aware and were measured. Obviously, those which are unknown cannot
be compared. The baseline characteristics of each group should be
presented in the main results paper of every randomized trial.
Special attention should be given to factors that may influence any
benefit of the study intervention and those that may predict
adverse effects. Full attention to baseline comparability is not
always given. In a review of 206 surgical trials, only 73% reported
baseline data [2]. Moreover, more
than one quarter of those trials reported fewer than five baseline
factors. Altman and Doré, in a review of 80 published randomized
trials, noted considerable variation in the quality of the
reporting of baseline characteristics [3]. Half of those reporting continuous covariates
did not use appropriate measures of variability.
While randomization on average produces
balance between comparison groups, it does not guarantee balance in
every trial or for any specific baseline measure. Clearly,
imbalances are more common in smaller trials and they may raise
questions regarding the validity of the trial outcomes. For
example, a placebo-controlled, double-blind trial in 39
participants with mucopolysaccharidosis type VI reported that the
intervention significantly improved exercise endurance
[4]. However, at baseline the
12-min walk test, the primary outcome measure, showed the distance
walked to be 227 m in the intervention group and 381 m in
the placebo group, a substantial baseline imbalance. A double-blind
placebo-controlled trial in 341 participants with Alzheimer’s
disease evaluated three active treatments—vitamin E, a selective
monoamine oxidase inhibitor, and their combination [5]. At baseline, the Mini-Mental State
Examination (MMSE) score, a variable highly predictive of the
primary outcome, was significantly higher in the placebo group than
in the other two groups, indicating that the placebo patients were
at a lower risk. Imbalances may even exist in large studies. In the
Aspirin Myocardial Infarction Study [6], which had over 4,500 participants, the
aspirin group was at slightly higher overall risk than the placebo
group when prognostic baseline characteristics were examined.
Assessment of baseline comparability is
important in all randomized trials. In studies without
randomization assessing baseline comparability is more problematic.
The known baseline factors may not always have been measured
accurately. Moreover, there are all the other factors that were not
measured or even known. For nonrandomized studies in contrast to
randomized trials one cannot assume balance in the unmeasured
covariates.
The investigator needs to look at
baseline variables in several ways. The simplest is to compare each
variable to make sure that it has reasonably similar distribution
in each study group. Means, medians, and ranges are all convenient
measures. The investigator can also combine the variables, giving
each one an appropriate weight or coefficient, but doing this
presupposes a knowledge of the relative prognostic importance of
the variables. This kind of knowledge can come only from other
studies with very similar populations or by looking at the control
group after the present study is completed. The weighting technique
has the advantage that it can take into account numerous small
differences between groups. If imbalances between most of the
variables are in the same direction, the overall imbalance can turn
out to be large, even though differences in individual variables
are small.
In the 30-center Aspirin Myocardial
Infarction Study which involved over 4,500 participants, each
center can be thought of as a small study with about 150
participants [6]. When the baseline
comparability within each center was reviewed, substantial
differences in almost half the centers were found, some favoring
intervention and some control (Furberg, CD, unpublished data). The
difference between intervention and control groups in predicted
3-year mortality, using the Coronary Drug Project model, exceeded
20% in 5 of the 30 clinics. This analysis illustrates that fairly
large imbalances for known baseline factors can be common in
smaller studies and that they may influence the trial outcome. In
larger trials these study group differences balance out and the
unadjusted primary analyses are reliable. In secondary analyses,
adjustments can be made using regression methods with baseline
covariates. Another approach is to use propensity scores which
combine individual covariates (see Chap. 18).
Identified imbalances do not
invalidate a randomized trial, but they may make interpretation of
results more complicated. In the North American Silver-Coated
Endotracheal Tube trial, a higher number of patients with chronic
obstructive pulmonary disease were randomized to the uncoated tube
group [7]. The accompanying
editorial [8] pointed to this
imbalance as one factor behind the lack of robustness of the
results, which indicated a reduction in the incidence of
ventilator-associated pneumonia. Chronic obstructive pulmonary
disease is a recognized risk factor for ventilator-associated
pneumonia.
It is important to know which baseline
factors may influence the trial outcomes and to determine whether
they were imbalanced and whether observed trends of imbalance
favored one group or the other. The critical baseline factors to
consider ought to be prespecified in the protocol. Reliance on
significance testing as a measure of baseline equivalence is common
[2]. Due to the often large number
of statistical tests, the challenge is to understand the meaning
and importance of observed differences whether or not they are
statistically significant. A nonsignificant baseline group
difference in the history of hemorrhagic stroke could still affect
the treatment outcome in thrombolytic trials [9].
Formal statistical testing of baseline
imbalances was common in the past. However, the consensus has
changed and the position today is that such testing should be
avoided [10–12]. When comparing baseline factors, groups can
never be shown to be identical. Only absence of “significant”
differences can be demonstrated. A review of 80 trials published in
four leading journals, showed that hypothesis tests of baseline
comparability were conducted in 46 of these trials. Of a total of
600 such tests, only 24 (4%) were significant at the 5% level
[3], consistent with what would be
expected by chance.
We agree that testing of baseline
imbalances should not be conducted. However, in the Results section
of the main article, we recommend that in addition to a description
of the study population, special attention should be paid to those
characteristics that are prognostically important.
Controlling for Imbalances in the Analysis
If there is concern that one or two
key prognostic factors may not “balance out” during randomization,
thus yielding imbalanced groups at baseline, the investigator may
conduct a covariate-adjustment on the basis of these factors. In
unadjusted analyses in the Alzheimer trial discussed above, there
were no outcome differences among the groups. After adjustment for
the baseline difference in MMSE, all actively treated groups did
better than placebo by slowing the progression of disease. Chapter
18 reviews the advantages and
disadvantages of covariate adjustment. The point here is that, in
order to make such adjustments, the relevant characteristics of the
participants at baseline must be known and measured. A survey of 50
randomized trials showed that most trials (38 of 50) emphasized the
unadjusted comparisons [13], and
28 presented covariate-adjusted results as a back-up. Of the
remaining 12 trials, 6 gave no unadjusted results. We recommend
presentation of both unadjusted (primary) and adjusted (secondary)
results.
Subgrouping
Often, investigators are interested
not only in the response to intervention in the total study group,
but also in the response in one or more subgroups. Particularly, in
studies in which an overall intervention effect is present,
analysis of results by appropriate subgroup may help to identify
the specific population most likely to benefit from, or be harmed
by, the intervention. Subgrouping may also help to elucidate the
mechanism of action of the intervention. Definition of such
subgroups should rely only on baseline data, not data measured
after initiation of intervention (except for factors such as age or
gender which cannot be altered by the intervention). An example of
a potential problem with establishing subgroups post hoc is the
Canadian Cooperative Study Group trial of aspirin and
sulfinpyrazone in people with cerebral or retinal ischemic attacks
[14]. After noting an overall
benefit from aspirin in reducing continued ischemic attacks or
stroke, the authors observed and reported that the benefit was
restricted to men. In approving aspirin for the indication of
transient ischemic attacks in men, the U.S. Food and Drug
Administration relied on the Canadian Cooperative Study Group. A
subsequent meta-analysis of platelet-active drug trials in the
secondary prevention of cardiovascular disease concluded that the
effect is similar in men and women [15]. However, a later placebo-controlled primary
prevention trial of low-dose aspirin (100 mg on alternate
days) in women reported a favorable aspirin effect on the risk of
stroke, but no overall reduction in risks of myocardial infarction
and cardiovascular death and perhaps benefit in those over 65 years
of age [16]. Thus, this example
illustrates that any conclusions drawn from subgroup hypotheses not
explicitly stated in the protocol should be given less credibility
than those from hypotheses stated a priori. Retrospective subgroup
analysis should serve primarily to generate new hypotheses for
subsequent testing (Chap. 18).
One of the large active-control trials
of rosiglitazone in people with type 2 diabetes reported a
surprising increase in the risk of fractures compared to metformin
or glibenclamide, a risk that was, however, limited to women
[17].In this case, the post hoc
observation was replicated in a subsequent trial of pioglitazone
which showed a similar gender-specific increase compared to placebo
[18]. Additionally, a
meta-analysis confirmed that this class of hypoglycemic agents
doubles the risk of fractures in women without any documented
increase in men [19]. Confirmation
of initial results is important in science.
In their review of 50 clinical trial
reports from four major medical journals, Assmann et al.
[13] noted a large variability in
the presentation of subgroup findings. Thirty-five reported
subgroup analyses. Seventeen of these limited the number of
baseline factors to one; five included seven or more factors.
Eighteen of the 35 included more than one outcome to subgroup
analyses; six reported on six or more outcomes. More than half of
the subgroup reports did not use statistical tests for interaction.
Such tests are critical, since they directly determine whether an
observed treatment difference in an outcome depends on the
participant’s subgroup. Additionally, it was often difficult to
determine whether the subgroup analyses were prespecified or post
hoc.
In a similar survey conducted in 72
randomized surgical trials, 54 subgroup analyses were conducted in
27 of the trials [20]. The
majority of these were post hoc. The investigators featured these
outcome differences in 31 of the 54 subgroup analyses in the
Summary and Conclusions of the publication.
A rapidly emerging field in medicine
is that of pharmacogenetics which holds promise for better
identification of people who may benefit more from a treatment or
who are more likely to develop serious adverse effects
[21]. Until quite recently the
focus was on a limited number of candidate genes due to the high
cost of genotyping, but as technologies have improved attention has
shifted to genome-wide association (GWA) studies of hundreds of
thousands or millions of single-nucleotide polymorphisms (SNPs)
[22]. This approach, and
cost-effective whole-genome sequencing technologies, allows
examination of the whole genome unconstrained by prior hypotheses
on genomic structure or function influencing a given trait
[23]. This has resulted in
discoveries of an enormous number of genotype-phenotype
relationships [24]. Collection of
biologic samples at baseline in large, long-term trials has emerged
as a rich source for such pharmacogenetic studies. However, the
analyses of these samples is a statistical challenge due to the
very large number of variants tested, which is typically dealt with
by requiring very small p-values. Using strict Bonferonni
correction, dividing the standard p < 0.05 by a million or more
genetic variants assayed yields significance thresholds of
5 × 10(−8) which in turn require very large sample sizes to reach a
similarly large replication samples [25]. Interpretation of rare variants detected by
sequencing is even more challenging as the majority of such
variants are present in only one subject even in large studies. In
such cases functional information about the effect of the variant
on the gene and its product and other forms of experimental
evidence can be used to supplement the sequencing data
[26].
Genetic determinants of beneficial
responses to a treatment are increasingly investigated, especially
in cancer. Three cancer drugs, imatinib mesylate, trastuzumab, and
gefitinib, have documented efficacy in subsets of patients with
specific genetic variants (particularly variants in the genomes of
their tumors), while two others, irinotecan and 6-mercaptopurine,
can be toxic in standard doses in other genetically defined subsets
of patients [22]. Availability of
clinical tests for these variants allows treatment to be
cost-effective and more efficacious by limiting recommended use to
those likely to benefit. The strength by which common variants can
influence the risk determination ranges from a several-fold
increased risk compared to those without the variant to a
1,000-fold increase [27].
Replications of the findings are especially important for these
types of studies.
The identification of new genetic
variants associated with serious adverse effects is also a critical
area of investigation. The goal is to identify through genetic
testing those high-risk patients prior to initiation of treatment.
A genome-wide association study identified a SNP within the
SLCO1B1 gene on chromosome
12 linked to dose-dependent, statin-induced myopathy
[28]. Over 60% of all diagnosed
myopathy cases could be linked to the C allele of the SNP
rs4149056, which is present in 15% of the population.
Identification of C allele carriers prior to initiating therapy
could reduce myopathy while retaining treatment benefits by
targeting this group for lower doses or more frequent monitoring of
muscle-related enzymes.
The regulatory agencies are
increasingly relying on subgroup analyses of pharmacogenetic
markers. Presence of specific alleles, deficient gene products,
inherited familial conditions and patterns of drug responses can
provide important efficacy and safety information. As of 2014, the
FDA has included in the labeling this type of subgroup data for
approximately 140 different drugs [29].
The sample size requirements, the
analytic problem of multiplicity (a genome-wide panel may have over
2.5 million SNPs after imputation) and the need for replications
are discussed in Chap. 18.
What Constitutes a True Baseline Measurement?
Screening for Participants
In order to describe accurately the
study participants, baseline data should ideally reflect the true
condition of the participants. Certain information can be obtained
accurately by means of one measurement or evaluation at a baseline
interview and examination. However, for many variables, accurately
determining the participant’s true state is difficult, since the
mere fact of impending enrollment in a trial, random fluctuation or
the baseline examination itself may alter a measurement. For
example, is true blood pressure reflected by a single measurement
taken at baseline? If more than one measurement is made, which one
should be used as the baseline value? Is the average of repeated
measurements recorded over some extended period of time more
appropriate? Does the participant need to be taken off all
medications or be free of other factors which might affect the
determination of a true baseline level?
When resolving these questions, the
screening required to identify eligible potential participants, the
time and cost entailed in this identification, and the specific
uses for the baseline information must be taken into account.
In almost every clinical trial, some
sort of screening of potential participants for trial eligibility
is necessary. This may take place over more than one visit.
Screening eliminates participants who, based on the entrance
criteria, are ineligible for the study. A prerequisite for
inclusion is the participant’s willingness to comply with a
possibly long and arduous study protocol. The participant’s
commitment, coupled with the need for additional measurements of
eligibility criteria, means that intervention allocation usually
occurs later than the time of the investigator’s first contact with
the participant. An added problem may result from the fact that
discussing a study with someone or inviting him to participate in a
clinical trial may alter his state of health. For instance, people
asked to join a study of lipid-lowering agents because they had an
elevated serum LDL cholesterol at a screening examination might
change their diet on their own initiative just because of the fact
they were invited to join the study. Therefore, their serum LDL
cholesterol as determined at baseline, perhaps a month after the
initial screen, may be somewhat lower than usual. Improvement could
happen in many potential candidates for the trial and could affect
the validity of the assumptions used to calculate sample size. As a
result of the modification in participant behavior, there may be
less room for response to the intervention. If the study calls for
a special dietary regimen, this might not be so effective at the
new, lowered LDL cholesterol level. Obviously, these changes occur
not just in the group randomized to the active intervention, but
also in the control group.
Although it may be impossible to avoid
altering the behavior of potential participants, it is often
possible to adjust for such anticipated changes in the study
design. Special care can be taken when discussing studies with
people to avoid sensitizing them. Time between invitation to join a
study and baseline evaluation should be kept to a minimum. People
who have greatly changed their eating habits between the initial
screen and baseline, as determined by a questionnaire at baseline,
can be declared ineligible to join. Alternatively, they can be
enrolled and the required sample size increased. Whatever is done,
these are expensive ways to compensate for the reduced expected
response to the intervention.
Regression Toward the Mean
Sometimes a person’s eligibility for a
study is determined by measuring continuous variables, such as
blood sugar or cholesterol level. If the entrance criterion is a
high or low value, a phenomenon referred to as “regression toward
the mean” is encountered [30].
Regression toward the mean occurs because measurable
characteristics of an individual do not have constant values but
vary. Thus, individuals have days when the measurements are on the
high side and other days when they are on the low side within their
ranges of variability. Because of this variability, although the
population mean for a characteristic may be relatively constant
over time, the locations of individuals within the population
change. If two sets of measurements are made on individuals within
the population, therefore, when the first value is substantially
larger (smaller) than the population mean, the second is likely to
be lower (higher) than the first.
Therefore, whenever participants are
selected from a population on the basis of the cutoff of some
measured characteristic, the mean of a subsequent measurement will
be closer to the population mean than is the first measurement
mean. Furthermore, the more extreme the initial selection criterion
(that is, the further from the population mean), the greater will
be the regression toward the mean at the time of the next
measurement. The “floor-and-ceiling effect” used as an illustration
by Schor [31] is helpful in
understanding this concept. If all the flies in a closed room near
the ceiling in the morning are monitored, then at any subsequent
time during the day more flies will be below where they started
than above. Similarly, if the flies start close to the floor, the
more probable it is for them to be higher, rather than lower, at
any subsequent time.
Cutter [32] gives some nonbiological examples of
regression toward the mean. He presents the case of a series of
three successive tosses of two dice. The average of the first two
tosses is compared with the average of the second and third tosses.
If no selection or cut-off criterion is used, the average of the
first two tosses would, in the long run, be close to the average of
the second and third tosses. However, if a cut-off point is
selected which restricts the third toss to only those instances
where the average of the first and second tosses is nine or
greater, regression toward the mean will occur. The average of the
second and third tosses for this selected group will be less than
the average of the first two tosses for this group.
As with the example of the participant
changing his diet between screening and baseline, this phenomenon
of regression toward the mean can complicate the assessment of
intervention. In another case, an investigator may wish to evaluate
the effects of an antihypertensive agent. She measures blood
pressure once at the baseline examination and enters into the study
only those people with systolic pressures over 150 mmHg. She
then gives a drug and finds on rechecking that most people have
responded with lowered blood pressures. However, when she
re-examines the control group, she finds that most of those people
also have lower pressures. Regression to the mean is the major
explanation for the frequently seen marked mean blood pressure
reduction observed early in the control group. The importance of a
control group is obvious in such situations. An investigator cannot
simply compare preintervention and postintervention values in the
intervention group. She must compare postintervention values in the
intervention group with values obtained at similar times in the
control group.
This regression toward the mean
phenomenon can also lead to a problem discussed previously. Because
of regression, the true values at baseline are less extreme than
the investigator had planned on, and there is less room for
improvement from the intervention. In the blood pressure example,
after randomization, many of the participants may have systolic
blood pressures in the low 140’s or even below 140 rather than
above 150 mmHg. There may be some reluctance to use
antihypertensive agents in people with systolic pressures in the
130s, for example, than in those with higher pressures, and
certainly, the opportunity to demonstrate full effectiveness of the
agent may be lost.
Two approaches to reducing the impact
of regression toward the mean have been used by trials relying on
measurements with large variability, such as blood pressure and
some chemical determinations. One approach is to use a more extreme
value than the entrance criterion when people are initially
screened. Secondly, mean values of multiple measurements at the
same visit or from more than one screening visit have been used to
achieve more stable measurements. In hypertensive trials with a
cutoff of systolic blood pressure of 140 mmHg, only those
whose second and third measure averaged 150 mmHg or greater
would be invited at the first screening visit to the clinic for
further evaluation. The average of two recordings at the second
visit would constitute the baseline value for comparison with
subsequent determinations.
Interim Events
When baseline data are measured too
far in advance of intervention assignment, a study event may occur
in the interim. The participants having events in the interval
between allocation and the actual initiation of intervention would
dilute the results and decrease the chances of finding a
significant difference. In the European Coronary Surgery Study,
coronary artery bypass surgery should have taken place within 3
months of intervention allocation [33]. However, the mean time from randomization
to surgery was 3.9 months. Consequently, of the 21 deaths in
the surgical group in the first 2 years, six occurred before
surgery could be performed. If the response, such as death, is
nonrecurring and this occurs between baseline and the start of
intervention, the number of participants at risk of having the
event later is reduced. Therefore, the investigator needs to be
alert to any event occurring after baseline but before intervention
is instituted. When such an event occurs before randomization, i.e.
allocation to intervention or control, she can exclude the
participant from the study. When the event occurs after allocation,
but before start of intervention, participants should nevertheless
be kept in the study and the event counted in the analysis. Removal
of such participants from the study may bias the outcome. For this
reason, the European Coronary Surgery Study Group kept such
participants in the trial for purposes of analysis. The
inappropriateness of withdrawing participants from data analysis is
discussed more fully in Chap. 18.
Uncertainty About Qualifying Diagnosis
A growing problem in many disease
areas such as arthritis, diabetes and hypertension is finding
potential participants who are not receiving competing treatments
prior to randomization. So-called washout phases are often relied
on in order to determine “true” baseline values.
Particularly difficult are those
studies where baseline factors cannot be completely ascertained
until after intervention has begun. For optimal benefit of
thrombolytic therapy in patients with a suspected acute myocardial
infarction, treatment has to be given within hours. This means that
there is not time to wait for confirmation of the diagnosis with
development of Q-wave abnormalities on the ECG and marked increases
in serum levels of cardiac enzymes. In the Global Utilization of
Streptokinase and Tissue Plasminogen Activator of Occluded Coronary
Arteries (GUSTO) trial treatment had to be given within 6 h
[34]. To confirm the diagnosis,
the investigators had to settle for two less definitive criteria;
chest pain lasting at least 20 min and ST-segment elevations
on the ECG.
The challenge in the National
Institute of Neurological Disorders and Stroke t-PA stroke trial
was to obtain a brain imaging study and to initiate treatment
within 180 min of stroke onset. This time was difficult to
meet and participant enrollment lagged. As a result of a
comprehensive process improvement program at the participating
hospitals, the time between hospital admission and initiation of
treatment was substantially reduced with increased recruitment
yield. Almost half of eligible patients admitted within
125 min of stroke onset were enrolled [35].
Even if an investigator can get
baseline information just before initiating intervention, she may
need to compromise. For instance, being an important prognostic
factor, serum cholesterol level is obtained in most studies of
heart disease. Serum cholesterol levels, however, are temporarily
lowered during the acute phase of a myocardial infarction and
additionally a large number of participants may be on
lipid-lowering therapy. Therefore, in any trial enrolling people
who have just had a myocardial infarction, baseline serum
cholesterol data relate poorly to their usual levels. Only if the
investigator has data on participants from a time before the
myocardial infarction and prior to any initiation of lipid-lowering
therapy would usual cholesterol levels be known. On the other hand,
because she has no reason to expect that one group would have
greater lowering of cholesterol at baseline than the other group,
such levels can certainly tell her whether the study groups are
initially comparable.
Contamination of the Intervention
For many trials of chronic conditions,
it can be difficult to find and enroll newly diagnosed patients. To
meet enrollment goals, investigators often take advantage of
available pools of treated patients. In order to qualify such
patients, they often have to be withdrawn from their treatment. The
advantage of treatment withdrawal is that a true baseline can be
obtained. However, there are ethical issues involved with
withdrawing active treatments (Chap. 2).
An alternative may be to lower the
eligibility criteria for this group of treated patients. In the
Antihypertensive and Lipid Lowering treatment to prevent Heart
Attack Trial (ALLHAT), treated hypertensive patients were enrolled
even if their screening blood pressures were below the treatment
goal blood pressures [36]. It was
assumed that these individuals were truly hypertensive and, thus,
had elevated blood pressures prior to being given antihypertensive
medications. The disadvantage of this approach is that the true
untreated baseline values for blood pressure were unknown.
Medications that participants are
taking may also complicate the interpretation of the baseline data
and restrict the uses to which an investigator can put baseline
data. Determining the proportion of diabetic participants in a
clinical trial based on the number with elevated fasting blood
sugar or HbA1C levels at a baseline examination will
underestimate the true prevalence. People treated with oral
hypoglycemic agents or insulin may have their laboratory values
controlled. Thus, the true prevalence of diabetics would be
untreated participants with elevated blood sugar or
HbA1C and those being treated for their diabetes
regardless of their laboratory values. Similarly, a more accurate
estimate of the prevalence of hypertension would be based on the
number of untreated hypertensive subjects at baseline plus those
receiving antihypertensive treatment.
Withdrawing treatment prior to
enrollment could introduce other potential problems. Study
participants with a supply of the discontinued medications left in
their medicine cabinet may use them during the trial and, thus,
contaminate the findings. Similarly, if they have used other
medications prescribed for the condition under study, they may also
resort to these, whether or not their use is allowed according to
the study protocol. The result may be discordant use in the study
groups. Assessing and adjusting for the concomitant drug use during
a trial can be complex. The use and frequency of use need to be
considered. All of these potential problems are much smaller in
trials of newly diagnosed patients.
Appreciating that, for many
measurements, baseline data may not reflect the participant’s true
condition at the time of baseline, investigators perform the
examination as close to the time of intervention allocation as
possible. Baseline assessment may, in fact, occur shortly after
allocation but prior to the actual start of intervention. The
advantage of such timing is that the investigator does not spend
extra time and money performing baseline tests on participants who
may turn out to be ineligible. The baseline examination then occurs
immediately after randomization and is performed not to exclude
participants, but solely as a baseline reference point. Since
allocation has already occurred, all participants remain in the
trial regardless of the findings at baseline. This reversal of the
usual order is not recommended in single-blind or unblinded
studies, because it raises the possibility of bias during the
examination. If the investigator knows to which group the
participant belongs, she may subconsciously measure characteristics
differently, depending on the group assignment. Furthermore, the
order reversal may unnecessarily prolong the interval between
intervention allocation and its actual start.
Changes of Baseline Measurement
Making use of baseline data will
usually add sensitivity to a study. For example, an investigator
may want to evaluate a new hypoglycemic agent. She can either
compare the mean change in HbA1C from baseline to some
subsequent time in the intervention group against the mean change
in the control group, or simply compare the mean HbA1C
of the two groups at the end of the study. The former method
usually is a more powerful statistical technique because it can
reduce the variability of the response variables. As a consequence,
it may permit either fewer participants to be studied or a smaller
difference between groups to be detected (see Chap. 18).
Evaluation of possible unwanted
effects requires knowledge—or at least tentative ideas—about what
effects might occur. The investigator should record at baseline
those clinical or laboratory features which are likely to be
adversely affected by the intervention. Unexpected adverse effects
might be missed, but the hope is that animal studies or earlier
clinical work will have identified the important factors to be
measured.
References
1.
Pocock SJ, Assmann SE, Enos
LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline
comparisons in clinical trial reporting: current practice and
problems. Stat Med
2002;21:2917–2930.CrossRef
2.
Hall JC, Hall JL. Baseline
comparisons in surgical trials. ANZ J Surg 2002;72:567–569.CrossRef
3.
Altman DG, Doré CJ.
Randomization and baseline comparisons in clinical trials.
Lancet
1990;335:149–153.CrossRef
4.
Harmatz P, Giugliani R,
Schwartz I, et al. Enzyme replacement therapy for
mucopolysaccharidosis VI: A phase 3, randomized, double-blind,
placebo-controlled, multinational study of recombinant human
N-acetylgalactosamine 4-sulfatase (recombinant human arylsulfatase
B or RHASB) and follow-on, open-label extension study. J Pediatr 2006;148:533–539.CrossRef
5.
Sano M, Ernesto C, Thomas RG,
et al. for the Members of the Alzheimer’s Disease Cooperative
Study. A controlled trial of selegiline, alpha-tocopherol, or both
as treatment for Alzheimer’s Disease. N Engl J Med
1997;336:1216–1222.CrossRef
6.
Aspirin Myocardial Infarction
Study Research Group. A randomized, controlled trial of aspirin in
persons recovered from myocardial infarction. JAMA 1980;243:661–669.CrossRef
7.
Kollef MH, Afessa B, Anzueto
A, et al. for the NASCENT Investigation Group. Silver-coated
endotracheal tubes and incidence of ventilator-associated
pneumonia. The NASCENT randomized trial. JAMA 2008;300:805–813.CrossRef
8.
Chastre J. Preventing
ventilator-associated pneumonia. Could silver-coated endotracheal
tubes be the answer? (Editorial) JAMA 2008;300:842–844.CrossRef
9.
Burgess DC, Gebski VJ, Keech
AC. Baseline data in clinical trials. MJA 2003;179:105–107.
10.
Senn S: Testing for baseline
balance in clinical trials. Stat
Med 1994;13:1715–1726.CrossRef
11.
Steyerberg EW, Bossuyt PMM,
Lee KL. Clinical trials in acute myocardial infarction: Should we
adjust for baseline characteristics? Am Heart J
2000;139:745–751.CrossRef
12.
Roberts C, Torgerson DJ.
Understanding controlled trials. Baseline imbalance in randomised
controlled trials. Br Med J
1999;319:185.CrossRef
13.
Assmann SF, Pocock SJ, Enos
LE, Kasten LE. Subgroup analysis and other (mis)uses of baseline
data in clinical trials. Lancet 2000;355:1064–1069.CrossRef
14.
The Canadian Cooperative
Study Group. A randomized trial of aspirin and sulfinpyrazone in
threatened stroke. N Engl J
Med 1978;299:53–59.CrossRef
15.
Antiplatelet Trialists’
Collaboration. Collaborative overview of randomised trials of
antiplatelet therapy - I. Prevention of death, myocardial
infarction, and stroke by prolonged antiplatelet therapy in various
categories of patients. Br Med
J 1994;308:81–106.
16.
Ridker PM, Cook NR, Lee I-M,
et al. A randomized trial of low-dose aspirin in the primary
prevention of cardiovascular disease in women. N Engl J Med
2005;352:1293–1304.CrossRef
17.
Kahn SE, Haffner SM, Heise
MA, et al. for the ADOPT Study Group. Glycemic durability of
rosiglitazone, metformin, or glyburide monotherapy. N Engl J Med
2006;355:2427–2443.
18.
Dormandy JA, Charbonnel B,
Eckland DJA, et al. Secondary prevention of macrovascular events in
patients with type 2 diabetes in the PROactive study (PROspective
pioglitAzone Clinical Trial In macroVascular Events): A randomised
controlled trial. Lancet
2005;366:1279–1289.CrossRef
19.
Loke YK, Singh S, Furberg
CD. Long-term use of thiazolidinediones and fractures in type 2
diabetes: a meta-analysis. CMAJ 2009;180:32–39.CrossRef
20.
Bhandari M, Devereaux PJ, Li
P, et al. Misuse of baseline comparison tests and subgroup analyses
in surgical trials. Clin Orthop
Relat Res. 2006;447:247–251.CrossRef
21.
Johnson JA, Boerwinkle E,
Zineh I, et al. Pharmacogenomics of antihypertensive drugs:
Rationale and design of the Pharmacogenomic Evaluation of
Antihypertensive Responses (PEAR) study. Am Heart J
2009;157:442–449.CrossRef
22.
Grant SF, Hakonarson H.
Recent development in pharmacogenomics: from candidate genes to
genome-wide association studies. Expert Rev Mol Diagn
2007;7:371–393.CrossRef
23.
Donnelly P. Progress and
challenges in genome-wide association studies in humans.
Nature
2008;456:728–731.CrossRef
24.
Lu JT, Campeau PM, Lee BH.
Genotype-phenotype correlation—promiscuity in the era of
next-generation sequencing. N Engl
J Med 2014;371:593–596.CrossRef
25.
Chanock SJ, Manolio T,
Boehnke M, et al. NCI-NHGRI Working Group on Replication in
Association Studies. Replicating genotype-phenotype associations.
Nature
2007;447:655–660.CrossRef
26.
MacArthur DG, Manolio TA,
Dimmock DP, et al. Guidelines for investigating causality of
sequence variants in human disease. Nature 2014;508:469–476.CrossRef
27.
Nelson MR, Bacanu S-A,
Mosteller M, et al. Genome-wide approaches to identify
pharmacogenetic contributions to adverse drug reactions.
Pharmacogenomics J
2009;9:23–33.CrossRef
28.
The SEARCH Collaborative
Group. SLCO1B1 variants and
statin-induced myopathy—A genomewide study. N Engl J Med
2008;359:789–799.CrossRef
29.
The U.S. Food and Drug
Administration. Drugs. Table of pharmacogenomic biomarkers in drug
labeling. Updated 08/18/2014.
www.fda.gov/drugs/scienceresearch/researchareas/pharmacogenetics/ucm083378.htm
30.
James KE. Regression toward
the mean in uncontrolled clinical studies. Biometrics 1973;29:121–130.CrossRef
31.
Schor SS. The
floor-and-ceiling effect. JAMA 1969;207:120.CrossRef
32.
Cutter GR. Some examples for
teaching regression toward the mean from a sampling viewpoint.
Am Stat
1976;30:194–197.
33.
European Coronary Surgery
Study Group. Coronary-artery bypass surgery in stable angina
pectoris: survival at two years. Lancet 1979;i:889–893.
34.
The GUSTO Investigators. An
international randomized trial comparing four thrombolytic
strategies for acute myocardial infarction. N Engl J Med 1993;329:673–682.
(Correction 1994;331:277).
35.
Tilley BC, Lyden PD, Brott
TG, et al. for the National Institute of Neurological Disorders and
Stroke rt-PA Stroke Study Group. Total quality improvement method
for reduction of delays between emergency department admission and
treatment of acute ischemic stroke. Arch Neurol
1997;54:1466–1474.CrossRef
36.
Davis BR, Cutler JA, Gordon
DJ, et al. for the ALLHAT Research Group. Rationale and design of
the Antihypertensive and Lipid Lowering treatment to prevent Heart
Attack Trial (ALLHAT). Am J
Hypertens 1996;9:342–360.