© Springer International Publishing Switzerland 2015
Lawrence M. Friedman, Curt D. Furberg, David L. DeMets, David M. Reboussin and Christopher B. GrangerFundamentals of Clinical Trials10.1007/978-3-319-18539-2_9

9. Baseline Assessment

Lawrence M. Friedman, Curt D. Furberg2, David L. DeMets3, David M. Reboussin4 and Christopher B. Granger5
(1)
North Bethesda, MD, USA
(2)
Division of Public Health Sciences, Wake Forest School of Medicine, Winston-Salem, NC, USA
(3)
Department Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
(4)
Department of Biostatistics, Wake Forest School of Medicine, Winston-Salem, NC, USA
(5)
Department of Medicine, Duke University, Durham, NC, USA
 
In clinical trials, baseline refers to the status of a participant before the start of intervention. Baseline data may be measured by interview, questionnaire, physical examination, laboratory tests, and procedures. Measurement need not be only numerical in nature. It can also mean classification of study participants into categories based on factors such as absence or presence of some trait or condition.
There are multiple uses of the baseline data. First, there is a need to describe the trial participants so the readers of the trial publications can determine to which patient population the findings apply. Second, it is important methodologically to present relevant baseline data by study group in order to allow assessment of group comparability and answer the question whether randomization generated balanced groups. Third, baseline data are used in statistical analyses to control for any baseline imbalances. Finally, baseline data form the basis for subgroup analyses of the trial findings. This chapter is concerned with these uses of the baseline data.

Fundamental Point

Relevant baseline data should be measured in all study participants before the start of intervention.

Uses of Baseline Data

Description of Trial Participants

Because the trial findings in a strict sense only apply to the participants enrolled in the trial, it is essential that they are properly and as completely as possible described. This is done in the typical Table 1 of the results publication. The description of baseline covariates provides documentation that can guide cautious extrapolation of trial findings to other populations with the same medical condition [1]. A common limitation is that the characteristics of the excluded participants with the study condition are seldom reported. To the reader, it would also be helpful to know what proportion of the study population was excluded. In other words, what was the recruitment yield? Also of interest would be a presentation of the reasons for exclusion. Based on the published information, clinicians need to know to which of their patients the findings directly apply. They also need to know the characteristics of the excluded patients with the study condition, so they can determine whether the trial findings can reasonably be extrapolated.
The amount of data collected at baseline depends on the nature of the trial and the purpose for which the data will be used. As mentioned elsewhere, some trials have simple protocols which means that detailed documentation of many baseline variables is omitted and only a few key demographic and medical variables are ascertained. If such trials are large, it is reasonable to expect that good balance between groups will be achieved. Because the goals of these trials are restricted to answering the primary question and one or two secondary questions, the other uses for baseline data may not be necessary.

Baseline Comparability

Baseline data allow people to evaluate whether the study groups were comparable before intervention was started. The assessment of comparability typically includes pertinent demographic and socioeconomic characteristics, risk or prognostic factors, medications, and medical history. This assessment is necessary in both randomized and nonrandomized trials. In assessment of comparability in any trial, the investigator can only look at relevant factors about which she is aware and were measured. Obviously, those which are unknown cannot be compared. The baseline characteristics of each group should be presented in the main results paper of every randomized trial. Special attention should be given to factors that may influence any benefit of the study intervention and those that may predict adverse effects. Full attention to baseline comparability is not always given. In a review of 206 surgical trials, only 73% reported baseline data [2]. Moreover, more than one quarter of those trials reported fewer than five baseline factors. Altman and Doré, in a review of 80 published randomized trials, noted considerable variation in the quality of the reporting of baseline characteristics [3]. Half of those reporting continuous covariates did not use appropriate measures of variability.
While randomization on average produces balance between comparison groups, it does not guarantee balance in every trial or for any specific baseline measure. Clearly, imbalances are more common in smaller trials and they may raise questions regarding the validity of the trial outcomes. For example, a placebo-controlled, double-blind trial in 39 participants with mucopolysaccharidosis type VI reported that the intervention significantly improved exercise endurance [4]. However, at baseline the 12-min walk test, the primary outcome measure, showed the distance walked to be 227 m in the intervention group and 381 m in the placebo group, a substantial baseline imbalance. A double-blind placebo-controlled trial in 341 participants with Alzheimer’s disease evaluated three active treatments—vitamin E, a selective monoamine oxidase inhibitor, and their combination [5]. At baseline, the Mini-Mental State Examination (MMSE) score, a variable highly predictive of the primary outcome, was significantly higher in the placebo group than in the other two groups, indicating that the placebo patients were at a lower risk. Imbalances may even exist in large studies. In the Aspirin Myocardial Infarction Study [6], which had over 4,500 participants, the aspirin group was at slightly higher overall risk than the placebo group when prognostic baseline characteristics were examined.
Assessment of baseline comparability is important in all randomized trials. In studies without randomization assessing baseline comparability is more problematic. The known baseline factors may not always have been measured accurately. Moreover, there are all the other factors that were not measured or even known. For nonrandomized studies in contrast to randomized trials one cannot assume balance in the unmeasured covariates.
The investigator needs to look at baseline variables in several ways. The simplest is to compare each variable to make sure that it has reasonably similar distribution in each study group. Means, medians, and ranges are all convenient measures. The investigator can also combine the variables, giving each one an appropriate weight or coefficient, but doing this presupposes a knowledge of the relative prognostic importance of the variables. This kind of knowledge can come only from other studies with very similar populations or by looking at the control group after the present study is completed. The weighting technique has the advantage that it can take into account numerous small differences between groups. If imbalances between most of the variables are in the same direction, the overall imbalance can turn out to be large, even though differences in individual variables are small.
In the 30-center Aspirin Myocardial Infarction Study which involved over 4,500 participants, each center can be thought of as a small study with about 150 participants [6]. When the baseline comparability within each center was reviewed, substantial differences in almost half the centers were found, some favoring intervention and some control (Furberg, CD, unpublished data). The difference between intervention and control groups in predicted 3-year mortality, using the Coronary Drug Project model, exceeded 20% in 5 of the 30 clinics. This analysis illustrates that fairly large imbalances for known baseline factors can be common in smaller studies and that they may influence the trial outcome. In larger trials these study group differences balance out and the unadjusted primary analyses are reliable. In secondary analyses, adjustments can be made using regression methods with baseline covariates. Another approach is to use propensity scores which combine individual covariates (see Chap. 18).
Identified imbalances do not invalidate a randomized trial, but they may make interpretation of results more complicated. In the North American Silver-Coated Endotracheal Tube trial, a higher number of patients with chronic obstructive pulmonary disease were randomized to the uncoated tube group [7]. The accompanying editorial [8] pointed to this imbalance as one factor behind the lack of robustness of the results, which indicated a reduction in the incidence of ventilator-associated pneumonia. Chronic obstructive pulmonary disease is a recognized risk factor for ventilator-associated pneumonia.
It is important to know which baseline factors may influence the trial outcomes and to determine whether they were imbalanced and whether observed trends of imbalance favored one group or the other. The critical baseline factors to consider ought to be prespecified in the protocol. Reliance on significance testing as a measure of baseline equivalence is common [2]. Due to the often large number of statistical tests, the challenge is to understand the meaning and importance of observed differences whether or not they are statistically significant. A nonsignificant baseline group difference in the history of hemorrhagic stroke could still affect the treatment outcome in thrombolytic trials [9].
Formal statistical testing of baseline imbalances was common in the past. However, the consensus has changed and the position today is that such testing should be avoided [1012]. When comparing baseline factors, groups can never be shown to be identical. Only absence of “significant” differences can be demonstrated. A review of 80 trials published in four leading journals, showed that hypothesis tests of baseline comparability were conducted in 46 of these trials. Of a total of 600 such tests, only 24 (4%) were significant at the 5% level [3], consistent with what would be expected by chance.
We agree that testing of baseline imbalances should not be conducted. However, in the Results section of the main article, we recommend that in addition to a description of the study population, special attention should be paid to those characteristics that are prognostically important.

Controlling for Imbalances in the Analysis

If there is concern that one or two key prognostic factors may not “balance out” during randomization, thus yielding imbalanced groups at baseline, the investigator may conduct a covariate-adjustment on the basis of these factors. In unadjusted analyses in the Alzheimer trial discussed above, there were no outcome differences among the groups. After adjustment for the baseline difference in MMSE, all actively treated groups did better than placebo by slowing the progression of disease. Chapter 18 reviews the advantages and disadvantages of covariate adjustment. The point here is that, in order to make such adjustments, the relevant characteristics of the participants at baseline must be known and measured. A survey of 50 randomized trials showed that most trials (38 of 50) emphasized the unadjusted comparisons [13], and 28 presented covariate-adjusted results as a back-up. Of the remaining 12 trials, 6 gave no unadjusted results. We recommend presentation of both unadjusted (primary) and adjusted (secondary) results.

Subgrouping

Often, investigators are interested not only in the response to intervention in the total study group, but also in the response in one or more subgroups. Particularly, in studies in which an overall intervention effect is present, analysis of results by appropriate subgroup may help to identify the specific population most likely to benefit from, or be harmed by, the intervention. Subgrouping may also help to elucidate the mechanism of action of the intervention. Definition of such subgroups should rely only on baseline data, not data measured after initiation of intervention (except for factors such as age or gender which cannot be altered by the intervention). An example of a potential problem with establishing subgroups post hoc is the Canadian Cooperative Study Group trial of aspirin and sulfinpyrazone in people with cerebral or retinal ischemic attacks [14]. After noting an overall benefit from aspirin in reducing continued ischemic attacks or stroke, the authors observed and reported that the benefit was restricted to men. In approving aspirin for the indication of transient ischemic attacks in men, the U.S. Food and Drug Administration relied on the Canadian Cooperative Study Group. A subsequent meta-analysis of platelet-active drug trials in the secondary prevention of cardiovascular disease concluded that the effect is similar in men and women [15]. However, a later placebo-controlled primary prevention trial of low-dose aspirin (100 mg on alternate days) in women reported a favorable aspirin effect on the risk of stroke, but no overall reduction in risks of myocardial infarction and cardiovascular death and perhaps benefit in those over 65 years of age [16]. Thus, this example illustrates that any conclusions drawn from subgroup hypotheses not explicitly stated in the protocol should be given less credibility than those from hypotheses stated a priori. Retrospective subgroup analysis should serve primarily to generate new hypotheses for subsequent testing (Chap. 18).
One of the large active-control trials of rosiglitazone in people with type 2 diabetes reported a surprising increase in the risk of fractures compared to metformin or glibenclamide, a risk that was, however, limited to women [17].In this case, the post hoc observation was replicated in a subsequent trial of pioglitazone which showed a similar gender-specific increase compared to placebo [18]. Additionally, a meta-analysis confirmed that this class of hypoglycemic agents doubles the risk of fractures in women without any documented increase in men [19]. Confirmation of initial results is important in science.
In their review of 50 clinical trial reports from four major medical journals, Assmann et al. [13] noted a large variability in the presentation of subgroup findings. Thirty-five reported subgroup analyses. Seventeen of these limited the number of baseline factors to one; five included seven or more factors. Eighteen of the 35 included more than one outcome to subgroup analyses; six reported on six or more outcomes. More than half of the subgroup reports did not use statistical tests for interaction. Such tests are critical, since they directly determine whether an observed treatment difference in an outcome depends on the participant’s subgroup. Additionally, it was often difficult to determine whether the subgroup analyses were prespecified or post hoc.
In a similar survey conducted in 72 randomized surgical trials, 54 subgroup analyses were conducted in 27 of the trials [20]. The majority of these were post hoc. The investigators featured these outcome differences in 31 of the 54 subgroup analyses in the Summary and Conclusions of the publication.
A rapidly emerging field in medicine is that of pharmacogenetics which holds promise for better identification of people who may benefit more from a treatment or who are more likely to develop serious adverse effects [21]. Until quite recently the focus was on a limited number of candidate genes due to the high cost of genotyping, but as technologies have improved attention has shifted to genome-wide association (GWA) studies of hundreds of thousands or millions of single-nucleotide polymorphisms (SNPs) [22]. This approach, and cost-effective whole-genome sequencing technologies, allows examination of the whole genome unconstrained by prior hypotheses on genomic structure or function influencing a given trait [23]. This has resulted in discoveries of an enormous number of genotype-phenotype relationships [24]. Collection of biologic samples at baseline in large, long-term trials has emerged as a rich source for such pharmacogenetic studies. However, the analyses of these samples is a statistical challenge due to the very large number of variants tested, which is typically dealt with by requiring very small p-values. Using strict Bonferonni correction, dividing the standard p < 0.05 by a million or more genetic variants assayed yields significance thresholds of 5 × 10(−8) which in turn require very large sample sizes to reach a similarly large replication samples [25]. Interpretation of rare variants detected by sequencing is even more challenging as the majority of such variants are present in only one subject even in large studies. In such cases functional information about the effect of the variant on the gene and its product and other forms of experimental evidence can be used to supplement the sequencing data [26].
Genetic determinants of beneficial responses to a treatment are increasingly investigated, especially in cancer. Three cancer drugs, imatinib mesylate, trastuzumab, and gefitinib, have documented efficacy in subsets of patients with specific genetic variants (particularly variants in the genomes of their tumors), while two others, irinotecan and 6-mercaptopurine, can be toxic in standard doses in other genetically defined subsets of patients [22]. Availability of clinical tests for these variants allows treatment to be cost-effective and more efficacious by limiting recommended use to those likely to benefit. The strength by which common variants can influence the risk determination ranges from a several-fold increased risk compared to those without the variant to a 1,000-fold increase [27]. Replications of the findings are especially important for these types of studies.
The identification of new genetic variants associated with serious adverse effects is also a critical area of investigation. The goal is to identify through genetic testing those high-risk patients prior to initiation of treatment. A genome-wide association study identified a SNP within the SLCO1B1 gene on chromosome 12 linked to dose-dependent, statin-induced myopathy [28]. Over 60% of all diagnosed myopathy cases could be linked to the C allele of the SNP rs4149056, which is present in 15% of the population. Identification of C allele carriers prior to initiating therapy could reduce myopathy while retaining treatment benefits by targeting this group for lower doses or more frequent monitoring of muscle-related enzymes.
The regulatory agencies are increasingly relying on subgroup analyses of pharmacogenetic markers. Presence of specific alleles, deficient gene products, inherited familial conditions and patterns of drug responses can provide important efficacy and safety information. As of 2014, the FDA has included in the labeling this type of subgroup data for approximately 140 different drugs [29].
The sample size requirements, the analytic problem of multiplicity (a genome-wide panel may have over 2.5 million SNPs after imputation) and the need for replications are discussed in Chap. 18.

What Constitutes a True Baseline Measurement?

Screening for Participants

In order to describe accurately the study participants, baseline data should ideally reflect the true condition of the participants. Certain information can be obtained accurately by means of one measurement or evaluation at a baseline interview and examination. However, for many variables, accurately determining the participant’s true state is difficult, since the mere fact of impending enrollment in a trial, random fluctuation or the baseline examination itself may alter a measurement. For example, is true blood pressure reflected by a single measurement taken at baseline? If more than one measurement is made, which one should be used as the baseline value? Is the average of repeated measurements recorded over some extended period of time more appropriate? Does the participant need to be taken off all medications or be free of other factors which might affect the determination of a true baseline level?
When resolving these questions, the screening required to identify eligible potential participants, the time and cost entailed in this identification, and the specific uses for the baseline information must be taken into account.
In almost every clinical trial, some sort of screening of potential participants for trial eligibility is necessary. This may take place over more than one visit. Screening eliminates participants who, based on the entrance criteria, are ineligible for the study. A prerequisite for inclusion is the participant’s willingness to comply with a possibly long and arduous study protocol. The participant’s commitment, coupled with the need for additional measurements of eligibility criteria, means that intervention allocation usually occurs later than the time of the investigator’s first contact with the participant. An added problem may result from the fact that discussing a study with someone or inviting him to participate in a clinical trial may alter his state of health. For instance, people asked to join a study of lipid-lowering agents because they had an elevated serum LDL cholesterol at a screening examination might change their diet on their own initiative just because of the fact they were invited to join the study. Therefore, their serum LDL cholesterol as determined at baseline, perhaps a month after the initial screen, may be somewhat lower than usual. Improvement could happen in many potential candidates for the trial and could affect the validity of the assumptions used to calculate sample size. As a result of the modification in participant behavior, there may be less room for response to the intervention. If the study calls for a special dietary regimen, this might not be so effective at the new, lowered LDL cholesterol level. Obviously, these changes occur not just in the group randomized to the active intervention, but also in the control group.
Although it may be impossible to avoid altering the behavior of potential participants, it is often possible to adjust for such anticipated changes in the study design. Special care can be taken when discussing studies with people to avoid sensitizing them. Time between invitation to join a study and baseline evaluation should be kept to a minimum. People who have greatly changed their eating habits between the initial screen and baseline, as determined by a questionnaire at baseline, can be declared ineligible to join. Alternatively, they can be enrolled and the required sample size increased. Whatever is done, these are expensive ways to compensate for the reduced expected response to the intervention.

Regression Toward the Mean

Sometimes a person’s eligibility for a study is determined by measuring continuous variables, such as blood sugar or cholesterol level. If the entrance criterion is a high or low value, a phenomenon referred to as “regression toward the mean” is encountered [30]. Regression toward the mean occurs because measurable characteristics of an individual do not have constant values but vary. Thus, individuals have days when the measurements are on the high side and other days when they are on the low side within their ranges of variability. Because of this variability, although the population mean for a characteristic may be relatively constant over time, the locations of individuals within the population change. If two sets of measurements are made on individuals within the population, therefore, when the first value is substantially larger (smaller) than the population mean, the second is likely to be lower (higher) than the first.
Therefore, whenever participants are selected from a population on the basis of the cutoff of some measured characteristic, the mean of a subsequent measurement will be closer to the population mean than is the first measurement mean. Furthermore, the more extreme the initial selection criterion (that is, the further from the population mean), the greater will be the regression toward the mean at the time of the next measurement. The “floor-and-ceiling effect” used as an illustration by Schor [31] is helpful in understanding this concept. If all the flies in a closed room near the ceiling in the morning are monitored, then at any subsequent time during the day more flies will be below where they started than above. Similarly, if the flies start close to the floor, the more probable it is for them to be higher, rather than lower, at any subsequent time.
Cutter [32] gives some nonbiological examples of regression toward the mean. He presents the case of a series of three successive tosses of two dice. The average of the first two tosses is compared with the average of the second and third tosses. If no selection or cut-off criterion is used, the average of the first two tosses would, in the long run, be close to the average of the second and third tosses. However, if a cut-off point is selected which restricts the third toss to only those instances where the average of the first and second tosses is nine or greater, regression toward the mean will occur. The average of the second and third tosses for this selected group will be less than the average of the first two tosses for this group.
As with the example of the participant changing his diet between screening and baseline, this phenomenon of regression toward the mean can complicate the assessment of intervention. In another case, an investigator may wish to evaluate the effects of an antihypertensive agent. She measures blood pressure once at the baseline examination and enters into the study only those people with systolic pressures over 150 mmHg. She then gives a drug and finds on rechecking that most people have responded with lowered blood pressures. However, when she re-examines the control group, she finds that most of those people also have lower pressures. Regression to the mean is the major explanation for the frequently seen marked mean blood pressure reduction observed early in the control group. The importance of a control group is obvious in such situations. An investigator cannot simply compare preintervention and postintervention values in the intervention group. She must compare postintervention values in the intervention group with values obtained at similar times in the control group.
This regression toward the mean phenomenon can also lead to a problem discussed previously. Because of regression, the true values at baseline are less extreme than the investigator had planned on, and there is less room for improvement from the intervention. In the blood pressure example, after randomization, many of the participants may have systolic blood pressures in the low 140’s or even below 140 rather than above 150 mmHg. There may be some reluctance to use antihypertensive agents in people with systolic pressures in the 130s, for example, than in those with higher pressures, and certainly, the opportunity to demonstrate full effectiveness of the agent may be lost.
Two approaches to reducing the impact of regression toward the mean have been used by trials relying on measurements with large variability, such as blood pressure and some chemical determinations. One approach is to use a more extreme value than the entrance criterion when people are initially screened. Secondly, mean values of multiple measurements at the same visit or from more than one screening visit have been used to achieve more stable measurements. In hypertensive trials with a cutoff of systolic blood pressure of 140 mmHg, only those whose second and third measure averaged 150 mmHg or greater would be invited at the first screening visit to the clinic for further evaluation. The average of two recordings at the second visit would constitute the baseline value for comparison with subsequent determinations.

Interim Events

When baseline data are measured too far in advance of intervention assignment, a study event may occur in the interim. The participants having events in the interval between allocation and the actual initiation of intervention would dilute the results and decrease the chances of finding a significant difference. In the European Coronary Surgery Study, coronary artery bypass surgery should have taken place within 3 months of intervention allocation [33]. However, the mean time from randomization to surgery was 3.9 months. Consequently, of the 21 deaths in the surgical group in the first 2 years, six occurred before surgery could be performed. If the response, such as death, is nonrecurring and this occurs between baseline and the start of intervention, the number of participants at risk of having the event later is reduced. Therefore, the investigator needs to be alert to any event occurring after baseline but before intervention is instituted. When such an event occurs before randomization, i.e. allocation to intervention or control, she can exclude the participant from the study. When the event occurs after allocation, but before start of intervention, participants should nevertheless be kept in the study and the event counted in the analysis. Removal of such participants from the study may bias the outcome. For this reason, the European Coronary Surgery Study Group kept such participants in the trial for purposes of analysis. The inappropriateness of withdrawing participants from data analysis is discussed more fully in Chap. 18.

Uncertainty About Qualifying Diagnosis

A growing problem in many disease areas such as arthritis, diabetes and hypertension is finding potential participants who are not receiving competing treatments prior to randomization. So-called washout phases are often relied on in order to determine “true” baseline values.
Particularly difficult are those studies where baseline factors cannot be completely ascertained until after intervention has begun. For optimal benefit of thrombolytic therapy in patients with a suspected acute myocardial infarction, treatment has to be given within hours. This means that there is not time to wait for confirmation of the diagnosis with development of Q-wave abnormalities on the ECG and marked increases in serum levels of cardiac enzymes. In the Global Utilization of Streptokinase and Tissue Plasminogen Activator of Occluded Coronary Arteries (GUSTO) trial treatment had to be given within 6 h [34]. To confirm the diagnosis, the investigators had to settle for two less definitive criteria; chest pain lasting at least 20 min and ST-segment elevations on the ECG.
The challenge in the National Institute of Neurological Disorders and Stroke t-PA stroke trial was to obtain a brain imaging study and to initiate treatment within 180 min of stroke onset. This time was difficult to meet and participant enrollment lagged. As a result of a comprehensive process improvement program at the participating hospitals, the time between hospital admission and initiation of treatment was substantially reduced with increased recruitment yield. Almost half of eligible patients admitted within 125 min of stroke onset were enrolled [35].
Even if an investigator can get baseline information just before initiating intervention, she may need to compromise. For instance, being an important prognostic factor, serum cholesterol level is obtained in most studies of heart disease. Serum cholesterol levels, however, are temporarily lowered during the acute phase of a myocardial infarction and additionally a large number of participants may be on lipid-lowering therapy. Therefore, in any trial enrolling people who have just had a myocardial infarction, baseline serum cholesterol data relate poorly to their usual levels. Only if the investigator has data on participants from a time before the myocardial infarction and prior to any initiation of lipid-lowering therapy would usual cholesterol levels be known. On the other hand, because she has no reason to expect that one group would have greater lowering of cholesterol at baseline than the other group, such levels can certainly tell her whether the study groups are initially comparable.

Contamination of the Intervention

For many trials of chronic conditions, it can be difficult to find and enroll newly diagnosed patients. To meet enrollment goals, investigators often take advantage of available pools of treated patients. In order to qualify such patients, they often have to be withdrawn from their treatment. The advantage of treatment withdrawal is that a true baseline can be obtained. However, there are ethical issues involved with withdrawing active treatments (Chap. 2).
An alternative may be to lower the eligibility criteria for this group of treated patients. In the Antihypertensive and Lipid Lowering treatment to prevent Heart Attack Trial (ALLHAT), treated hypertensive patients were enrolled even if their screening blood pressures were below the treatment goal blood pressures [36]. It was assumed that these individuals were truly hypertensive and, thus, had elevated blood pressures prior to being given antihypertensive medications. The disadvantage of this approach is that the true untreated baseline values for blood pressure were unknown.
Medications that participants are taking may also complicate the interpretation of the baseline data and restrict the uses to which an investigator can put baseline data. Determining the proportion of diabetic participants in a clinical trial based on the number with elevated fasting blood sugar or HbA1C levels at a baseline examination will underestimate the true prevalence. People treated with oral hypoglycemic agents or insulin may have their laboratory values controlled. Thus, the true prevalence of diabetics would be untreated participants with elevated blood sugar or HbA1C and those being treated for their diabetes regardless of their laboratory values. Similarly, a more accurate estimate of the prevalence of hypertension would be based on the number of untreated hypertensive subjects at baseline plus those receiving antihypertensive treatment.
Withdrawing treatment prior to enrollment could introduce other potential problems. Study participants with a supply of the discontinued medications left in their medicine cabinet may use them during the trial and, thus, contaminate the findings. Similarly, if they have used other medications prescribed for the condition under study, they may also resort to these, whether or not their use is allowed according to the study protocol. The result may be discordant use in the study groups. Assessing and adjusting for the concomitant drug use during a trial can be complex. The use and frequency of use need to be considered. All of these potential problems are much smaller in trials of newly diagnosed patients.
Appreciating that, for many measurements, baseline data may not reflect the participant’s true condition at the time of baseline, investigators perform the examination as close to the time of intervention allocation as possible. Baseline assessment may, in fact, occur shortly after allocation but prior to the actual start of intervention. The advantage of such timing is that the investigator does not spend extra time and money performing baseline tests on participants who may turn out to be ineligible. The baseline examination then occurs immediately after randomization and is performed not to exclude participants, but solely as a baseline reference point. Since allocation has already occurred, all participants remain in the trial regardless of the findings at baseline. This reversal of the usual order is not recommended in single-blind or unblinded studies, because it raises the possibility of bias during the examination. If the investigator knows to which group the participant belongs, she may subconsciously measure characteristics differently, depending on the group assignment. Furthermore, the order reversal may unnecessarily prolong the interval between intervention allocation and its actual start.

Changes of Baseline Measurement

Making use of baseline data will usually add sensitivity to a study. For example, an investigator may want to evaluate a new hypoglycemic agent. She can either compare the mean change in HbA1C from baseline to some subsequent time in the intervention group against the mean change in the control group, or simply compare the mean HbA1C of the two groups at the end of the study. The former method usually is a more powerful statistical technique because it can reduce the variability of the response variables. As a consequence, it may permit either fewer participants to be studied or a smaller difference between groups to be detected (see Chap. 18).
Evaluation of possible unwanted effects requires knowledge—or at least tentative ideas—about what effects might occur. The investigator should record at baseline those clinical or laboratory features which are likely to be adversely affected by the intervention. Unexpected adverse effects might be missed, but the hope is that animal studies or earlier clinical work will have identified the important factors to be measured.
References
1.
Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat Med 2002;21:2917–2930.CrossRef
2.
Hall JC, Hall JL. Baseline comparisons in surgical trials. ANZ J Surg 2002;72:567–569.CrossRef
3.
Altman DG, Doré CJ. Randomization and baseline comparisons in clinical trials. Lancet 1990;335:149–153.CrossRef
4.
Harmatz P, Giugliani R, Schwartz I, et al. Enzyme replacement therapy for mucopolysaccharidosis VI: A phase 3, randomized, double-blind, placebo-controlled, multinational study of recombinant human N-acetylgalactosamine 4-sulfatase (recombinant human arylsulfatase B or RHASB) and follow-on, open-label extension study. J Pediatr 2006;148:533–539.CrossRef
5.
Sano M, Ernesto C, Thomas RG, et al. for the Members of the Alzheimer’s Disease Cooperative Study. A controlled trial of selegiline, alpha-tocopherol, or both as treatment for Alzheimer’s Disease. N Engl J Med 1997;336:1216–1222.CrossRef
6.
Aspirin Myocardial Infarction Study Research Group. A randomized, controlled trial of aspirin in persons recovered from myocardial infarction. JAMA 1980;243:661–669.CrossRef
7.
Kollef MH, Afessa B, Anzueto A, et al. for the NASCENT Investigation Group. Silver-coated endotracheal tubes and incidence of ventilator-associated pneumonia. The NASCENT randomized trial. JAMA 2008;300:805–813.CrossRef
8.
Chastre J. Preventing ventilator-associated pneumonia. Could silver-coated endotracheal tubes be the answer? (Editorial) JAMA 2008;300:842–844.CrossRef
9.
Burgess DC, Gebski VJ, Keech AC. Baseline data in clinical trials. MJA 2003;179:105–107.
10.
Senn S: Testing for baseline balance in clinical trials. Stat Med 1994;13:1715–1726.CrossRef
11.
Steyerberg EW, Bossuyt PMM, Lee KL. Clinical trials in acute myocardial infarction: Should we adjust for baseline characteristics? Am Heart J 2000;139:745–751.CrossRef
12.
Roberts C, Torgerson DJ. Understanding controlled trials. Baseline imbalance in randomised controlled trials. Br Med J 1999;319:185.CrossRef
13.
Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet 2000;355:1064–1069.CrossRef
14.
The Canadian Cooperative Study Group. A randomized trial of aspirin and sulfinpyrazone in threatened stroke. N Engl J Med 1978;299:53–59.CrossRef
15.
Antiplatelet Trialists’ Collaboration. Collaborative overview of randomised trials of antiplatelet therapy - I. Prevention of death, myocardial infarction, and stroke by prolonged antiplatelet therapy in various categories of patients. Br Med J 1994;308:81–106.
16.
Ridker PM, Cook NR, Lee I-M, et al. A randomized trial of low-dose aspirin in the primary prevention of cardiovascular disease in women. N Engl J Med 2005;352:1293–1304.CrossRef
17.
Kahn SE, Haffner SM, Heise MA, et al. for the ADOPT Study Group. Glycemic durability of rosiglitazone, metformin, or glyburide monotherapy. N Engl J Med 2006;355:2427–2443.
18.
Dormandy JA, Charbonnel B, Eckland DJA, et al. Secondary prevention of macrovascular events in patients with type 2 diabetes in the PROactive study (PROspective pioglitAzone Clinical Trial In macroVascular Events): A randomised controlled trial. Lancet 2005;366:1279–1289.CrossRef
19.
Loke YK, Singh S, Furberg CD. Long-term use of thiazolidinediones and fractures in type 2 diabetes: a meta-analysis. CMAJ 2009;180:32–39.CrossRef
20.
Bhandari M, Devereaux PJ, Li P, et al. Misuse of baseline comparison tests and subgroup analyses in surgical trials. Clin Orthop Relat Res. 2006;447:247–251.CrossRef
21.
Johnson JA, Boerwinkle E, Zineh I, et al. Pharmacogenomics of antihypertensive drugs: Rationale and design of the Pharmacogenomic Evaluation of Antihypertensive Responses (PEAR) study. Am Heart J 2009;157:442–449.CrossRef
22.
Grant SF, Hakonarson H. Recent development in pharmacogenomics: from candidate genes to genome-wide association studies. Expert Rev Mol Diagn 2007;7:371–393.CrossRef
23.
Donnelly P. Progress and challenges in genome-wide association studies in humans. Nature 2008;456:728–731.CrossRef
24.
Lu JT, Campeau PM, Lee BH. Genotype-phenotype correlation—promiscuity in the era of next-generation sequencing. N Engl J Med 2014;371:593–596.CrossRef
25.
Chanock SJ, Manolio T, Boehnke M, et al. NCI-NHGRI Working Group on Replication in Association Studies. Replicating genotype-phenotype associations. Nature 2007;447:655–660.CrossRef
26.
MacArthur DG, Manolio TA, Dimmock DP, et al. Guidelines for investigating causality of sequence variants in human disease. Nature 2014;508:469–476.CrossRef
27.
Nelson MR, Bacanu S-A, Mosteller M, et al. Genome-wide approaches to identify pharmacogenetic contributions to adverse drug reactions. Pharmacogenomics J 2009;9:23–33.CrossRef
28.
The SEARCH Collaborative Group. SLCO1B1 variants and statin-induced myopathy—A genomewide study. N Engl J Med 2008;359:789–799.CrossRef
29.
The U.S. Food and Drug Administration. Drugs. Table of pharmacogenomic biomarkers in drug labeling. Updated 08/18/2014. www.​fda.​gov/​drugs/​scienceresearch/​researchareas/​pharmacogenetics​/​ucm083378.​htm
30.
James KE. Regression toward the mean in uncontrolled clinical studies. Biometrics 1973;29:121–130.CrossRef
31.
Schor SS. The floor-and-ceiling effect. JAMA 1969;207:120.CrossRef
32.
Cutter GR. Some examples for teaching regression toward the mean from a sampling viewpoint. Am Stat 1976;30:194–197.
33.
European Coronary Surgery Study Group. Coronary-artery bypass surgery in stable angina pectoris: survival at two years. Lancet 1979;i:889–893.
34.
The GUSTO Investigators. An international randomized trial comparing four thrombolytic strategies for acute myocardial infarction. N Engl J Med 1993;329:673–682. (Correction 1994;331:277).
35.
Tilley BC, Lyden PD, Brott TG, et al. for the National Institute of Neurological Disorders and Stroke rt-PA Stroke Study Group. Total quality improvement method for reduction of delays between emergency department admission and treatment of acute ischemic stroke. Arch Neurol 1997;54:1466–1474.CrossRef
36.
Davis BR, Cutler JA, Gordon DJ, et al. for the ALLHAT Research Group. Rationale and design of the Antihypertensive and Lipid Lowering treatment to prevent Heart Attack Trial (ALLHAT). Am J Hypertens 1996;9:342–360.