“A spider’s web”

It is to earlier diagnosis that we must look for any material improvement in our cancer cures.

—John Lockhart-Mummery, 1926

The greatest need we have today in the human cancer problem, except for a universal cure, is a method of detecting the presence of cancer before there are any clinical signs of symptoms.

—Sidney Farber, letter to Etta Rosensohn,
November 1962

Lady, have you been “Paptized”?

New York Amsterdam News,
on Pap smears, 1957

The long, slow march of carcinogenesis—the methodical, step-by-step progression of early-stage lesions of cancer into frankly malignant cells—inspired another strategy to prevent cancer. If cancer truly slouched to its birth, as Auerbach suspected, then perhaps one could still intervene on that progression in its earliest stages—by attacking precancer rather than cancer. Could one thwart the march of carcinogenesis in midstep?

Few scientists had studied this early transition of cancer cells as intensively as George Papanicolaou, a Greek cytologist at Cornell University in New York. Robust, short, formal, and old-worldly, Papanicolaou had trained in medicine and zoology in Athens and in Munich and arrived in New York in 1913. Penniless off the boat, he had sought a job in a medical laboratory but had been relegated to selling carpets at the Gimbels store on Thirty-third Street to survive. After a few months of truly surreal labor (he was, by all accounts, a terrible carpet salesman), Papanicolaou secured a research position at Cornell that may have been just as surreal as carpet selling: he was assigned to study the menstrual cycle of guinea pigs, a species that neither bleeds visibly nor sheds tissue during menses. Using a nasal speculum and Q-tips, Papanicolaou had nonetheless learned to scrape off cervical cells from guinea pigs and spread them on glass slides in thin, watery smears.

The cells, he found, were like minute watch-hands. As hormones rose and ebbed in the animals cyclically, the cells shed by the guinea pig cervix changed their shapes and sizes cyclically as well. Using their morphology as a guide, he could foretell the precise stage of the menstrual cycle often down to the day.

By the late 1920s, Papanicolaou had extended his technique to human patients. (His wife, Maria, in surely one of the more grisly displays of conjugal fortitude, reportedly allowed herself to be tested by cervical smears every day.) As with guinea pigs, he found that cells sloughed off by the human cervix could also foretell the stages of the menstrual cycle in women.

But all of this, it was pointed out to him, amounted to no more than an elaborate and somewhat useless invention. As one gynecologist archly remarked, “in primates, including women,” a diagnostic smear was hardly needed to calculate the stage or timing of the menstrual cycle. Women had been timing their periods—without Papanicolaou’s cytological help—for centuries.

Disheartened by these criticisms, Papanicolaou returned to his slides. He had spent nearly a decade looking obsessively at normal smears; perhaps, he reasoned, the real value of his test lay not in the normal smear, but in pathological conditions. What if he could diagnose a pathological state with his smear? What if the years of staring at cellular normalcy had merely been a prelude to allow him to identify cellular abnormalities?

Papanicolaou thus began to venture into the world of pathological conditions, collecting slides from women with all manners of gynecological diseases—fibroids, cysts, tubercles, inflammations of the uterus and cervix, streptococcal, gonococcal, and staphylococcal infections, tubal pregnancies, abnormal pregnancies, benign and malignant tumors, abscesses and furuncles, hoping to find some pathological mark in the exfoliated cells.

Cancer, he found, was particularly prone to shedding abnormal cells. In nearly every case of cervical cancer, when Papanicolaou brushed cells off the cervix, he found “aberrant and bizarre forms” with abnormal, bloated nuclei, ruffled membranes, and shrunken cytoplasm that looked nothing like normal cells. It “became readily apparent,” he wrote, that he had stumbled on a new test for malignant cells.

Thrilled by his results, Papanicolaou published his method in an article entitled “New Cancer Diagnosis” in 1928. But the report, presented initially at an outlandish “race betterment” eugenics conference, generated only further condescension from pathologists. The Pap smear, as he called the technique, was neither accurate nor particularly sensitive. If cervical cancer was to be diagnosed, his colleagues argued, then why not perform a biopsy of the cervix, a meticulous procedure that, even if cumbersome and invasive, was considered far more precise and definitive than a grubby smear? At academic conferences, experts scoffed at the crude alternative. Even Papanicolaou could hardly argue the point. “I think this work will be carried a little further,” he wrote self-deprecatingly at the end of his 1928 paper. Then, for nearly two decades, having produced two perfectly useless inventions over twenty years, he virtually disappeared from the scientific limelight.

Images

Between 1928 and 1950, Papanicolaou delved back into his smears with nearly monastic ferocity. His world involuted into a series of routines: the daily half-hour commute to his office with Maria at the wheel; the weekends at home in Long Island with a microscope in the study and a microscope on the porch; evenings spent typing reports on specimens with a phonograph playing Schubert in the background and a glass of orange juice congealing on his table. A gynecologic pathologist named Herbert Traut joined him to help interpret his smears. A Japanese fish and bird painter named Hashime Murayama, a colleague from his early years at Cornell, was hired to paint watercolors of his smears using a camera lucida.

For Papanicolaou, too, this brooding, contemplative period was like a personal camera lucida that magnified and reflected old experimental themes onto new ones. A decades-old thought returned to haunt him: if normal cells of the cervix changed morphologically in graded, stepwise fashion over time, might cancer cells also change morphologically in time, in a slow, stepwise dance from normal to malignant? Like Auerbach (whose work was yet to be published), could he identify intermediate stages of cancer—lesions slouching their way toward full transformation?

At a Christmas party in the winter of 1950, challenged by a tipsy young gynecologist in his lab to pinpoint the precise use of the smear, Papanicolaou verbalized a strand of thought that he had been spinning internally for nearly a decade. The thought almost convulsed out of him. The real use of the Pap smear was not to find cancer, but rather to detect its antecedent, its precursor—the portent of cancer.

It was a revelation,” one of his students recalled. “A Pap smear would give a woman a chance to receive preventive care [and] greatly decrease the likelihood of her ever developing cancer.” Cervical cancer typically arises in an outer layer of the cervix, then grows in a flaky, superficial whirl before burrowing inward into the surrounding tissues. By sampling asymptomatic women, Papanicolaou speculated that his test, albeit imperfect, might capture the disease at its first stages. He would, in essence, push the diagnostic clock backward—from incurable, invasive cancers to curable, preinvasive malignancies.

Images

In 1952, Papanicolaou convinced the National Cancer Institute to launch the largest clinical trial of secondary prevention in the history of cancer using his smearing technique. Nearly every adult female resident of Shelby County, Tennessee—150,000 women spread across eight hundred square miles—was tested with a Pap smear and followed over time. Smears poured in from hundreds of sites: from one-room doctor’s offices dotted among the horse farms of Germantown to large urban community clinics scattered throughout the city of Memphis. Temporary “Pap clinics” were set up in factories and office buildings. Once collected, the samples were funneled into a gigantic microscope facility at the University of Tennessee, where framed photographs of exemplary normal and abnormal smears had been hung on the walls. Technicians read slides day and night, looking up from the microscopes at the pictures. At the peak, nearly a thousand smears were read every day.

As expected, the Shelby team found its fair share of advanced cancerous lesions in the population. In the initial cohort of about 150,000, invasive cervical cancer was found in 555 women. But the real proof of Papanicolaou’s principle lay in another discovery: astonishingly, 557 women were found to have preinvasive cancers or even precancerous changes—early-stage, localized lesions curable by relatively simple surgical procedures. Nearly all these women were asymptomatic; had they never been tested, they would never have been suspected of harboring preinvasive lesions. Notably, the average age of diagnosis of women with such preinvasive lesions was about twenty years lower than the average age of women with invasive lesions—once again corroborating the long march of carcinogenesis. The Pap smear had, in effect, pushed the clock of cancer detection forward by nearly two decades, and changed the spectrum of cervical cancer from predominantly incurable to predominantly curable.

Images

A few miles from Papanicolaou’s laboratory in New York, the core logic of the Pap smear was being extended to a very different form of cancer. Epidemiologists think about prevention in two forms. In primary prevention, a disease is prevented by attacking its cause—smoking cessation for lung cancer or a vaccine against hepatitis B for liver cancer. In secondary prevention (also called screening), a disease is prevented by screening for its early, presymptomatic stage. The Pap smear was invented as a means of secondary prevention for cervical cancer. But if a microscope could detect a presymptomatic state in scraped-off cervical tissue, then could another means of “seeing” cancer detect an early lesion in another cancer-afflicted organ?

In 1913, a Berlin surgeon named Albert Salomon had certainly tried. A dogged, relentless champion of the mastectomy, Salomon had whisked away nearly three thousand amputated breasts after mastectomies to an X-ray room where he had photographed them after surgery to detect the shadowy outlines of cancer. Salomon had detected stigmata of cancer in his X-rays—microscopic sprinkles of calcium lodged in cancer tissue (“grains of salt,” as later radiologists would call them) or thin crustacean fingerlings of malignant cells reminiscent of the root of the word cancer.

The next natural step might have been to image breasts before surgery as a screening method, but Salomon’s studies were rudely interrupted. Abruptly purged from his university position by the Nazis in the mid-1930s, Salomon escaped the camps to Amsterdam and vanished underground—and so, too, did his shadowy X-rays of breasts. Mammography, as Salomon called his technique, languished in neglect. It was hardly missed: in a world obsessed with radical surgery, since small or large masses in the breast were treated with precisely the same gargantuan operation, screening for small lesions made little sense.

For nearly two decades, the mammogram thus lurked about in the far peripheries of medicine—in France and England and Uruguay, places where radical surgery held the least influence. But by the mid-1960s, with Halsted’s theory teetering uneasily on its pedestal, mammography reentered X-ray clinics in America, championed by pioneering radiographers such as Robert Egan in Houston. Egan, like Papanicolaou, cast himself more as an immaculate craftsman than a scientist—a photographer, really, who was taking photographs of cancer using X-rays, the most penetrating form of light. He tinkered with films, angles, positions, and exposures, until, as one observer put it, “trabeculae as thin as a spider’s web” in the breast could be seen in the images.

But could cancer be caught in that “spider’s web” of shadows, trapped early enough to prevent its spread? Egan’s mammograms could now detect tumors as small as a few millimeters, about the size of a grain of barley. But would screening women to detect such early tumors and extricating the tumors surgically save lives?

Images

Screening trials in cancer are among the most slippery of all clinical trials—notoriously difficult to run, and notoriously susceptible to errors. To understand why, consider the odyssey from the laboratory to the clinic of a screening test for cancer. Suppose a new test has been invented in the laboratory to detect an early, presymptomatic stage of a particular form of cancer, say, the level of a protein secreted by cancer cells into the serum. The first challenge for such a test is technical: its performance in the real world. Epidemiologists think of screening tests as possessing two characteristic performance errors. The first error is overdiagnosis—when an individual tests positive in the test but does not have cancer. Such individuals are called “false positives.” Men and women who falsely test positive find themselves trapped in the punitive stigma of cancer, the familiar cycle of anxiety and terror (and the desire to “do something”) that precipitates further testing and invasive treatment.

The mirror image of overdiagnosis is underdiagnosis—an error in which a patient truly has cancer but does not test positive for it. Underdiagnosis falsely reassures patients of their freedom from disease. These men and women (“false negatives” in the jargon of epidemiology) enter a different punitive cycle—of despair, shock, and betrayal—once their disease, undetected by the screening test, is eventually uncovered when it becomes symptomatic.

The trouble is that overdiagnosis and underdiagnosis are often intrinsically conjoined, locked perpetually on two ends of a seesaw. Screening tests that strive to limit overdiagnosis—by narrowing the criteria by which patients are classified as positive—often pay the price of increasing underdiagnosis because they miss patients that lie in the gray zone between positive and negative. An example helps to illustrate this trade-off. Suppose—to use Egan’s vivid metaphor—a spider is trying to invent a perfect web to capture flies out of the air. Increasing the density of that web, she finds, certainly increases the chances of catching real flies (true positives) but it also increases the chances of capturing junk and debris floating through the air (false positives). Making the web less dense, in contrast, decreases the chances of catching real prey, but every time something is captured, chances are higher that it is a fly. In cancer, where both overdiagnosis and underdiagnosis come at high costs, finding that exquisite balance is often impossible. We want every cancer test to operate with perfect specificity and sensitivity. But the technologies for screening are not perfect. Screening tests thus routinely fail because they cannot even cross this preliminary hurdle—the rate of over- or underdiagnosis is unacceptably high.

Suppose, however, our new test does survive this crucial bottleneck. The rates of overdiagnosis and underdiagnosis are deemed acceptable, and we unveil the test on a population of eager volunteers. Suppose, moreover, that as the test enters the public domain, doctors immediately begin to detect early, benign-appearing, premalignant lesions—in stark contrast to the aggressive, fast-growing tumors seen before the test. Is the test to be judged a success?

No; merely detecting a small tumor is not sufficient. Cancer demonstrates a spectrum of behavior. Some tumors are inherently benign, genetically determined to never reach the fully malignant state; and some tumors are intrinsically aggressive, and intervention at even an early, presymptomatic stage might make no difference to the prognosis of a patient. To address the inherent behavioral heterogeneity of cancer, the screening test must go further. It must increase survival.

Imagine, now, that we have designed a trial to determine whether our screening test increases survival. Two identical twins, call them Hope and Prudence, live in neighboring houses and are offered the trial. Hope chooses to be screened by the test. Prudence, suspicious of overdiagnosis and underdiagnosis, refuses to be screened.

Unbeknownst to Hope and Prudence, identical forms of cancer develop in both twins at the exact same time—in 1990. Hope’s tumor is detected by the screening test in 1995, and she undergoes surgical treatment and chemotherapy. She survives five additional years, then relapses and dies ten years after her original diagnosis, in 2000. Prudence, in contrast, detects her tumor only when she feels a growing lump in her breast in 1999. She, too, has treatment, with some marginal benefit, then relapses and dies at the same moment as Hope in 2000.

At the joint funeral, as the mourners stream by the identical caskets, an argument breaks out among Hope’s and Prudence’s doctors. Hope’s physicians insist that she had a five-year survival: her tumor was detected in 1995 and she died in 2000. Prudence’s doctors insist that her survival was one year: Prudence’s tumor was detected in 1999 and she died in 2000. Yet both cannot be right: the twins died from the same tumor at the exact same time. The solution to this seeming paradox—called lead-time bias—is immediately obvious. Using survival as an end point for a screening test is flawed because early detection pushes the clock of diagnosis backward. Hope’s tumor and Prudence’s tumor possess exactly identical biological behavior. But since doctors detected Hope’s tumor earlier, it seems, falsely, that she lived longer and that the screening test was beneficial.

So our test must now cross an additional hurdle: it must improve mortality, not survival. The only appropriate way to judge whether Hope’s test was truly beneficial is to ask whether Hope lived longer regardless of the time of her diagnosis. Had Hope lived until 2010 (outliving Prudence by a decade), we could have legitimately ascribed a benefit to the test. Since both women died at the exact same moment, we now discover that screening produced no benefit.

A screening test’s path to success is thus surprisingly long and narrow. It must avoid the pitfalls of overdiagnosis and underdiagnosis. It must steer past the narrow temptation to use early detection as an end in itself. Then, it must navigate the treacherous straits of bias and selection. “Survival,” seductively simple, cannot be its end point. And adequate randomization at each step is critical. Only a test capable of meeting all these criteria—proving mortality benefit in a genuinely randomized setting with an acceptable over- and underdiagnosis rate—can be judged a success. With the odds stacked so steeply, few tests are powerful enough to withstand this level of scrutiny and truly provide benefit in cancer.

Images

In the winter of 1963, three men set out to test whether screening a large cohort of asymptomatic women using mammography would prevent mortality from breast cancer. All three, outcasts from their respective fields, were seeking new ways to study breast cancer. Louis Venet, a surgeon trained in the classical tradition, wanted to capture early cancers as a means to avert the large and disfiguring radical surgeries that had become the norm in the field. Sam Shapiro, a statistician, sought to invent new methods to mount statistical trials. And Philip Strax, a New York internist, had perhaps the most poignant of reasons: he had nursed his wife through the torturous terminal stages of breast cancer in the mid-1950s. Strax’s attempt to capture preinvasive lesions using X-rays was a personal crusade to unwind the biological clock that had ultimately taken his wife’s life.

Venet, Strax, and Shapiro were sophisticated clinical trialists: right at the onset, they realized that they would need a randomized, prospective trial using mortality as an end point to test mammography. Methodologically speaking, their trial would recapitulate Doll and Hill’s famous smoking trial of the 1950s. But how might such a trial be logistically run? The Doll and Hill study had been the fortuitous by-product of the nationalization of health care in Great Britain—its stable cohort produced, in large part, by the National Health Service’s “address book” of registered doctors across the United Kingdom. For mammography, in contrast, it was the sweeping wave of privatization in postwar America that provided the opportunity to run the trial. In the summer of 1944, lawmakers in New York unveiled a novel program to provide subscriber-based health insurance to groups of employees in New York. This program, called the Health Insurance Plan (HIP), was the ancestor of the modern HMO.

The HIP filled a great void in insurance. By the mid-1950s, a triad of forces—immigration, World War II, and the Depression—had brought women out of their homes to comprise nearly one-third of the total workforce in New York. These working women sought health insurance, and the HIP, which allowed its enrollees to pool risks and thereby reduce costs, was a natural solution. By the early 1960s, the plan had enrolled more than three hundred thousand subscribers spread across thirty-one medical groups in New York—nearly eighty thousand of them women.

Strax, Shapiro, and Venet were quick to identify the importance of the resource: here was a defined—“captive”—cohort of women spread across New York and its suburbs that could be screened and followed over a prolonged time. The trial was kept deliberately simple: women enrollees in the HIP between the ages of forty and sixty-four were divided into two groups. One group was screened with mammography while the other was left unscreened. The ethical standards for screening trials in the 1960s made the identification of the groups even simpler. The unscreened group—i.e., the one not offered mammography—was not even required to give consent; it could just be enrolled passively in the trial and followed over time.

The trial, launched in December 1963, was instantly a logistic nightmare. Mammography was cumbersome: a machine the size of a full-grown bull; photographic plates like small windowpanes; the slosh and froth of toxic chemicals in a darkroom. The technique was best performed in dedicated X-ray clinics, but unable to convince women to travel to these clinics (many of them located uptown), Strax and Venet eventually outfitted a mobile van with an X-ray machine and parked it in midtown Manhattan, alongside the ice-cream trucks and sandwich vendors, to recruit women into the study during lunch breaks.*

Strax began an obsessive campaign of recruitment. When a subject refused to join the study, he would call, write, and call her again to persuade her to join. The clinics were honed to a machinelike precision to allow thousands of women to be screened in a day:

Interview . . . 5 stations X 12 women per hour = 60 women. . . . Undress-Dress cubicles: 16 cubicles X 6 women per hour = 96 women per hour. Each cubicle provides one square of floor space for dress-undress and contains four clothes lockers for a total of 64. At the close of the ‘circle,’ the woman enters the same cubicle to obtain her clothes and dress. . . . To expedite turnover, the amenities of chairs and mirrors are omitted.”

Curtains rose and fell. Closets opened and closed. Chairless and mirrorless rooms let women in and out. The merry-go-round ran through the day and late into the evening. In an astonishing span of six years, the trio completed a screening that would ordinarily have taken two decades to complete.

If a tumor was detected by mammography, the woman was treated according to the conventional intervention available at the time—surgery, typically a radical mastectomy, to remove the mass (or surgery followed by radiation). Once the cycle of screening and intervention had been completed, Strax, Venet, and Shapiro could watch the experiment unfold over time by measuring breast cancer mortality in the screened versus unscreened groups.

Images

In 1971, eight years after the study had been launched, Strax, Venet, and Shapiro revealed the initial findings of the HIP trial. At first glance, it seemed like a resounding vindication of screening. Sixty-two thousand women had been enrolled in the trial; about half had been screened by mammography. There had been thirty-one deaths in the mammography-screened group and fifty-two deaths in the control group. The absolute number of lives saved was admittedly modest, but the fractional reduction in mortality from screening—almost 40 percent—was remarkable. Strax was ecstatic: “The radiologist,” he wrote, “has become a potential savior of women—and their breasts.”

The positive results of the HIP trial had an explosive effect on mammography. “Within 5 years, mammography has moved from the realm of a discarded procedure to the threshold of widespread application,” a radiologist wrote. At the National Cancer Institute, enthusiasm for screening rose swiftly to a crescendo. Arthur Holleb, the American Cancer Society’s chief medical officer, was quick to note the parallel to the Pap smear. “The time has come,” Holleb announced in 1971, “for the . . . Society to mount a massive program on mammography just as we did with the Pap test. . . . No longer can we ask the people of this country to tolerate a loss of life from breast cancer each year equal to the loss of life in the past ten years in Viet Nam. The time has come for greater national effort. I firmly believe that time is now.”

The ACS’s massive campaign was called the Breast Cancer Detection and Demonstration Project (BCDDP). Notably, this was not a trial but, as its name suggested, a “demonstration.” There was no treatment or control group. The project intended to screen nearly 250,000 women in a single year, nearly eight times the number screened by Strax in three years, in large part to show that it was possible to muscle through mammographic screening at a national level. Mary Lasker backed it strongly, as did virtually every cancer organization in America. Mammography, the “discarded procedure,” was about to become enshrined in the mainstream.

Images

But even as the BCDDP forged ahead, doubts were gathering over the HIP study. Shapiro, recall, had chosen to randomize the trial by placing the “test women” and “control” women into two groups and comparing mortality. But, as was common practice in the sixties, the control group had not been informed of its participation in a trial. It had been a virtual group—a cohort drawn out of the HIP’s records. When a woman had died of breast cancer in the control group, Strax and Shapiro had dutifully updated their ledgers, but—trees falling in statistical forests—the group had been treated as an abstract entity, unaware even of its own existence.

In principle, comparing a virtual group to a real group would have been perfectly fine. But as the trial enrollment had proceeded in the mid-1960s, Strax and Shapiro had begun to worry whether some women already diagnosed with breast cancer might have entered the trial. A screening examination would, of course, be a useless test for such women since they already carried the disease. To correct for this, Shapiro had begun to selectively remove such women from both arms of the trial.

Removing such subjects from the mammography test group was relatively easy: the radiologist could simply ask a woman about her prior history before she underwent mammography. But since the control group was a virtual entity, there could be no virtual asking. It would have to be culled “virtually.” Shapiro tried to be dispassionate and rigorous by pulling equal numbers of women from the two arms of the trial. But in the end, he may have chosen selectively. Possibly, he overcorrected: more patients with prior breast cancer were eliminated from the screened group. The difference was small—only 434 patients in a trial of 30,000—but statistically speaking, fatal. Critics now charged that the excess mortality in the unscreened group was an artifact of the culling. The unscreened group had been mistakenly overloaded with more patients with prior breast cancer—and the excess death in the untreated group was merely a statistical artifact.

Mammography enthusiasts were devastated. What was needed, they admitted, was a fair reevaluation, a retrial. But where might such a trial be performed? Certainly not in the United States—with two hundred thousand women already enrolled in the BCDDP (and therefore not eligible for another trial), and its bickering academic community shadowboxing over the interpretation of shadows. Scrambling blindly out of controversy, the entire community of mammographers overcompensated as well. Rather than build experiments methodically on other experiments, they launched a volley of parallel trials that came tumbling out over each other. Between 1976 and 1992, enormous parallel trials of mammography were launched in Europe: in Edinburgh, Scotland, and in several sites in Sweden—Malmö, Kopparberg, Östergötland, Stockholm, and Göteborg. In Canada, meanwhile, researchers lurched off on their own randomized trial of mammography, called the National Breast Screening Study (CNBSS). As with so much in the history of breast cancer, mammographic trial-running had turned into an arms race, with each group trying to better the efforts of the others.

Images

Edinburgh was a disaster. Balkanized into hundreds of isolated and disconnected medical practices, it was a terrible trial site to begin with. Doctors assigned blocks of women to the screening or control groups based on seemingly arbitrary criteria. Or, worse still, women assigned themselves. Randomization protocols were disrupted. Women often switched between one group and the other as the trial proceeded, paralyzing and confounding any meaningful interpretation of the study as a whole.

The Canadian trial, meanwhile, epitomized precision and attention to detail. In the summer of 1980, a heavily publicized national campaign involving letters, advertisements, and personal phone calls was launched to recruit thirty-nine thousand women to fifteen accredited centers for screening mammography. When a woman presented herself at any such center, she was asked some preliminary questions by a receptionist, asked to fill out a questionnaire, then examined by a nurse or physician, after which her name was entered into an open ledger. The ledger—a blue-lined notebook was used in most clinics—circulated freely. Randomized assignment was thus achieved by alternating lines in that notebook. One woman was assigned to the screened group, the woman on the next line to the control group, the third line to the screened, the fourth to the control, and so forth.

Note carefully that sequence of events: a woman was typically randomized after her medical history and examination. That sequence was neither anticipated nor prescribed in the original protocol (detailed manuals of instruction had been sent to each center). But that minute change completely undid the trial. The allocations that emerged after those nurse interviews were no longer random. Women with abnormal breast or lymph node examinations were disproportionately assigned to the mammography group (seventeen to the mammography group; five to the control arm, at one site). So were women with prior histories of breast cancer. So, too, were women known to be at “high risk” based on their past history or prior insurance claims (eight to mammography; one to control).

The reasons for this skew are still unknown. Did the nurses allocate high-risk women to the mammography group to confirm a suspicious clinical examination—to obtain a second opinion, as it were, by X-ray? Was that subversion even conscious? Was it an unintended act of compassion, an attempt to help high-risk women by forcing them to have mammograms? Did high-risk women skip their turn in the waiting room to purposefully fall into the right line of the allocation book? Were they instructed to do so by the trial coordinators—by their examining doctors, the X-ray technicians, the receptionists?

Teams of epidemiologists, statisticians, radiologists, and at least one group of forensic experts have since pored over those scratchy notebooks to try to answer these questions and decipher what went wrong in the trial. “Suspicion, like beauty, lies in the eye of the beholder,” one of the trial’s chief investigators countered. But there was plenty to raise suspicion. The notebooks were pockmarked with clerical errors: names changed, identities reversed, lines whited out, names replaced or overwritten. Testimonies by on-site workers reinforced these observations. At one center, a trial coordinator selectively herded her friends to the mammography group (hoping, presumably, to do them a favor and save their lives). At another, a technician reported widespread tampering with randomization with women being “steered” into groups. Accusations and counteraccusations flew through the pages of academic journals. “One lesson is clear,” the cancer researcher Norman Boyd wrote dismissively in a summary editorial: “randomization in clinical trials should be managed in a manner that makes subversion impossible.”

But such smarting lessons aside, little else was clear. What emerged from that fog of details was a study even more imbalanced than the HIP study. Strax and Shapiro had faltered by selectively depleting the mammography group of high-risk patients. The CNBSS faltered, skeptics now charged, by succumbing to the opposite sin: by selectively enriching the mammography group with high-risk women. Unsurprisingly, the result of the CNBSS was markedly negative: if anything, more women died of breast cancer in the mammography group than in the unscreened group.

Images

It was in Sweden, at long last, that this stuttering legacy finally came to an end. In the winter of 2007, I visited Malmö, the site for one of the Swedish mammography trials launched in the late 1970s. Perched almost on the southern tip of the Swedish peninsula, Malmö is a bland, gray-blue industrial town set amid a featureless, gray-blue landscape. The bare, sprawling flatlands of Skåne stretch out to its north, and the waters of the Øresund strait roll to the south. Battered by a steep recession in the mid-1970s, the region had economically and demographically frozen for nearly two decades. Migration into and out of the city had shrunk to an astonishingly low 2 percent for nearly twenty years. Malmö had been in limbo with a captive cohort of men and women. It was the ideal place to run a difficult trial.

In 1976, forty-two thousand women enrolled in the Malmö Mammography Study. Half the cohort (about twenty-one thousand women) was screened yearly at a small clinic outside the Malmö General Hospital, and the other half not screened—and the two groups have been followed closely ever since. The experiment ran like clockwork. “There was only one breast clinic in all of Malmö—unusual for a city of this size,” the lead researcher, Ingvar Andersson, recalled. “All the women were screened at the same clinic year after year, resulting in a highly consistent, controlled study—the most stringent study that could be produced.”

In 1988, at the end of its twelfth year, the Malmö study reported its results. Overall, 588 women had been diagnosed with breast cancer in the screened group, and 447 in the control group—underscoring, once again, the capacity of mammography to detect early cancers. But notably, at least at first glance, early detection had not translated into overwhelming numbers of lives saved. One hundred and twenty-nine women had died of breast cancer—sixty-three in the screened and sixty-six in the unscreened—with no statistically discernible difference overall.

But there was a pattern behind the deaths. When the groups were analyzed by age, women above fifty-five years had benefited from screening, with a reduction in breast cancer deaths by 20 percent. In younger women, in contrast, screening with mammography showed no detectable benefit.

This pattern—a clearly discernible benefit for older women, and a barely detectable benefit in younger women—would be confirmed in scores of studies that followed Malmö. In 2002, twenty-six years after the launch of the original Malmö experiment, an exhaustive analysis combining all the Swedish studies was published in the Lancet. In all, 247,000 women had been enrolled in these trials. The pooled analysis vindicated the Malmö results. In aggregate, over the course of fifteen years, mammography had resulted in 20 to 30 percent reductions in breast cancer mortality for women aged fifty-five to seventy. But for women below fifty-five, the benefit was barely discernible.

Mammography, in short, was not going to be the unequivocal “savior” of all women with breast cancer. Its effects, as the statistician Donald Berry describes it, “are indisputable for a certain segment of women—but also indisputably modest in that segment.” Berry wrote, “Screening is a lottery. Any winnings are shared by the minority of women. . . . The overwhelming proportion of women experience no benefit and they pay with the time involved and the risks associated with screening. . . . The risk of not having a mammogram until after age 50 is about the same as riding a bicycle for 15 hours without a helmet.” If all women across the nation chose to ride helmetless for fifteen hours straight, there would surely be several more deaths than if they had all worn helmets. But for an individual woman who rides her bicycle helmetless to the corner grocery store once a week, the risk is so minor that some would dismiss it outright.

In Malmö, at least, this nuanced message has yet to sink in. Many women from the original mammographic cohort have died (of various causes), but mammography, as one Malmö resident described it, “is somewhat of a religion here.” On the windy winter morning that I stood outside the clinic, scores of women—some over fifty-five and some obviously younger—came in religiously for their annual X-rays. The clinic, I suspect, still ran with the same efficiency and diligence that had allowed it, after disastrous attempts in other cities, to rigorously complete one of the most seminal and difficult trials in the history of cancer prevention. Patients streamed in and out effortlessly, almost as if running an afternoon errand. Many of them rode off on their bicycles—oblivious of Berry’s warnings—without helmets.

Images

Why did a simple, reproducible, inexpensive, easily learned technique—an X-ray image to detect the shadow of a small tumor in the breast—have to struggle for five decades and through nine trials before any benefit could be ascribed to it?

Part of the answer lies in the complexity of running early-detection trials, which are inherently slippery, contentious, and prone to error. Edinburgh was undone by flawed randomization; the BCDDP by nonrandomization. Shapiro’s trial was foiled by a faulty desire to be dispassionate; the Canadian trial by a flawed impulse to be compassionate.

Part of the answer lies also in the old conundrum of over- and underdiagnosis—although with an important twist. A mammogram, it turns out, is not a particularly good tool for detecting early breast cancer. Its false-positive and false-negative rates make it far from an ideal screening test. But the fatal flaw in mammography lies in that these rates are not absolute: they depend on age. For women above fifty-five, the incidence of breast cancer is high enough that even a relatively poor screening tool can detect an early tumor and provide a survival benefit. For women between forty and fifty years, though, the incidence of breast cancer sinks to a point that a “mass” detected on a mammogram, more often than not, turns out to be a false positive. To use a visual analogy: a magnifying lens designed to make small script legible does perfectly well when the font size is ten or even six points. But then it hits a limit. At a certain size font, chances of reading a letter correctly become about the same as reading a letter incorrectly. In women above fifty-five, where the “font size” of breast cancer incidence is large enough, a mammogram performs adequately. But in women between forty and fifty, the mammogram begins to squint at an uncomfortable threshold—exceeding its inherent capacity to become a discriminating test. No matter how intensively we test mammography in this group of women, it will always be a poor screening tool.

But the last part of the answer lies, surely, in how we imagine cancer and screening. We are a visual species. Seeing is believing, and to see cancer in its early, incipient form, we believe, must be the best way to prevent it. As the writer Malcolm Gladwell once described it, “This is a textbook example of how the battle against cancer is supposed to work. Use a powerful camera. Take a detailed picture. Spot the tumor as early as possible. Treat it immediately and aggressively. . . . The danger posed by a tumor is represented visually. Large is bad; small is better.”

But powerful as the camera might be, cancer confounds this simple rule. Since metastasis is what kills patients with breast cancer, it is, of course, generally true that the ability to detect and remove premetastatic tumors saves women’s lives. But it is also true that just because a tumor is small does not mean that it is premetastatic. Even relatively small tumors barely detectable by mammography can carry genetic programs that make them vastly more likely to metastasize early. Conversely, large tumors may inherently be genetically benign—unlikely to invade and metastasize. Size matters, in other words—but only to a point. The difference in the behavior of tumors is not just a consequence of quantitative growth, but of qualitative growth.

A static picture cannot capture this qualitative growth. Seeing a “small” tumor and extracting it from the body does not guarantee our freedom from cancer—a fact that we still struggle to believe. In the end, a mammogram or a Pap smear is a portrait of cancer in its infancy. Like any portrait, it is drawn in the hopes that it might capture something essential about the subject—its psyche, its inner being, its future, its behavior. “All photographs are accurate,” the artist Richard Avedon liked to say, “[but] none of them is the truth.”

Images

But if the “truth” of every cancer is imprinted in its behavior, then how might one capture this mysterious quality? How could scientists make that crucial transition between simply visualizing cancer and knowing its malignant potential, its vulnerabilities, its patterns of spread—its future?

By the late 1980s, the entire discipline of cancer prevention appeared to have stalled at this critical juncture. The missing element in the puzzle was a deeper understanding of carcinogenesis—a mechanistic understanding that would explain the means by which normal cells become cancer cells. Chronic inflammation with hepatitis B virus and H. pylori initiated the march of carcinogenesis, but by what route? The Ames test proved that mutagenicity was linked to carcinogenicity, but mutations in which genes, and by what mechanism?

And if such mutations were known, could they be used to launch more intelligent efforts to prevent cancer? Instead of running larger trials of mammography, for instance, could one run smarter trials of mammography—by risk-stratifying women (identifying those with predisposing mutations for breast cancer) such that high-risk women received higher levels of surveillance? Would that strategy, coupled with better technology, capture the identity of cancer more accurately than a simple, static portrait?

Cancer therapeutics, too, had seemingly arrived at the same bottleneck. Huggins and Walpole had shown that knowing the inner machinery of the cancer cell could reveal unique vulnerabilities. But the discovery had to come from the bottom up—from the cancer cell to its therapy. “As the decade ended,” Bruce Chabner, former director of the NCI’s Division of Cancer Treatment, recalled, “it was as if the whole discipline of oncology, both prevention and cure, had bumped up against a fundamental limitation of knowledge. We were trying to combat cancer without understanding the cancer cell, which was like launching rockets without understanding the internal combustion engine.”

But others disagreed. With screening tests still faltering, with carcinogens still at large, and with the mechanistic understanding of cancer in its infancy, the impatience to deploy a large-scale therapeutic attack on cancer grew to its bristling tipping point. A chemotherapeutic poison was a poison was a poison, and one did not need to understand a cancer cell to poison it. So, just as a generation of radical surgeons had once shuttered the blinds around itself and pushed the discipline to its terrifying limits, so, too, did a generation of radical chemotherapists. If every dividing cell in the body needed to be obliterated to rid it of cancer, then so be it. It was a conviction that would draw oncology into its darkest hour.

The Emperor of All Maladies: A Biography of Cancer
titlepage.xhtml
cover.html
001.html
002.html
003.html
004_split_000.html
004_split_001.html
005_split_000.html
005_split_001.html
006.html
007.html
008.html
009_split_000.html
009_split_001.html
010_split_000.html
010_split_001.html
011.html
012.html
013.html
014.html
015_split_000.html
015_split_001.html
016.html
017_split_000.html
017_split_001.html
018.html
019.html
020.html
021.html
022.html
023.html
024.html
025.html
026.html
027.html
028_split_000.html
028_split_001.html
029.html
030_split_000.html
030_split_001.html
031_split_000.html
031_split_001.html
032_split_000.html
032_split_001.html
033.html
034.html
035.html
036.html
037.html
038_split_000.html
038_split_001.html
039.html
040_split_000.html
040_split_001.html
041_split_000.html
041_split_001.html
042.html
043_split_000.html
043_split_001.html
044_split_000.html
044_split_001.html
045_split_000.html
045_split_001.html
046_split_000.html
046_split_001.html
047.html
048.html
049.html
050_split_000.html
050_split_001.html
051_split_000.html
051_split_001.html
052.html
053_split_000.html
053_split_001.html
054_split_000.html
054_split_001.html
055.html
056_split_000.html
056_split_001.html
057_split_000.html
057_split_001.html
058_split_000.html
058_split_001.html
059.html
060_split_000.html
060_split_001.html
061.html
062_split_000.html
062_split_001.html
063.html
insert.html
064.html
065_split_000.html
065_split_001.html
066.html
067.html
068.html
069.html
070.html