13 Separation
We have been looking at some challenging aspects of computation while assuming that everything of interest is nearby. We haven’t needed to consider distance, because we’ve implicitly assumed it didn’t matter. But as systems get larger and computation speeds get faster, we increasingly have to pay attention to issues of distance.
For most of human history, the speed of communication across long distances has been identical to the speed at which people could travel those distances. The local representative of a remote organization (government, trading company, financial partnership, or the like) necessarily had to operate in a semi-independent fashion much of the time. For example, in colonial times Massachusetts had a royal governor appointed to administer the colony. Although some ministry in London would have been nominally “in charge” of affairs in Massachusetts, London was a sea voyage of many weeks away; so it would have been impractical for the governor to consult the relevant ministry officials for their opinion on each question that needed to be resolved. Instead, the governor would have to exercise his judgment locally, within a relatively broad set of policy guidelines.
In contrast, for that exact same colonial governor who wasn’t consulting London about every single question, it is completely plausible that he might have chosen to consult with a trusted member of his local staff on every single question that needed to be resolved. Turning to a local expert would be particularly likely if the person serving as governor were primarily a ceremonial or “figurehead” governor. Such a person operating on his own would be neither well acquainted with local conditions nor likely to make sound decisions.
A computer scientist would refer to the governor’s relationship with the remote ministry as loosely coupled. In a similar vein, we could refer to the governor’s relationship with the local advisor as tightly coupled. Although the governor could choose to have a loosely coupled relationship with the local advisor—simply consulting that person less often, or not at all—it is impossible for the governor to choose to have a tightly coupled relationship with the ministry in Britain. Distance precludes that possibility.
Distributed Systems
In our modern age of telecommunication, it is easy to imagine that distance-related concerns are only of historic interest. Sometimes it seems as though everyone can be our local contact by simply dialing a telephone. But in the world of computers and data communication, these issues still have a vivid relevance.
We start with the difference between a distributed system and a concurrent system. We have already done a fair bit of thinking about concurrency in chapter 10, as we considered the challenges of coordinating multiple processes. So what’s new here? The governor was interacting with other people whether the other person was local or remote; so in either case there were “people-related” communication issues to resolve. But the remote communications had additional limits. In general, distributed systems are concurrent (involving multiple processes), but a concurrent system is not necessarily distributed. What makes the difference? Autonomy and distance.
Autonomy
Autonomy means that a distributed system is made up of independently operating elements that can fail separately, so that the system has to be prepared to handle a situation in which some elements have failed.
Consider the internet as a whole. It consists of billions of different elements. Those elements include devices like phones, tablets, and laptops that people are using to request information. Those devices are collectively called clients. There are also computers that store information and make it available; as we have mentioned previously, those computers are collectively called servers. Finally, there are many kinds of specialized gadgets that make up the “plumbing” of connections and paths to connect clients with servers.
With so many distinct parts, there is never any point at which all of these elements are functioning at the same time. Something, somewhere, is always broken. But each element functions (or fails) on its own. The marvelous thing is that the internet as a whole still functions anyway. To a first approximation, it simply doesn’t matter that something, somewhere is always broken.
In one sense this is nothing new. For example, we can see something similar in the collection of different ways to travel—the transportation network. The global transportation network always has something broken somewhere, but in general people can still get from point A to point B.
Autonomy also means that different elements are owned and administered separately. If you own a device, you can decide for yourself when it is working and when it is not; you don’t (typically) give up that control simply because you connect the device to the internet. Even when devices are connected to the internet, you can still choose to turn them off or change the services they provide.
Distance
Distance means that the elements of the distributed system are sufficiently far apart that they can’t receive information simultaneously. As we have already noted, we cannot run a tightly coupled system when distance gets large enough. In contrast, in a local (nondistributed) system we may also have various communication mechanisms among processes, but we don’t need to pay attention to either their delays or their failures. Instead, interprocess communication works like just another kind of computational step.
As with the royal governor, distance implies autonomy, but autonomy does not necessarily imply distance. Very distant systems have to run independently of each other; as we have seen, there is no opportunity for the governor to consult with London on every question. But it is equally possible that another official who is working in London operates in essentially the same autonomous way as the colonial governor. Simply being nearby doesn’t require the sacrifice of autonomy.
Standards
Autonomy doesn’t imply anarchy. For two computers to be able to communicate, the two sides must have some amount of agreement about how information is represented:
• Are the bits represented by electrical impulses on a wire, by variations in radio waves, by flashes of light in a glass fiber, or by some other means?
• How is a “1” different from a “0”?
• How rapidly can either side provide bits to the other side?
• Can both sides “speak” at the same time or must they follow some kind of turn-taking?
We know that a person who speaks only English can’t communicate verbally on the phone with a person who speaks only Chinese. In a similar fashion, we can’t expect computers to communicate without some common basis for that communication.
Thus, connecting to a network requires adhering to some technical standards. In addition, there are (usually) legal and behavioral standards we are supposed to meet in order to get connected to someone else. But those standards are relatively loose and afford us a considerable degree of freedom. In our language metaphor, we may have to speak English, but then we are free to discuss what we like.
Distance Revisited
Now let’s consider the problem of distance in more detail. The Mars rovers Spirit and Curiosity offer a particularly vivid example of the challenges of distance in a distributed system. Initially, it might seem like it would be fun for someone on Earth to drive the rover around the Martian surface by remote control, like some kind of video game. Unfortunately, the physics make it impossible except in super-slow-motion. The distance between the Earth and Mars varies, but one reported time for a radio signal to reach Curiosity and return to Earth was 13 minutes and 48 seconds.
Figure 13.1 depicts an interplanetary version of the game “Marco Polo” where one blindfolded player calls out “Marco” and the other players immediately call out “Polo.” The player on Mars is very speedy, instantaneously answering—however they don’t get a chance to respond until almost seven minutes after the blindfolded player on Earth first yells out. What does this mean for driving the rover? Any view available “now” on Earth is actually the view of almost seven minutes ago on Mars, while a control movement made “now” on Earth can’t affect the vehicle on Mars until almost seven minutes from “now.”

An interplanetary game of Marco Polo.
Of course, most of us don’t deal with the problem of remote control for Martian rovers. What is the significance of distance for more earthbound domains of interest? There are two issues. One is that light takes time to travel, and so ordinary notions of simultaneity break down. The other issue is that in the absence of communication, there is no information available about the remote party. We’ll take up each issue in turn.
Light Is Slow
The fastest response you can get from the “other side” of a communication is bounded by the speed of light, and when the distance is substantial that can be a problem. As humans we are used to light being so fast that it is essentially instantaneous. We just can’t perceive the time that light requires to travel human-size distances. But there is an interesting consequence of our success at building ever-faster computing: our machines are now fast enough that they can really “notice” issues related to the speed of light.
A nanosecond is a billionth of a second, which by human standards is a ridiculously short span of time. One handy rule of thumb is that in a nanosecond of elapsed time, light only travels about a foot. Now consider that various laptop computer specifications list their speeds in GHz, and that 1 GHz (1 billion cycles/second) is a frequency where something happens once each nanosecond. Modern computers generally have speeds well above 1 GHz, so that they have multiple things happening each nanosecond—and suddenly light doesn’t seem fast at all. Every time your laptop is taking a step in some process, it’s doing it so quickly that light couldn’t even make it from one side of your keyboard to the other.
Naturally, this problem of “light being slow” is even worse when the distances involved are larger than the distance across a laptop, and those issues get still worse if the communication involves a lot of back-and-forth “chattiness” between the communicating parties. (Surprising as it may seem, computer scientists actually do refer to these back-and-forth interactions as being “chatty.”) In particular, it’s easy to make poor design choices when communication is local, because even many back-and-forth exchanges will still add up to only a small amount of time, perhaps imperceptible in human terms. For example, 50 exchanges that each take only 1/1000 second (that is, 1 millisecond or ms) will collectively add up to 50 ms. That’s only 1/20 of a second, and not a humanly noticeable delay in most circumstances. Successive movie frames appear at roughly that rate, exactly because that’s too fast for them to be seen as separate. But if that exact same communication pattern then happens over much longer distances, it’s easy for the accumulated time to become very noticeable for users.
For example, let’s assume that the communication is happening between Boston and New York. A plausible network round-trip time for a single exchange might well be not 1 ms but 20 ms. At that speed, the same 50 exchanges would now take a whole second, which can be a distressingly long time (“one-one-thousand” said out loud) compared to what previously seemed instantaneous.
Now let’s assume that the communication is happening between Boston and San Francisco. A plausible network round-trip time for a single exchange might well be 200 ms. Those exact same 50 exchanges at 200 ms each would take ten seconds, which would start to prompt thoughts of taking a coffee break rather than breathlessly awaiting the response.
This problem of worse performance with increased distance is not a contrived example. Indeed, I spent a decade working in the marketplace of products that helped people overcome these kinds of inefficiencies. Organizations collectively spent hundreds of millions of dollars each year on solutions to reduce the number of message exchanges required. Slowdown problems caused by distance are often both important and hard to fix.
Is Anyone There?
We just considered the problem that “light is too slow,” which is one of the problems caused by distance. The second issue caused by distance is that we cannot tell the difference between a communicating party that is slow and one that has failed. We send a message and wait for a reply. But what does it mean when we haven’t received a response? We can choose an arbitrary cutoff time to declare that the other party has failed, at least from our perspective. But there’s nothing to guarantee that we’re making that decision correctly. It’s entirely possible that the reply we’ve been awaiting arrives just a split-second after we’ve decided that the other party has failed. Then what should we do?
This second problem can be as difficult to grasp as the speed-of-light problem, and for similar reasons. In our everyday experience, we don’t usually have problems with simultaneity. We can pretty easily establish a real-time two-way communication in most cases—for example, in a face-to-face conversation or a telephone call. The difficulty with network traffic is most readily understood if we imagine a world in which the only communication mechanism is email (without acknowledgment). When sending an email, you know when you receive a reply that the other party received your initial message. But until you receive something back, you really don’t know what the situation is. Has the other party received your message, or is the message still in transit? Has your message been lost so that they don’t even know you sent something to them? If they did receive your message, are they doing anything about it? Even if they did receive your message, and they would like to do something about it, has something gone wrong? (Did they die? Are they too sick to reply? Are they too busy to reply?) Even if they did reply, perhaps something has gone wrong in the reply path. Although the reply may have been sent successfully, perhaps it is taking a long time to be delivered, or perhaps it has been lost, or perhaps in the time since you sent your message your receiving device is turned off, or not in a service area, or otherwise not able to receive the message.
We will have more to say about reliable communication in Chapter 18. For now, we can get by with just the three following observations:
- 1. In the general case, communication over distance is unreliable: messages can be lost or can get out of order.
- 2. There are ingenious techniques available that can hide the messiness of partial delivery and out-of-order delivery, if at least some messages are being successfully received.
- 3. Those techniques are not able to hide or fix the problem of being entirely unable to communicate with the other party. As we will see, some of those kinds of problems lead to unfixable uncertainty.
The Order of Events
Communication in the style of mail or email is called asynchronous. When activities are synchronized, they are happening at the same time—indeed, “synchron” comes from Greek words meaning “same time.” So when something is asynchronous, that means the communication is happening “without same time,” or more colloquially, “at different times.” The underlying reality of data networking is the strange world of asynchronous communication. As was true for communication in colonial times, we simply cannot know reliably what is happening at a remote location.
In fact, the consequences are even stranger than being reduced to email. If we don’t have any kind of synchrony (same-time-ness) anywhere, we don’t have any way to determine if events are simultaneous in two different places. Instead, there’s a different clock at every different place, which puts us into the weird world of Einstein’s special relativity. Without simultaneity we can’t even get a single consistent shared time line that captures all the events of interest at all the various locations. Although each place can track the order of messages sent or received locally, a collection of distant places can’t establish a single consistent order of events based on only what it knows locally. A simple example makes the problem clearer.
Figure 13.2 shows a pattern of communication among four people who are far apart. We see that Alice (A) sends a message to Bob (B), Charles (C) sends a message to Diane (D), and Diane sends a message to Bob. In our ordinary experience, we might well expect that all of these players can construct the same sequencing of those events. But in a distributed system without synchronized clocks, everyone’s knowledge is much more limited.

Message ordering among 4 widely separated people.
For example, assume that each of the four parties is on a different continent and running an ordinary local clock. They’re not using GPS to synchronize. We’ll come back to that kind of solution later. Each party can know the order of all events that happen where they are (we’ll call that local ordering). We also know that message sending is causal: any time there is some message M sent and the same message M received, the event of receiving the message happens after the event of sending the message. (If we ever had a situation where a message was received before it was sent, that would imply some weird kind of clairvoyance or time travel that’s beyond what we currently know how to do.) Together, local ordering and causality let us place some of the events into before-and-after orderings, but it doesn’t always tell us orderings that we might intuitively expect to know.
In this example we can walk through what each of the players knows and doesn’t know. Let’s first consider Alice. Alice knows:
• Bob receives the message after Alice sends it, and
• … that’s pretty much all that Alice knows.
So now let’s consider Charles, whose situation is much like Alice’s: Charles knows:
• Diane receives the message after Charles sends it, and
• … again, that’s pretty much all that Charles knows.
If we now look at Bob and Diane, we can see that they both know more about ordering, but still much less than we might expect in everyday life. Specifically, Bob knows:
• whether he received the message from Alice before he received the message from Diane, or vice versa.
• that each of those senders had to send the corresponding message before Bob could receive it.
But even though Bob knows the order in which he received those two messages, he doesn’t know the order in which they were sent. Bob can’t tell whether Alice sent before Diane, Diane sent before Alice, or they sent simultaneously. Bob also doesn’t know whether Diane received the message from Charles before or after she sent a message to Bob.
There is one case where it’s possible to know a little more: it’s possible that the message to Bob includes some information about the message received from Charles. In that case, Bob knows that Diane must have received the message from Charles before she sent the message to Bob, or else she wouldn’t have been able to include that information from the Charles message. But in all other cases, Bob simply doesn’t know the ordering of actions for Diane.
Finally, let’s look at Diane. Similar to what we saw with Bob, Diane knows:
• the local order of whether the message she received from Charles happened before or after the message she sent to Bob.
• that Charles had to send the corresponding message before Diane could receive it.
But Diane doesn’t know the ordering of the messages received by Bob, nor does she know the ordering of the messages sent by Alice and Charles.
Our everyday experience prompts us to think of the world as a linear sequence of events, with everything happening at a well-defined point that is “before,” “simultaneous with,” or “after” everything else that is of interest. But the world of concurrent processes and distributed systems is intrinsically not like that.
Reaching Agreement
The complex reality of asynchronous communication has some interesting consequences when we consider the actions of our counterpart (the other side of the conversation or message exchange—our correspondent, if you prefer). One consequence is that we can’t reliably tell the difference between a counterpart who’s slow, a counterpart who’s failed, and a counterpart who’s fully functional but isolated from us by some kind of communication failure. From our side of the conversation, they all look alike. We expect some kind of response, but it hasn’t come.
We’ll consider failure in more detail in chapter 16. Here, the core issue isn’t that failures occur; it’s that we can’t reliably judge whether a given situation is, or isn’t, some kind of failure.
Another consequence is that we can’t reliably reach agreement in a distributed system without some kind of timers. Assume for a moment that we have some kind of recipe that lets a collection of distributed processes reach agreement. Computer scientists would call such a recipe a consensus protocol. In the absence of timers, it only takes a single faulty or malicious process to break the recipe so that the other processes can’t reach agreement. This limitation is perhaps a little surprising, so let’s consider what the reasoning is.
Let’s assume that the participants are only trying to agree on the value of one bit—computer scientists would say 0 or 1, but we could also think of it as being a Yes/No decision. If we have a working consensus protocol for just one bit, we can build up much more elaborate decision systems. Our technique would be vaguely similar to what we saw previously in chapter 2 with being able to use many bits to represent any analog value. However, if we can’t build a working consensus protocol for even one bit, then we basically can’t make any decisions at all.
In a system without timers, any consensus protocol can only depend on steps taken by the individual players and on messages sent, because that’s all there is! For any consensus protocol, there must be some particular step or message at which the decision is actually made. That is, before the crucial step or message the group could have decided either “yes” or “no,” but after that particular step or message, there is only one outcome possible. Once we know what the decision point is, we can simply produce a failure right there. If that particular step or message doesn’t happen because of a failure, then the protocol is stuck.
Notice that this is an argument that is mathematical in its nature, not computational. We are effectively saying (and others have proved) that for any consensus protocol, there exists some single failure that will cause it to get stuck … unless we have some kind of a timeout.
The argument doesn’t tell us exactly how to break any particular consensus protocol. It doesn’t tell us where to look for the critical step or message. But it does let us know that we have a three-way choice:
- 1. perfect processes and communication—no failures at all! or
- 2. timers, or
- 3. an unreliable decision process.
If we turn from this theoretical discussion to practical implementations of communication in networks with failures, we find that timers are typically included in those implementations. If you aren’t aware of the theoretical issues, it’s easy to see those timers as merely shortcuts that help speed up some cases where a message has been lost. But instead, theory shows that time and timers are absolutely critical so that our distributed systems can work at all.
Heartbeats
A surprisingly common reaction to the previous theoretical argument is to dismiss it, and move on to finding some practical solution to the problem. However, even simple-seeming solutions can encounter tricky problems. For example, let’s consider the idea of setting up periodic “heartbeats” between communicating parties. These are just simple messages conveying “I’m here” to the other side. When the heartbeat isn’t there any more, we decide that a failure has occurred and switch to a new arrangement. Computer scientists refer to making such a change as failing over and to the event itself as a failover. Figure 13.3 depicts the system before and after a failover.

Normal operation vs. failover.
The top pair shows the normal state of the system with two boxes, agreeing that the left box is the leader. The bottom pair shows the situation where the left box has failed and the right box has taken over.
If we have a heartbeat system in place, does it help? Not always. How should we interpret an absence of heartbeats? One possibility is that the counterpart has failed completely and is no longer operating at all. Another possibility is that the counterpart has slowed down a great deal, and the rate of heartbeats is now very slow. A final possibility is that the counterpart is doing just fine, but some part of the path to the counterpart is broken: for example, the telephone line has been accidentally cut by a backhoe, or a power failure has knocked out critical equipment along the path, or there’s simply not enough capacity somewhere along the path and the heartbeat message is getting dropped. We know only that something has failed, and so nothing is getting through. Unfortunately, we cannot reliably distinguish among the possibilities.
In the worst case, we could have a communication failure so that both parties believe that the other one has failed, even though actually both parties are doing fine. Figure 13.4 shows a bad failover caused by a communication failure.

Bad failover caused by lack of communication.
This bad failover, followed by two parallel leaders, will cause particular problems at the point where the communication failure is fixed. All the parties can now communicate with each other, but part of what they find out is that the two leaders may have both made separate and incompatible changes to the system’s state and history. Depending on what else has happened at that point, it may not be possible to merge the different versions back into a single consistent state.
A bad failover can be a particular problem if the two parties are trying to maintain some data consistently. Both parties will assume that they are the only survivor and will set up another counterpart (so that the data will still be maintained, even if they fail). Instead of two parties cooperating to maintain a single item of data, we will have four parties (two sets of two) to maintain two separate versions of what should be a single set of data. See figure 13.5.

Unwanted duplicate copies caused by bad failover.
Do these problems mean that a heartbeat system always fails in these ways? No. But they do help us see that there are a number of subtle issues to consider when we’re concerned with failures and communication in a distributed system.
Are Earth-Size Distances Small?
We’ve spent some time examining the counterintuitive weirdness of distributed systems. Distributed systems have those weird qualities because of distance, and a common view among people who work on distributed systems is that there’s no way to eliminate those problems. For a large-enough distributed system, that analysis is almost certainly correct: if there really is no way of achieving synchronization among locations, then the weirdness of multiple clocks is unavoidable.
However, we have to be careful not to fall in love with analogies to physics, and the intellectual fun that can come from relating distributed systems to Einstein’s special relativity. It’s useful to remember that almost all interesting distributed systems are actually based on the surface of Earth—which in cosmic terms is a pretty tiny place. So we have to consider the possibility that an alternative approach to synchronized clocks and simultaneity might “only” work at planetary scale. Such an approach, despite its theoretical limitations, would nonetheless mean that essentially all practical systems can ignore distributed-system multiple-clock weirdness.
Google has built a system called Spanner that uses the Global Positioning System (GPS) and atomic clocks as a way of overcoming some of the problems of distance in a distributed system. Although GPS is most familiar as a means of identifying a receiver’s geographic position, its underpinnings include a very accurate and widely distributed clock signal. That globally available clock signal makes it possible for the elements of a distributed system to operate synchronously.
In a Spanner system, there is a common synchronized clock that has some uncertainty in it. Effectively, the relativistic issues that we’ve previously mentioned are transformed from different local clocks into a “fuzziness” of global clock time. Each systemwide clock time is a short interval rather than a single precise instant.
In Spanner, the comparison of times is no longer in terms of exact simultaneity, but instead in terms of overlap. By being less precise about time, we can build a common clock even in a distributed system. When remote elements have common clocks, and messages can be stamped with times from those common clocks, it becomes easier to build consistent distributed systems with consistent views of shared state and events.
It’s still too early to tell how much Spanner’s approach will influence future systems. On the one hand, it’s fairly hard and expensive work to get the GPS signals available to servers and ensure that the clock signal is accurate. On the other hand, that’s work that can be shared among a large collection of servers, rather than having to be done by every different kind of application. And that work might well be worthwhile if it means that time-related problems go away for system designers. Perhaps future distributed applications will be built on a Spanner-inspired distributed operating system.