A World without Form
I WOULD NEVER UNDERSTAND this new world until I was fully grounded in search technology. Craig Silverstein volunteered to school me and a handful of other nontechnical staff with a lunchtime talk he called "Google 7A: Search for non-majors." Craig, a former Stanford grad student with an impish grin and a sense of humor dry to the point of sere, liked to say he was employee 1.5, officially added to the payroll between Larry and Sergey when the company incorporated. Craig also provided our daily bread, wandering the halls on random afternoons wearing a beatific smile and announcing in an interrogatory falsetto, "Breaaddd?" Googlers would scramble from their cubicles to partake of the air-filled, nutrition-free loaf he had just taken from Google's breadmaking machine.
We gathered in the Ping-Pong room, which housed the dark green game table formerly in Susan's garage. The net was down, perhaps knocked over by one of our venture capitalists to clear a place for his laptop during the board meeting the day before. Craig began his lecture as we munched on red licorice and bowls of M&Ms.
"A search engine has three components," he began, writing on the whiteboard beneath a header that read "The Life of a Query."
"First, we have to collect information about what pages exist on the web, which we do through a process called crawling. Our spidering software, which we call Googlebot, jumps from link to link gathering URLs [web addresses] and data on the content that lives at each one. The crawl usually takes about a month, and once it's completed, we have a big bag of stuff that needs to be sorted into a usable list. That's called indexing."
I wrote "crawling" and "indexing" in my lab book and put boxes around them. Next I connected the boxes and turned them into a pair of Ben Franklin–style glasses, with a spider hanging from a thread where the nose would be.
"Once we have an index," Craig continued, "we assign a rank to each page based on its importance with our PageRank algorithm. PageRank is Google's secret sauce."
"Secret sauce?" I leaned forward to learn what we had that was better than all the other search engines that our founders seemed so quick to dismiss.
"PageRank looks at all the pages on the web and assigns a value to them based on who else links to them. The more credible the sites linking to them, the higher the PageRank. That's the first half of the recipe."
I wrote "pageRank" under the Ben Franklin spectacles and drew an oval around it. It looked a little like a clown mouth, so I sketched a skull around it and added some Bozo hair on the sides.
"The second half is how we determine which results are most relevant to the specific query we've received. Most of our competitors look at basic stuff like how many times a word appears on a page. We look at what we know about how sites use that term on their pages. What words appear next to it? Is it in bold type or a different font? How does the term appear in links pointing to those pages? That link analysis is really important. The words in the link pointing to a page are called anchor text."
A chain grew from a corner of my clown's mouth and fell to the bottom of the page, where an anchor suddenly appeared surrounded by grinning fish with barracuda teeth.
"How well we match the query determines our search quality," Craig went on, "which is not an exact science, since evaluating whether a query is a good match is somewhat subjective. If you searched for 'jaguar,' did you mean the car or the cat or the football team? Sometimes it's difficult to disambiguate a query like that."
I wrote down "disambiguate," and said it silently to myself three times so it would become my word. And I drew something that looked vaguely like a spotted jungle cat chasing the fish around the anchor, then added bubbles since he was underwater.
"Once we determine the order of the pages we want to show, we need to serve the results back to the user who submitted the query. That's where gwiss comes in." As he said "gwiss," Craig wrote "GWS" on the whiteboard. Beneath it he wrote "Google Web Server." "Gwiss is the software that actually interacts with users when they submit a query and when we serve results back to them. When we want to update how Google looks to users, we need to push out a new gwiss to implement those changes."
I couldn't think of what a gwiss might look like, so I sketched some Swiss cheese behind the clown head. By the time Craig had finished, I had a broader understanding of the way Google worked and a bizarre doodle to add to my collection of things not to share with my new coworkers.
Later Urs confirmed that Google had kicked butt in search quality even before Larry and Sergey left Stanford in 1998, because link analysis was an alchemist's stone for turning web dross into gold. Google's relevancy lured in early adopters and the media, but behind its beguiling look lay an arthritic infrastructure in danger of collapsing. "The ranking beat AltaVista by a mile," Urs told me, "but it was slow and we couldn't build an index reliably."
The challenge of improving Google's crawling, indexing, and serving systems was what had drawn Urs to the company. He'd figured the project would take about a year and then he'd move back to Europe. "I underestimated how much of a systems problem this whole thing was," he confessed. "We had a university system and we needed to basically rewrite the whole thing from scratch." While Google did a good job with the data it had, it collected far too little and wasn't searching through it fast enough.
Speed or scale. Pick one. When we crawled more web pages, the index got bigger and the pageranker had more data to draw upon, so we could produce more-relevant results. That attracted more users and more searches, so our audience grew. A bigger index, however, required more machines doing more processing, and more processing took more time. Adding users puts more demand on the network, which, as anyone sharing an Internet connection knows, slows things down.
As they did when forced to choose a future for Google as a destination site or a technology provider, Larry and Sergey chose both. Google's quest would be to get faster even as it expanded in all directions. They went looking for others who shared their disdain for the limits imposed by nature's laws.