A World without Form

I WOULD NEVER UNDERSTAND this new world until I was fully grounded in search technology. Craig Silverstein volunteered to school me and a handful of other nontechnical staff with a lunchtime talk he called "Google 7A: Search for non-majors." Craig, a former Stanford grad student with an impish grin and a sense of humor dry to the point of sere, liked to say he was employee 1.5, officially added to the payroll between Larry and Sergey when the company incorporated. Craig also provided our daily bread, wandering the halls on random afternoons wearing a beatific smile and announcing in an interrogatory falsetto, "Breaaddd?" Googlers would scramble from their cubicles to partake of the air-filled, nutrition-free loaf he had just taken from Google's breadmaking machine.

We gathered in the Ping-Pong room, which housed the dark green game table formerly in Susan's garage. The net was down, perhaps knocked over by one of our venture capitalists to clear a place for his laptop during the board meeting the day before. Craig began his lecture as we munched on red licorice and bowls of M&Ms.

"A search engine has three components," he began, writing on the whiteboard beneath a header that read "The Life of a Query."

"First, we have to collect information about what pages exist on the web, which we do through a process called crawling. Our spidering software, which we call Googlebot, jumps from link to link gathering URLs [web addresses] and data on the content that lives at each one. The crawl usually takes about a month, and once it's completed, we have a big bag of stuff that needs to be sorted into a usable list. That's called indexing."

I wrote "crawling" and "indexing" in my lab book and put boxes around them. Next I connected the boxes and turned them into a pair of Ben Franklin–style glasses, with a spider hanging from a thread where the nose would be.

"Once we have an index," Craig continued, "we assign a rank to each page based on its importance with our PageRank algorithm. PageRank is Google's secret sauce."

"Secret sauce?" I leaned forward to learn what we had that was better than all the other search engines that our founders seemed so quick to dismiss.

"PageRank looks at all the pages on the web and assigns a value to them based on who else links to them. The more credible the sites linking to them, the higher the PageRank. That's the first half of the recipe."

I wrote "pageRank" under the Ben Franklin spectacles and drew an oval around it. It looked a little like a clown mouth, so I sketched a skull around it and added some Bozo hair on the sides.

"The second half is how we determine which results are most relevant to the specific query we've received. Most of our competitors look at basic stuff like how many times a word appears on a page. We look at what we know about how sites use that term on their pages. What words appear next to it? Is it in bold type or a different font? How does the term appear in links pointing to those pages? That link analysis is really important. The words in the link pointing to a page are called anchor text."

A chain grew from a corner of my clown's mouth and fell to the bottom of the page, where an anchor suddenly appeared surrounded by grinning fish with barracuda teeth.

"How well we match the query determines our search quality," Craig went on, "which is not an exact science, since evaluating whether a query is a good match is somewhat subjective. If you searched for 'jaguar,' did you mean the car or the cat or the football team? Sometimes it's difficult to disambiguate a query like that."

I wrote down "disambiguate," and said it silently to myself three times so it would become my word. And I drew something that looked vaguely like a spotted jungle cat chasing the fish around the anchor, then added bubbles since he was underwater.

"Once we determine the order of the pages we want to show, we need to serve the results back to the user who submitted the query. That's where gwiss comes in." As he said "gwiss," Craig wrote "GWS" on the whiteboard. Beneath it he wrote "Google Web Server." "Gwiss is the software that actually interacts with users when they submit a query and when we serve results back to them. When we want to update how Google looks to users, we need to push out a new gwiss to implement those changes."

I couldn't think of what a gwiss might look like, so I sketched some Swiss cheese behind the clown head. By the time Craig had finished, I had a broader understanding of the way Google worked and a bizarre doodle to add to my collection of things not to share with my new coworkers.

Later Urs confirmed that Google had kicked butt in search quality even before Larry and Sergey left Stanford in 1998, because link analysis was an alchemist's stone for turning web dross into gold. Google's relevancy lured in early adopters and the media, but behind its beguiling look lay an arthritic infrastructure in danger of collapsing. "The ranking beat AltaVista by a mile," Urs told me, "but it was slow and we couldn't build an index reliably."

The challenge of improving Google's crawling, indexing, and serving systems was what had drawn Urs to the company. He'd figured the project would take about a year and then he'd move back to Europe. "I underestimated how much of a systems problem this whole thing was," he confessed. "We had a university system and we needed to basically rewrite the whole thing from scratch." While Google did a good job with the data it had, it collected far too little and wasn't searching through it fast enough.

Speed or scale. Pick one. When we crawled more web pages, the index got bigger and the pageranker had more data to draw upon, so we could produce more-relevant results. That attracted more users and more searches, so our audience grew. A bigger index, however, required more machines doing more processing, and more processing took more time. Adding users puts more demand on the network, which, as anyone sharing an Internet connection knows, slows things down.

As they did when forced to choose a future for Google as a destination site or a technology provider, Larry and Sergey chose both. Google's quest would be to get faster even as it expanded in all directions. They went looking for others who shared their disdain for the limits imposed by nature's laws.

I'm Feeling Lucky
titlepage.xhtml
dummy_split_000.html
dummy_split_001.html
dummy_split_002.html
dummy_split_003.html
dummy_split_004.html
dummy_split_005.html
dummy_split_006.html
dummy_split_007.html
dummy_split_008.html
dummy_split_009.html
dummy_split_010.html
dummy_split_011.html
dummy_split_012.html
dummy_split_013.html
dummy_split_014.html
dummy_split_015.html
dummy_split_016.html
dummy_split_017.html
dummy_split_018.html
dummy_split_019.html
dummy_split_020.html
dummy_split_021.html
dummy_split_022.html
dummy_split_023.html
dummy_split_024.html
dummy_split_025.html
dummy_split_026.html
dummy_split_027.html
dummy_split_028.html
dummy_split_029.html
dummy_split_030.html
dummy_split_031.html
dummy_split_032.html
dummy_split_033.html
dummy_split_034.html
dummy_split_035.html
dummy_split_036.html
dummy_split_037.html
dummy_split_038.html
dummy_split_039.html
dummy_split_040.html
dummy_split_041.html
dummy_split_042.html
dummy_split_043.html
dummy_split_044.html
dummy_split_045.html
dummy_split_046.html
dummy_split_047.html
dummy_split_048.html
dummy_split_049.html
dummy_split_050.html
dummy_split_051.html
dummy_split_052.html
dummy_split_053.html
dummy_split_054.html
dummy_split_055.html
dummy_split_056.html
dummy_split_057.html
dummy_split_058.html
dummy_split_059.html
dummy_split_060.html
dummy_split_061.html
dummy_split_062.html
dummy_split_063.html
dummy_split_064.html
dummy_split_065.html
dummy_split_066.html
dummy_split_067.html
dummy_split_068.html
dummy_split_069.html
dummy_split_070.html
dummy_split_071.html
dummy_split_072.html
dummy_split_073.html
dummy_split_074.html
dummy_split_075.html
dummy_split_076.html
dummy_split_077.html
dummy_split_078.html
dummy_split_079.html
dummy_split_080.html
dummy_split_081.html
dummy_split_082.html
dummy_split_083.html
dummy_split_084.html
dummy_split_085.html
dummy_split_086.html
dummy_split_087.html
dummy_split_088.html
dummy_split_089.html
dummy_split_090.html
dummy_split_091.html
dummy_split_092.html
dummy_split_093.html
dummy_split_094.html
dummy_split_095.html
dummy_split_096.html
dummy_split_097.html
dummy_split_098.html
dummy_split_099.html
dummy_split_100.html
dummy_split_101.html
dummy_split_102.html
dummy_split_103.html
dummy_split_104.html
dummy_split_105.html
dummy_split_106.html
dummy_split_107.html
dummy_split_108.html
dummy_split_109.html
dummy_split_110.html
dummy_split_111.html
dummy_split_112.html
dummy_split_113.html
dummy_split_114.html
dummy_split_115.html
dummy_split_116.html
dummy_split_117.html
dummy_split_118.html
dummy_split_119.html
dummy_split_120.html
dummy_split_121.html
dummy_split_122.html
dummy_split_123.html
dummy_split_124.html
dummy_split_125.html
dummy_split_126.html
dummy_split_127.html
dummy_split_128.html
dummy_split_129.html
dummy_split_130.html
dummy_split_131.html
dummy_split_132.html
dummy_split_133.html
dummy_split_134.html
dummy_split_135.html
dummy_split_136.html
dummy_split_137.html
dummy_split_138.html
dummy_split_139.html
dummy_split_140.html
dummy_split_141.html
dummy_split_142.html
dummy_split_143.html
dummy_split_144.html
dummy_split_145.html
dummy_split_146.html
dummy_split_147.html
dummy_split_148.html
dummy_split_149.html
dummy_split_150.html
dummy_split_151.html
dummy_split_152.html
dummy_split_153.html
dummy_split_154.html
dummy_split_155.html
dummy_split_156.html
dummy_split_157.html