It is a lift though. Sure, they are standing on top of giants such as the HITS a...

corresation · on Nov 24, 2013

They viewed is as an interconnected network of documents.

That is the very premise of the WWW. There was nothing new in that.

There is a too common "the victors write the history" sort of way of looking at things. And FWIW, the notion of counting, weighting and sorting on links is hardly a novel premise, but the schism that Google really brought was that they had a business case to spend far more computer resources on any given search, and on the content that served those results, the results being a much better product. When they in-housed some custom ad serving, the ascent of Google began.

Every product along the way -- whether it is long gone or a thriving success -- has an influence on technologies that follow. Pointcast is absolute the ancestor of RSS and other so-called push technologies, which themselves could be considered the ancestor to things like Twitter. To write it off, rewriting history based upon some sort of survivorship bias, just clouds the topic.

vidarh · on Nov 25, 2013

> That is the very premise of the WWW. There was nothing new in that.

Yet their competitors insisted on treating each document as standalone until after Google popularised this idea.

At the time it very much was seen as a new idea, and the idea that this was good enough to deprecate lots of advanced search operators was so alien that it took a lot of us a long time to stop trying to construct complicated searches of the type encouraged by e.g. Altavista.

bborud · on Nov 25, 2013

> Yet their competitors insisted on treating each > document as standalone until after Google > popularised this idea.

That's not entirely true, but this is the simplified view that the press popularized because it made for a story that was easier to report.

When you say "insisted" that seems to imply that other search engines were opposed to the idea (whether that is what you meant is another matter). Back in the 90s there were a lot of things we wanted to do, but that we either didn't have the manpower/time/money/talent to do or that we just hadn't figured out how to implement efficiently.

Ideas are cheap, but implementation is what counts.

I can remember that we discussed various link-aware ranking factors long before we even had built a proper search engine, but we needed to solve a lot of more pressing problems before that was even on the agenda (like build a crawler that didn't break the internet :)).

Once on the agenda the question was how to implement it at scale. Remember that this was before AWS, before Hadoop, and before a lot of other things that the average developer has easy access to today. We had a finite set of machines and a finite amount of money to buy machines. And even if we could have had all the hardware in the world we still had to figure out how to turn a lot of algorithms implemented for single machine processing into distributed systems.

That being said, the fellow I shared an office with was able to crank out a sufficiently scalable PageRank-like implementation in a few weeks.

Not that this made much of a difference PR-wise (and initially, it didn't have quite the impact on the quality of search results we had hoped for). In the view of the public only Google used PageRank and PageRank was the thing that made them stand out. End of story. The fact that Google did dozens of other things much better than their competition was...well, too complicated for journalists to report.

As a side note: I think we reinvented, and implemented, various subsets of MapReduce at least a dozen times during 1998-2000. A lot of our processing systems were based around sort and linear disk scans. Two things we knew how to do fast. Google (at some point) had the good sense to recognize that this could be turned into an infrastructure, which allowed them to spend less time building systems that could be expressed using these primitives. They also took the time to solve things like reliably storing data without getting too hung up on traditional filesystem semantics. (When data-sizes grow to the point where you have to maintain large population of disks, you can't really trust a single disk to be there tomorrow. We did some testing with RAID systems, but were unable to find a solution that was both cheap enough and reliable enough. Not having something akin to GFS was, in retrospect, a major impediment)