On the data smelter

Posted by Antonio 5 months, 2 weeks ago (March 16, 2008)

Any self-respecting ManGeek ought to love a term I picked up from the Economist a couple of weeks in an article on cloud computing: data smelter. Apparently this is moniker used for the huge data centers that Google, Amazon, Microsoft, Yahoo, and others are building on the banks of the Columbia river in Oregon. Located in the middle of the cheapest power available in the US, the name data smelter is a play on the aluminum smelters that peppered the banks of the Columbia over the last hundred years, but it's also great because it hints at one of the most relevant facets of the cloud computing/web services revolution: the ability for new services to recombine data hosted by other services in novel and interesting ways. We haven't even begun to feel the true power of how transformative this loose coupling of data and processing is likely to be; today's "mash-ups" are barely at the crawl phase of development in what we are likely to see.

And yet, it's worth pausing for a second to think about the cost of the current smelting. The Economist piece cites the Google data center at the Dalles as requiring the power of a town of 200,000 people. Most of this wattage goes to power the compute cycles that Google requires to index the world's information, and in most cases these cycles are well spent by running hairy algorithms that apply the bleeding edge of computer science to extract order from chaos.

But this is not always the case. For instance, at Tabblo, a meaningful amount of our general web traffic comes from Google Bot or one of its competitors. This despite the fact that we have well-structured RSS equivalents that could be polled/processed in a much more efficient way. Ditto for all of the much bigger user-generated content sites— they too have a meaningful amount of traffic coming from indexing bots while at the same time providing feeds that might provide just as much information for searchers while using less bandwidth, fewer CPU cycles, and not as much overall smelting.

The few times I've read any luminaries from Google talking about the semantic web in any shape or form (RDF, microformats, etc.), they always pooh-pooh it with slights like "people don't want to deal in angle brackets all day." And until I started thinking about the energy implications of these data smelters, I was likely to agree— after all, we're all still suffering from the CORBA/DCOM hangover of the last decade where a few vendors bamboozled the entire industry into thinking that an overwrought solution for remote process data exchange was the answer to all of these coupling needs (watch the WS-* offspring for a modern-day equivalent).

But last week Yahoo played a potentially game-changing move with its pledge to support the semantic web standards (microformats, RDF, etc.) across all of its properties. As much I tend to write off Yahoo as roadkill on the Google highway, it's clear that a few folks there are still doing good things for the net and the planet.

If the other industry heavyweights are goaded into following through, we may end up running slightly cleaner data smelters in the near future.

Tags: , ,

Comments

Post a comment

(Please use only plain text. Though I will escape all of your HTML, URLs will be clickable)

Your name:

Comment: