Monday, September 08, 2008

Petabyte management

A gigabyte means nothing today, and storage technology is now being thought of in terms of terabyte units. Recent advances in harddrive technology have made such adjustments in thinking possible. But I recently ran across a quote which gives some indication of just how big a management problem Petabyte database storage would present. (A petabye is 10 to 15th bytes, equaling a quadrillion bytes.) Apparently, the advance of scientific instruments for measuring and collecting data far outstrip our ability to store such data. Alex Szalay, writing a commentary for Science News, gives a good sketch of the problem:
"Today, you can scan one gigabyte of data or download it with a good computer system in a minute. But with current technologies, storing a petabyte would require about 1,500 hard disks, each holding 750 gigabytes. That means it would take almost three years to copy a petabyte database -- and cost about $1 million"[1]
Of course if you have a moderate system, with standard U.S. broadband, you could actually send a petabyte to Hong Kong by sailboat faster than you could move it over the internet. Jonathan Schwartz makes just this kind of claim, giving his explanation in the following:
"So if you had a half megabit per second internet connection, which is relatively high in the US (relatively low compared to residential bandwidth available in, say, Korea), it'd take you 16 billion seconds, or 266 million minutes, or 507 years to transmit the data. Can you sail to Hong Kong faster than that? At a full megabit, just divide the time in half. Even at a hundred megabits (about the highest, generally available, of any carrier I've seen), it's a few years. [....] So yes, at least for now, it's faster to send a petabyte of data via a sailboat than the internet (at least defined by the bandwidth to which most of us have access)."[2]
There's probably good reason to think that sending a petabyte of data is a temporary problem, as increases in the technology of storage capacity and bandwidth move along at quite a clip. So transmitting a petabyte of data probably won't be much of an issue. However, as sensor technology will also be advancing, so we're probably in the permanent position of always being able to collect more data than we can comfortably transmit and store. This is probably some sort of corollary to a personal computer purchasing rule I read about some years ago, but which is still true today: The computer you want is always $5,000 more than you can afford.


[image] "Magnetic Data Storage: Bits per dollar in constant yr. 2000 dollars" (Accessed Sept. 8, 2008)

[1]Alex Szalay, "Preserving digital data for the future of eScience" Science News August 30, 2008.

[2] Jonathan Schwartz "Moving A Petabyte of Data" Jonathan's Blog Mar 12, 2007 (Accessed Sept. 8, 2008)


Labels: , , , ,


At 2:18 PM, Blogger Jeffrey said...

Interesting piece. Check this out on the Greenplum blog, one of the few companies allowing cheap scale to a Petabyte Database:


Post a Comment

<< Home