Friday October 26, 2012

Get A 80TB Copy Of The Web

Imagine what you could do with all this. Sure you'd have to clear up some space on your porn file server to store all this but that shouldn't be an issue for most of us. wink

We are interested in exploring how others might be able to interact with or learn from this content if we make it available in bulk. To that end, we would like to experiment with offering access to one of our crawls from 2011 with about 80 terabytes of WARC files containing captures of about 2.7 billion URIs. The files contain text content and any media that we were able to capture, including images, flash, videos, etc.

Comments