New York Times TimesMachine

Nyt_titanic_sinks_2 Derek Gottfrid and his colleagues at the New York Times have obviously been having a lot of fun with Amazon EC2.

Their latest offering is the TimesMachine. Print subscribers can access any issue of the New York Times, dating back to Volume 1, Number 1 in 1851. Non-subscribers can take a peek at 6 different (and historically significant) issues, including the inaugural edition, the end of World War I, and the sinking of the Titanic.

As they explained in their blog post, they used EC2, Hadoop, and some of their own code to convert 405,000 large TIFF images, 3.3 million SGML files, and 405,000 XML files to 810,000 PNG images and 405,000 JavaScript files. This didn’t take all that long:

“By leveraging the power of AWS and Hadoop, we were able to utilize hundreds of machines concurrently and process all the data in less than 36 hours.”

The content itself is really interesting, but I also enjoyed the fact that it was possible to see the articles in the context of the other issues of the day. The advertising is also interesting.

Robert Scoble has more coverage, including a video interview with Derek.

