Back in 1996 when the internet was primitive, a couple of San Francisco-based engineers were looking to solve the problem of saving websites from vanishing after they were altered or shut down. They founded the Internet Archive, a nonprofit digital library. The purpose of the Internet Archive was to… archive the internet.
In 2001, this preserved World Wide Web collection was made available to the public via the Wayback Machine, whose simple mission was to bring “Universal Access to All Knowledge.”
The Wayback Machine still operates today, with Director Mark Graham at the helm — anyone with internet access can search through 347 billion web pages.
Originally kept on digital tapes, today the internet’s 22 petabytes (that’s one thousand million million bytes) of data is stored on a huge cluster of Linux nodes. It’s actually stored twice — “Because we’re paranoid,” Graham told Ars Technica — so around 44 petabytes are scattered across physical data centers around the world, including in San Francisco, Amsterdam, and in Egypt’s Library of Alexandria.
But what about ephemeral stuff, like Snapchat? Graham has said they’re working on adapting the Wayback Machine to the new tech environment — but that still leaves a ton of questions unanswered.
What about when government sites are emptied? Has the Wayback Machine ever been involved in court cases to provide digital evidence? What’s the best thing ever found with the Wayback Machine? Will someone be able to dig up my regrettable early Facebook photos?
Get the TNW newsletter
Get the most important tech news in your inbox each week.