Preserving websites to ensure perpetual access

The Internet Archive's Wayback Machine lets you see what websites used to look like but it can only do that if volunteers actively save those sites and pages. Gary Price explains how to do that.

The entire idea of the Wayback Machine is preservation of that ephemeral entity we call the internet. The Internet Archive has, as of the end of September 2022, archived over 724 billion webpages going back to 1996. As information professionals are all too well aware, websites can disappear in the blink of an eye. Maybe a company, institution, or publisher is no longer in existence, a product changed or was discontinued, or on a more sinister note, government censorship resulted in a takedown of a site. Even for sites that continue to exist, the text or images can change over time.

Information professionals can play an important role in preserving history by saving webpages through the Save Page Now (SPN) functionality.

This allows you to capture and archive any webpage, PDF, and material in other formats (assuming it’s not being blocked by the site owner or behind a paywall). You can archive what you’re seeing and know that a copy of it is available in the Wayback Machine. Saving material provides some piece of mind, knowing that what you read and saw is now permanently accessible and stored with a timestamp of when it was archived.

Using Save Page Now also helps create a more robust web archive for all users by not only adding new material but also creating new versions of previously archived material since web pages are constantly updated but remain at the same URL.

 How to use Save Page Now

 Archiving using Save Page Now is easy, fast and free. Simply enter or paste a URL into the Save Page Now box and wait a few moments for the process to complete. You’ll soon see a Wayback Machine URL of the captured/archived page with an embedded timestamp. If you’re a registered Internet Archive user (highly recommended, free) Save Page Now becomes an even more powerful service with these additional options:

  • Save Outlinks

Checking this box will capture all of the outbound links embedded in the page you’ve submitted. So, with a single click the URL you’ve submitted and all of the pages it links to are captured at the same time.

  • Save Error Page

If the page is unavailable a capture will be made.

  • Save Screen shot

A static screen shot is also made.

  • Save also in my web archive

A copy of the capture is placed in your own web archived that’s included with Internet Archive registration.

  • Please email me the results

A copy of the results are sent to the address used to register.

After clicking the “Save Page” button a window appears that shows each URL as it’s being captured. If a URL is being captured for the first time, that is also noted.

There are a number of tools that can save you a few seconds by not having to copy and paste a URL into the “Save Page Now” box, such as the browser extension for Chrome and Firefox. Click “Save Page Now” and the URL currently visible in your browser is transferred to the “Save Page Now” interface. Another useful feature the Wayback browser add-on provides is redirecting you to archived versions of a URL if the live version is unavailable.

Saving a large number of pages

If you have a large number of URLs you want to archive, a batch processing option might be helpful. It couldn't be easier to use.

You’ll need your Internet Archive login and access to Google Sheets. Then place the URLs into a Google Sheet, one per line.

  • Click, “Archive URLs.” (at this point you’ll be asked for permission by Google to use the Internet Archive service).
  • Depending on server load, you should soon see the service begin to archive each URL.
  • Soon thereafter the spreadsheet will be updated with the archived URL, how many additional URLs (outlinks) were archived at the same time if the option was selected and if it’s a first archive. Note: The “Check the URLs are Available in the Live Web” option can be used to verify if a URL is available without archiving.

Bottom Line

 Given the growing amount of material on the web, its ephemeral nature, other influences, and the fact that the Wayback Machine is often the only option we have to access archived web content, Save Page Now is an important tool not only for the individual researcher but also allows all of us to contribute to creating a permanent record of the web. It should be considered a basic mission for librarians and other information professionals to actively incorporate preserving the web into their daily professional lives.