Capturing Web Content with Archive-It

You know what they say – once you post something online, you can’t take it down. “The internet is forever” – except when it’s not. Ever clicked on a link only to receive the pesky message “404 Error: Page Not Found”? Web records such as websites and social media are only “forever” if they are properly, and promptly, preserved.

Most Alabama state agencies maintain a website so that citizens can access content and get things done online without having to make a call or come by the office. State agencies also use websites and social media to communicate with citizens. These websites and social media pages are updated frequently, however, and may one day disappear. Websites and social media serve state agencies and citizens in the present but may also be of interest to future researchers.

The State Records Commission has identified all state agency websites as permanent records per the Records Disposition Authorities (RDAs). Yet the archivists at the ADAH (talented though we may be) cannot capture the constantly evolving websites of around 200 state agencies. Since 2005, the ADAH has used a service called Archive-It to capture state agency websites.

What is Archive-It?

Archive-It is a subscription-based web archiving service from the Internet Archive, a 501(c)(3) non-profit and digital library. The Internet Archive provides free access to archived websites and other digital artifacts to researchers, historians, and the general public.

The Internet Archive also works with over 600 libraries and other partner organizations to harvest, build, and preserve collections of digital content, such as websites, blogs, and social media sites. The Archive-It service takes “snapshots” of a website’s appearance and top-level content throughout the year through a process called web crawling.

Webcrawling: How does it work?

Have you ever wondered how Google provides just the search result you need? Search engines like Google use webcrawlers. A webcrawler, sometimes called a spider, is software that systematically browses (or “crawls”) and automatically indexes the web.

Webcrawlers are always at work. They start with the targeted URL or “seed” URL. Usually the home page, the seed is the web crawler’s starting address for capturing content. From there, they follow links and extract data and documents. If a crawler comes across a new webpage, it indexes the page. If the webpage has already been indexed, then the crawler determines whether re-indexing is warranted.

Archive-It uses Heritrix, a webcrawler developed by the Internet Archive. Heritrix crawls all the seeds provided by the ADAH simultaneously and copies and saves the information as it goes. Archived websites are stored as “snapshots” but can be read and navigated as if they were live. They are full-text searchable within seven days of capture. The Internet Archive stores a primary and back-up copy at its data centers on multiple servers.

Note: All web crawlers, including Heritrix, fall short of making a complete index. There is no guarantee that documents placed on agency websites will be captured. Documents with a permanent retention must be transmitted to the ADAH separately. 

How does the ADAH use Archive-It?

The ADAH pays a subscription to collect a certain number of URLs. To archive a website, we provide its seed URL. The ADAH crawls all websites and select social media sites of all state agencies as well as the social media sites of Alabama Representatives and Senators. Social media sites crawls occur four times a year, while website crawls occur two times a year.

The ADAH has assigned descriptive metadata to each seed including website name, agency name, and short descriptions to aid access for researchers. The ADAH generates quarterly reports with statistics such as the total number of seeds crawled, the total number of documents crawled, and the total amount of data crawled in bytes.

How do I access archived websites?

Websites currently preserved by the ADAH are accessible here. If your agency’s website is not being captured, has been redesigned, or its URL has changed, please email a list of the URLs to the following:

Rachel Smith at

Becky Hebert at

Note: Universities and Local Governments are responsible for archiving snapshots of their own websites.

Imagine surfing circa 1999 and looking back on the Y2K hype, or revisiting an older version of your favorite Web site. Use the Wayback Machine to see billions of archived websites including vintage games, grab original source code from archived web pages, or visit websites that no longer exist. Simply type in a URL, select a date range, and begin surfing.

Newspaper Preservation

Guest Contributor: Mary Clare Johnson, Collections Archivist, Alabama Department of Archives and History

Many of us collect and keep newspapers and clippings as souvenirs of historical and personal importance; however, these ephemeral objects are not meant to last forever and have an expected lifespan of 50 years or less. They require special care and proper storage to outlast their impermanent lifespans.

They are usually printed on inexpensive, poor-quality paper made from unpurified wood pulp. This type of paper has a chemically unstable nature that causes it to become discolored, brittle, and acidic over time and to eventually disintegrate. Exposure to light, high humidity, and atmospheric pollutants hastens this disintegration. There are steps you can take, however, to preserve a beloved newspaper and lessen damage.

The first step in preserving your newspaper is to decide whether to store it lying flat folded or unfolded. When thinking about this decision, consider two questions:

  • Will unfolding pages cause damage along the fold lines?
  • Do you have enough room to store it flat?

Some experts recommend storing it unfolded, while others maintain that it should be folded in half (the way it looks when sold). Do whatever causes the least harm.

When storing your newspapers, avoid using these damaging materials:

  • Paper clips and staples, which rust and leave a stain as they deteriorate
  • Rubber bands, which degrade and stick
  • Glue or tape, as the adhesive will eventually leave stains
  • Lamination, as the plastic will permanently damage your newsprint and is an irreversible process

Keeping newspapers and clippings in boxes will prevent exposure to dirt, dust, and light, which cause newsprint to darken and become more brittle and the ink to fade over time. The size of the box should be close to the size of the materials it contains. It should not be made of standard cardboard, which tends to be acidic. It should be acid-free, lignin-free, buffered, and have a lid the same depth as the base. Buffered means that an alkaline (non-acidic) buffer has been added to the box to neutralize the acids given off by the newsprint so that the box will last longer. Clearly label the box with the titles and dates of the contents to prevent unnecessary handling.

If saving more than one complete newspaper, have a folder for each one that is acid-free, lignin-free, and buffered. If saving several sheets or numerous clippings, you may need more than one folder because you don’t want to overstuff the folders. In addition, some experts recommend inserting an acid-free, alkaline-buffered sheet of tissue paper between each page for further protection. Keeping pages pressed together with no buffer allows acid to spread and cause further damage to them. A cheaper alternative is acid-free tissue paper with no alkaline buffer. It reduces the risk of increasing the newspaper’s acidity but doesn’t prevent the spread of acid between pages.

Store the boxes in a cool, dry, and dark place in the main part of your house where temperatures and humidity levels stay relatively stable, such as a closet, under your bed, or a file cabinet drawer. Do not place boxes near radiators or vents. Basements, garages, and attics are not suitable because they can experience drastic temperature and humidity swings. Dampness can encourage the growth of mold and attract insects. Heat accelerates the chemical process that causes newsprint to deteriorate.

Routinely check to make sure your storage area is clean and dust-free. The more stable the environment, the longer newsprint will last. Also, make sure your storage box does not include other types of materials, such as letters, photographs, or books. The acidity of newsprint can cause permanent damage and stains to other materials.

Preserving your original newspaper is great but remember that the content is more important than the object itself. To preserve the content and minimize handling of the original, make a high-resolution scan and store the images on your computer and a USB flash drive. Then you can print copies of the scanned images for everyday use and display. Regular copy/printer paper will be more chemically stable and durable and will far outlive newsprint when stored in a stable environment. If you are concerned that scanning the newspaper will cause great harm, a library or archive can help you locate a microfilm copy or digitized version of your paper.

When it comes to display, it is best to frame a copy of your scanned newspaper and not display the original because of the damage caused by sunlight and fluorescent light. If you really want to display the original, it should be framed using acid-free backing board and kept away from windows. The frame should have special glass that blocks harmful ultraviolet (UV) light.

It is important to remember that the inherent acids in newspapers will continue to break them down slowly. If you want to ensure their long-term survival, you can consult a professional paper conservator who can neutralize these harmful acids through a process called deacidification. Available conservators can be found on the American Institute for Conservation website. Keep in mind, however, that their services will likely run into the hundreds of dollars.

While there are many threats to the survival of newsprint, proper preventative measures will help it last for many years.

Below is a list of archival quality supplies:

Box for clippingsGaylord Archival Blue/Grey Barrier Board Flip-Top Document Case
Box for folded newspapersGaylord Archival Blue/Grey Barrier Board Drop-Front Deep Lid Print Box
Box for unfolded
Gaylord Archival Tan Barrier Board Drop-Front Newspaper/Print Box
Folders for clippingsGaylord Archival Reinforced Full 1” Tab Legal Size File Folders

Gaylord Archival Reinforced Full 1” Tab Letter Size File Folders
Folders for folded or
unfolded newspapers
Gaylord Archival Oversize Newspaper File Folders
Buffered tissue paperGaylord Archival Buffered Acid-Free Tissue
Unbuffered tissue paperGaylord Archival Unbuffered Acid-Free Tissue
Frame kit for clippings or
Gaylord Archival Simply Black Collection Wood Frame Kit with 1.25” Molding
Preservation kit for folded or
unfolded newspapers
Gaylord Archival Newspaper Preservation Kit


American Library Association (2015, March 3). Digitizing old newspapers. Retrieved from

American Library Association (2017, March 30). Storing old letters and newspaper clippings. Retrieved from

Archival Methods [Screen name]. (2016, April 5). Archival solution of the week: Newspaper & magazine storage kits. Retrieved from

Archival Methods [Screen name]. (2015, October 22). Preserving: Archivally storing old newspapers. Retrieved from

How to preserve your Obama victory newspaper. (2008, November 7). San Francisco Chronicle. Retrieved from

Library of Congress. (n.d.). Preservation measures for newspapers. Retrieved from

Lockshin, N. (2012, January 12). How do I preserve my newspaper? Retrieved from

Northeast Document Conservation Center. (n.d.) Caring for private and family collections. Retrieved from

Ritzenthaler, M. L. (2016). Preserving newspaper clippings. Prologue Magazine, 48(1). Retrieved from 

Tobey, D. A. (2001). Preserving history: Here’s how to keep that historic newspaper for years to come [PDF file]. Retrieved from

U.S. National Archives and Records Administration. (n.d.). How can I preserve an important edition of a newspaper? Retrieved from

For further information on aspects of preservation, here are some resources:

Library of Congress: Collections Care

National Park Service: Conserve O Grams

Northeast Document Conservation Center (NEDCC): Preservation Leaflets

U.S. National Archives and Records Administration: Preservation

Preserving Historic Ledgers and Books

Guest Contributor: Keri Hallford, Collections Archivist, Alabama Department of Archives and History

Are you considering wrapping books in your agency’s collection? Keeping bound records was once an easy and reliable way to reference important information quickly. In the digital age, however, this method is becoming outmoded, and books often fall into disrepair. As bound records become more delicate and harder to care for, some archivists choose to wrap books and ledgers to protect these aging materials.


Before you wrap your books, there are several questions that you need to consider:

  • Are you trying to prevent damage caused by friction as books are placed on or removed from shelves?
  • Have the cover and/or multiple pages detached?
  • Are you trying to keep a book from becoming dusty or dirty?
  • Is the book’s leather binding producing a fine powder, referred to as “red rot”? (Note: Red rot has certain health dangers associated with it, so please proceed with caution!)
  • Do your materials need water protection that your shelves are not supplying?

If your answer to any of these questions is yes, then you may want to wrap your books. Contemplate your budget for a wrapping project. Will this be an ongoing initiative? Are you only wrapping books on an “as-needed” basis? Will you do just a few books, or rows upon rows? The supply costs can add up over time.


You can decide to tie broken books together with cotton tying tape, to hold detached pieces in place until the greater binding issues can be addressed. Be sure to not draw the tie too loose or too tight, as either may cause damage to the book.

There are several materials that we recommend wrapping with. From least durable to most durable, you can use archival wrapping paper (like a craft paper in consistency, but better for the item); folder stock; or a spun polyester fabric-like substance called Tyvek. Tyvek is chemically inert, allows the books to breathe, and is water resistant, which may help to protect an item that isn’t protected by shelving if there’s a water leak.


Tyvek has a shiny side and a soft, matte side. Be sure to use it with the shiny side out. Much thought needs to go into how often your books are going to be used in the future. If it sits on the shelf most of the time, then you probably won’t need to use this more durable material.

To wrap a book, use the book itself as a template. Cut two strips of your chosen material to fit the length, width, and height of the book. The two strips should lie across each other perpendicularly.

Secure one strip with at least two pieces of hook and loop material (such as Velcro), and then secure the other side in a similar fashion.


Write proper identifying information on the spine or wherever it can easily be viewed in your storage area. You may use a pencil or, more permanently, a micron pen.

Before you close the book, slip an identification paper into the book so it can be identified if its wrapper is misplaced.

To shelve your book, consider the size of the volume and the size of the shelf. Very heavy and large books should be laid on their sides. Never pile up so many books that the bottom volume is impossible to move and its spine warps with the weight. If possible, do not allow a book to overhang its shelf. Serious damage may occur over time, especially when an item can be accidentally struck by people walking past the shelf.

Below is a handy rubric of archival quality supplies and companies from which you can purchase them. As with any supply company, buying in bulk will help save money.







Wrapping Paper




Perma Dur
Folder Stock


Folder Stock

Cotton Tape

Unbleached Cotton
Tying Tape

Unbleached Tape

Cotton Tying
Tape 100 yds
Cotton Tying
Tape 1,000 yds

Hook Loop


Velcro Velcoin

Velcro Velcoin

Pigma Pen

Pigma Pens

Pigma Pen

Pigma Micron

Books and ledgers remain crucial resources, even in the Web 2.0 era. They provide both intrinsic and extrinsic information, ranging from the actual content of the text to features like binding, flyleaves, watermarks, margin notes, page layout, and the ink and script used by the creator. By taking measures to protect bound records in need of extra care, these items can be made available to researchers for years to come.