Capturing Web Content with Archive-It

You know what they say – once you post something online, you can’t take it down. “The internet is forever” – except when it’s not. Ever clicked on a link only to receive the pesky message “404 Error: Page Not Found”? Web records such as websites and social media are only “forever” if they are properly, and promptly, preserved.

Most Alabama state agencies maintain a website so that citizens can access content and get things done online without having to make a call or come by the office. State agencies also use websites and social media to communicate with citizens. These websites and social media pages are updated frequently, however, and may one day disappear. Websites and social media serve state agencies and citizens in the present but may also be of interest to future researchers.

The State Records Commission has identified all state agency websites as permanent records per the Records Disposition Authorities (RDAs). Yet the archivists at the ADAH (talented though we may be) cannot capture the constantly evolving websites of around 200 state agencies. Since 2005, the ADAH has used a service called Archive-It to capture state agency websites.

What is Archive-It?

Archive-It is a subscription-based web archiving service from the Internet Archive, a 501(c)(3) non-profit and digital library. The Internet Archive provides free access to archived websites and other digital artifacts to researchers, historians, and the general public.

The Internet Archive also works with over 600 libraries and other partner organizations to harvest, build, and preserve collections of digital content, such as websites, blogs, and social media sites. The Archive-It service takes “snapshots” of a website’s appearance and top-level content throughout the year through a process called web crawling.

Webcrawling: How does it work?

Have you ever wondered how Google provides just the search result you need? Search engines like Google use webcrawlers. A webcrawler, sometimes called a spider, is software that systematically browses (or “crawls”) and automatically indexes the web.

Webcrawlers are always at work. They start with the targeted URL or “seed” URL. Usually the home page, the seed is the web crawler’s starting address for capturing content. From there, they follow links and extract data and documents. If a crawler comes across a new webpage, it indexes the page. If the webpage has already been indexed, then the crawler determines whether re-indexing is warranted.

Archive-It uses Heritrix, a webcrawler developed by the Internet Archive. Heritrix crawls all the seeds provided by the ADAH simultaneously and copies and saves the information as it goes. Archived websites are stored as “snapshots” but can be read and navigated as if they were live. They are full-text searchable within seven days of capture. The Internet Archive stores a primary and back-up copy at its data centers on multiple servers.

Note: All web crawlers, including Heritrix, fall short of making a complete index. There is no guarantee that documents placed on agency websites will be captured. Documents with a permanent retention must be transmitted to the ADAH separately. 

How does the ADAH use Archive-It?

The ADAH pays a subscription to collect a certain number of URLs. To archive a website, we provide its seed URL. The ADAH crawls all websites and select social media sites of all state agencies as well as the social media sites of Alabama Representatives and Senators. Social media sites crawls occur four times a year, while website crawls occur two times a year.

The ADAH has assigned descriptive metadata to each seed including website name, agency name, and short descriptions to aid access for researchers. The ADAH generates quarterly reports with statistics such as the total number of seeds crawled, the total number of documents crawled, and the total amount of data crawled in bytes.

How do I access archived websites?

Websites currently preserved by the ADAH are accessible here. If your agency’s website is not being captured, has been redesigned, or its URL has changed, please email a list of the URLs to the following:

Rachel Smith at

Becky Hebert at

Note: Universities and Local Governments are responsible for archiving snapshots of their own websites.

Imagine surfing circa 1999 and looking back on the Y2K hype, or revisiting an older version of your favorite Web site. Use the Wayback Machine to see billions of archived websites including vintage games, grab original source code from archived web pages, or visit websites that no longer exist. Simply type in a URL, select a date range, and begin surfing.

Preserving Alabama’s Musical Heritage: The Alabama State Council on the Arts Processing Project

In early June 2019, ADAH Appraisal and Collections staff teamed up to process records from the Alabama State Council on the Arts. The records came from the office of Folklife Specialist Steve Grauberger, who served on the Council from 1998 until his retirement in 2017. As part of the Council’s mission to promote arts activities in Alabama, Mr. Grauberger recorded musical performances at churches, rehearsals, conventions, and other cultural events. His most significant contributions are hundreds of hours of field recordings of gospel and other folk music.

Alabama is to gospel what Mississippi is to the blues. No other state has such a rich, active, and well-documented tradition of gospel music. Perhaps the most significant part of this collection is its field recordings from the Sand Mountain and Wiregrass regions of Alabama, two hotbeds of gospel singing. One can hear diverse musical styles in these regions, from bluegrass, to gospel, to blues, to shape-note music.

Shape-Note Music from Wiregrass and Sand Mountain

There are two shape-note systems of singing: four-shape or “Fa-So-La,” and seven-shape or “Do-Re-Mi.” Sacred Harp uses the four-shape system and is performed a capella (voice only, without instruments). Unlike standard musical notation, Sacred Harp music uses printed shapes – ovals, diamonds, squares, and triangles – to help untrained singers read the music. Sacred Harp singers sit in a square with bass, alto, tenor, and soprano parts facing each other on each side. The National Sacred Harp Singing Convention is held in Alabama every June and draws singers from all over the United States.

Sacred Harp flourishes in both the Wiregrass and Sand Mountain regions of Alabama. Named for its tall native grass, the Wiregrass region in southeast Alabama – particularly the cities of Troy and Ozark – is the epicenter of African-American gospel.  Mr. Grauberger recorded extensively in this region.

Seven-shape note music is lesser known than Sacred Harp, yet extremely influential. Sand Mountain, a region spanning from northeast Alabama to southwestern Georgia, is a hub for both Sacred Harp and seven-shape note congregational hymn singing. Many popular hymns derive from this tradition, including “I’ll Fly Away,” “Victory in Jesus,” and “Standing on the Promises.” The collection features ample seven-shape field recordings and songbooks.

Gospel, Bluegrass, Blues, and Other Folk Music

Other highlights of the collection include the following field recordings:

  • Jubilee singers in the Jefferson County Quartet tradition, including but not limited to the Ensley Jubilee Singers, the Delta Aires Quartet, and the Shelby County Big Four.
  • The Sullivan Family Band: Margie and Enoch Sullivan and their family of St. Stephens, Alabama performed bluegrass gospel for over fifty years.
  • Blues guitarist J.W. Warren; blues harmonica player David Johnson, and blues one-man band Sonny Boy King.
  • Fiddler Noah Lacy of Jackson County, Alabama.
  • The Baldwin County Polka Band.
  • Mariachi Garibaldi of Montgomery, Alabama. (Fun Fact: Mariachi Garibaldi performed at the grand opening of the Alabama Department of Archives and History’s “Alabama Voices” exhibition in February of 2014.)

Processing the Collection

Folklife field recordings from the Alabama State Council on the Arts contain a variety of formats, from digital audio tapes (DAT), to cassettes, to compact discs (CDs), with performance recordings from as early as 1927 to as recent as 2015. Most of the recordings are audio recordings, but some consist of photographs and videos. After conducting an initial survey of the records, ADAH staff divided the field recordings by physical format and arranged them chronologically. We separated publicity records, including radio shows and other productions produced by the Council.

Explore Alabama Folklife

Researchers interested in Alabama folk culture and music can explore the Alabama Department of Archives and History’s Archive of Alabama Folk Culture (AAFC). The AAFC features fieldwork gathered by the Alabama Folklife Association, a partner program of the Alabama State Council on the Arts, and the Alabama Center for Traditional Culture, a division of the Alabama State Council on the Arts.

Researchers can access the most recent materials from the Alabama State Council on the Arts by visiting the Research Room at ADAH. While Reference staff are unable to provide access to some format types (DAT and reel-to-reel recordings, for example), they can provide access to the bulk of the collection. We recommend contacting Reference staff ahead of your visit to ensure that proper equipment is on hand.


Brennan, Grey. Henagar: The Sound of (Sacred Harp) music. Sweet Home Alabama. Retrieved from

Mahala Church, Mobile, Alabama (2009, August 24). National Sacred Harp Singing Convention, Encyclopedia of Alabama. Retrieved from

Olliff, Martin T. (2018, November 29). Wiregrass region. Encyclopedia of Alabama. Retrieved from

Reyes, Luisa Kay (2019, March 5). Southern shapes. Alabama Bicentennial. Retrieved from

Willett, Henry (1995, April). Voices raised, singing praise: Two centuries of sacred sounds in Alabama. Alabama State Council on the Arts. Retrieved from

Meet the Staff Feature: Michael Grissett

For the Record’s “Meet the Staff” feature is an opportunity for our archivists to connect directly with the community which we serve.

Name: Michael Grissett

Title: Records Center Archivist

Specialties: Operational Organization, Procedure and Work Flow Design, Risk Management

How did you end up working at the Alabama Department of Archives and History?

I have had a lifelong fascination with History, Geography, and other social sciences. I initially studied Engineering at Auburn University but ultimately pursued my passion and graduated with a degree in history. I have been working at the State Records Center since mid-October of 2018, and I think the position is a perfect fit for me.

What is something you enjoy about working in records management?

It’s exciting to work at a critical junction in the operation of state government. While the staff at the Records Center is small relative to the other divisions within the Archives, the work here keeps us on our feet and gives me a firsthand look at the immediate and long-term challenges of preserving paper records.

What exactly does the Records Center do that is different from the bulk of the Records Management Section?

The State Records Center coordinates with state agencies to store, file, re-file, deliver, and destroy temporary records on the agencies’ behalf. The division was established in the late 1980s in response to the lack of temporary records storage options at the state level. In essence, we interact with the bulk of records which state agencies may not use on a daily basis, but which they are nonetheless legally required to maintain. Unlike the bulk of the Records Management Section, the Records Center directly handles agencies’ temporary records being moved and stored, from Delivery to Disposition.

Doesn’t electronic record creation and storage render your service unnecessary?

One may think that, but many agencies continue to create and retain a large volume of paper records. Electronic records are easier to access and require less physical storage space, but they are more susceptible to security breaches. Paper records maintained at a secure location permit sensitive information to be preserved and accessed when necessary, without the risk of electronic data breaches that can be expensive to prepare for and recover from.  

What are your hobbies when you are not at work?

I like to read on current and past events, cook, play the drums, hang out with friends, and play video games when not at work.

Newspaper Preservation

Guest Contributor: Mary Clare Johnson, Collections Archivist, Alabama Department of Archives and History

Many of us collect and keep newspapers and clippings as souvenirs of historical and personal importance; however, these ephemeral objects are not meant to last forever and have an expected lifespan of 50 years or less. They require special care and proper storage to outlast their impermanent lifespans.

They are usually printed on inexpensive, poor-quality paper made from unpurified wood pulp. This type of paper has a chemically unstable nature that causes it to become discolored, brittle, and acidic over time and to eventually disintegrate. Exposure to light, high humidity, and atmospheric pollutants hastens this disintegration. There are steps you can take, however, to preserve a beloved newspaper and lessen damage.

The first step in preserving your newspaper is to decide whether to store it lying flat folded or unfolded. When thinking about this decision, consider two questions:

  • Will unfolding pages cause damage along the fold lines?
  • Do you have enough room to store it flat?

Some experts recommend storing it unfolded, while others maintain that it should be folded in half (the way it looks when sold). Do whatever causes the least harm.

When storing your newspapers, avoid using these damaging materials:

  • Paper clips and staples, which rust and leave a stain as they deteriorate
  • Rubber bands, which degrade and stick
  • Glue or tape, as the adhesive will eventually leave stains
  • Lamination, as the plastic will permanently damage your newsprint and is an irreversible process

Keeping newspapers and clippings in boxes will prevent exposure to dirt, dust, and light, which cause newsprint to darken and become more brittle and the ink to fade over time. The size of the box should be close to the size of the materials it contains. It should not be made of standard cardboard, which tends to be acidic. It should be acid-free, lignin-free, buffered, and have a lid the same depth as the base. Buffered means that an alkaline (non-acidic) buffer has been added to the box to neutralize the acids given off by the newsprint so that the box will last longer. Clearly label the box with the titles and dates of the contents to prevent unnecessary handling.

If saving more than one complete newspaper, have a folder for each one that is acid-free, lignin-free, and buffered. If saving several sheets or numerous clippings, you may need more than one folder because you don’t want to overstuff the folders. In addition, some experts recommend inserting an acid-free, alkaline-buffered sheet of tissue paper between each page for further protection. Keeping pages pressed together with no buffer allows acid to spread and cause further damage to them. A cheaper alternative is acid-free tissue paper with no alkaline buffer. It reduces the risk of increasing the newspaper’s acidity but doesn’t prevent the spread of acid between pages.

Store the boxes in a cool, dry, and dark place in the main part of your house where temperatures and humidity levels stay relatively stable, such as a closet, under your bed, or a file cabinet drawer. Do not place boxes near radiators or vents. Basements, garages, and attics are not suitable because they can experience drastic temperature and humidity swings. Dampness can encourage the growth of mold and attract insects. Heat accelerates the chemical process that causes newsprint to deteriorate.

Routinely check to make sure your storage area is clean and dust-free. The more stable the environment, the longer newsprint will last. Also, make sure your storage box does not include other types of materials, such as letters, photographs, or books. The acidity of newsprint can cause permanent damage and stains to other materials.

Preserving your original newspaper is great but remember that the content is more important than the object itself. To preserve the content and minimize handling of the original, make a high-resolution scan and store the images on your computer and a USB flash drive. Then you can print copies of the scanned images for everyday use and display. Regular copy/printer paper will be more chemically stable and durable and will far outlive newsprint when stored in a stable environment. If you are concerned that scanning the newspaper will cause great harm, a library or archive can help you locate a microfilm copy or digitized version of your paper.

When it comes to display, it is best to frame a copy of your scanned newspaper and not display the original because of the damage caused by sunlight and fluorescent light. If you really want to display the original, it should be framed using acid-free backing board and kept away from windows. The frame should have special glass that blocks harmful ultraviolet (UV) light.

It is important to remember that the inherent acids in newspapers will continue to break them down slowly. If you want to ensure their long-term survival, you can consult a professional paper conservator who can neutralize these harmful acids through a process called deacidification. Available conservators can be found on the American Institute for Conservation website. Keep in mind, however, that their services will likely run into the hundreds of dollars.

While there are many threats to the survival of newsprint, proper preventative measures will help it last for many years.

Below is a list of archival quality supplies:

Box for clippingsGaylord Archival Blue/Grey Barrier Board Flip-Top Document Case
Box for folded newspapersGaylord Archival Blue/Grey Barrier Board Drop-Front Deep Lid Print Box
Box for unfolded
Gaylord Archival Tan Barrier Board Drop-Front Newspaper/Print Box
Folders for clippingsGaylord Archival Reinforced Full 1” Tab Legal Size File Folders

Gaylord Archival Reinforced Full 1” Tab Letter Size File Folders
Folders for folded or
unfolded newspapers
Gaylord Archival Oversize Newspaper File Folders
Buffered tissue paperGaylord Archival Buffered Acid-Free Tissue
Unbuffered tissue paperGaylord Archival Unbuffered Acid-Free Tissue
Frame kit for clippings or
Gaylord Archival Simply Black Collection Wood Frame Kit with 1.25” Molding
Preservation kit for folded or
unfolded newspapers
Gaylord Archival Newspaper Preservation Kit


American Library Association (2015, March 3). Digitizing old newspapers. Retrieved from

American Library Association (2017, March 30). Storing old letters and newspaper clippings. Retrieved from

Archival Methods [Screen name]. (2016, April 5). Archival solution of the week: Newspaper & magazine storage kits. Retrieved from

Archival Methods [Screen name]. (2015, October 22). Preserving: Archivally storing old newspapers. Retrieved from

How to preserve your Obama victory newspaper. (2008, November 7). San Francisco Chronicle. Retrieved from

Library of Congress. (n.d.). Preservation measures for newspapers. Retrieved from

Lockshin, N. (2012, January 12). How do I preserve my newspaper? Retrieved from

Northeast Document Conservation Center. (n.d.) Caring for private and family collections. Retrieved from

Ritzenthaler, M. L. (2016). Preserving newspaper clippings. Prologue Magazine, 48(1). Retrieved from 

Tobey, D. A. (2001). Preserving history: Here’s how to keep that historic newspaper for years to come [PDF file]. Retrieved from

U.S. National Archives and Records Administration. (n.d.). How can I preserve an important edition of a newspaper? Retrieved from

For further information on aspects of preservation, here are some resources:

Library of Congress: Collections Care

National Park Service: Conserve O Grams

Northeast Document Conservation Center (NEDCC): Preservation Leaflets

U.S. National Archives and Records Administration: Preservation