Backup and Synchronize with Amazon Glacier and SyncBackPro

Author: Swapna Naraharisetty, 2BrightSparks Pte. Ltd.

Amazon Glacier is a secure, reliable and inexpensive cloud service for data archiving and long-term storage. It is primarily designed for archiving static data (or cold data) that is not changed or accessed for potentially long periods of time e.g. months, years, or even decades and the data archived in Glacier is considered as immutable, because it cannot be modified, moved or copied, it can only be downloaded or deleted. Thus, Glacier storage is not suitable for storing active data that is changed frequently.

A base unit of storage in Glacier is called an Archive. An archive can represent a single file or several files can be combined and uploaded as a single archive. Each archive in Glacier has a unique system generated identifier (a long string of letters and numbers) which can be used as a key to retrieve an archive in the future.

The archives in Glacier are usually not accessed directly. The data upload or download to/from Glacier is often done via Amazon S3, as it does the mapping between the user-defined object name in S3 and the system generated identifier in Glacier.

File backup to Amazon Glacier

SyncBackPro supports file backup to Glacier through Amazon S3 (not directly to Glacier). Once the files are copied to S3, Amazon does the pushing automatically from S3 to Glacier based on the lifecycle rules set in S3. Please refer to the Amazon documentation for instructions on how to add a lifecycle configuration rule to a bucket.

After an object has been archived in Glacier, its storage class in S3 will be changed to ‘GLACIER’ indicating that its contents are moved to Glacier, but an index entry is stored in S3, which can be used for updating, deleting or retrieving an archive in the future.

File restore from Amazon Glacier

Although Glacier provides a cost-effective solution for uploading unlimited data, the pricing model and retrieval process from Glacier are very slow and complex. Every restore request to Glacier has a retrieval delay of 3 to 5 hours before the archives are available for download. Please refer to Amazon for Glacier storage and retrieval pricing.

The data retrieval model from Glacier is a two-step process. The first step is to request Glacier to store a temporary copy of data in Amazon S3, and the second step is to download the data from S3 to your destination location.

How to restore Glacier archives using SyncBackPro

1. Run Amazon S3 profile (the profile used initially to backup files to S3) in restore mode.

Alternatively, you can create a new profile in SyncBackPro with Source pointing to a directory where you want to restore files and Destination to a directory on S3 where your objects are stored. Then run this profile in 'Restore' mode.

2. SyncBackPro sends requests to Glacier to restore archives to S3 and an entry will be recorded in the log file for each object that need to be restored:

Failed to copy from Source: The file is stored on Glacier. A request has been made to restore the file. Please run the profile again in 3 to 5 hours.

3. After 3 to 5 hours of wait time, run the same profile (used in step 1) in restore mode to download files from S3 to a source directory specified in your profile or to use the restore data as required by the profile (e.g. create version of a file, rename a file etc.)

File backup to Amazon Glacier with SyncBackPro versioning

When Versioning (under Copy/Delete -> Versioning settings page) is enabled, the data backup model (for files that are already archived in Glacier) is also a 2-step process.

In the first step, SyncBackPro sends a request to Glacier to store a temporary copy of the archives in S3. The second step is to create a version file of the temporary object (rename the file and store it in the versions folder) and replace the temporary object with the updated file from Source. Later, the updated file and the version file are pushed to Glacier based on the lifecycle rules set in Amazon S3.

How to create versioned backups in Glacier using SyncBackPro

You have created an Amazon S3 profile (with Versioning enabled) and you have already archived a set of files in Glacier in the previous profile runs. Now you want to update an archive in Glacier and create a version file of it before the update:

1. Run the Amazon S3 profile. SyncBackPro sends a request to Glacier to store archives in S3. An entry will be recorded in the log file for each object that need to be restored:

Failed to copy from Source: The file is stored on Glacier. A request has been made to restore the file. Please run the profile again in 3 to 5 hours.

2. After 3 to 5 hours of time, run the same profile again. SyncBackPro will then create a version of original object in S3 before overwriting it with the data from Source.

The updated file and versioned file will be moved to Glacier automatically (based on the lifecycle rules set).

Conclusion

Due to its low cost, Amazon Glacier is a great storage choice for archiving data that is rarely accessed, updated, deleted or retrieved. The drawback is that the retrieval process is slow and complex. If you require fast, frequent access to your data, or want to create versioned backups using SyncBackPro, then please consider using other cloud services supported by SyncBackPro (e.g. Amazon S3, Microsoft Azure, Dropbox, Box, OneDrive, Office365, etc.)