We have had many people contact us to ask if we are going to add support for Glacier, often with the comment that Glacier is probably very similar to the existing Amazon S3 service, and presuming it will be an easy thing to do.
Yes, we are looking at Glacier, but no, it is not similar to S3. Let me explain why.
First off, Glacier has clearly been designed primarily for archiving and not backup. What is the difference between archiving and backup? The main difference is that a backup usually uses active data, i.e. files that are changing. Archiving is for securely storing static files (i.e. they do not change) for potentially long periods of time (think years or decades). In Glacier terminology files uploaded to it are referred to as archives, not files or objects.
In Glacier you cannot modify, move, or copy any file (archive) stored within it. Once you upload something to Glacier it can only be downloaded or deleted. That means the archive is immutable.
When data is uploaded to Glacier (as an archive) it is not given a filename or stored in a virtual file-system, i.e. there is no path. Once an archive has been uploaded you are given a unique ID which is just a very long string of letters and numbers that has no meaning at all except as the key to retrieving that archive in future. So Glacier cannot be used to host web sites as per S3, for example. That also means you cannot access the archives using a browser.
As Glacier uses a bunch of ID’s, it is not practical for different applications to store and retrieve archives because there is no standard way to say what the ID refers to, e.g. the filename. So you could download an archive from Glacier but all you’ll have is a meaningless ID. There is a way to store a description with an archive in Glacier, but that description is just free-form text and so could be anything at all. The application that uploaded the archive decides what it means.
Now to the most significant difference: if you want to retrieve an archive from Glacier it can take hours to get it. Yes, hours. It’s actually an asynchronous process, i.e. the application must ask for the archive (using its unique archive ID) and then it can go and do something else. Eventually a message is sent from Glacier telling the application that the archive can now be downloaded. This very slow process also includes getting a list of the archives available. That means unless the application caches all the archive ID’s locally then simply getting a list of files can also take hours.
Glacier is perfect for archiving. For example, everyone has lots and lots of files that they rarely use but need to keep a copy of (legal documents, personal photos, etc.). With Glacier you could move those files off your computer or server and into Glacier. You then have a safe off-site archive of those files. Amazon states:
Thanks for reading, and I hope you’re still awake! Now we want to hear from you.Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing.
What would you use Glacier to store? How would you use it? What sort of functionality would you expect an application using Glacier to support? Add your comments and thoughts to this thread, or email archive [at] 2brightsparks.com