Archiving vs Backup: Why They're Not the Same Thing
In casual conversation, "backup" and "archive" get used as if they mean the same thing. They do not. Both involve copying data to a second location, but they exist for different reasons, follow different rules, and store data on different kinds of media for different lengths of time. Treating one as the other is a habit that quietly costs money, recovery time, or both, and the bill usually arrives at the worst possible moment.
A backup is a short-cycle copy of live data that is held against the day something goes wrong with the original: an accidental delete, a corrupt file, a failed drive, a ransomware attack, a stolen laptop. Its job is to put you back where you were yesterday, or last week, with as little fuss as possible. An archive is a different animal. It is a long-term store of data that you are no longer working with day to day, but that you still need to keep, perhaps because the law says so, perhaps because the project might be revisited, perhaps because the data has historical or sentimental value. Where a backup is read mostly during emergencies, an archive may sit untouched for years.
This article works through the practical differences, the most common ways people get the two confused, and how to set up SyncBack profiles so each job is doing what it was designed for.
TL;DR
A backup is a recent, frequently refreshed copy used to recover live data after loss or damage. An archive is a long-term store of data that is no longer in active use but still needs to be kept. They have different retention, different access patterns, and belong on different storage. Use both, but keep them separate.
Table of Contents
What a Backup Actually Is
A backup, in any practical sense, is a copy of files that you still depend on. The whole point of running a backup is to give yourself somewhere to go back to when something happens to the working copy. The threats are well known and depressingly common: a user deletes the wrong folder, a piece of software writes garbage into a database, a hard drive fails, a laptop is stolen, ransomware encrypts a share, a faulty power supply takes out a NAS shelf. In each case the working copy becomes useless, and the only question that matters is how recent and how complete your fallback is.
Because backups exist to handle recent damage, they have to be recent themselves. A backup taken six months ago is almost worthless to a business that has been writing to its database every day since. Two metrics come up over and over in backup planning. The first is RPO, the recovery point objective: how much new data can you afford to lose between backups. The second is RTO, the recovery time objective: how long you can afford to be down while you restore. For most home users an RPO of one day and an RTO of a few hours is fine. For a busy retail database, an RPO measured in minutes and an RTO measured in seconds may be the standard. The common thread is that both numbers are short. The whole shape of a backup system, including how often it runs, how it is stored, and how restores are tested, falls out of those two numbers.
A good backup also keeps multiple versions of the same file. If a document was already corrupted three days ago and only noticed today, a single most-recent copy will not save you. SyncBack's versioning settings exist for exactly this reason: they let you walk back through prior versions of a file rather than only the latest. Versioning is what makes a backup a backup, rather than a mirror of whatever bad state your files happen to be in this morning. Without it, a synchronisation tool simply propagates today's mess to a second location and calls it protection.
The storage that holds a backup needs to be fast enough to read and write at the rate the schedule demands, reliable enough to be trusted, and ideally separate enough from the original that one disaster does not take out both. External drives, NAS shelves, and warm cloud tiers all fit the role. Cold archive tiers usually do not, because the access pattern of a backup is "frequent writes, occasional restores, fast turnaround", which is the opposite of what cold storage is built for. The benchmark most professionals work to is the 3-2-1-1-0 backup rule: at least three copies, on two kinds of media, with one offsite, one offline or immutable, and zero verification errors. That rule is written for backups, and an archive should respect it as well, but the two need to be planned independently.
One more thing worth saying clearly. Individual backup copies are not meant to live forever. A typical backup retention policy might keep daily versions for 30 days, weekly versions for three months, and monthly versions for a year, with older copies falling off the end of the rotation. That fall-off is the design, not a flaw. The job of a backup is to cover recent change. The job of keeping things forever belongs somewhere else.
What an Archive Actually Is
An archive solves a different problem. The data in an archive is no longer being actively edited or accessed, but it cannot simply be deleted. Maybe a regulator says you have to keep seven years of accounting records. Maybe the photographer's wedding shoot from 2018 is finished, the client has been delivered, the prints have shipped, and there is no reason for the working copies to clutter the production drive, but those files still need to exist somewhere. Maybe a research project ended, the paper was published, and the raw data has to be preserved in case of audit or replication. Maybe a film production wrapped and the dailies have to live somewhere safe for the next twenty years in case a sequel goes into pre-production. In all of these cases the data has shifted from "live" to "kept", and that shift is the whole point of archiving.
Because the data is no longer active, an archive is read rarely. Once or twice a year is normal. Once in a decade is not unusual. This changes the storage maths completely. Where a backup destination needs to be reasonably fast and online all the time, an archive can sit on slow, cheap, or even offline media: LTO tape, optical, an external drive in a fire safe, a cold cloud tier such as Amazon S3 Glacier or Azure Archive Storage. Retrieval may take hours and may carry a per-gigabyte fee, and that is acceptable because retrieval is exceptional. The cloud archive storage classes guide explains the trade-offs between the various tiers if you are weighing up where to put a long-term store.
Retention works differently too. A backup retention policy talks about days, weeks, and months. An archive retention policy talks about years, often tied to a specific event such as the closure of a legal matter, the end of a tax period, the completion of a project, or the death of a copyright holder. The clock on an archive item starts ticking when the event happens, not when the file was created. Many archives are also expected to be immutable: once written, the data must not be changed or deleted by anyone, including administrators, until the retention window expires. This is what tools like S3 Object Lock, write-once-read-many (WORM) storage, and tape with the write tab broken off are for. Some industries (financial services and healthcare are the obvious examples) have explicit regulatory requirements for that immutability, and meeting them with a regular backup destination is difficult.
There is one more property that matters, and it tends to surprise people the first time they think about it. An archive is often the only remaining copy of the data. The whole reason to archive a finished project is to get it off the working drive and free up space. Once the originals are deleted, the archive is no longer a copy of anything. It is the source of truth. That has consequences for how an archive should be created, how it should be verified, and how many copies of it should exist. An archive that lives as a single copy on a single tape in a single building is not an archive, it is a future incident report.
The Differences Side by Side
Putting backups and archives next to each other in a single table makes the contrast easier to see at a glance. The columns below are the questions that usually decide which storage tier and which retention policy a given pile of data belongs in.
| Property | Backup | Archive |
|---|---|---|
| Purpose | Recover live data after loss or damage | Keep finished or inactive data for the long term |
| Source data | Currently in use | No longer in active use |
| Typical age of data | Days to a few months | Years to decades |
| Retention period | Days, weeks, a few months, sometimes a year | Years, often set by law or contract |
| Access frequency | Whenever a restore is needed | Rare, sometimes never |
| Acceptable restore time | Minutes to hours | Hours to days |
| Storage tier | Warm: NAS, external drives, warm cloud buckets | Cold or immutable: tape, optical, archive cloud tiers, WORM |
| Indexing | By date and source path | By project, matter, client, year, or other business key |
| Original still in production? | Yes, the live copy is still authoritative | Often deleted; the archive is the source of truth |
| Versioning | Multiple versions of each file | One canonical copy, immutable |
How People Get This Wrong (And What It Costs)
Most teams know in the abstract that backups and archives are different. The trouble starts when budget, time or storage gets tight and someone proposes a shortcut. Three patterns come up again and again.
Treating the Backup as an Archive
The most common version of this is the small business that decides "we will just keep every backup forever, and that way we have the archive too". On paper it sounds efficient. In practice it is two failures stacked on top of each other. The first failure is cost and management. Backup destinations are sized for recent data, not decades of accumulated history. Keeping every nightly copy of every file forever turns a sensibly sized backup volume into a runaway storage bill. Catalogues balloon, restore browsers slow to a crawl, and the backup software starts spending most of its time managing version graveyards rather than backing things up.
The second failure is retrieval. When an auditor asks for a single contract from 2019, the team has to dig through hundreds of backup sets to find it, because backups are organised by date and by source path, not by document or matter. An archive would have indexed that contract once, by the categories that matter to the business (client, project, legal hold), so retrieval is a search rather than an excavation. Backups are not built for that kind of lookup, and bolting an indexing layer onto a backup pipeline tends to produce something that does both jobs badly.
There is also a quieter failure mode that only shows up over the years. Backup formats change. Vendors are bought, products are discontinued, file formats are deprecated, encryption keys are rotated and old ones forgotten. A backup taken in 2014 with a tool that no longer exists is not really an archive. It is a stack of bytes nobody can read.
Treating the Archive as a Backup
The other direction is just as painful. A photographer ships a wedding shoot to LTO tape the day after delivery and clears the working drive. A month later, a corrupt RAW file is reported. The original is gone. The tape has the same corrupt file, because nothing was ever verified between the camera and the archive. There was no backup phase in between to catch the problem while the originals still existed.
This is the core mistake: the archive captured whatever state the data was in at the moment of archiving, including any damage. If the archiving step is also the only copy that ever existed, there is no chance to spot a problem before the originals are deleted. The fix is mundane but essential: keep the data on a working drive and inside a backup rotation for a period after delivery, then archive once the data has had time to be exercised, verified, and proven good. Only then should the originals be removed.
The same pattern hits any team that uses a deep cold storage tier as a backup destination. Restoring a single deleted file from S3 Glacier Deep Archive costs money and can take up to twelve hours. That is fine if it happens once a year. It is unworkable if it happens twice a week because someone keeps deleting the wrong thing. Cold tiers exist for data you are confident you will rarely touch, and that confidence has to be earned by having a real backup in front of them.
Mixing the Two in the Same Job
The third pattern is subtler. A SyncBack profile is set up to copy a working folder to an external drive nightly, with versioning enabled and a ten-year retention rule. The intention is good (keep both recent and long-term copies in one place) but the result is that the destination is now trying to be a backup and an archive at the same time. The backup is slowed down by the weight of historical versions. The archive has no immutability, no separate index, and no retention rule keyed to any meaningful business event. Old versions of files long since deleted from the working folder linger on the destination forever, taking up space that the active backup needs.
The cleaner pattern is two profiles. One profile handles the backup: nightly, with normal retention, written to a fast destination. A second profile handles archiving: triggered manually or quarterly, written to a separate destination, with no source-side deletion and an immutable or write-once target. Each profile then has a single, clear job, and the storage used by each can be sized and budgeted on its own merits.
Real-World Examples
The contrast is easier to see in concrete situations. Three short examples from very different working environments show the same idea in action: backups are doing one job, archives are doing another, and the two never share a destination.
The Photographer with Twenty Years of Work
Active projects sit on a working SSD. SyncBackPro runs every night and copies the working SSD to a NAS, with versioning set to keep the last 90 days of changes. That is the backup, and it covers the usual problems: an accidental delete during a Lightroom catalogue cleanup, a drive failure, a corrupted edit. When a project closes (the client has been delivered, the invoice paid, the prints sent), the photographer copies the project to a separate archive volume on bulk storage, and then to a second archive copy held offsite. Both archive copies are verified against checksums recorded at the time of the original capture. Once both archive copies have been confirmed, the project is removed from the working SSD. From that point on the archive copies are the source of truth for that work, and they are not part of the nightly backup rotation. New active projects flow through the same cycle as they come in.
The Law Firm with Active and Closed Matters
Active matters sit on a private cloud share that is backed up nightly with 30 days of versions. Closed matters are not. When a matter closes, an automated job moves it from the active share to an immutable archive store with a retention rule tied to the matter's closure date, set to whatever the firm's compliance counsel requires (often seven to ten years for litigation files, longer for certain types of matter). The archive store has its own indexing by client and matter number, so finding a closed file in 2031 is a search, not a restore. The backup pipeline never sees closed matters, and the archive store is never asked to behave like a backup. Each system is sized, budgeted and audited on its own.
The Company with Seven Years of Finance Records
The live accounting database is backed up every night and again every weekend, with daily versions kept for 30 days and weekly versions kept for a year. At year end, the database is exported in a documented, vendor-neutral format and written to an archive volume tagged with the financial year. The archive copies are held for the statutory retention period and are not touched by the regular backup rotation. When the auditors arrive in March asking for FY2023, the team pulls FY2023 from the archive, not from a five-year-old backup. The backup is doing its job (covering this week's data) and the archive is doing its job (holding the year-end snapshot that the tax authority can demand at any time within the retention window).
How SyncBack Handles Both Jobs
SyncBackPro and SyncBackSE can be set up for either role; the profile settings are what tell SyncBack which job a given run is doing. There is no "backup mode" or "archive mode" switch. There is just a profile, and the choices you make inside it decide whether you are running a backup or an archive.
A backup profile typically uses a scheduled run, daily or more often, so coverage is automatic and does not depend on someone remembering to click a button. Versioning is on, with old versions kept for as long as the recovery policy needs them and no longer. The destination sits on warm storage such as a NAS, an external drive, or a warm cloud bucket. Verification is on, so each run confirms the copy is intact. The profile usually runs in backup or mirror mode depending on whether the destination should track deletions on the source.
An archive profile looks different. It runs on demand, or on a long schedule (quarterly, annually) keyed to a project or compliance event rather than to the calendar. Source-side deletion is off, so the original is never removed by SyncBack itself; that decision is made by a human after the archive is verified. The destination sits on cold or immutable storage: a cloud archive tier, a separate volume, or write-once media. Hashes are recorded at copy time so that integrity can be checked years later, and the folder layout on the destination is designed for retrieval by project, matter, year or client rather than by source path.
Because the two profiles are separate, their retention rules do not interfere with each other. The backup destination stays sized for recent data. The archive destination grows slowly and predictably. When a problem turns up, the question of "where do I look" is decided by which profile owns the data, and there is no temptation to scroll back through a decade of nightly backups looking for one specific file.
One overlap point is worth flagging. Until the archive copies have been verified, the data is still in scope for the backup. The right time to remove a project from the working folder is not the moment the archive write completes; it is the moment the archive copies have been read back, hashed, and confirmed good. SyncBack can do the verification step as part of the archive run, but the originals should stay in the backup rotation for at least one cycle afterwards. That overlap is cheap insurance against a silent failure in the archive write.
A Practical Combined Workflow
Pulling the threads together, a workable strategy for almost any environment looks roughly like this:
- Run a real backup, every day, of everything live. Use versioning. Keep recent versions for as long as you genuinely might need to roll back, and not longer.
- When a project closes, do not just move it off the working drive into the backup destination. Archive it deliberately, to a separate destination, with at least two copies on different media.
- Keep the archive copies under the 3-2-1 rule in their own right: at least two copies, on at least two kinds of media, with at least one offsite. The archive is now your source of truth for that data, so it deserves the same protection that the live data did.
- Verify archives on a schedule, not just on creation. Bit rot is real, and ten-year-old media has a habit of going quiet right when you need it. A quick read-back hash check once a year catches problems while there is still time to do something about them.
- Keep the two retention rules separate. Backups roll over on their own clock. Archives roll over on the compliance or business event that controls them. Never let one calendar override the other.
- Document both. A backup that nobody can restore from because the documentation is missing is not a backup, and an archive whose retention rules live only in a former employee's head is not an archive.
Frequently Asked Questions
Is a synchronisation copy the same as a backup?
No. A sync mirrors the current state of the source, including any damage. Without versioning, a sync that runs after a corruption simply copies the corrupted file to the destination. For more on this distinction, see Cloud Sync vs True Backup.
Can a single device hold both my backup and my archive?
Technically yes, in separate folders, but it is not advisable for anything important. A backup and an archive should not share a single point of failure, and a single device usually means a single mechanical or electrical fault can take both out. At minimum, archive copies should live on separate media from the backup, and ideally in a different physical location.
How does immutability fit in?
Immutability matters most for archives, because the goal is to keep a fixed record that nobody can alter for a defined period. It is also useful for backups, especially in the context of ransomware defence, but the mechanism is usually different: a short immutability window on recent backup copies, versus a multi-year lock on archive copies.
What about cloud sync services like OneDrive or Dropbox?
Sync services are neither backups nor archives. They are convenience tools for accessing the same files from multiple devices. They can be useful inside a wider strategy, but they should not be the only thing standing between you and data loss. For a fuller treatment, see Cloud Sync vs True Backup.
How long should I keep my archives?
It depends on what is in them. Tax and accounting records typically have statutory retention periods (often around seven years, but it varies by country). Medical and legal records often have to be kept for decades. Creative projects are usually kept indefinitely, because there is no obvious moment when they stop being useful. The retention number should come from the rule that requires the data, not from a guess.
Conclusion
A backup recovers what you almost lost. An archive holds what you no longer use but cannot afford to delete. They are not interchangeable, and trying to make one tool do both jobs is how teams end up with bloated backup volumes, missing compliance records, or worse, single-copy archives that go bad while no-one is looking.
A real data protection strategy uses both, in their proper roles, with their own retention rules and their own destinations. SyncBackPro and SyncBackSE can run the backup job and the archive job side by side; what matters is that the two profiles are configured to do what each was designed to do, and that the storage behind each profile reflects how that data is actually going to be used.
If you do not currently have a clear separation between your backup and your archive, that is the place to start. Look at your current setup, work out what is doing each job today, and then sort out the gaps. Your future self, or your future auditor, will thank you.
Download SyncBackPro and start building a strategy that handles both sides of the problem.