Explaining the Inverse Chain and Investigating Large Recovery Points

Follow

Scope

Examining how sector-level change on a protected system generates used space in the Inverse Chain

Causes and Management

Background

The Siris Inverse Chain is derived from the more traditional incremental image backup chain generated by ShadowProtect.

ShadowProtect Incremental Chain

Back to Top

The traditional ShadowProtect backup chain consists of a full backup file, followed by a chain of incremental image files. Although ImageManager provides ways to compress this chain, it may grow too long for practical management. Also, if a single incremental file is lost or corrupted, then the chain is no longer viable.

ShadowSnap Inverse Chain

Back to Top

ShadowSnap Agent establishes an Inverse Chain by converting a first full backup from ShadowProtect into a working Virtual Machine dataset, and taking a ZFS point-in-time snapshot to create the first recovery point.

When the ShadowSnap Agent requests the next incremental, ShadowProtect software writes and sends a new incremental much like it would in a traditional backup chain. But instead of archiving this incremental file, ShadowSnap then updates the Virtual Machine dataset with it, and takes another point-in-time ZFS snapshot of the updated dataset.

In this way, the backup dataset residing in the Siris storage is effectively a full backup, updated with the latest changes from the protected system. Incremental point-in-time snapshots allow the Siris to restore previous versions of the Virtual Machine dataset. Therefore the ShadowSnap inverts the traditional backup chain by keeping a full backup of the most updated data, with additional incremental space used by a historical recovery point chain.

The Inverse Chain allows for greater compression, rolling retention, deletion of recovery points from anywhere in the chain without interfering with integrity, and instant recovery from any given point in the chain.

Used Space in an Inverse Chain

Back to Top

These benefits are made possible by the way ZFS integrates logical volume management into a filesystem, but this comes at some cost to intuitive space reporting for individual recovery points. Because the recovery points are sparse snapshots of datasets residing in sparse-provisioned volumes dynamically managed by the ZFS storage pool, the space 'used' by each recovery point snapshot changes with the distribution of data change throughout all other recovery point snapshots in that dataset.  If a recovery point's 'used' space is defined as the amount of disk storage that would be freed by deleting it, then this space changes as adjacent snapshots are created or deleted, changing the amount of historical data unique to the given snapshot.

Misconceptions

Therefore, at current writing, the sizes of recovery points listed in the Siris web console do not represent used space on the device, only the size of the incremental image file transferred to create a recovery point.

Recovery points are therefore a good measure of data change on the protected system, but not of used space added to the dataset.

The amount of space used relative to the data transferred depends on how well the incremental ShadowProtect writes matches with the backup chain.

recoveryPoints.JPG

Example: The circled size does not indicate space used by this recovery point, only the size of the backup file sent to create this recovery point.
However, an incremental of this size does suggest an incident of high block level change on the protected system.

In the worst case scenario, if a protected server experiences an unexpected shutdown or severe backup writer conflict, the next incremental as seen by ShadowProtect may be so divergent from the recorded backup chain that it cannot be reconciled, and the chain must be restarted with a new full backup.

Where to Look

The best indication of total used space would be found on the Home page, under Local Storage.  Note that the "Total Protected" number under Device Information indicates the total original size of all protected volumes.

Meanwhile the 'Local Storage' listing shows the total size of each agent volume, including both the size of the original volume and the used space added by maintaining recovery point snapshots.

Causes of Large Recovery Points - Block/Sector Level Change and Journaling Disruption

Back to Top

Large, full-image-sized recovery points as seen in the 'Access Recovery Points' view may sometimes be Differential Merges.

New Full Backup vs. Differential Merge

If a new full backup has been taken, the total space used by an Agent will be approximately double the original size of a full image of all included volumes (as set in Advanced Options). In the case of a Differential Merge, a recovery point the size of a full backup is displayed, but the total used space does not dramatically increase.

If a system is consistently requiring Differential Merge backups, there is likely a block-level or VSS-related issue which may finally lead to a condition where a new full backup is required.

Volume Shadow Service (VSS)

Often the causes of large recovery points will be found by checking running services against conflicting services such as those found in the VSS Explained article. These services can often upset ShadowSnap incremental journaling, such that it must use the Differential Merge process to find where the last incremental left off.

Disk Defragmentation

Alternatively, evidence may exist on the protected system's System Event Log of disk defragmentation or other maintenance taking place during a backup, previous or ongoing disk failure, previous system failure, or unexpected shutdown. These events may generate so much block level change that a normal incremental will never reconcile with the original backup chain.

Please note that block-level changes due to defragmentation or checkdisk operations may be disproportionately large, as the repair of each block may imply changes to multiple blocks on the disk.

ntfsError.JPG

Example: If NTFS errors do not cause backups to fail completely, they may generate very large incrementals or repeated full backups.

Causes of Large Block/Sector Level Data Change

Back to Top

Another source of large incrementals would be as a direct result of data change.

SQL Logging Issues

For instance if ShadowSnap is backing up a volume used to store SQL backups, SQL or Exchange logs, or other backups of the system that would be redundant with ShadowSnap backups. If we are imaging an SQL database and the backup for that database, we are backing that database up twice.

The data change used by redundant backups can multiply if backups are also being deleted, as both the deletion and the new backup will register as change.

Volume Shadow Copies

Running Volume Shadow Copies and storing them on a volume imaged by ShadowSnap has the same effect. If Volume Shadow Storage is near capacity this may also be deleting old backups as it makes new ones, multiplying the block-level change detected.

Database Defragmentation

The defragmentation of large databases will also translate to large block-level change - it is important backups do not run during defragmentation, as subsequent incrementals will have difficulty matching the chain and a Differential Merge or new full backup may be necessary.

Virtual Machines

In a hypervisor environment, backup snapshots of guest OS systems stored on a volume being backed up by ShadowSnap will also generate large block level change, or trigger Differential Merges.

SAN-hosted Storage

SAN hosted storage that has been sparsely or dynamically allocated is not recommended, and backups of these volumes tend not only to be large but also difficult to recover data from, as normal incremental journaling will have difficulty tracking dynamically allocated blocks. Data stored on sparse or dynamic SAN hosts should be backed up at a file level to a NAS share on the Datto device.

Distributed File System (DFS)

When running DFS, files get staged in a temp folder while awaiting transmission over the network. This can result in large block level change. If DFS is domain-dependent, make sure to supply the correct domain credentials to allow for the transfers to occur. Also, set it up with a path that the source service can reach. Even when you set it up according to these standards, you may still experience large block level change. As an alternative to sharing, try RDC.

Inverse Chain Mechanics and Managing New Full Backups

Back to Top

If a new full backup is taken by the ShadowProtect software, ShadowSnap updates the backup dataset by replacement with the new full image, and that new full image becomes another recovery point snapshot in the history of the dataset. Because previous snapshot recovery points continue to referentially protect the original full backup, the Inverse Chain will be using space to hold both full backups.

In this case, there remains one continuous Inverse Chain: The new ShadowProtect backup chain is added to the space used by the historical snapshot recovery points.

In order to reduce the space used by multiple full backups, all recovery points associated with old full backups must be deleted, leaving only the most recent full backup, which should remain compatible with the next incrementals.

These situations can lead to a device with full storage, which is discussed specifically in the Backup Skipped Due To Not Enough Free Space article.

It is possible to roll back a new full backup if it is the most recent recovery point, but whether ShadowProtect can generate a new incremental that will reconcile into the original chain depends on the extent of the change the ShadowSnap software is reading on the volume - when this is attempted, be sure to prompt a Differential Merge from the Agent's Advanced Options.

Our Differential Merge technology makes it possible to reconcile a backup chain with a changed volume in many cases that would otherwise require a new full image. However, maintaining a stable backup chain remains a concern for any incremental backup solution.

Maintenance of the protected machine, its disks, and its VSS writers, including an up-to-date OS, are the main avenues to preventing issues with large recovery points.

For more information, StorageCraft also provides this Guide to VSS.


Was this article helpful?

3 out of 3 found this helpful

You must sign in before voting on this article.

Want to talk about it? Head on over to our Community Forum!