Basic troubleshooting for backup error alerts

Topic

This article describes basic troubleshooting for backup errors.

Environment

  • Datto SIRIS
  • Datto ALTO
  • Datto NAS

Description

Troubleshooting should consist of the following steps.

  • Gather information
  • Research the error
  • Ensure you've met prerequisites
  • Review protected machine health
  • Test a manual backup

Gather Information

Make a note of the following information:

  • System name
  • Operating system
  • Error message
  • Time the error occurs.

Log in to the Datto Device Web UI to see the full error

Open the Protect tab in the Datto Device Web for more information on the error. You'll see a red banner below the protected machine's name that describes the error. You can click Get More Info on the far right of the banner to view the error log.

You can get more detailed information by clicking Show Backup Logs > Show Agent Logs for the agent-based logging or Show Agentless Logs for agentless logging.

fig1.png
Figure 1: Error banner on the Protect tab

Research the error

  • Research the error in the Datto Knowledge Base. You can search by keyword, error string, or error code.
  • Knowledge Base articles document a wide variety of errors and what might cause them. The articles also include steps you can take to resolve most issues.
  • Many error messages are straightforward. For example: "Cannot connect to the host - aborting backup". You might receive this error if you scheduled a backup to run when the protected machine was powered off, or if the agent software is not running correctly on the protected machine.
  • Error messages will vary depending on which backup agent software is on the protected machine or If the backups are agent-based or agentless.
  • Check to see if a backup has completed successfully since the time the error occurred. If a backup has completed successfully, the error may have been transient or could be due to a conflicting task on the server.
  • If a backup is currently running after you received a failure alert, it could complete successfully when it retries. When a server has VSS errors, a backup may fall back to a crash-consistent mode and then complete successfully.
  • If the steps in a Knowledge Base article do not resolve your issue, verify the health of the protected system.

Ensure you've met prerequisites

Datto publishes "Getting Started" articles that include minimum system and networking requirements for each of our products. You should check and verify that your system meets these requirements. You can search our Knowledge Base for "getting started" for a complete list. Here are some of the more commonly referenced articles:

Review protected machine health

VSS

Microsoft VSS is an essential component in successful Windows backups. Many factors affect VSS functionality, including:

  • Disk health
  • NTFS filesystem health
  • Free space
  • Disk fragmentation
  • Disk I/O
  • Disk permissions
  • Antivirus software
  • Windows updates

To check the status of VSS writers before starting a new backup:

  1. Open a Windows CMD prompt as an administrator on the protected machine
  2. Type vssadmin list writers
  3. Press Enter.

The above command will list out all the writers on the protected system. A healthy writer will have a state of "stable." A VSS writer can only be used by one application at a time, so if you see a writer listed as waiting for completion, that indicates another application is currently using that writer.

See the following Knowledge Base articles for more information on resolving VSS issues:

Task Scheduler

  • Check scheduled tasks on the protected machine for anything that may have been running during the time the error occurred. Scheduled ShadowCopy Jobs can cause VSS conflicts during Datto device backup times.
  • Check for defragmentation applications. Sometimes these applications will disable VSS during the defragmentation process.
  • Check for other backup software that might still be running. Datto does not recommend running more than one backup solution on your protected machine; doing so will cause conflicts in most cases.
  • If the server is running Microsoft SQL, SQL backups can also cause VSS collisions. Take note of when these are running in SQL Studio. Reschedule backups, so they don't run at the same time and conflict. See Best-practices for backing up and restoring Microsoft SQL databases for more information.

Services

Server resources

  • You should have at least 1 GB free memory available during normal operation for the backup process to function without issues.
  • CPU utilization will increase when the system is taking a VSS snapshot.

Disk fragmentation

  • VSS application freeze cannot exceed 60 seconds, or the operation will time out. Highly fragmented systems may produce errors in regards to "flush and hold" taking too long.

Free disk space

  • You need at least 20% free space on the protected machine for snapshot copy-on-write operations.
  • Windows requires a 15% free disk space for defragmentation to work properly.

Event logs

  • Review Windows Application Logs in Windows Event Viewer.
  • Filter for event sources for VSS around the time of the backup error.
  • If you find event VSS errors in the event logs, research the event id with Microsoft. The resolution of VSS errors will help backups run successfully.

Test a manual backup

  • Once you have researched your error and taken steps to resolve the backup failure, manually start a backup for the agent from the Protect tab of the Datto Device Web.
  • If you get a different error, research and troubleshoot that error as well. If you are stuck and the backup continues to fail, you can contact Datto Technical Support.

Open a ticket with Datto Technical Support

The ticket should contain the following information:

  • The error that has occurred
  • Time of the error
  • Protected system name
  • Any Knowledge Base articles you have referenced
  • Any troubleshooting steps that you have taken.

This information will help to avoid repeating steps and allow Datto Technical Support to resolve the issue as soon as possible.