Deep-Dive: Hyper-V 2016 Production Checkpoints

Checkpoints, also known as Snapshots in previous versions of Windows Server, are a mechanism for capturing a state of a virtual machine. Checkpoints allow a changed state to revert back to when the checkpoint was taken. When originally developed, Microsoft intended for Snapshots/Checkpoints to only be used for Development and Lab environments. It was common practice in many organizations to use these Snapshots/Checkpoints in Production to revert back to changes. For example, it has been well documented that sometimes hotfixes and patches can cause issues with productions systems. Once discovered, organizations would simply revert a VM from a previous state to fix the issue. This was not supported and or recommend by Microsoft.

A major advancement in Windows Server 2016 is the release of Production Checkpoints.

Previous versions of Windows Server Hyper-V used .XML-based files to represent VM Memory and the state of VM Devices respectively at the time of the Checkpoint. So not to be confused with Production files, these Checkpoint-specific files must be stored within a separate Checkpoint File Location. New to Windows Server 2016, Microsoft has now deprecated the .XML file format and have since introduced .VMCX and .VMRS file formats. The last portion of the checkpoint architecture is the differencing disk that’s used. This differencing disk follows the .AVHD(x) file format and is stored in the same directory as the Production .VHD(X) file. While the Checkpoint is open, all writes that occur are captured within this differencing hard disk. At the time of replay, the VM is powered off, the blocks of data are merged to the production .VHD(X) and the VM is brought back online.

Additionally, Hyper-V 2016 is deployed on the Resilient File System (ReFS) v3.1 within Windows Server 2016 the Checkpoint process is able to leverage the Block Clone API. Due to the nature of how snapshots were conducted within Server 2012R2 for instance Microsoft never supported creating Checkpoints on any production system. ReFS makes thins much more efficient as the existing blocks of data are never physically moved around they’re simply referenced via the metadata that ReFS employs.

Let’s take a look at this problem a bit deeper and use SQL Server as an example. With Standard Windows Server Checkpoints, all of the disk and memory state is captured, this includes in-flight transactions. So when you choose to apply this checkpoint, the application can have issues rolling back to this point in time. Production Checkpoints are fully supported for all Production applications as the technology now uses Windows Backup technologies. VSS is used inside the Windows guest operating system and System Freeze on Linux to appropriately place the application in a consistent state during the checkpoint process.

 

The illustration above illustrates the settings available on an individual virtual machine. All VMs that are created on Windows 10 or Windows Server 2016 TP 4 have Production Checkpoints enabled by default, however, you can choose via checkbox to revert to standard checkpoints if production is not available.

To change between types of checkpoints:

  1. Right click on the VM, choose Settings.
  2. Within the Management pane, choose Checkpoints
  3. Click either Production or Standard Checkpoints.

The example leverages PowerShell to change the checkpoint type to standard and then initiate a checkpoint with the name, StandardCheckpoint.

As previously mentioned, Standard checkpoints capture the memory and disk state of the virtual machine, so when reverted, the VM comes back up in exactly the same state as it was when the checkpoint was initiated. As seen below, upon applying checkpoint, StandardCheckpoint our VM comes directly back as it was before.

To enable Production checkpoints and replay this example, we can use the GUI within Hyper-V Manager or Powershell.

Within Hyper-V Manager, using the steps listed above, change the checkpoint type to Production and leave the checkbox un-checked — this way we are forcing Hyper-V to use Production checkpoints. Whenever you take a manual snapshot through Hyper-V Manager with Production Checkpoint enabled, you receive a confirmation that Production Checkpoints were used.

The key difference between Standard Checkpoints and Production Checkpoints is Volume Snapshot Service (VSS) is used for Windows VMs, and Linux-based VMs flush their file system buffers to create a file system consistent checkpoint. These are the same technologies that are used within image backup processes, making it possible to now checkpoint production workloads that include SQL Server, Exchange, Active Directory and SharePoint for example.

Below, we see that whenever this Production Checkpoint example is applied, our VM is brought up in a clean state. Meaning the Guest Operating System feels and looks as though it was shut down properly. After applying a Production type snapshot you MUST manually power the VM back on.

Making sure that applications and workloads are recoverable when things go bump in the night is very important. Modern backup solutions leverage snapshots and checkpoints to create point-in-time restore points. In Hyper-V 2016 these backup products leverage Recovery Checkpoints. Recovery Checkpoints are application consistent exactly like Production Checkpoints – the main difference is that Recovery Checkpoints are initiated by the backups software. In the image below we can see that the backup software utilized the Recovery Checkpoint.

Production Checkpoints are new within Windows Server 2016 Hyper-V and provide a better way of creating ad-hoc roll-back points in time.

Add a Comment

Your email address will not be published. Required fields are marked *