Note: The description in the article is based on Veeam Backup & Replication 11a, licensed using Veeam Universal License (VUL), similar to Enterprise Plus.
Note: Version 12 partially changes the functionality of Backup Copy Job, certain information is in the article Veeam Backup & Replication 12 - Backup Chain and Backup Copy format upgrade.
Backup Copy
Backups and the 3-2-1 Rule
The main purpose of backup is to protect data from disasters and failures of virtual or physical machines. Backups must also offer functional, quality, and sufficiently fast data recovery.
It is generally recommended to use the 3-2-1 rule. We back up production data in the same location (on-site), to another storage, to allow for quick recovery. This backup is then copied to a different location (off-site), so we have data available in case of a serious disaster affecting the entire data center. Cloud storage can also be used.
Briefly about Backup Copy
To create backup copies, Veeam Backup & Replication offers the Backup Copy job. It allows creating multiple instances of the same backed-up data in different locations, either in the same location or in another. Backup copies have the same format as backups created by the Backup Job. Simply put, data is copied (synchronized) from one repository to another according to set rules.
For backup copies, we define our own retention policies to maintain the required number of restore points. And possibly GFS retention (long-term retention policies) for storing full backups for archival purposes.
How it works
- Jobs are divided by backup source type (platform) and can only contain objects of the same type (VMware VM, Hyper-V VM, Windows computer, Application, etc.)
- For transfer, the source and target Veeam Data Mover are used (either on the Repository or Gateway server), possibly also the WAN accelerator
- For the VM Backup Copy Job, it connects to the virtualization servers and gathers information about the VMs whose restore points we want to copy
First Backup Copy run
- creates a Full Backup, subsequent ones are incremental, using Forever Forward Incremental
- initially, data blocks needed to create the full backup of the latest state are copied (data may be read from multiple backup chain files)
- data is transferred to the target repository (using compression, deduplication, possibly WAN acceleration), where it is written to the backup file
Each subsequent Backup Copy run
- when a new Restore Point appears in the source repository, only incremental changes (between the previous and current job run) from the latest Restore Point (can be Full or Incremental) are copied
- if the last file is incremental, using the Forward Incremental or Forever Forward Incremental method, the Backup Copy Job works significantly faster
Copy Modes
- Immediate copy - for the selected source Backup Job, it copies each Restore Point as soon as it appears in the repository, copying all backups, including transaction log backups
- Periodic copy - more common, it copies the latest available Restore Point once in a specified interval (Backup Copy Interval - at the start of the interval, it checks if a new Restore Point is available, if not, it waits until one is created), we can select only certain machines (VMs) to be copied
Initial synchronization (initial data transfer)
To create the initial data copy (Full Backup) in the target repository, we can use offline transfer (Seeding). If the data is too large, the link is slow, and it would not be transferred over the network in time. We transfer the data to the target repository and then use mapping (Backup Copy Job Mapping).
Note: The official guide describes the procedure where we create a backup copy using the Backup Copy Job, which we then move to the target repository. Another part of the documentation states that the job can be mapped to a backup created by the Backup Job or Backup Copy Job. Therefore, it should be possible to skip this part and directly copy a specific backup (if it contains the same content we want to be copied).
- create a Backup Copy Job (we must use the copy mode we want to use in the target)
- add the sources we want to copy
- as the target repository, select a local Backup Repository, where the files for transfer will be prepared (it cannot be the same repository where the source backups are)
- run the Backup Copy Job to create a file with a Full Backup (incremental backups may also be created and copied)
- move the backup folder (VBK, VBM, and possibly VIB files) to the target backup repository
- rescan the target repository
- Backup Infrastructure - Backup Repositories - select Rescan on the repository
- map the backup to the Backup Copy Job
- Home - Jobs, select Edit on the job created at the beginning
- on the Target tab, select Map backup
- select the backup, complete the wizard (Next), and confirm with Apply
- start the copy job
- right-click on the job and select Sync Now
Note: At the end of the article, I added my practical experience.
Linking Backup Job to Backup Copy Job
In the Backup Job settings, we can use the Storage tab option Configure secondary destination for this job. In the Secondary Target step, we can add a specific existing Backup Copy Job. This automatically adjusts the Backup Copy Job to set the Backup Job as the data source. This ensures backup storage in the secondary repository.
Backup Copy Job
The job can be created from the Home tab menu option Backup Copy Job. We select the job type based on the platform. One job can process one or more machines of the same platform. Machines are processed in parallel.
When creating or editing a job, we go through the wizard and set various parameters on several tabs. A brief description of the items:
- Job
- Name - each job must have a unique name
- Copy Mode - select Immediate or Periodic
- Objects - select the jobs (workloads) whose restore points we want to copy to the target repository, options vary by copy mode, select the source object container from
- Infrastructure (Periodic) - (only for VM Backup Copy Job) select VMs or VM containers from the entire virtual infrastructure, Restore Point is searched among all backups (if VMs are backed up by multiple jobs), the latest one is used, we can limit to a specific backup repository (Source)
- Backups (Periodic) - similar to Infrastructure, but select VMs or machines within a specific existing backup
- Jobs (Periodic, Immediate) - select a specific job, the entire Restore Point is copied
- Repositories (Immediate) - select a specific backup repository, copies all Restore Points of the same platform as the Backup Copy Job
- Target
- Backup repository - specify the target backup repository (use Map backup to map the job to existing backups)
- Retention Policy - set retention policies (how many restore points or days to keep)
- GFS - set long-term retention policies for archiving
- Advanced - Maintenance - schedule regular maintenance, check the latest Restore Point - Perform backup files health check and maintain Full Backup, if not performing regular active full backups, remove deleted VM data from backups (when VM is deleted or removed from backup) - Remove deleted items data after and defragment and compact (create a new full backup, copying data, can only be used if not using GFS) - Defragment and compact full backup file
- Advanced - Storage - set data reduction - deduplication, compression, and encryption of backups
- Advanced - Notifications - set notifications upon job completion (can be set globally or here for the job) - SNMP, email
- Advanced - Scripts - run custom scripts before and/or after the job
- Data Transfer - select how the backup data is transferred
- Direct - directly from the source to the target repository, suitable for copies in the same location
- Through built-in WAN accelerators - deploy a WAN accelerator in both the source and target locations, used to save bandwidth during transfer
- Schedule - define a time window for copying (Backup Copy Window), can specify prohibited hours when data transfer between repositories is not allowed (data transformation on the repository can still occur)
WAN Acceleration
- Detailed documentation WAN Accelerators
- Veeam technology focused on Veeam jobs (backups) that optimizes data transfer to remote locations
- To use, we need two Windows servers (source and target), where we deploy the WAN Accelerator
- Performs deduplication (Global data deduplication, Variable block size deduplication), compression, multi-stream transfer, caching
- Optimizes network usage but utilizes (loads) other components (CPU, RAM, disk)
High bandwidth mode
- For links with speeds from 100 Mbps to 1 Gbps, it is recommended to use High bandwidth mode
- Does not use Global cache, uses faster compression methods, still saves bandwidth compared to direct mode (above 1 Gbps it is no longer beneficial)
- Must be enabled on both sides
WAN Accelerator
- The source Data Mover communicates with the source WAN Accelerator, which sends data to the target WAN Accelerator, which then passes it to the target Data Mover
- The source accelerator is more heavily loaded, multiple source accelerators (even from one location) can connect to one target accelerator
- The source accelerator processes jobs (VM disks) sequentially one by one
- A folder
VeeamWAN
with subfoldersDigests
andGlobalCache
is created on the server - The source accelerator creates and stores digest files (selection) for each processed VM disk, keeps 2 copies of these files, file size is about 2% of the disk size for Low bandwidth mode (5 TB = 100 GB) and 1% for High bandwidth mode (5 TB = 50 GB)
- For Low bandwidth mode, a Global Cache is created, on the target accelerator we should set 10 GB for each type of operating system, on the source it takes up about 2% of the set value for the target, detailed information WAN Accelerator Sizing
Deploying and configuring WAN Accelerator
The WAN Accelerator role can be deployed on a physical or virtual 64-bit Microsoft Windows server that is added among the Managed Servers. The server can already have another component (role), such as Backup Proxy or Backup Repository. We add the WAN Accelerator from the Backup Infrastructure perspective, the WAN Accelerators item, and the Add WAN Accelerator option.
- Server - primarily select the server to be used for the WAN Accelerator
- Traffic port - can change the default TCP port 6165
- Streams - number of connections for data transfer between accelerators, default is 5 TCP threads, for high latency links it is beneficial to increase the number to achieve higher transfer speeds (full bandwidth utilization)
- High bandwidth mode
- Cache - specify the path to the folder where the service files and Global Cache data are stored, when both accelerators are in High bandwidth mode, the Global Cache is not used, but we still need to specify the size and ensure there is enough free space on the disk, the specified Cache size is for each source accelerator
- Review - for the WAN Accelerator, two components are needed (Veeam Data Mover - Transport and Veeam WAN Accelerator Service), we see if they are already installed on the server or will be installed in the Apply step
After deploying the target accelerator and assigning it to the Backup Copy Job, it is recommended to use the Populate cache function (right-click on it). Then select the repository (Source Backup Repositories), it is recommended to select a repository in the same location as the target accelerator. Various types of OS are searched in it and data blocks are transferred to the cache.
Practical experience creating a Backup Copy Job with offline transfer of the initial full backup (Seeding)
I needed to transfer a backup copy of a file server, which is 5TB in size, to a remote location. Even over a 200 Mbps link, the initial copying of the full backup would take several days. So I decided to use Seeding and transfer the initial data on an external disk.
Splitting the file onto external disks
For the transfer, I had 2 external SSD disks of 4 TB each. The first problem is splitting the backup file onto two disks. I tried various tools, but all had problems with the size in the TB range, either they didn't work at all or the copying was very slow. So I ended up using Total Commander, which I started with, but it also has one problem. Many years ago, I used it to split files onto floppy disks, but splitting a TB file onto external disks didn't quite work. When splitting the file, it created another part even though there was no more space on the disk and then ended with an error (instead of asking for another medium or path).
I had to first split the file directly on the server and store it in a repository with enough space. I eventually chose multiple parts of 900 GB each. The splitting ran at a speed of about 180 MB/s and took more than 9 hours. Then I copied to external disks, the copying speed was about 295 MB/s and took just over 5 hours.
What is very important, you must upload the CRC file (created by Total Commander during splitting) to both the first and second disks. If you don't, only the files from the first disk will be used during merging. Total Commander won't know to ask for additional files and will stop. When it has the CRC file available, it knows the resulting file should be larger and will ask for the path to the next file. So, you swap the external disk and continue.
The entire process of creating a Backup Copy Job with mapping
- perform Export Backup - to obtain a Full Backup that includes the latest incremental backups, thanks to Fast Clone it took 7 minutes
- split the VBK file and upload it to external disks, upload the CRC file to both disks, also copy the VBM file
- transport to the remote location
- in the repository create a folder named after the future job (Backup Copy Job) and upload (merge) the VBK and VBM files (since the repository was connected only by a 1 Gbps link, the copying speed was 115 MB/s)
- you can rename the VBM file according to the job name (Backup Copy Job)
- create a Backup Copy Job, map the transferred backup, and start it
Export Backup
Allows synthesizing a complete and independent full backup file from selected restore points in the backup repository. This creates a VBK file for a specific date (and corresponding VBM). The backup is stored in the same repository as the source (in a folder named after the VM and date).
- Home - Backups
- find the VM you want to export, right-click and select Export backup
- you can select the Restore Point (default is the latest) and complete
Backup Copy Job
- rescan the target repository
- Backup Infrastructure - Backup Repositories - select Rescan on the repository
- the backup we transferred will appear in Home - Backups - Disk (Imported)
- if we didn't copy the exported backup but took the VBK and VBM files from the backup job, remove the missing backups - on the backup Properties and Forget - All unavailable backups
- create a Backup Copy Job
- Home - Backup Copy Job
- add the sources you want to copy, select the target repository in the remote location, set other parameters
- map the backup to the Backup Copy Job
- on the Target tab, select Map backup
- select the backup we transferred to the remote location
- complete the wizard Finish
- if we created the job disabled (Disabled), enable it Enable
- the job starts, if we are not in a time when copying is not allowed, it will begin
Synchronizing multiple Restore Points
I couldn't find an answer to one question, but hopefully, Veeam is intelligent enough and works correctly.
It is likely that copying data and transferring it to the remote location will take several days. Meanwhile, several new restore points will be created in the backup we want to copy. The official documentation describes the functioning of Backup Copy as copying changes from the latest restore point.
Veeam Backup & Replication copies incremental changes from this most recent restore point...
Only when creating the initial full backup can data be copied from multiple restore points to create a full backup.
During the first run of the job with mapped files of the full backup, several events are logged:
Missing restore points have been created... Copying restore point 01.02.2023 19:31:29 from backup repository ScaleOutRepo...
It states that it used the latest restore point, there is no information if it used other missing incremental backups for the existing full backup.
WAN Acceleration and Creating fingerprints
If you use WAN accelerators, you may encounter another problem. During the first run of the job, the log shows the event Creating fingerprints for Hard disk
, which in my case took 21 hours. Several times the job didn't complete within the copy interval. It always restarted and began creating Fingerprints from the beginning.
Fingerprints is probably what the documentation refers to as Digests. The accelerator analyzes the data blocks of the files (VM disks) to be transferred and creates a digest file of these data blocks. During the next run, it creates Digests of new files and by comparison determines if the data blocks have already been transferred. The creation occurs on the target.
This operation on a large disk takes a very long time. It can be bypassed by not using WAN acceleration (using the Direct transfer mode). Or you need to ensure that the creation of Fingerprints is completed. This may mean temporarily allowing data transfer by the job at any time and extending the periodic copy interval (e.g., to 2 days) or adjusting the start and run of the job.
Eventually, the creation of Fingerprints was completed, they were copied to the source. And the data was read and transferred from the source, the size of which was larger than any incremental backup. So hopefully, it was incremental data from the full backup transferred to the target.
Zdravím, chtěl bych se zeptat jestli je možné připojit záložní server pomocí veřejné IP adresy.
Chceme s kolegou zálohovat data na vzdálený server avšak nám to hází chyby.
respond to [1]Daniel: Určite je to možné, ale budete muset otevřít porty, které jsou potřeba pro kumunikaci Server <-> backup server(veeam port usage). Ale je to poměrně velké bezpečnostní riziko a myslete na kapacitu datové linky.