Backing Up Servers with a Large amount of File/Folder

Backing Up Servers with a Large amount of File/Folder

If you have a server(s) with a large amount of files/folders and subsequent data that are erroring, especially when taking fulls with B4H, we recommend that you ensure the following settings are in place.

1. Make sure heartbeat is enabled on the affected FD(s) and the SD(s) they are backing up to. This will be a default for new B4H default as of 9/1/12, however, some clients or FDs added prior may not have it enabled.

To check to see if the setting is added and add it, you'll need to be comfortable with ssh'ing and editting files on your FD and SDs. If not, please open a support ticket with the access information and we can take a look for you.

Note: The following will interrupt jobs running on the FD and/or SDs, so make sure whatever you have running has finished before proceeding!

To enable SD heartbeat interval:
cd /opt/bacula/etc/
cp bacula-sd.conf bacula-sd.conf.bak
nano bacula-sd.conf and add: Heartbeat Interval = 60 underneath the maximum concurrent jobs parameter

If the heartbeat interval is already present, there's no need to do anything further on the SD.

It will look something like this once done:
Storage { # definition of myself
Name = someserver-sd
SDPort = 9103 # Director's port
WorkingDirectory = "/opt/bacula/working"
Pid Directory = "/opt/bacula/working"
Maximum Concurrent Jobs = 100
Heartbeat Interval = 60
}

Save your changes, and restart bacula-sd service with: service bacula-sd restart.

Now for the FD, it's very similar:

To enable SD heartbeat interval:
cd /opt/bacula/etc/
cp bacula-fd.conf bacula-fd.conf.bak
nano bacula-fd.conf and add: Heartbeat Interval = 60 underneath the maximum concurrent jobs parameter

If the heartbeat interval is already present, there's no need to do anything further on the FD.


It will look something like this once complete:
FileDaemon { # this is me
Name = someclient-fd
FDport = 9102 # where we listen for the director
FDAddress =
WorkingDirectory = /opt/bacula/working
Pid Directory = /opt/bacula/working
Maximum Concurrent Jobs = 20
Heartbeat Interval = 60
}
Save your changes, and restart bacula-fd service with: service bacula-fd restart.

2. If the heartbeat interval is already present on your FD(s) and SD(s) and you're running Linux, you may need to lower the tcp keepalive time to take full advantage of the heartbeat setting and resolve the issue.

To do this, first check what the value is in the running kernel with: sysctl net.ipv4.tcp_keepalive_time

7200s or 2hrs is the default in most RHEL distributions. We recommend lowering it to 60s. This will apply the change in the running kernel for testing purposes. To make sure that it takes affect with your backups, you'll want to restart the bacula service as well. Make sure this setting it applied to both the FD and SD that the erroring job is utilizing priro to retesting.

To apply it to a FD:
sysctl -w net.ipv4.tcp_keepalive_time=60
service bacula-fd restart

To apply it to a SD:
sysctl -w net.ipv4.tcp_keepalive_time=60
service bacula-sd restart

Now, proceed to run the jobs again and see if all is well. If they complete successfully, be sure to add net.ipv4.tcp_keepalive_time=60 to /etc/sysctl.conf so that the setting persists across reboots. If net.ipv4.tcp_keepalive_time already exists, but it's set to a different value, simply edit the value to 60 and save the file.     
    • Related Articles

    • Backing up over a LAN Network

      Backing up over a LAN network is accomplished in the following way 1) add an SD to your Bacula4 GUI user account 2) while adding the SD, ensure you add a WAN IP and LAN IP to the wizard (if no LAN IP is assigned, the SD will automatically only backup ...
    • Backing up over a LAN Network

      Backing up over a LAN network is accomplished in the following way 1) add an SD to your Bacula4 GUI user account 2) while adding the SD, ensure you add a WAN IP and LAN IP to the wizard (if no LAN IP is assigned, the SD will automatically only backup ...
    • Windows Restore Error - ERR=The process cannot access the file because it is being used by another process

      Occassionally on Windows servers, users will see the following restore error in the job logs "ERR=The process cannot access the file because it is being used by another process." This is a known issue with windows in certain scenarios, and impacts ...
    • Windows Restore Error - ERR=The process cannot access the file because it is being used by another process

      Occassionally on Windows servers, users will see the following restore error in the job logs "ERR=The process cannot access the file because it is being used by another process." This is a known issue with windows in certain scenarios, and impacts ...
    • Adding Servers - Licensing Options

      Please see this article for a description of the different license types HERE When adding servers/VMs - here are the options you choose (again, based on the licensing model linked above) 1) add a server - use this option when adding a stand alone ...