Check what the value is in the running kernel with: sysctl net.ipv4.tcp_keepalive_time
7200s or 2hrs is the default in most RHEL distributions. We recommend lowering it to 60s. This will apply the change in the running kernel for testing purposes. To make sure that it takes affect with your backups, you'll want to restart the ClusterLogics service as well. Make sure this setting it applied to both the FD and SD that the erroring job is utilizing priro to retesting.
To apply it to a FD:
sysctl -w net.ipv4.tcp_keepalive_time=60
service bacula-fd restart
To apply it to a SD:
sysctl -w net.ipv4.tcp_keepalive_time=60
service bacula-sd restart
Now, proceed to run the jobs again and see if all is well. If they complete successfully, be sure to add net.ipv4.tcp_keepalive_time=60 to /etc/sysctl.conf so that the setting persists across reboots. If net.ipv4.tcp_keepalive_time already exists, but it's set to a different value, simply edit the value to 60 and save the file.