Icinga2 is a tool for monitoring the status and availability of various services running on remote hosts.
There seems to be an issue when setting an Icinga2 Master server and try to monitor an Icinga2 Satellite client that runs Docker, namely that if you try to monitor the disks using the (default) Nagios Monitoring Plugin "check_disk" it will crash immediately with:
***** Service Monitoring on icinga2masterserver *****
RemoteDisks on icinga2satellite is CRITICAL!
Info: DISK CRITICAL - /var/lib/docker/overlay2/92b9d501cf3bf52b30c6f3f3cc45535e8d6cdc35a78c674f7213c79dd9a14b7b/merged is not accessible: Permission denied
When: 2018-07-06 17:54:38 +0100
Service: Disks
Host: icinga2satellite
Not only does this fail to tell you the space on that disk, but stops reporting space on any other disk or mount that exist. Stopping Docker clears the "error" and the check_disk will work.
Warning: Don't stop Docker on a production server to test this! (Unless you really mean it!)
All my testing has been done on a small cluster of Debian Jessie and Stretch VirtualBox VMs before applying it to a production server.
The first problem you will probably investigate is that UUID after overlay2 in the pathname. This will change every time Docker is started. Neither is there a folder called merged at the end of it.
You might try making the “icinga2” user on the satellite a member of group “docker”, but this doesn't work. NB. on Debian, this user is called “nagios”
There are a load of options in check_disk to exclude paths and mount points, but these do not work, either.
For example, you might think this will work:
vars.disk_ignore_eregi_path = [ "/var/lib/docker/$$" ]
… but it won't.
What you really need to do is set up a custom service and exclude some filesystem types. Then apply it only to hosts running Docker
File: /etc/icinga2/zones.d/icinga2masterserver/MyServices.conf
apply Service "DockerDisks" {
import "generic-service"
check_command = "disk"
vars.disk_all = "true"
vars.disk_path = "/"
vars.disk_exclude_type = [ "overlay", "tmpfs" , "nsfs" ]
vars.disk_wfree = "40%"
vars.disk_cfree = "30%"
command_endpoint = host.vars.client_endpoint
assign where host.vars.client_endpoint && "docker" in host.vars.services
}
The important bit here is:
vars.disk_exclude_type = [ "overlay", "tmpfs" , "nsfs" ]
This is on Debian (Stretch and Jessie).
Use the mount command to check the type of filesystem on your OS, I believe on Centos it is overlayfs, not overlay, for example.
If you add these one at at time, you will see the DISK CRITICAL warning change every time you restart icinga2, and then you can examine and exclude the relevant additional filesystem types until the error clears.
For the satellite running Docker, you only need a very minor addition to the host config
File: /etc/icinga2/zones.d/icinga2masterserver/icinga2satellite.conf
# object for icinga2satellite
object Zone "icinga2satellite" {
endpoints = [ "icinga2satellite" ]
parent = "icinga2masterserver"
}
object Endpoint "icinga2satellite" {
host = "192.168.10.20"
}
object Host "icinga2satellite" {
import "generic-host"
address = "192.168.10.20"
vars.os = "Linux"
vars.services = ["docker" ]
vars.notification["mail"] = {
/* The UserGroup `icingaadmins` is defined in `users.conf`. */
groups = [ "icingaadmins" ]
}
vars.client_endpoint = name
}
I have added the line:
vars.services = ["docker" ]
Which matches up with
assign where host.vars.client_endpoint && "docker" in host.vars.services
in MyServices.conf
If it is your Icinga2 Master server itself that is running Docker, then the default disk check against itself will fail because the service itself is set to run where
host.name
== NodeName
To stop the default disk check against NodeName if Docker is running, modify the default services.conf
File: /etc/icinga2/conf.d/services.conf
apply Service for (disk => config in host.vars.disks) {
import "generic-service"
check_command = "disk"
assign where host.name == !NodeName
vars += config
}
So we've added the line:
assign where host.name == !NodeName
Then add this new, replacement, service to your custom MyServices.conf
/* stop broken disk check on your icinga2 server with docker */
apply Service "icingadisks" {
import "generic-service"
check_command = "disk"
vars.disk_all = "true"
vars.disk_path = "/"
vars.disk_exclude_type = [ "overlay", "tmpfs" , "nsfs" ]
vars.disk_wfree = "40%"
vars.disk_cfree = "30%"
command_endpoint = host.vars.client_endpoint
assign where host.name == NodeName
}
So this service runs against “NodeName” ie your Icinga2 Master server
As usual after making new configs or changing them, do
sudo icinga2 daemon -C
to check for any problems with your new config, and then restart icinga2
sudo systemctl restart icinga2
And hopefully those DISK CRITICAL permissions errors related to Docker will be gone!
Links:
Docker : https://www.docker.com/
Icinga2: https://www.icinga.com/
VirtualBox: https://www.virtualbox.org/
Debian: https://www.debian.org/
Centos: https://www.centos.org/
Posted by Dominic Mason on 01/08/2018