Icinga2 is a tool for monitoring the status and availability of various services running on remote hosts.

There seems to be an issue when setting an Icinga2 Master server and try to monitor an Icinga2 Satellite client that runs Docker, namely that if you try to monitor the disks using the (default) Nagios Monitoring Plugin "check_disk" it will crash immediately with:


***** Service Monitoring on icinga2masterserver *****

RemoteDisks on icinga2satellite is CRITICAL!

Info: DISK CRITICAL - /var/lib/docker/overlay2/92b9d501cf3bf52b30c6f3f3cc45535e8d6cdc35a78c674f7213c79dd9a14b7b/merged is not accessible: Permission denied

When: 2018-07-06 17:54:38 +0100

Service: Disks

Host: icinga2satellite


Not only does this fail to tell you the space on that disk, but stops reporting space on any other disk or mount that exist. Stopping Docker clears the "error" and the check_disk will work.

Warning: Don't stop Docker on a production server to test this! (Unless you really mean it!)

All my testing has been done on a small cluster of Debian Jessie and Stretch VirtualBox VMs before applying it to a production server.


The first problem you will probably investigate is that UUID after overlay2 in the pathname. This will change every time Docker is started. Neither is there a folder called merged at the end of it.

You might try making the “icinga2” user on the satellite a member of group “docker”, but this doesn't work. NB. on Debian, this user is called “nagios”

There are a load of options in check_disk to exclude paths and mount points, but these do not work, either.


For example, you might think this will work:

vars.disk_ignore_eregi_path = [ "/var/lib/docker/$$" ]

… but it won't.


What you really need to do is set up a custom service and exclude some filesystem types. Then apply it only to hosts running Docker


File: /etc/icinga2/zones.d/icinga2masterserver/MyServices.conf

apply Service "DockerDisks" {

import "generic-service"

check_command = "disk"

vars.disk_all = "true"

vars.disk_path = "/"

vars.disk_exclude_type = [ "overlay", "tmpfs" , "nsfs" ]

vars.disk_wfree = "40%"

vars.disk_cfree = "30%"

command_endpoint = host.vars.client_endpoint

assign where host.vars.client_endpoint && "docker" in host.vars.services

}


The important bit here is:

vars.disk_exclude_type = [ "overlay", "tmpfs" , "nsfs" ]

This is on Debian (Stretch and Jessie).


Use the mount command to check the type of filesystem on your OS, I believe on Centos it is overlayfs, not overlay, for example.

If you add these one at at time, you will see the DISK CRITICAL warning change every time you restart icinga2, and then you can examine and exclude the relevant additional filesystem types until the error clears.


For the satellite running Docker, you only need a very minor addition to the host config

File: /etc/icinga2/zones.d/icinga2masterserver/icinga2satellite.conf

# object for icinga2satellite

object Zone "icinga2satellite" {

endpoints = [ "icinga2satellite" ]

parent = "icinga2masterserver"

}

object Endpoint "icinga2satellite" {

host = "192.168.10.20"

}

object Host "icinga2satellite" {

import "generic-host"

address = "192.168.10.20"

vars.os = "Linux"

vars.services = ["docker" ]

vars.notification["mail"] = {

/* The UserGroup `icingaadmins` is defined in `users.conf`. */

groups = [ "icingaadmins" ]

}

vars.client_endpoint = name

}


I have added the line:

vars.services = ["docker" ]

Which matches up with

assign where host.vars.client_endpoint && "docker" in host.vars.services

in MyServices.conf


If it is your Icinga2 Master server itself that is running Docker, then the default disk check against itself will fail because the service itself is set to run where

host.name == NodeName


To stop the default disk check against NodeName if Docker is running, modify the default services.conf

File: /etc/icinga2/conf.d/services.conf

apply Service for (disk => config in host.vars.disks) {

import "generic-service"

check_command = "disk"

assign where host.name == !NodeName

vars += config

}


So we've added the line:

assign where host.name == !NodeName


Then add this new, replacement, service to your custom MyServices.conf

/* stop broken disk check on your icinga2 server with docker */

apply Service "icingadisks" {

import "generic-service"

check_command = "disk"

vars.disk_all = "true"

vars.disk_path = "/"

vars.disk_exclude_type = [ "overlay", "tmpfs" , "nsfs" ]

vars.disk_wfree = "40%"

vars.disk_cfree = "30%"

command_endpoint = host.vars.client_endpoint

assign where host.name == NodeName

}


So this service runs against “NodeName” ie your Icinga2 Master server


As usual after making new configs or changing them, do

sudo icinga2 daemon -C

to check for any problems with your new config, and then restart icinga2

sudo systemctl restart icinga2


And hopefully those DISK CRITICAL permissions errors related to Docker will be gone!


Links:

Docker : https://www.docker.com/

Icinga2: https://www.icinga.com/

VirtualBox: https://www.virtualbox.org/

Debian: https://www.debian.org/

Centos: https://www.centos.org/


Posted by Dominic Mason on 01/08/2018