When wanting to monitor the condition of your containers using Nagios there are some really nice features you can enable to check that your containers are up, not abusing the cpu etc. But wouldn’t it be nice to check out what’s going on inside the container too?

By using the Nagios plugin check_docker.py we are able to ping a query to the docker API to find out the condition of the containers. It will return our expected Nagios messages so we can actively monitor the states.

The plugin does require the docker API to be enabled on the docker host. When running it can return all kinds of useful json data about our containers.

To enable the Docker API I followed this guide, but on my distro (Debian Buster) had to look in /usr/lib/systemd/system for my docker.service file.

I created the folder docker.service.d and created the file override.conf within it, that just contained:

ExecStart=/usr/bin/dockerd -H fd:// -H tcp:// --containerd=/run/containerd/containerd.sock

Now I can visithttp://localhost:2376/containers/json?all=true in a browser and see all the juicy docker data.

With the API now active I can install and use the Nagios plugin. You can even run it from a non-Nagios system at the command line and try it out.

$ /usr/local/bin/check_docker --connection localhost:2376 --containers standby master --status running

OK: master status is running; OK: standby status is running

But what about inside the container?

Using our docker-compose.yml file we can add in the checks we want run inside the container. If any of them fail and return a bad result using the Nagios plugin with the --health flag will pickup on any of our internal failings.

docker-compose.yml – services

  container_name: master
  image: postgres:10
    - ./master/data:/var/lib/postgresql/data
  restart: always
    - 5432:5432
    test: [ "CMD", "pg_isready", "-q", "-d", "postgres"]
      timeout: 45s
      interval: 10s
      retries: 10

By adding in a health check section I can cause docker to carry out periodic testing inside the container and pass the result out to my Nagios plugin when it checks.

$ /usr/local/bin/check_docker --connection localhost:2376 --containers standby master --health

OK: standby is healthy; OK: master is healthy

You’ll obviously want to get more creative than the example here, but this lets me know that PostgreSQL is doing what it should.