Icinga 2 Fundamentals course

We finished the first Icinga 2 Fundamentals course held in London , So far the course has proved very useful to our students and has shown them how to use Icinga in ways that were confusing before.

We covered some basic subjects outside the scope of the course to help some of the students become better acquainted with the Linux operating system.

The course pace varied as in the first day we went through a large part of the material, but as we moved to the “lab heavy” sections we slowed down, but we managed to overcome and achieve the goals.

A subject that came up repeatedly in the course is the request for training on the Icinga Director , which is a tool many users want to incorporate in to their system.

We believe that we might add another day to the next training course on that subject …

Evaluating Prometheus Monitoring Tool

A client asked to evaluate Prometheus Monitoring solution for it’s AWS infrastructure, so after 2 days of reading and testing the Prometheus system we can say several things about the tool:

  1. The modular build of the application is confusing at first and can be challenging to someone that is used to have the core product handle all the functions (comparison, alerting , test logic etc’), but once you are able to adjust your way of thinking to it, it makes sense and easy to see the logical division.
  2. Another “shift” from the Nagios approach is the way that Prometheus is evaluating when and how to alert. In Nagios and any system that has evolved from it’s school of thought the evaluation of the triggering is done on the individual data check (service), where as in Prometheus the individual check is irrelevant,the alerting logic is where the evaluation is done, based on multiple dividers: node names,logical grouping, the data point relative to time series and you can also add arithmetic calculation for prediction alerting based on historical data.
  3. The Prometheus clients capture many data points on your remote nodes and require very simple configuration for the server to read the data. The advantages of the “pull” (or “Active” to those coming from Nagios evolved systems) method is apparent as you can have many servers read the data from a single client for redundancy and be fast aware when a remote agent is no longer responding.
  4. A fully evolved query language that allows building complex logic for parsing and slicing the data to present the metric you wish to get.

With those good points (and there are more)  there are some issues that seem to be lacking :

  1. The built in interface does not update in a regular intervals, to achieve a visualisation that keeps the graphs current you need to use a 3rd party tool, the recommended one is Grafana, which already has the capability to use Prometheus as a data backend for querying.
  2. The modular build of the product may be an issue when the internal parts fail (alertmanager) as you will not be aware of the issue, as no alerts will be sent and the only indication will be the dashboard, granted you may define many alert manager instances to eliminate that issue, but for small implementation that still feels like a problem.
  3. “More is Less”: the abundant metrics supplied by the client can be daunting to begin with and understanding how to handle and use those for a basic monitoring setup can be overwhelming, causing the novice user to shy away and seek “simpler” solutions.

There are many more points that can be said both as Pro’s and Con’s on the system, as I am sure that many in the Monitoring world will point out, as a whole Prometheus provides a good solid tool, and as always, you need to consider 2 points when you choose a monitoring tool :

  1. What do you want to achieve?
  2. How much time you want to invest (Time = Money)  ?

When those two are defined and agreed upon, Prometheus could be one of the tools for consideration.

Ansible Playbook – Icinga

Working on a presentation for the FlossUK talk about the Metric led development I encountered some issues in the Ansible playbook that stopped the created Icinga from executing the remote checks via nrpe.

As I am learning Ansible and wanted to make the demo work (which can be found here ) I fixed the issue got the demo working.

With the move that Icinga did to use GitHub as the code repository, submitting the fix was an easy step and makes community contributions that much easier.

I also noticed that some features that I wanted were missing so I am planning to take up those tasks ( like having Icingaweb2 playbook work on Ubuntu) and try to get them working for the benefit of everyone.

Finally NRPE is adding True SSL security connection

After so many years of people pointing out that the so called security of the NRPE agent is not really a valid security, it seem that the developers working on the NRPE project have finally taken those concerns to heart and incorporated  a proper SSL security configuration method to the agent :

# These directives allow you to specify how to use SSL/TLS.

# This can be any of: SSLv2 (only use SSLv2), SSLv2+ (use any version),
# SSLv3 (only use SSLv3), SSLv3+ (use SSLv3 or above), TLSv1 (only use
# TLSv1), TLSv1+ (use TLSv1 or above), TLSv1.1 (only use TLSv1.1),
# TLSv1.1+ (use TLSv1.1 or above), TLSv1.2 (only use TLSv1.2),
# TLSv1.2+ (use TLSv1.2 or above)
# If an “or above” version is used, the best will be negotiated. So if both
# ends are able to do TLSv1.2 and use specify SSLv2, you will get TLSv1.2.


# This is for backward compatibility and is DEPRECATED. Set to 1 to enable
# ADH or 2 to require ADH. 1 is currently the default but will be changed
# in a later version.


# This lists which ciphers can be used. For backward compatibility, this
# defaults to ‘ssl_cipher_list=ALL:!MD5:@STRENGTH’ in this version but
# will be changed to something like the example below in a later version of NRPE.


# SSL Certificate and Private Key Files


# This options determines client certificate usage.
# Values: 0 = Don’t ask for or require client certificates (default)
# 1 = Ask for client certificates
# 2 = Require client certificates


# This option determines which SSL messages are send to syslog. OR values
# together to specify multiple options.

# Values: 0x00 (0) = No additional logging (default)
# 0x01 (1) = Log startup SSL/TLS parameters
# 0x02 (2) = Log remote IP address
# 0x04 (4) = Log SSL/TLS version of connections
# 0x08 (8) = Log which cipher is being used for the connection
# 0x10 (16) = Log if client has a certificate
# 0x20 (32) = Log details of client’s certificate if it has one
# -1 or 0xff or 0x2f = All of the above



This is a massive step up from the way it was before, now we need to see if the nrpe plugin (a.k.a check_nrpe ) or the Nagios Core has also been updated to include the directives for the communication to the improved NRPE.

2016 Nagios World Conference cancellation notice


To those that were hoping to attend the Aiki Linux presentation on the 2016 Nagios World Conference, We are sorry to say that we just got a notice from Nagios.com announcing the cancellation of the event :

“We regret to inform you that we have made the difficult decision to cancel the 2016 Nagios World Conference due to lower than anticipated attendance.  As a scheduled speaker, we appreciate the work you put into preparing for the conference and want to thank you for volunteering to speak.”

It is a sad day to see that a project that for a long time has been benchmark and the leading product in IT monitoring has fallen back with the times so much that his main conference has been cancelled due to low interest.

Icinga Camp Berlin

Speaking today at Icinga Camp Amsterdam

Icinga Camp Amsterdam 2016










Come Join us if you still can.

Will be a day full of Icinga cool talks