Engine migration

One of the customers we have has asked us to help evaluate alternative monitoring engines to the one they currently use (Nagios Core 4.1.1, complied from source) within their overall solutions, as an “easy to start” option we advised them to migrate to Icinga 1.x (although we explained to the client that the project has no active development to it, and is in maintenance mode only).

Understanding those issues and knowing that if they want to fully migrate to another solution they will have to either write it themselves or adapt to another openly available option, we implemented the latest Icinga 1 ( Icinga 1.14.0 ) within their solution and asked them to benchmark the behaviour of the system :

Application   Nagios Engine Icinga Engine
Main page Part I 14.31s 9.5
  Part II 12.72 9.32
Event Console   9.71s 12.1
       
Host Groups   21.25s 14.7
       
Outages   7.5s 8.19
       
System   9.71s 8.7

 

Stats: 20000 Hosts, 1 service check per host, As can be seen in most cases the Icinga engine outperformed the Nagios engine.

Finally NRPE is adding True SSL security connection

After so many years of people pointing out that the so called security of the NRPE agent is not really a valid security, it seem that the developers working on the NRPE project have finally taken those concerns to heart and incorporated  a proper SSL security configuration method to the agent :

# SSL/TLS OPTIONS
# These directives allow you to specify how to use SSL/TLS.

# SSL VERSION
# This can be any of: SSLv2 (only use SSLv2), SSLv2+ (use any version),
# SSLv3 (only use SSLv3), SSLv3+ (use SSLv3 or above), TLSv1 (only use
# TLSv1), TLSv1+ (use TLSv1 or above), TLSv1.1 (only use TLSv1.1),
# TLSv1.1+ (use TLSv1.1 or above), TLSv1.2 (only use TLSv1.2),
# TLSv1.2+ (use TLSv1.2 or above)
# If an “or above” version is used, the best will be negotiated. So if both
# ends are able to do TLSv1.2 and use specify SSLv2, you will get TLSv1.2.

#ssl_version=SSLv2+

# SSL USE ADH
# This is for backward compatibility and is DEPRECATED. Set to 1 to enable
# ADH or 2 to require ADH. 1 is currently the default but will be changed
# in a later version.

#ssl_use_adh=1

# SSL CIPHER LIST
# This lists which ciphers can be used. For backward compatibility, this
# defaults to ‘ssl_cipher_list=ALL:!MD5:@STRENGTH’ in this version but
# will be changed to something like the example below in a later version of NRPE.

#ssl_cipher_list=ALL:!MD5:@STRENGTH
#ssl_cipher_list=ALL:!aNULL:!eNULL:!SSLv2:!LOW:!EXP:!RC4:!MD5:@STRENGTH

# SSL Certificate and Private Key Files

#ssl_cacert_file=/etc/ssl/servercerts/ca-cert.pem
#ssl_cert_file=/etc/ssl/servercerts/nagios-cert.pem
#ssl_privatekey_file=/etc/ssl/servercerts/nagios-key.pem

# SSL USE CLIENT CERTS
# This options determines client certificate usage.
# Values: 0 = Don’t ask for or require client certificates (default)
# 1 = Ask for client certificates
# 2 = Require client certificates

#ssl_client_certs=0

# SSL LOGGING
# This option determines which SSL messages are send to syslog. OR values
# together to specify multiple options.

# Values: 0x00 (0) = No additional logging (default)
# 0x01 (1) = Log startup SSL/TLS parameters
# 0x02 (2) = Log remote IP address
# 0x04 (4) = Log SSL/TLS version of connections
# 0x08 (8) = Log which cipher is being used for the connection
# 0x10 (16) = Log if client has a certificate
# 0x20 (32) = Log details of client’s certificate if it has one
# -1 or 0xff or 0x2f = All of the above

#ssl_logging=0x00

 

This is a massive step up from the way it was before, now we need to see if the nrpe plugin (a.k.a check_nrpe ) or the Nagios Core has also been updated to include the directives for the communication to the improved NRPE.

2016 Nagios World Conference cancellation notice

Nagios_logo

To those that were hoping to attend the Aiki Linux presentation on the 2016 Nagios World Conference, We are sorry to say that we just got a notice from Nagios.com announcing the cancellation of the event :

“We regret to inform you that we have made the difficult decision to cancel the 2016 Nagios World Conference due to lower than anticipated attendance.  As a scheduled speaker, we appreciate the work you put into preparing for the conference and want to thank you for volunteering to speak.”

It is a sad day to see that a project that for a long time has been benchmark and the leading product in IT monitoring has fallen back with the times so much that his main conference has been cancelled due to low interest.

Icinga: from stand-alone to Cloud oriented evolution

When you start talking to System Administrators, DevOps engineers, Developers or NOC personal about open source NMS* tools, the first one that comes to mind is Nagios®. When you tell them that Icinga is a fork of Nagios, they are quick to dismiss it too, due to the reputation that Nagios carries with it.

Icinga owes its origin to Nagios’ limitations. It started as a fork from the Nagios code, and as such was encumbered with the limitation of the original design: Classic Data Centre computing – bare metal and manual management, but it evolved to something quite different.

Icinga was born due to the slow progress of the Nagios project in response to requests made by the community and the inability of that same community to contribute, send updates and improvements upstream to the core product. Wanting to allow the community to have its say, Icinga started by asking for improvement ideas. Many improvements were added to the Icinga code but the main request that was coming back from users was: redundancy and adaptability.

Creative solutions thought of: integrations with CMDBs, active-passive fail-over with data syncing etc. but none of those truly solved the inherent problem that was in Nagios – centralized standalone box that required constant updating to keep up with new servers or services.

The Icinga team knew they had to adapt the project to the Cloud frame of thought and mode of adaptability. Thus Icinga2 and the Icinga2 API were conceived and developed. Icinga2 was designed from the ground up, forgoing the old code base and starting from a fresh view. Some principals were brought over (hosts, services, hostgroups, servicegroups and dependencies) but the approach to those concepts was altered:

  • The configuration scheme was changed to become a configuration DSL which allows quasi-code to be incorporated in the definition and includes conditional logic for improve dynamic adaptation to the actions taken.

  • While the old classic UI works seamlessly with the new engine, a new Web2.0 User Interface (titled Icinga2 Web2) was designed, and includes the capabilities to incorporate other tools that expand the monitoring exposure to the Dashboard, like ELK* stack, Graphite, Nagvis and PNP4Nagios.

  • Icinga2 comes with a truly distributed solution that allows monitoring in geographically separated zones to sync and transfer HA capability for redundancy, which is supported by a SSL based communication to increase the security of the data transferred between nodes.

  • Icinga2 API is a fully fledged REST API that allows registration and configuration of nodes and services at run time without the need to modify configuration files or restart the main process for the systems to be monitored or be removed (for example: when a cloud instance goes off-line).

  • Icinga1 was available as a preconfigured VM* image to those that wanted to test and learn how to use the system. Moving along with the times, Icinga2 comes in other ways that are easier and faster to adopt and deploy: Docker & Vagrant. The Icinga team also provides a ready to use play-books/recipes/manifests for the most popular configuration tools: Puppet, Chef and Ansible.

If you are looking to learn about how implement Icinga2 into your organization, all those developments and more topics are covered in the official training courses provided by Icinga training partners, like Aiki Linux Ltd.

Glossary:

NMS = Network Monitoring System, a software tool that enables querying the state of the nodes that exist in the company’s infrastructure (local & remote).

ELK = ElasticSearch, LogStash & Kibana stack, a combination of software tools that allows collection and aggregation of logs and data for analysis and study.

CMDB = Configuration Management Database 

Talking to the API

In many environments I find myself “teaching” employees how to work with a monitoring system: why nagios is sending more then one alert , why they were unable to acknowledge an issue , why they did not get a notification about an event they saw on the dashboard , what the benefit of the scheduled downtime is , and many other things that come with working with nagios.

Almost always the question comes up “is there an automated way we can have to do all those things ?” , well, the answer to this is always “Yes , and No” .

Yes – there is the External commands that are available to communicate with the nagios core process and allow you to submit actions without using the Web Ui .

No – to each command there are somewhat different parameters and you will need to know which ones need to be added for which command , otherwise the request will fail .

So I have been working on trying to allow the user to “simplify” the access to the cgi and wrote a script :

#!/bin/bash

usage (){
echo “usage :sumbit_cgi_option.sh <Host> <Service> <Comment>  <username> <password> <cmd_mod> ”
echo -e -n “The script requires several parameters in order to work :\n”
echo -e -n “\tHost = The short hostname [i.e lmmapp2801].\n”
echo -e -n “\tService = The descriptive service name  [i.e Jave Core Dump], a name containing spaces should be encapsulated in single quotes ‘ ‘ .\n”
echo -e -n “\tComment = A descriptive comment pretainig to the issue. \n”
echo -e -n “\tUsername  = Your Nagios interface login name. \n”
echo -e -n “\tPassword = Your Nagios Interface password.\n”
echo -e -n “\tcmd_mod = should the command run on a host level or a service level :\n”
echo -e -n ” \t\t2 = Service Level (Prefered Method)\n”
echo -e -n ” \t\t1 = Host Level \n”
}

type_host () {
echo -e -n ” CMD_ADD_HOST_COMMENT\t                            1\n”
echo -e -n ” CMD_SCHEDULE_HOST_DOWNTIME        \t              55\n”
}
type_svc(){
echo -n -e ” CMD_ADD_SVC_COMMENT\t                              3\n”
echo -n -e ” CMD_SCHEDULE_SVC_DOWNTIME          \t             56\n”
echo -n -e ” CMD_SCHEDULE_HOST_SVC_DOWNTIME \t                 86\n”
}

if [ $# -ne  6 ] || [ $# -eq 0 ] ;then
usage
exit 1
fi

HOST=$1
SERVICE=$2
COMMENT=$3
NAGURL=http://10.173.4.82/nagios/cgi-bin/cmd.cgi
USER=$4
PASS=$5
CMD_MOD=$6

echo Scheduling downtime on nagios

if [ $6 -eq “1” ] ; then
echo “Enter the Host command type:”
type_host
read CMD_TYPE
if  [ $CMD_TYPE -eq  “55” ]  ; then
echo “Enter the Downtime Length (in minutes):”
read LENGTH
STARTDATE=`date +%d-%m-%Y\ %H:%M:%S`
ENDDATE=`date –date=”+$LENGTH minutes” +%d-%m-%Y\ %H:%M:%S`
curl $NAGURL -u $USER:$PASS –silent –show-error \
–data cmd_typ=$CMD_TYPE \
–data cmd_mod=$CMD_MOD \
–data host=$HOST \
–data “service=$SERVICE”  \
–data “com_data=$COMMENT” \
–data trigger=0 \
–data “start_time=$STARTDATE” \
–data “end_time=$ENDDATE” \
–data fixed=1 \
–data hours=2 \
–data minutes=0 \
–data btnSubmit=Commit | grep -q “Your command request was successfully submitted to Nagios for processing.” ;
else
STARTDATE=`date +%d-%m-%Y\ %H:%M:%S`
curl $NAGURL -u $USER:$PASS –silent –show-error \
–data cmd_typ=$CMD_TYPE \
–data cmd_mod=$CMD_MOD \
–data host=$HOST \
–data “service=$SERVICE”  \
–data “com_data=$COMMENT” \
–data trigger=0 \
–data “start_time=$STARTDATE” \
–data btnSubmit=Commit | grep -q “Your command request was successfully submitted to Nagios for processing.” ;
fi
echo Scheduled downtime for $HOST
exit
fi

if [ $6 -eq “2 ” ] ; then
echo “Enter the Service command type:”
type_svc
read CMD_TYPE
if  [ $CMD_TYPE -eq  “56” ]  || [ $CMD_TYPE -eq  “86” ]; then
echo “Enter the Downtime Length  (in minutes):”
read LENGTH
STARTDATE=`date +%d-%m-%Y\ %H:%M:%S`
ENDDATE=`date –date=”+$LENGTH minutes” +%d-%m-%Y\ %H:%M:%S`
curl $NAGURL -u $USER:$PASS –silent –show-error \
–data cmd_typ=$CMD_TYPE \
–data cmd_mod=$CMD_MOD \
–data host=$HOST \
–data “service=$SERVICE”  \
–data “com_data=$COMMENT” \
–data trigger=0 \
–data “start_time=$STARTDATE” \
–data “end_time=$ENDDATE” \
–data fixed=1 \
–data hours=2 \
–data minutes=0 \
–data btnSubmit=Commit | grep -q “Your command request was successfully submitted to Nagios for processing.” ;
else
STARTDATE=`date +%d-%m-%Y\ %H:%M:%S`
curl $NAGURL -u $USER:$PASS –silent –show-error \
–data cmd_typ=$CMD_TYPE \
–data cmd_mod=$CMD_MOD \
–data host=$HOST \
–data “service=$SERVICE”  \
–data “com_data=$COMMENT” \
–data trigger=0 \
–data “start_time=$STARTDATE” \
–data btnSubmit=Commit | grep -q “Your command request was successfully submitted to Nagios for processing.” ;
fi
echo Scheduled downtime for $SERVICE in host $HOST
exit
fi

I have tested it and it works well , but it still is limited in the functionality it provides so I am thinking of converting this to a PHP page/url that will allow the same functionality , but that will have to wait till i can find the time .

Announced Nagios certifications

So I logged on the site and tried the demo and to my disappointment it failed to work with my Linux Laptop , so I thought it was an issue with my system , so I contacted the test centre , I got a prompt reply but it wasn’t something i was expecting considering the exam I was planning for :

“Unfortunately our system is not compatible with Linux based operating systems at this time. We do however, offer support with Windows and Mac computers.”

So , when trying to test a Linux based application you are not supporting access from a Linux OS ? that seemed very odd to me so I approached  Nagios.com for clarification , and this was the answer I got:

“You are correct, at this time ProctorU does not support Linux OS’. I understand how that may seem odd, and apologize for any inconvenience this may cause you.

However, we spent a great deal of time selecting our certification proctoring partner, and eventually determined that ProctorU was able to offer the greatest overall value to our clients. Unfortunately one of the few downsides is that they do not support Linux OS’ ?

Hopefully you are able to get ahold of a Windows machine to take the certification on.”

So in essence I need to bend over backwards in order to pay money and take an exam , And work with an OS I despise ? Is it me or this sounds like a bad business planning ?

Nagios 3.4.0

The latest version of Nagios was announced yesterday  (Nagios core 3.4.0 released ), So today AikiLinux grabbed the new source code and gave it a test run.

The source code install went smooth and fast with no hiccups ( considering this is and upgrade on our test system , it was to be expected , we will also do a test on a clean build ) , and we noticed a new option in the make options “make install-exfoliation” , this a an newer Web interface that is being shipped along side the classic Nagios interface , and you have the option to install one or the other as they overwrite the html/stylesheets/  and  html/images/  under  /usr/local/nagios/share/ .

It will remain to see which UI the packagers will use when providing the packages , the provided spec file does not specify which UI to use so if you build the package on your own , and want to use the newer UI , you’ll need to watch out for that .

An issue that still remains since 3.0.6 is the refresh page , when viewing a page and hitting the F5 button , the interface is taking you back to the main page and not keeping it’s position in the nagios CGI display , this is happening due to the PHP/HTML integration in the web UI and is easily fixable by a simple coded insertion how do I refresh Nagios and stay on the current page

More testing and reviews to come later.