FlossUK 2018 – turmoil and joy.

FlossUK 2018
In our 2nd year coming to FlossUK we had the “frustrating” issue of having been asked to transform our talks, meaning a topic that we planned to talk about for 5 minute to a 20 minute talk and the topic of the 20 minute, to the 5 minute.

Assaf, our member at the event, worked on the presentations till the very last minute ( quite literally), because a talk was cancelled on the fact that the planned speaker had a last minute change in plans and could not make it, so an impromptu talk was inserted that covered almost 40% of what his intended talk was about, and in that required he change some of the slides.

Our main talk “shifting the acceptance approach in a devops team” went as well as can be expected and the response it got was encouraging – considering it was like preaching to the quire, but even disciples need to have the truth exposed to them and explained why things don’t always go the way the should/want.

It seems like it struck a cord with some members of the crowd as some came to ask questions about how to make clients listen.

So far it has been a very instructive and interesting event, the talks about image forensics, Terraform and Prometheus were very informative and well presented.

Terraform output , Ansible and Icinga

In the last several month Assaf have been (slowly) maintaining and working on updating and improving the Icinga2  Ansible playbooks , and as he worked on those he found that he needed the hosts to test on to be built and taken down in a fast repetition, and to preform the build and shutdown was too time consuming.

Welcome Terraform , the wonderful tool from HashiCorp has provided him with the ability to provision the server and remote nodes fast and in a reproducible way to ensure that each code run is tested in a clean and similar setup.

One issue was that the ‘Hosts’ file for the ansible run had to be manually changed each time with the new IP’s of the new instances ( AWS is nice for this short intervals) , and that slowed the progress down.

We know that many people are using ansible and terraform combination to manage their infrastructure, but in most cases the we found on line the ansible is called as a ‘local-exec’ provisioner at the end of the execution , and thus uses the internal variable’s of the terraform run, as we needed an external file for the testing ( for simulation of the user experience and the way the roles are looking at the inventory) it was important to create the inventory file in a specific way.

“Simple” most terraform users will say, “just use the provisioner ‘local-exec’ to write the output to a file” and they are correct, with a little caveat, if you wish to write the file in a specific resource creation order, you end up with a file that is out of order.
For example, here is an output file we got when building an icinga2 demo environment with a master and 2 nodes (webservers):

 [monitoring_servers]   # the Icinga roles need this group to know which are the master servers

This will cause our Ansible to read as if we have 3 master servers, and that of course is incorrect.

What the file should look like is


Do notice the group separator that is required/was added in the end, this was skipped in all the previous runs due to the order of creation, so to fix this issue the solution was very simple. ‘depends_on‘ which is a simple “wait” function that causes resources to wait for one to be ready.
In this case we wait for the server IP to be added and then we add the label and the IP’s of the nodes.

This solution has enabled us to speed out testing quite nicely and is a thing that should allow us to bring more improvements to the Ansible playbooks in a faster pace.

Microsoft’s divide and conqueror

Active directory and LDAP are to most used authentication tools in the world today, used by many companies and on-line services to authenticate and authorise users for accessing the provided resources.

We have been working on a project for a customer and in it we had to use the company’s Active directory for authenticating users to the UI of the platform, the system is running on Linux so we configured LDAP to query the AD.

Microsoft is proud to announce that it is the largest contributor to open source projects on github today, and declaring that they “embrace” open-source (“look you can even run Ms-SQL on Linux”).

So far everything sounds very nice and simple, and then we tried to get the list of users and we encountered a simple ugly frustrating truth:

 Contributing != Collaborating 

Starting in 2003 Microsoft has added a limitation into the AD configuration that disallows any other protocol that queries the AD from getting a list of records that is longer then 1000 , if you want the list of users and the company has 1002 , your LDAP query will only give back either the first 1000 or the last (depends on your filters) and if you have more then 10,000 , you are in a big problem.

There is a “fix” that can allow you to get more then the 1000 results posted on the Microsoft tech net, but it might not be suitable for everyone as not every Linux System person may have access or the cooperation of the company’s IT to implement it.

This behaviour persisted in the 2008 and later versions of the AD platform so we can see that Microsoft might be “embracing” Open-Source but they are very far from “integrate with Open Source”, As we can imagine the change to allow these 2 tools to work seamlessly with each  other should not be that complex, as it was possible in the past, but it seems that the commercial aspect [e.g. force others to move to AD] is the prevailing thought that stops the change.

Engine migration

One of the customers we have has asked us to help evaluate alternative monitoring engines to the one they currently use (Nagios Core 4.1.1, complied from source) within their overall solutions, as an “easy to start” option we advised them to migrate to Icinga 1.x (although we explained to the client that the project has no active development to it, and is in maintenance mode only).

Understanding those issues and knowing that if they want to fully migrate to another solution they will have to either write it themselves or adapt to another openly available option, we implemented the latest Icinga 1 ( Icinga 1.14.0 ) within their solution and asked them to benchmark the behaviour of the system :

Application   Nagios Engine Icinga Engine
Main page Part I 14.31s 9.5
  Part II 12.72 9.32
Event Console   9.71s 12.1
Host Groups   21.25s 14.7
Outages   7.5s 8.19
System   9.71s 8.7


Stats: 20000 Hosts, 1 service check per host, As can be seen in most cases the Icinga engine outperformed the Nagios engine.

A Foray into Docker

One of our clients is attempting to break his product to micro services and docker containerisation and asked us to help in building the Docker containers and images.

Having little exposure to docker prior to this engagement our consultant had his doubts about the ability to break the requested component to a container in the allocated time for the task.

The on-line and learning Apps that are available for free were a great help, and in a matter of 2 days he was able to provide with the client a base line image that can server not just for the specified project, but also for other parts of the overall solution, with only minor alteration that can be added to the Dockerfile.

Looking at the project now, the consultant said he will be able to commit to the timetable and might be able to provide more capabilities to the containerisation plan.

The single issue that was encountered and might not be well explained or documented is the ability to export/save/import images, it seemed logical that you can export a layer from an existing image without the need to run the container, however that is not the case.*

When you want to apply a layer to the image and save it for further building, you must “create” a container by running the image and only then export it to a file, We have already build the image using the “docker build” command, why not take that image and export it with the new added layer to allow the creation of the new “base”  image ?  Seems a bit counter productive.

We understand that the running of the container is used for testing and ensuring that the build was successful, but shouldn’t that be the choice of the builder ,if a builder removes the cmd “echo IT works” directive at the end of the Dockerfile, it should be his choice if he wants to run the container or immediately export it as a new image .

Despite that small issue, we believe that Docker is a great tool, and will work to deepen our exposure to it.

*All the comments above refer to the CLI interactions, if there are tools that overcome that issue, we have not used them and as such can not speculate on their usability.




Simple things in life

In many places we work we have to manage a client’s servers and remembering IP’s and names can be tiresome or very confusing.

SSH has a wonderful ability to allow configuring “shortcuts” that you can configure to associate names to servers and even define the login you want to use for each one, the problem becomes that you need to manage the file and add those entry manually (or even remember to do that).

A simple bash function can help make life easier :

mkshtc () {
_user=$(echo $1 | awk ‘{split($1,a,”@”); print a[1]}’ )
_host=$(echo $1 | awk ‘{split($1,a,”@”); print a[2]}’ )
printf “Host $2\\n\\tHostname $_host\\n\\tUser $_user\\n” >> ~/.ssh/config

using this in your .bashrc file you can add the new servers to the config file in a 1-liner

mkshtc admin@ glare_node_1

and the entry is added to your .ssh/config file

# cat ~/.ssh/config
Host glare_node_1
User admin

Easy, simple and now you can use names for your ssh connections (think of it as a DNS for your logins), the function can easily be expanded to include other attributes for the host like port and identity File:

mkshtc () {
_user=$(echo $1 | awk '{split($1,a,"@"); print a[1]}' )
_host=$(echo $1 | awk '{split($1,a,"@"); print a[2]}' )
_file=$(echo $1 | awk '{print $2}' )
printf "Host $2\\n\\tHostname $_host\\n\\tUser $_user\\n\\tIdentityFile ~/.ssh/$_file\\n" >> ~/.ssh/config

Just make sure it is the right one for your needs.

Icinga 2 Fundamentals course

We finished the first Icinga 2 Fundamentals course held in London , So far the course has proved very useful to our students and has shown them how to use Icinga in ways that were confusing before.

We covered some basic subjects outside the scope of the course to help some of the students become better acquainted with the Linux operating system.

The course pace varied as in the first day we went through a large part of the material, but as we moved to the “lab heavy” sections we slowed down, but we managed to overcome and achieve the goals.

A subject that came up repeatedly in the course is the request for training on the Icinga Director , which is a tool many users want to incorporate in to their system.

We believe that we might add another day to the next training course on that subject …

Evaluating Prometheus Monitoring Tool

A client asked to evaluate Prometheus Monitoring solution for it’s AWS infrastructure, so after 2 days of reading and testing the Prometheus system we can say several things about the tool:

  1. The modular build of the application is confusing at first and can be challenging to someone that is used to have the core product handle all the functions (comparison, alerting , test logic etc’), but once you are able to adjust your way of thinking to it, it makes sense and easy to see the logical division.
  2. Another “shift” from the Nagios approach is the way that Prometheus is evaluating when and how to alert. In Nagios and any system that has evolved from it’s school of thought the evaluation of the triggering is done on the individual data check (service), where as in Prometheus the individual check is irrelevant,the alerting logic is where the evaluation is done, based on multiple dividers: node names,logical grouping, the data point relative to time series and you can also add arithmetic calculation for prediction alerting based on historical data.
  3. The Prometheus clients capture many data points on your remote nodes and require very simple configuration for the server to read the data. The advantages of the “pull” (or “Active” to those coming from Nagios evolved systems) method is apparent as you can have many servers read the data from a single client for redundancy and be fast aware when a remote agent is no longer responding.
  4. A fully evolved query language that allows building complex logic for parsing and slicing the data to present the metric you wish to get.

With those good points (and there are more)  there are some issues that seem to be lacking :

  1. The built in interface does not update in a regular intervals, to achieve a visualisation that keeps the graphs current you need to use a 3rd party tool, the recommended one is Grafana, which already has the capability to use Prometheus as a data backend for querying.
  2. The modular build of the product may be an issue when the internal parts fail (alertmanager) as you will not be aware of the issue, as no alerts will be sent and the only indication will be the dashboard, granted you may define many alert manager instances to eliminate that issue, but for small implementation that still feels like a problem.
  3. “More is Less”: the abundant metrics supplied by the client can be daunting to begin with and understanding how to handle and use those for a basic monitoring setup can be overwhelming, causing the novice user to shy away and seek “simpler” solutions.

There are many more points that can be said both as Pro’s and Con’s on the system, as I am sure that many in the Monitoring world will point out, as a whole Prometheus provides a good solid tool, and as always, you need to consider 2 points when you choose a monitoring tool :

  1. What do you want to achieve?
  2. How much time you want to invest (Time = Money)  ?

When those two are defined and agreed upon, Prometheus could be one of the tools for consideration.