Production distributed system – pt. 2

Once we were able to have the Galera databases sync and aware of each other is was time to tackle the issue of “How do we register the service?”

So it was time to work on the Consul cluster, we considered using 3 different nodes for this cluster to add the another layer of redundancy to each component but the customer elected to run the Consul service on the same nodes as the Galera. It might seem like an odd point to have the discovery server run on the same node as the service is it monitoring, but the logic was “if the Galera node is down, then the Consul service is also degraded, and we will address them together”

So we build a 3 node Consul service, with agents on each of the Galera nodes.

each node was configured to join the cluster with 2 other nodes specified in the “start_join” directive

{
"server": false,
"datacenter": "foo",
"data_dir": "/var/consul",
"encrypt" : "",
"log_level": "INFO",
"enable_syslog": true,
"start_join": [ "172.2.6.15","172.2.7.10" ]
}

The file was located in the /etc/consul.d/client/config.json  this took care of the client/server sign up, but when about knowing if the Galera is up … Simple, we created a check that queries the backend database and reports back, this file , aptly named galera.json was located on the main /etc/consul.d   directory

{
"service":
{
"name": "galeradb",
"tags": ["icinga-galera"],
  "check": {
    "id" : "mysql",
    "name": "Check mysql port listening",
    "tcp" : "localhost:3306",
    "interval": "10s",
    "timeout": "1s"
   }
  }
}

this ensured that the Consul checked the response of the database and reported back to the cluster in case of a failure and make sure to allow election and deletion to the other nodes.

At this stage , then the backend was ready we started the Icinga installation, with 2 master and 2 web servers in a redundant connectivity (that documentation is found here ), and then we needed to configure the IDO to the Galera database, we hit an issue.

As we changed the /etc/resolv.conf on the Icinga nodes to use the 3 consul nodes , icinga use the Consul as the DNS and be able to resolve for the database:

/**

* The db_ido_mysql library implements IDO functionality
* for MySQL.
*/

library "db_ido_mysql"

object IdoMysqlConnection "ido-mysql" {
  user = ""
  password = ""
  host = "galeradb.service.consul"
  database = "icinga"
}

but considering that many checks of the system relied on DNS resolving of external IP’s .. we were stuck with how we can ensure that the service returned the correct IP.

So we had to connect Icinga  to a named server, in our case Bind9. We build a named service on the same nodes so we can make as little change on the icinga server and use the already configured DNS requests on port 53 [UDP] going to the consul servers to work for us.

A very basic named.conf :

options {
  directory "/var/named";
  dump-file "/var/named/data/cache_dump.db";
  statistics-file "/var/named/data/named_stats.txt";
  memstatistics-file "/var/named/data/named_mem_stats.txt";
  allow-query { any; };
  recursion yes;

  dnssec-enable no;
  dnssec-validation no;

/* Path to ISC DLV key */
  bindkeys-file "/etc/named.iscdlv.key";

  managed-keys-directory "/var/named/dynamic";
};

include "/etc/named/consul.conf";

Notice the inclusion of the consul.conf file , this is where the “magic” happens:

zone "consul" IN {
  type forward;
  forward only;
  forwarders { 127.0.0.1 port 8600; };
};

This file tells named to forward all DNS request to external DNS server except for those with a “consul” domain , which are then forwarded to the localhost on port 8600 ( consul’s default DNS port) ,and thus provide the IP of the Galera cluster, for any other IP is will go to the DNS of choice configured when the consul service was build, we choose the all too familiar  “8.8.8.8” ( this is added to the cluster bootstrap stage )

"recursors":[
"8.8.8.8"
]

So the next stage was to test the resolving and the system survival.

Production distributed system – pt. 1

A customer came to AikiLinux requesting our assistance in designing and implementing a highly distributed and resilient monitoring system based on Icinga, with a planned scope of monitoring it’s own internal cloud service and for some of the services it provides for it’s external customers.

In the initial step we evaluated the requirements of the cluster and also build a small scale lab for them (master, 2 satellites and a host to monitor), and then set out to understand the network topology and limitations that might impact performance.

The things that we found were “normal” for a large multi continent organisation:

  • remote separated data centres
  • very restrictive IT department
  • ESX resources
    … nothing new or things we haven’t encountered before.

So we set out to design the solution and thought on what components will help us provide a truly redundant system, without relying on any cloud provider service, all done in house.

The Stack we ended up with was fairly simple : MariaDB Galera , HashiCorp Consul, Named for the database, and a standard HA setup for the Icinga itself.

The first challenge in building this system was ensuring that the Galera cluster was up and running so we modified the  /etc/my.cnf.d/server.cnf

# this is read by the standalone daemon and embedded servers
[server]

# this is only for the mysqld standalone daemon
[mysqld]
log_error=/var/log/mariadb.log
#
# * Galera-related settings
#
[galera]
# Allow server to accept connections on all interfaces.
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=”gcomm://172.32.6.15,172.31.6.15,172.33.6.15″
innodb_locks_unsafe_for_binlog = 1

## Galera Cluster Configuration
wsrep_cluster_name=”icinga-galera”

and started the nodes….no sync between the master and the nodes.

We tested several solutions, modifying the security policy and the firewall, but in the end the only way to get the cluster up and running was to disable SElinux (mind you, it was after the 3rd firewall that you need to get through to gain access to the server) .

Once the node “saw” each other, we started testing data replication and we saw that 2 nodes replicate data but the 3rd did not.

It turns out that NTP was disabled and the time diff between the servers was more then 1900 seconds, uncomment the ntp records and ensure the sync of the clocks … and now we have replication. YAY!!

 

 

 

Updates and plans for the future

It has been a couple of busy months for our team at AikiLinux since FlossUK, with good things happening:

  • We have started working with Icinga on organising an Icinga Camp in Tel Aviv later this year, the provisional dates are 10-16 of December .
  • We have expanded our personel by bringing a new person to the fold in the UK.
  • As we strive to expand our knowledge, our team members have been working and have implemented a Prometheus monitoring solution at on of our customers, and also building a DR solution for their AWS  based on Terraform .
  • 2 new clients started the engagement with AikiLinux :

FlossUK 2018 – turmoil and joy.

FlossUK 2018
In our 2nd year coming to FlossUK we had the “frustrating” issue of having been asked to transform our talks, meaning a topic that we planned to talk about for 5 minute to a 20 minute talk and the topic of the 20 minute, to the 5 minute.

Assaf, our member at the event, worked on the presentations till the very last minute ( quite literally), because a talk was cancelled on the fact that the planned speaker had a last minute change in plans and could not make it, so an impromptu talk was inserted that covered almost 40% of what his intended talk was about, and in that required he change some of the slides.

Our main talk “shifting the acceptance approach in a devops team” went as well as can be expected and the response it got was encouraging – considering it was like preaching to the quire, but even disciples need to have the truth exposed to them and explained why things don’t always go the way the should/want.

It seems like it struck a cord with some members of the crowd as some came to ask questions about how to make clients listen.

So far it has been a very instructive and interesting event, the talks about image forensics, Terraform and Prometheus were very informative and well presented.

Terraform output , Ansible and Icinga

In the last several month Assaf have been (slowly) maintaining and working on updating and improving the Icinga2  Ansible playbooks , and as he worked on those he found that he needed the hosts to test on to be built and taken down in a fast repetition, and to preform the build and shutdown was too time consuming.

Welcome Terraform , the wonderful tool from HashiCorp has provided him with the ability to provision the server and remote nodes fast and in a reproducible way to ensure that each code run is tested in a clean and similar setup.

One issue was that the ‘Hosts’ file for the ansible run had to be manually changed each time with the new IP’s of the new instances ( AWS is nice for this short intervals) , and that slowed the progress down.

We know that many people are using ansible and terraform combination to manage their infrastructure, but in most cases the we found on line the ansible is called as a ‘local-exec’ provisioner at the end of the execution , and thus uses the internal variable’s of the terraform run, as we needed an external file for the testing ( for simulation of the user experience and the way the roles are looking at the inventory) it was important to create the inventory file in a specific way.

“Simple” most terraform users will say, “just use the provisioner ‘local-exec’ to write the output to a file” and they are correct, with a little caveat, if you wish to write the file in a specific resource creation order, you end up with a file that is out of order.
For example, here is an output file we got when building an icinga2 demo environment with a master and 2 nodes (webservers):

 [monitoring_servers]   # the Icinga roles need this group to know which are the master servers
54.202.16.213
34.217.59.140
34.214.204.190

This will cause our Ansible to read as if we have 3 master servers, and that of course is incorrect.

What the file should look like is

[monitoring_servers]
54.202.16.213
[webservers]
34.217.59.140
34.214.204.19

Do notice the group separator that is required/was added in the end, this was skipped in all the previous runs due to the order of creation, so to fix this issue the solution was very simple. ‘depends_on‘ which is a simple “wait” function that causes resources to wait for one to be ready.
In this case we wait for the server IP to be added and then we add the label and the IP’s of the nodes.

This solution has enabled us to speed out testing quite nicely and is a thing that should allow us to bring more improvements to the Ansible playbooks in a faster pace.

Microsoft’s divide and conqueror

Active directory and LDAP are to most used authentication tools in the world today, used by many companies and on-line services to authenticate and authorise users for accessing the provided resources.

We have been working on a project for a customer and in it we had to use the company’s Active directory for authenticating users to the UI of the platform, the system is running on Linux so we configured LDAP to query the AD.

Microsoft is proud to announce that it is the largest contributor to open source projects on github today, and declaring that they “embrace” open-source (“look you can even run Ms-SQL on Linux”).

So far everything sounds very nice and simple, and then we tried to get the list of users and we encountered a simple ugly frustrating truth:

 Contributing != Collaborating 

Starting in 2003 Microsoft has added a limitation into the AD configuration that disallows any other protocol that queries the AD from getting a list of records that is longer then 1000 , if you want the list of users and the company has 1002 , your LDAP query will only give back either the first 1000 or the last (depends on your filters) and if you have more then 10,000 , you are in a big problem.

There is a “fix” that can allow you to get more then the 1000 results posted on the Microsoft tech net, but it might not be suitable for everyone as not every Linux System person may have access or the cooperation of the company’s IT to implement it.

This behaviour persisted in the 2008 and later versions of the AD platform so we can see that Microsoft might be “embracing” Open-Source but they are very far from “integrate with Open Source”, As we can imagine the change to allow these 2 tools to work seamlessly with each  other should not be that complex, as it was possible in the past, but it seems that the commercial aspect [e.g. force others to move to AD] is the prevailing thought that stops the change.

A Foray into Docker

One of our clients is attempting to break his product to micro services and docker containerisation and asked us to help in building the Docker containers and images.

Having little exposure to docker prior to this engagement our consultant had his doubts about the ability to break the requested component to a container in the allocated time for the task.

The on-line and learning Apps that are available for free were a great help, and in a matter of 2 days he was able to provide with the client a base line image that can server not just for the specified project, but also for other parts of the overall solution, with only minor alteration that can be added to the Dockerfile.

Looking at the project now, the consultant said he will be able to commit to the timetable and might be able to provide more capabilities to the containerisation plan.

The single issue that was encountered and might not be well explained or documented is the ability to export/save/import images, it seemed logical that you can export a layer from an existing image without the need to run the container, however that is not the case.*

When you want to apply a layer to the image and save it for further building, you must “create” a container by running the image and only then export it to a file, We have already build the image using the “docker build” command, why not take that image and export it with the new added layer to allow the creation of the new “base”  image ?  Seems a bit counter productive.

We understand that the running of the container is used for testing and ensuring that the build was successful, but shouldn’t that be the choice of the builder ,if a builder removes the cmd “echo IT works” directive at the end of the Dockerfile, it should be his choice if he wants to run the container or immediately export it as a new image .

Despite that small issue, we believe that Docker is a great tool, and will work to deepen our exposure to it.

*All the comments above refer to the CLI interactions, if there are tools that overcome that issue, we have not used them and as such can not speculate on their usability.

 

 

 

Simple things in life

In many places we work we have to manage a client’s servers and remembering IP’s and names can be tiresome or very confusing.

SSH has a wonderful ability to allow configuring “shortcuts” that you can configure to associate names to servers and even define the login you want to use for each one, the problem becomes that you need to manage the file and add those entry manually (or even remember to do that).

A simple bash function can help make life easier :

mkshtc () {
_user=$(echo $1 | awk ‘{split($1,a,”@”); print a[1]}’ )
_host=$(echo $1 | awk ‘{split($1,a,”@”); print a[2]}’ )
printf “Host $2\\n\\tHostname $_host\\n\\tUser $_user\\n” >> ~/.ssh/config
}

using this in your .bashrc file you can add the new servers to the config file in a 1-liner

mkshtc admin@192.168.0.11 glare_node_1

and the entry is added to your .ssh/config file

# cat ~/.ssh/config
Host glare_node_1
Hostname 192.168.0.11
User admin

Easy, simple and now you can use names for your ssh connections (think of it as a DNS for your logins), the function can easily be expanded to include other attributes for the host like port and identity File:

mkshtc () {
_user=$(echo $1 | awk '{split($1,a,"@"); print a[1]}' )
_host=$(echo $1 | awk '{split($1,a,"@"); print a[2]}' )
_file=$(echo $1 | awk '{print $2}' )
printf "Host $2\\n\\tHostname $_host\\n\\tUser $_user\\n\\tIdentityFile ~/.ssh/$_file\\n" >> ~/.ssh/config
}

Just make sure it is the right one for your needs.