Enter the world of FinOps

After the loss of my previous position due to the COVID-19, I was fortunate to land a position as a FinOps. The title and the concept was new to me, not the practice, that I have been doing both as a consultant and also as part of my previous job.

To learn that like “DevOps” some one took the “Finance” and “Operations” and combined them together to “FinOps”. To try to explain this to someone outside the Hi-Tech industry is confusing “I am a technological person responsible on overseeing the technological expenditures of the company and working on reducing them while ensureing not to hinder productivity or inovetion” , it’s that last part that most people have a hard time with, as most associate R&D with unchecked spending and calling it “research” (somewhat like throwing a rock into a lake and hoping to hit a fish).

But the FinOps, in parallel to the CISO role is a person that will have his reach in SO many aspects of a company, be it reigning in the AWS spending, working with the developers to ensure that they get the tools they NEED but not nesserily the tools the want, negotiate and review contracts for new services and tools, evaluate the new tools both from functionality and cost effectiveness the FinOps person has his plate full of things to do.

All this and the roles I held before helped me realise that a many place make a crucial mistake when they employee or hire a person for the FinOps position, many places hire an analyst to the position, someone that comes from an financial background not a technological. Placing the emphasis on the Fin aspect of the title, like those that only want developers to DevOps positions.

That method, in my opnion, is the incorrect one.
What you get is someone with a limited scope that have no “operational” skill set and will have a slow learning about the technology and understanding how to talk to the technical people he needs to convince with his recommendation “You can run the database on a ECS with a presistent volume on EFS and t2.medium backend, instead of a m5.large” (if you don’t understand what I said here, drop me a line).

A good FinOps person needs a good Operational background and a solid understanding on how to consider financial implications, but the skill most tech savvy people have is the hard one to find – People skills.
FinOps need to negotiate contracts, talk to sales people and tech people, and ( I know from personal experiance) a lot of us do not have that.

In summery, I am glad that I took this change in direction and hope I will be successful in it, and that I can give from my learning to others.

This post was originally meant to be posted on #peerlyst, but due to the demise of that community, I posted it here.

Production distributed system – pt. 2

Once we were able to have the Galera databases sync and aware of each other is was time to tackle the issue of “How do we register the service?”

So it was time to work on the Consul cluster, we considered using 3 different nodes for this cluster to add the another layer of redundancy to each component but the customer elected to run the Consul service on the same nodes as the Galera. It might seem like an odd point to have the discovery server run on the same node as the service is it monitoring, but the logic was “if the Galera node is down, then the Consul service is also degraded, and we will address them together”

So we build a 3 node Consul service, with agents on each of the Galera nodes.

each node was configured to join the cluster with 2 other nodes specified in the “start_join” directive

{
"server": false,
"datacenter": "foo",
"data_dir": "/var/consul",
"encrypt" : "",
"log_level": "INFO",
"enable_syslog": true,
"start_join": [ "172.2.6.15","172.2.7.10" ]
}

The file was located in the /etc/consul.d/client/config.json  this took care of the client/server sign up, but when about knowing if the Galera is up … Simple, we created a check that queries the backend database and reports back, this file , aptly named galera.json was located on the main /etc/consul.d   directory

{
"service":
{
"name": "galeradb",
"tags": ["icinga-galera"],
  "check": {
    "id" : "mysql",
    "name": "Check mysql port listening",
    "tcp" : "localhost:3306",
    "interval": "10s",
    "timeout": "1s"
   }
  }
}

this ensured that the Consul checked the response of the database and reported back to the cluster in case of a failure and make sure to allow election and deletion to the other nodes.

At this stage , then the backend was ready we started the Icinga installation, with 2 master and 2 web servers in a redundant connectivity (that documentation is found here ), and then we needed to configure the IDO to the Galera database, we hit an issue.

As we changed the /etc/resolv.conf on the Icinga nodes to use the 3 consul nodes , icinga use the Consul as the DNS and be able to resolve for the database:

/**

* The db_ido_mysql library implements IDO functionality
* for MySQL.
*/

library "db_ido_mysql"

object IdoMysqlConnection "ido-mysql" {
  user = ""
  password = ""
  host = "galeradb.service.consul"
  database = "icinga"
}

but considering that many checks of the system relied on DNS resolving of external IP’s .. we were stuck with how we can ensure that the service returned the correct IP.

So we had to connect Icinga  to a named server, in our case Bind9. We build a named service on the same nodes so we can make as little change on the icinga server and use the already configured DNS requests on port 53 [UDP] going to the consul servers to work for us.

A very basic named.conf :

options {
  directory "/var/named";
  dump-file "/var/named/data/cache_dump.db";
  statistics-file "/var/named/data/named_stats.txt";
  memstatistics-file "/var/named/data/named_mem_stats.txt";
  allow-query { any; };
  recursion yes;

  dnssec-enable no;
  dnssec-validation no;

/* Path to ISC DLV key */
  bindkeys-file "/etc/named.iscdlv.key";

  managed-keys-directory "/var/named/dynamic";
};

include "/etc/named/consul.conf";

Notice the inclusion of the consul.conf file , this is where the “magic” happens:

zone "consul" IN {
  type forward;
  forward only;
  forwarders { 127.0.0.1 port 8600; };
};

This file tells named to forward all DNS request to external DNS server except for those with a “consul” domain , which are then forwarded to the localhost on port 8600 ( consul’s default DNS port) ,and thus provide the IP of the Galera cluster, for any other IP is will go to the DNS of choice configured when the consul service was build, we choose the all too familiar  “8.8.8.8” ( this is added to the cluster bootstrap stage )

"recursors":[
"8.8.8.8"
]

So the next stage was to test the resolving and the system survival.

FlossUK 2018 – turmoil and joy.

FlossUK 2018
In our 2nd year coming to FlossUK we had the “frustrating” issue of having been asked to transform our talks, meaning a topic that we planned to talk about for 5 minute to a 20 minute talk and the topic of the 20 minute, to the 5 minute.

Assaf, our member at the event, worked on the presentations till the very last minute ( quite literally), because a talk was cancelled on the fact that the planned speaker had a last minute change in plans and could not make it, so an impromptu talk was inserted that covered almost 40% of what his intended talk was about, and in that required he change some of the slides.

Our main talk “shifting the acceptance approach in a devops team” went as well as can be expected and the response it got was encouraging – considering it was like preaching to the quire, but even disciples need to have the truth exposed to them and explained why things don’t always go the way the should/want.

It seems like it struck a cord with some members of the crowd as some came to ask questions about how to make clients listen.

So far it has been a very instructive and interesting event, the talks about image forensics, Terraform and Prometheus were very informative and well presented.