What about GitOps?
Me again with opinions.
I want to tell my story, and capture my experience with
These are my opinions, and as always I will do my best to be as respectful and reasonable as I possibly can.
What does GitOps mean to me?
I encourage you to do your own research. I also concede that my definition of GitOps is probably flawed.
GitOps to me is using git and associated workflows to do operations for a company, team or organization.
In other words, it is now reasonable to use markup languages like YAML to reasonably represent the broader components of a team’s application. I then understand GitOps to be the practice of using Git to manage this aforementioned config (YAML).
What does Operations mean to me?
Operations is synonymous to technical chores in my mind.
These are manual tasks that are reactive in nature.
In the same way a handyman would respond to a leaky faucet, and operations engineer would respond to a production alert.
Upon evaluation of the problem (weather it be by faucet or pagerduty) the responder would then perform some manual and reactive task to mitigate the problem, and attempt to prevent it from happening again.
This metaphor is further convoluted when we explore new work. In this example we might be “installing new faucets” into a new kitchen. Or perhaps we are renovating an existing kitchen. An operations engineer is then also tasked with these types of chores – installing “new” or “renovated” items in production.
Why is Git useful to software engineers?
Git is a handy tool because of how it enables distributed teams to work on the same lines of code.
Contrary to popular belief, Git doesn’t actually measure deltas in lines of code like other source control tools do. It rather stores references to files as a set of SHA1 hashes. These “snapshots” as the book defines them, are just revisions.
In a world where software engineers are managing a lot of changes to the same file, having an ability to store, track, and manage lines of code is useful.
In the same way that IaC and Kubernetes have enabled config management, Git has enabled workflows. Webservices such as GitHub and GitLab are built around managing the metadata and process involved in using Git to reconcile lines of code (snapshots) of files.
Some consequences of these software workflows are the abilities for distributed teams to “approve” changes, and for distributed teams to have a remote source of truth for the software’s lines of code. GitHub has a well known pull request model that gives teams a workflow to accept, reject, or request changes to proposed mutations in the code.
Software engineers find value in these consequences, because it enables them to quickly work on the same code using a standardized communications and approval model. These workflows enabled software engineers to go slow, be meticulous, and work very diligently on their code.
Where has GitOps caused problems?
The notion of using a distributed software management tool to do operational chores can be paradoxical in nature.
The value of operations is to be reactive, move quick, and fix problems as they present themselves. The value of software management is to go slow, be meticulous, gain buy-in, and favor diligence over efficiency.
GitOps is an antipattern.
I spend a lot of time defining “antipattern” in my critical talk about Kubernetes The Clusterfuck in the Kubernetes Code
Not to put too fine of a point on it, however I believe that software engineering workflows, and operational tasks contradict with each other. This conflict typically ends in undesirable compromise.
One of the two sides of the equation typically end up being sacrificed in order to make the other side work.
- The safe value in software management workflows is forgone in favor of quick turnaround. Merging directly to production. Reactive management.
- The timely value in operations workflows is forgone in favor of diligence in the approval process and code review.
What do I really think is the problem?
I do believe GitOps addresses a real problem, however I also feel that it causes equal or greater problems in other ways.
I think what we are really trying to address here isn’t the same set of concerns that Git addresses for software teams. If we accept that operational work will always be reactive in nature, and will be successful by quick effective turn arounds I think what we are looking for is distributed and effective editing.
Reminding ourselves of that Kubernetes and IaC encourages config management, is Git really the best way to do distributed editing for a quick turnaround?
I believe that GitOps is to operational work as emailing PDFs was to the 1990’s. The world doesn’t need another “work-around” and set of “agreements” to do distributed editing. I am sad to say that GitOps reminds me too much of emails with attachments such as
Karen updated Thursday - Final - 1.0 - Proposal.pdf
If you really want to do “Operations” you just want a distributed text editor that edits CRDs and objects directly.
At least this way discoverability and drift would be addressed, and we wouldn’t have required the Terraform state problem with YAML files.
Sure - we can put a shiny green “accept” button on your change before it goes live. If you would like.
I don’t think you want to do “Operations”
If the thought of a distributed text editor editing production files scares you (and it should) then perhaps it’s time for us to reevaluate how much production mutating we are doing on a daily basis.
Putting a pull request in front of a reactive production change, doesn’t make it any more safe or any more of a better idea than editing directly in production.
I agree with Git. I agree with Ops.
Where you lose me is when you combine Git and Ops and hope for the best of both worlds, and ignore the consequences.
I personally don’t find value in using Git to do Ops, especially when I consider the cost of the added complexity.
I would rather see teams offer feature released config that is versioned, and unchanging. Like a proper software delivery team. I would rather see teams do reactive operations well, using tools built specifically for the problems they are facing every day. Like a distributed text editor, or distributed CRD/YAML editor.
I would prefer if our teams didn’t have a lot of reactive operations to do 🙂