Click here for the RAW (easy to read) paper!


yes. hi hello. it’s me again 🙂

today i would like to talk about what i have been calling infrastructure as software.

or more importantly - the difference between infrastructure as software compared to infrastructure as code.

note: if you would like to read the 2017 book i published on this titled “cloud native infrastructure”, feel free to check out Chapter 3.


Understand what Turing complete means

You should be very comfortable with the term Turing Complete. You can read more here.

A basic summary is any “system” that can perform the following is considered Turing complete. In other words it is a computer.

  1. Can perform logic (if, else, else if, etc)
  2. Can iterate and expand memory (for loops, while loops, recursion, linking)

So basically if there is a machine, that has a programing language that can run on it - it is considered Turing complete.

Turing complete

// example.c

if (condition) {
    doThing();
}

Not Turing complete

# example.yaml

name: "kris nóva"

What is config?

static text. not turing complete.

Examples of config

  • .ini files
  • .cfg files
  • .xml files
  • .json files
  • .yaml files
  • .toml files
  • .properties Java config

What is code?

Turing complete human-readable instructions (text) and associated tooling (interpreters, compilers, etc)

code can be anything – sure – but the important part is that we draw the line here

  • Config: not turing complete
  • Code: turing complete

Examples of code

Any sort of code with if(){} statements or for{} loops. Salt, puppet, chef, ansible, etc, etc…

  • .hcl Terraform config
  • .yaml YAML with { text/template }
  • .conf Salt files with {% if %}
  • .py YAML can embed
  • .rb YAML can embed

The point is there are a lot of these. Pull requests accepted.

The other point is these are all Turing complete! ⚠


Why this matters

This matters because traditionally computer scientists treat config very differently than we treat code.

For example if you were to research mount on a Linux server. What would you find?

  1. A mount binary (executable) somewhere on the system
  2. Default config somewhere on the system.

On a linux system it would look something like this.

[novix@alice]: ~>$ which mount
/usr/bin/mount
[novix@alice]: ~>$ file /etc/fstab
/etc/fstab: ASCII text

The takeaway here is that the instructions (source code) that create mount are not on your computer. And that the config file fstab IS on your computer.

The program contains the logic. The config files hold values, and nothing more.

The 👏 config 👏 files 👏 are 👏 not 👏 Turing 👏 complete!

Drawing on history

Let’s look at the following examples of Turing complete systems (some of which were unintentional).

  • C++ Templates (lua)
  • Nginx config
  • Microsoft Excel/Power Point
  • Magic: The Gathering
  • C++ Templates
  • Minesweeper
  • Minecraft
  • X86 MOV
  • Factorio
  • Helm templates
  • Ansible

In other words in theory you could “compile” and “run” doom.exe on tools like helm, magic the gathering, and power point.

What is software?

People write software for many reasons. Companies that participate in capitalism typically write software for profit 💰.

Software is typically Turing complete.

There are best practices when writing, maintaing, deploying, and managing software.

Here are some features that modern day programming languages give us to use as computer scientists.

  • Color coding
  • IDEs
  • Rich unit testing tools
  • Asserting libraries
  • Integration testing
  • Cross compiling
  • Debugging
  • Logging
  • Versioning
  • Refactoring
  • Upgrading (Downgrading!)
  • Flow control
  • Linting
  • Syntax control
  • Observability
  • Hinting, User experience
  • Repeatability
  • Compatibility
  • Composability
  • Inheritance
  • Polymorphism
  • Abstraction
  • Encapsulation
  • Memory management (Garbage collection)
  • Concurrency principles
  • Retry logic
  • Error handling
  • Failure domains

Review of terms

Very quickly let’s review our terms.

  • Config: Static text. not turing complete.
  • Code: “Static”. Turing complete human-readable instructions (text) and associated tooling.
  • Software: Dynamic. Turing complete human-readable instructions (text) and associated tooling.

Notice the very important difference between “code” and “software”.

Software does things at runtime, and comes with the additional paradigm of a “software engineering workflow”.

Analysis

We have re-invented “programming languages” in the form of what we call code and we use this new code to manage Infrastructure, and we have done so poorly.

Remember the large list of “features” that we get with modern day programming languages? As any person who has attempted to debug a helm chart can tell you, debugging “code” can be difficult at best.

So knowing we now have 3 options to chose from. Also knowing that we are already hard coding different “values” in “manifests” in git.

Let’s re-ask some fundamental questions with our 3 possible options.

  1. Config (static, not turing complete)
  2. Code (static, turing complete)
  3. Software(dynamic, turing compelte)

What is the best pipeline for Turing complete use cases?

Do you have if{} statements in your code? Do these if{} statements do things for your team? Do you rely on these if{} statements in production?

This is a very clear example of what I would call “A Turing Complete” use case.

So what do you pick?

  1. Config (Won’t work because we require logic)
  2. Code (Okay solution, lacking tests, features, syntax)
  3. Software (Best solution, because does everything Code does, and more!)

Is it okay to hard code infrastructure as software?

In theory whenever you have a YAML file that looks something like this.

name: my-application
image: team/my-application:latest
port: 80

and you git commit and git push these changes… aren’t you effectively already “hard coding” your application?

My point is that for some reason we as infrastructure engineers have retained the lesson of:

hard coding is bad! don’t do it!

In favor of “manifests” that fail to actually make our lives “easier”.

I am suggesting that if you and your team are comfortable writing manifests like 👆 the one above. There is not much of a difference between that, and this Go library.


type Application struct {
    Name string
    Image string
    Port int
}

func MyApplication() *Application {
    return &Application{
        Name: "my-application",
        Image: "team/my-application:latest",
        Port: 80,
    }
}

When you chose the option like the 2nd option, you also get all the wonderful features of the programming language with it!

Heck - the Go compiler will even tell you what line number your syntax error is on! 😎 Try telling that to a nested YAML project.

  • No indentation concerns
  • No human burden to substitute values
  • Free formatting
  • Testing
  • The sky is the limit

Conclusion

Both software engineers (people writing business logic) and infrastructure engineers (people managing servers, vms, network, etc) can benefit from the many uses of modern programming languages.

In fact this paper could be read by any engineer with any use case (regardless of infrastructure), and the findings are equally relevant.

Especially when they favor modern programming languages over interpolated config AKA “code”.

Engineers can now use resiliency patterns found all over robotics, the linux kernel, and kubernetes such as

  • Controllers
  • Operators
  • Lifecycle libraries
  • Polymorphism
  • Scheduling
  • Reconciling
  • Watchdog timers
  • Retries
  • Status updates
  • State updates
  • Delta management

Creating, managing, deleting, updating, versioning, and otherwise mutating this virtual machine is now an exercise in “Feature Management” as a software team, in favor of “a YAML update” as an infrastructure team.

But how do you deploy the new “software”

If you are thinking this - you are one step ahead.

We need a pattern. We need an interface. We need to get people out of YAML land!

For right now I have opened up a pull request to the ever popular yamyams repository that attempts to standardize an interface we could use that we could build tooling around.

If anyone wants to contribute or offer a better interface feel free. I just want folks to stop writing YAML.

I will try to clean this code up as much as I can and at least get a basic library/cli pattern going. Please do consider contributing if you have opinions here.


// Deployable is an interface that can be implemented
// for deployable applications.
type Deployable interface {

	// Install will attempt to install in Kubernetes
	Install(client *kubernetes.Clientset) error

	// Uninstall will attempt to uninstall in Kubernetes
	Uninstall(client *kubernetes.Clientset) error

	// Resources returns untyped struct{}s which represent your application.
	// This is the first concrete point in which we realize 
	//    that what we are doing, has no business being "generic".
	Resources() []interface{}

	//YAML() []byte
	//JSON() []byte
}

The point is that instead of building a library of code (.yaml files) we are building a library of software (.go files).

.
├── go
│   ├── # Please do this
│   ├── api.go
│   ├── database.go
│   ├── message-queue.go
│   └── my-app.go
└── yaml
    ├── # Please do NOT do this
    ├── api.yaml
    ├── database.yaml
    ├── message-queue.yaml
    └── my-app.yaml

because wants you have the former, you can also do this 😎

├── api.go
├── api_test.go
├── database.go
├── database_test.go
├── message-queue.go
├── message-queue_test.go
├── my-app.go
└── my-app_test.go

and when someone comes to you with a bug, a feature, or a regression you now can dwell on the vast landscape that is modern software engineering in favor of code mutation!

Extra credit 💰

In my experience because programming languages are much more fundamental than proprietary tools learning a programming language like Go makes you a more valuable engineer.

Which is why I suggest, and prefer “software” over “code”.