K8s Is Hard (but worth it)

Adopting Kubernetes in a mature business is difficult. It’s bigger than the shift from physical servers to VMs.

But for most workloads it’s worth it.

This isn’t as organized as I’d like it to be, but I wanted to put some thoughts together. It’s hard to not wander too deep into the tech and organization, but still have enough meat to be meaningful and not just a silly whitepaper.

Maybe I’ll revisit some time.

The Extreme Lenses

Bright Eyed

Kubernetes is the future. It’s scalable and cloud native. I can use all of these features like service mesh to make my applications better.

Grumpy

Containers are just the new hotness in the Matroska doll of computing. We have physical computers and processes in the OS as little computers. Then we added hypervisors running virtual machines as processes. They have processes within them. So we have machines in machines in machines on a machine now. When you add containers, we’ve just added one more abstract machine. If none of the previous solved your problem, how is this one going to?

And Kubernetes is a neat little software defined datacenter, but you get to relearn all datacenter terminology and technology as new Kubernetes words and configuration.

Let’s get real

The grumpy voice is correct in many ways. Containers are somewhat redundant and unnecessary. They just happen to be the dominant paradigm used that provides idempotent systems. It could have been VM images.

But the addition of Kubernetes provides that same advantage over the greater infrastructure surrounding the application(s). When done correctly you can have an idempotent business service.

Back to the Matroska Doll

A caveat with continuing to nest “computers” is that each abstraction is lossy.

Adding an OS with processes means that CPU cycles, I/O, etc. are not guaranteed. You’re in line.

A hypervisor exacerbates this. Now you have CPU and I/O scheduling underneath the OS. If you’ve ever had to troubleshoot issues with CPU READY and WAIT times in VMWare, you’ll know this pain.

Then you have containers. Developers see these as “little VMs.” I frequently have to explain that a container doesn’t have a full Operating System, but just the userspace. They have the same choke points as multiple processes on an OS… because they are. OS resource constraints and performance issues due to context switching can be difficult to spot. A lot of that is due to staff that is used to single purpose VMs vs the multi-tenant physical Unix servers of old.

Containers give you 2 real advantages:

Idempotent deployment 2. Fast start times

You could do #1 with idempotent VM images and #2 by running multiple instances on a single OS. But containers combine those two advantages.

Yes there are things like unikernels that meet those goals. But frankly, we don’t generally have the talent to develop in this manner. Therefore there’s a weak ecosystem around them. Therefore it’s difficult to do. Wash. Rinse. Repeat.

Enter Kubernetes

Kubernetes pushes containerization into something much more useful than the two points above. It lets you define the entire environment for an application in an idempotent manner. It uses containers and software defined networking in a way that makes your deployment not really care what the underlying infrastructure is: physicals, VMs, Cloud services, whatever.

And it does this through a well defined API. In many ways, it’s delivering on the promise of OpenStack, just 1 level of abstraction above. And that makes it apply to Cloud equally as on-prem.

As someone who worked on multi-server parallel computing in the late 90s, it is quite a thing to behold. It creates a proper distributed system around Unix systems in a way I never could have imagined. It addresses not only the supercomputer-like requirements of that era, but resiliency, reliability, and speed of development.

The Double Edged Sword

Kubernetes is not an opinionated framework, however.

There is a rich ecosystem of mature components that can be leveraged. But there are also many shiny toys out there that sound enabling but can be crippling over time.

The worst thing is for an application to depend on Kubernetes. Using config maps, secret stores, etc. are really a non-issue. They can easily be replaced and the app can run on whatever else without changes to the codebase itself.

But if you chase shiny things you can end up locked into requiring k8s for an application, or worse, depending on a particular vendor’s k8s. That can be through application-level integration or through having such a complex configuration, that migrating is a significant effort.

Also Kubernetes is not a Heroku-like PaaS platform. It exposes all the guts and requires you to understand how to wire together a reliable system. It’s very easy to make an unreliable system. It is a platform upon which an opinionated system could be built, and I have seen companies do this through integrations with their development tooling. But we haven’t seen a generalized system emerge out of the community as a dominant platform.

But the most common fault I see is an application that requires every component to reside within a cluster. And a k8s cluster is a single point of failure. Anecdotally, I have seen more outages due to this than any other failure. All it takes is an upgrade gone wrong or a Public Cloud region to forget how routing works.

Operationalization

The other issue is standing up the ecosystem around Kubernetes. It’s the boring stuff no one likes to think about until they need it.

Time Series metrics - Log aggregation - APM - Backup and recovery
Consistent CI/CD integration - Secure secret storage - Appropriate RBAC - Consistent deployment mechanisms across teams - Roles and responsibilities - Quickly identifying the correct contact for a problem - Regulatory and general security compliance - Lifecycle management of infrastructure - Container scanning, both running and at rest

The transition from physical servers to virtual machines left these things largely unchanged and evolving independently over time. But in a Kubernetes world, almost all are impacted.

But that is not terrible. In more cases, that organic evolution has led to a mess. This gives the opportunity to architect operations vs inherit it. And it forces the issue of culture change.

And we’re at the maturity point where all of this can be done, even for a large enterprise with 100s of development teams.

Caveats

There are workloads that are just not a good fit for Kubernetes. High I/O data services at massive scale are just silly. I know, people wrap these in k8s sometimes, pin single pods to nodes, etc. But consuming these As A Service or running them in pure virtualization with normal infrastructure around them is a much better fit.

A lot of that comes back to operations. They take high levels of expertise to run themselves. Rarely do you find that skillset plus the skillset of k8s expertise. Adding the complexity of k8s for the sake of k8s adds little value.

Conclusion(ish)

I’ll be honest. Kubernetes was not ready for adoption by large companies with many development teams and tens of years of IT history for a long time. Mostly that was due to the operationalization concerns.

For the past couple of years it has been… for the most part. You know, after the point it stopped being super cool.

It does take more human resources than it ideally would to support a fleet of 100s of clusters. But I think we’re on the cusp of that changing. With the adoption of ClusterAPI and the extension of public cloud into private facilities, we’re approaching this.