Kubernetes is expensive because most people aren't using it correctly.

To date I have personally worked for 2 companies actively using Kubernetes. Neither (in my opinion) is using it "correctly".

I'm not here to trash talk employers though. I don't think these companies are anomalies. There are certainly reasons for such usage. Not good reasons. But definitely reasons. What is perhaps most bothersome is that potential cost savings are typically high enough to justify hiring a resource specifically to manage these resources.

Putting that aside, how do companies get there? I'm sure that there are other ways, but I think that the biggest is; they migrated from a solution which could not auto-scale. The end result? An intrinsic mistrust of auto-scaling (or a solution which won't support it). I read an article which stated that the average CPU usage across nodes is somewhere between 10-20%. Which means that from a CPU perspective, pods are being over-provisioned by 5-10x and RAM is only a little better.

The article which I read acknowledged that running out of RAM is much more serious than running out of CPU, so you would expect some amount of over-provisioning there. And I would add another culprit to the mix; VM configurability. (I suspect) to help them balance their own utilization, most cloud providers offer very little configurability in terms of RAM and CPU. If you want a certain amount of RAM, there are typically min-max numbers of vCores for that amount of RAM. 

This inflexibility isn't helped by the fact that most configurations in cloud setups are egregious for most scenarios already. What a lot of people fail to realize is that a containerized workload needs FAR less resources than a non-containerized workload. 

For some context on that argument, I'm currently running 40 containers on one PC. All combined and with the base operating system thrown in the mix they use on average less than 2% of my 3700X's CPU and about half of my 32GB of RAM. 

This includes running Gitlab and a runner (which actually account for about 80% of my resource utilization) and, at my load, everything would probably run just fine on the same system with 8-12GB of RAM. Gitlab is just greedy it grows over time, but I don't think it actually needs that much RAM and... companies don't need to run Gitlab as a part of their deployed services. And, if I cut out the OS because I were running this elsewhere, I could probably shave another GB off of that. 

Cut Gitlab, SQL and Nextcloud out of the picture and I could probably run the remaining 30 something containers on a single node with 128-256MB of RAM. 

Yes, my server has INCREDIBLY low usage, but the point is about auto-scaling and 40 containers is significantly larger than what most companies would deploy in a single environment. If, instead of the 3rd party applications, if I was simply deploying, say the code I run for my brothers business and assuming it would run at some load/scale, then I would provision things this way:

Provision server(s) for the database. These servers do need to accommodate the full load with headroom as required by the database server. Databases don't really auto-scale. At least, SQL which I'm currently using doesn't. Next, I would implement caching in something like Redis inside of my application and, likewise, provision the cache server(s) appropriately to handle the volume with headroom.

For absolutely everything else, I would make sure they can scale down to 1 or zero pods and figure out the load at peak, min and average usage. Then, find some number of VMs which divides nicely to fill out those needs.

As an example, if I have a service which at the low point in a day handles 250 requests per second and needs 512MB RAM and 2vCPUs to handle the load, averages 750 requests per second and peaks at say, 2000 requests per second with the RAM and CPU scaling linearly from the min, then I would try and find either a 1vCPU configuration @ 256MB of RAM or a 2vCPU @ 512MB of RAM. With the smaller being the preferred configuration. 

The companies I've worked with, I suspect would rather go with something like a 8vCPU @ 4GB of RAM or more in a scenario like this. This would of course be an insane waste of resources, but would be very much in line with usage averages from major cloud providers as well. 

Oddly though, it might actually perform worse. 

Firstly, it would perform worse at resiliency. Larger nodes means fewer nodes which means fewer nodes to absorb load while recovering from a failure. One hopes a node never fails, you don't bargain on. By under provisioning for your min load you should always have multiple nodes running, so even a failure at low load should be able to pick up at least some slack automatically. And, at higher loads you're more likely to be able to fully handle picking up the slack.

Next is cost performance. While a single higher provisioned node is generally going to be cheaper than multiple smaller nodes adding up to the same resources, with more smaller nodes, you're more likely to spin resources down more often. And, when you need to scale up, you don't need to scale up as big or as long. In my example, my expected minimum number of pods was just 2... partly because my example requirements were so low. Ideally, I would want it at 4 or more. The more pods you have running, the higher you can set the thresholds for your auto-scaling and the more rapidly they can respond to changes in demand thus doing a better job of minimizing costs.

Load performance is final big metric. If I scale up 80% utilization and scale down at, say 40% and I have 1000 requests/sec at the moment then (assuming my earlier example) I would need 2GB of RAM and 8vCPU to handle the load, divided by 80% to keep up with my resource limits then I need 5 of my smaller (512MB-2vCPU) nodes or 2 of my massive (4GB-8vCPU) nodes.

The first thing to notice is, the smaller VM configuration is going to be better utilized. It will be pretty close to that 80% utilization still and the overall resources needed are MUCH lower (2.5GB and 10vCPU vs 8GB and 16vCPU). The bigger VMs will be closer to spinning down again. This is not a good thing in an environment which supports auto-scaling. You don't want to waste your cloud budget on cold starts. 

Having lots of extra headspace is a strategy you use when you need to manually provision servers. Since you cannot react as quickly you need servers running at lower levels of utilization. Thus you want fewer, but larger servers. When you have auto-scaling, you want the opposite. You want to run as hot as possible over as many nodes as possible which means running on smaller configurations.

Going back to the 80% @1000r/s example we can also apply that example to the earlier metrics. On load, after we bring up the new pod we will be hovering at just over 40% usage (assuming it wasn't a massive spike). If the traffic climbed to say 79% and then one server crashed, the remaining server would not be able to handle the load. If that ever happened, the company would likely react by either making the anti-scaling kick in sooner which would be more expensive, or by making an even bigger mistake and scaling up hardware even further.

The smaller VMs are already running 5 VMs with 20% headspace and an 80% max which means that if one failed, it should be using, at most, around 80% of its resources which, incidentally the other 4 would likely be able to cover while a replacement is spins up. That is just lucky math in our scenario, but math which explains why you want more nodes which are smaller than your use case demands than fewer over-provisioned ones. As load scales up you can tolerate more node failures, you'll spend less budget on cold starts, and you'll actually be utilizing more of the resources you're paying for. In our example, the load would need to be 4x higher than before the larger nodes could tolerate such a failure.

This doesn't mean that you should always go as small as humanly possible. I'm simply trying to make the point that most companies err on the side of treating Kubernetes more like traditional VMs rather than like an auto-scaling solution. That mindset is one of the things which makes the cloud so expensive (though, admittedly, not the only thing). Your cloud provider is going to bill you the same if the resources are maxed on active your pods as they would if they were idling. From their perspective, you've got them allocated and they can't lease them elsewhere, so they need to bill you as though you are using them.

Comments

Popular Posts