Karpenter revolutionizes Kubernetes scaling with its dynamic, workload-based compute adjustment, offering superior efficiency and cost savings.
Karpenter revolutionizes Kubernetes scaling with its dynamic, workload-based compute adjustment, offering superior efficiency and cost savings.
This is part 1 of a 2 part series about Kubernetes autoscaler Karpenter. We’ll discuss its benefits, limitations, our experience, and its potential to increase performance & cost efficiency. Check out our blogs to learn more about similar topics.
Introduction to Karpenter
Karpenter is an open-source Kubernetes autoscaler that dynamically adjusts the cluster’s compute capacity based on the workload requirements. It aims to provide a more flexible and efficient solution for managing the scaling needs of Kubernetes clusters.
Before Karpenter, Kubernetes users primarily relied on Amazon EC2 Auto Scaling groups and the Kubernetes Cluster Autoscaler (CAS) to adjust their clusters’ compute capacity dynamically. However, this approach comes with several limitations. Over the years of managing many different Kubernetes environments for our clients, we’ve come across many of them.
Karpenter offers a more flexible and diverse approach without having to create multiple node groups. It is also not tightly coupled to Kubernetes versions, making it more versatile than CAS. Karpenter simplifies node provision based on workload requirements, and it can create diverse node configurations using flexible NodePool options. It also improves pod scheduling at scale by quickly launching nodes and scheduling pods.
CAS Limitations
Cluster autoscaler comes with many limitations, such as dependency on node groups and slower node provisioning times. There are also challenges in mixed architecture environments and various AWS constraints.
The limitations of CAS are the result of a direct coupling with AWS Autoscaling Groups. The typical use case for an AWS Autoscaling group is to manage a homogeneous group of instances. This can be a problem when workloads require a diverse set of node configurations.
These limitations can take various forms – such as if you want to mix On-Demand and Spot capacity and prioritise them based on availability. Achieving this with CAS is possible. However, requires the usage of multiple isolated node groups and several configuration caveats – such as Priority expanders1.
Another good example is how CAS manages allocations of Persistent Volume Claims (PVCs) backed by EBS in AWS. When you intend to run stateful applications in Kubernetes, you’ll likely want to utilise a backing storage for persistence. This is typically done by attaching and provisioning an EBS volume to the underlying EC2 instance – allowing the Pod to use it for persistent data storage. This action is performed by the Amazon EBS CSI Driver.
Key Limitation
The problem occurs when you want to move the pod to another node, in cases such as cluster rebalancing, spot interruptions, and other events. This is because the EBS volumes are zonal bound and can only be attached to EC2 instances within the zone they were originally provisioned in.
This is a key limitation that CAS is not able to take into an account when provisioning a new node. Again, due to the fact that it relies solely on the AWS Autoscaling group to handle the node provisioning. Autoscaling groups do not allow you to choose a zone. They have their own ways of zone balancing, which are not easily suppressed. This often leads to CAS spawning a new node in a zone other than the one with the EBS volume, leaving it there, and removing because of emptiness some time after. This cycle then repeats up until the correct node is coincidentally spawned.
There are many more limitations of CAS, including its performance and inability to choose a specific zone for node creation. Additionally, it cannot balance the number of nodes across zones or provision nodes based on pod priority.
Karpenter approach
To address the issues of CAS, Karpenter uses a different approach. Karpenter directly interacts with the EC2 Fleet API to manage EC2 instances, bypassing the need for autoscaling groups. This allows Karpenter to provision nodes quickly and adaptively, based on actual workload requirements. It’s capable of scheduling pods on a variety of instance types and architectures, optimizing resource usage and cost-efficiency. Additionally, Karpenter considers the availability zone of EBS volumes when scheduling pods, avoiding the issue of orphaned nodes.
The scheduling algorithm that Karpenter uses differs from CAS in many aspects, and can be divided into three layers of constraints.
First Layer of Constraints
Karpenter heavily relies on the bin packaging algorithm to find the optimal distribution of resources across the cluster. When there’s a need for new capacity in the cluster, Karpenter batches the pods and performs bin packaging to find the most suitable node.
Second Layer of Constraints
Is represented by the Kubernetes constraints, such as Resource requests, Node selectors, Node affinities, Topology spread, Pod affinity/anti-affinity, PVC allocations and others. All of these determine the placement of the Pod when scheduling. Therefore, have to be accounted for when picking the right node type, size and placement within AWS.
Third Layer of Constraints
Is represented by Karpenter’s cloud provider constraints. These include instance type availability, pricing models, and availability zones. By taking these into account, Karpenter can choose the most suitable instance type and zone for each pod, optimizing resource utilization and cost-efficiency.
- Karpenter batches pending pods and then bin-packs them based on CPU, memory, and GPUs required. It also considers node overhead, VPC CNI resources required, and daemonsets that will be packed when bringing up a new node. Karpenter recommends the use of C, M, and R >= Gen 3 instance types for most generic workloads, but it can be constrained in the NodePool spec with the instance-type well-known label in the requirements section.
- After the pods are bin-packed on the most efficient instance type (i.e., the smallest instance type that can fit the pod batch), Karpenter selects 59 other instance types. These selected types are larger than the most efficient packing. It then passes all 60 instance type options to an API called Amazon EC2 Fleet.
- The EC2 fleet API attempts to provision the instance type based on the Price Capacity Optimized allocation strategy. For the on-demand capacity type, this is effectively equivalent to the
lowest-price
allocation strategy. For the spot capacity type, Fleet will determine an instance type that has both the lowest price combined with the lowest chance of being interrupted. Note that this may not give you the instance type with the strictly lowest price for spot.
By utilizing this approach, Karpenter is allowing you to expand the capacity of the cluster quicker, more reliable, and potentially cheaper. 2
Is there a catch?
Well, sort of. There are some limitations to consider. As mentioned before, Karpenter is considering several layers of constrains when scheduling a new node. It is designed to follow all of the constrains that you define, and place the node accordingly. However, if you don’t set any, Karpenter will choose from all of the capacity that’s available. It will not implicitly distribute the workload to different AZs, or different node types.
For us, this was quite a surprise, when we originally launched our first workloads on Karpenter. Although it makes total sense if you think about it. When you’re coming from an environment that was using CAS, the distribution of nodes was sort of implicit.
Even if you didn’t set topology spread constraints or affinities, the nodes were still sort of equally balanced across AZs, because it was the Autoscaling group which was handling the scheduling in the background. Same thing applies for OS architectures, if you don’t explicitly set it, Karpenter will choose from both ARM and AMD64 instances, which can be troublesome for some applications.
CAS and Karpenter co-existence
It’s worth mentioning that Cluster Autoscaler and Karpenter can co-exist within the same cluster. This allows you to gradually migrate from CAS to Karpenter or use both simultaneously, depending on the specific workloads and the needs of your applications. This flexibility makes transitioning to Karpenter less risky and allows for a period of testing and adjustment.
However, there are things to be considered when running both solutions within a single environment. If you don’t clearly distinguish which workloads should be handled by which tool, e.g. using node selectors, both solutions will compete to schedule a new node. This introduces a racing condition, in which Karpenter has a clear advantage.
Since Karpenter can schedule nodes quicker, it will most often win this race and provide a new node for the pending workload. CAS will still attempt to create a new node, however will be slower and will most likely have to remove the node after some time, due to emptiness. This brings unnecessary costs to your cloud bill, since you’re running spare capacity without an actual need. Therefore it’s best to set constraints for both tools.
Cost Savings and Client Benefits
As discussed before, Karpenter introduces many performance, reliability and costs benefits.
In the second part of this series, I want to go deeper into our experience of rolling our Karpenter into our customer environments and the savings we were able to achieve.
- https://aws.github.io/aws-eks-best-practices/cluster-autoscaling/#prioritizing-a-node-group-asg ↩︎
- https://karpenter.sh/docs/faq/#how-does-karpenter-dynamically-select-instance-types ↩︎