Optimize Kubernetes Performance: Distribute CPUs Across NUMA Nodes
Is your Kubernetes application underperforming? Bottlenecks can arise from uneven CPU allocation across NUMA (Non-Uniform Memory Access) nodes. Learn how the distribute-cpus-across-numa
CPUManager policy option can help boost performance.
What is distribute-cpus-across-numa
and Why Should You Care?
Kubernetes, by default, tends to pack CPUs onto a single NUMA node until it's full. This can lead to performance issues, especially for parallel applications that rely on synchronized operations.
- The Problem: Imagine a scenario where one worker in a parallel process is consistently slower because it has fewer CPUs available on its NUMA node.
- The Solution: The
distribute-cpus-across-numa
policy option ensures that CPU allocations are evenly spread across NUMA nodes. It allows application developers to create environments where no single worker suffers from NUMA effects more than another.
Who Benefits from Even CPU Distribution?
This feature is particularly valuable for:
- High-performance computing (HPC) applications
- Applications using parallel algorithms with barrier synchronization
- Workloads where consistent CPU access across NUMA nodes is critical
How distribute-cpus-across-numa
Works
The distribute-cpus-across-numa
policy option is implemented within the static CPUManager policy in Kubernetes. When enabled, it uses an algorithm to try the following:
- Even Distribution: The CPU Manager seeks to evenly split the CPU requirements of your container across all available NUMA nodes.
- Best Effort Allocation: If a perfectly even split isn't possible, the remaining CPUs are assigned to maintain balance across NUMA nodes as much as possible.
Enabling the distribute-cpus-across-numa
Policy Option
To enable the distribute-cpus-across-numa
feature, you'll need to:
- Enable the
CPUManagerPolicyBetaOptions
feature gate on your kubelet. - Set the CPUManager policy to
static
. - Include
distribute-cpus-across-numa
in the list of CPUManager policy options in your kubelet configuration.
Important: Enabling or disabling this feature requires a kubelet restart.
Verifying the CPU Distribution
Once the policy option is enabled, you can verify its effectiveness by:
- Deploying a pod with a nodeSelector targeting a node with multiple NUMA nodes.
- Requesting exclusive CPUs for the container.
- Verifying that the allocated CPUs are evenly distributed across the NUMA nodes. You can do this by examining the container's CPU affinity or checking the
cpu_manager_numa_allocation_spread
metric.
Upgrade and Downgrade Considerations
- This feature is opt-in, so upgrades and downgrades should not impact running workloads.
- Existing workloads will continue to run uninterrupted, with any future workloads having their CPUs allocated according to the policy in place.
Monitoring CPU Distribution Across NUMA Nodes
Kubernetes provides metrics to help you monitor the distribution of CPUs across NUMA nodes. Key metrics include:
cpu_manager_numa_allocation_spread
: Shows how CPUs are distributed across NUMA nodes. Look for a more even distribution when the policy is enabled.
Potential Caveats
- Node Availability: Effective CPU distribution requires sufficient CPU resources available on multiple NUMA nodes.
- Existing Workloads: Pre-existing CPU allocations might affect the ability to achieve a perfectly balanced distribution.
Conclusion
The distribute-cpus-across-numa
policy option offers a powerful way to optimize Kubernetes performance for NUMA-aware applications. By distributing CPUs evenly, you can minimize bottlenecks and improve the overall efficiency of your workloads. If you're running parallel applications in Kubernetes, consider exploring this feature.