Team Isolation with Multiple NodePools¶
In enterprise environments, it's common to have multiple teams sharing a single Kubernetes cluster, or maybe even just a single team hosting very different workloads with their own requirements in the same cluster. In this module, we'll explore how to use Karpenter's NodePools to create isolated configurations for different requirements while maintaining efficient resource utilization.
Use Cases for Multiple NodePools¶
Multiple NodePools provide flexibility in your cluster configuration. Some common use cases include:
- Separate capacity for different teams: Configure teams on the same cluster to run on completely separate node capacity.
- Diverse operating systems: One team could run on nodes using AzureLinux, while another uses Ubuntu.
- Billing isolation: Use separate NodePools to track infrastructure costs by team or department.
- Different node constraints: Restrict certain teams from using specialized hardware (such as GPUs) or limit them to specific VM types.
- Custom deprovisioning settings: Apply different scaling behaviors based on workload patterns.
Note
It is recommended to create nodepools that are mutually exclusive. Pods should not match multiple nodepools. If multiple nodepools are matched, Karpenter will use the node pool with the highest weight (e.g., weight: 10 in the nodepool spec).
Note
In the approach discussed in this module, isolation is done at the node level, each team having its own nodes. Depending on your own requirements, it is also entirely possible that you do not need that level of isolation and all your workloads can run on the same nodes. In this case, a single nodepool might even be sufficient and you can refer to what we have seen in earlier modules directly.
You might still need some additional nodepools for specific workloads even run by a given team so it is still useful to understand the interactions of multiple NodePool in a Karpenter-enabled cluster.
Prerequisites¶
Before beginning, ensure you have:
- A running AKS cluster with Karpenter/NAP enabled
- The workshop namespace created
Exercise 1: Enforce team identification¶
Let's start with a simple usecase to ensure that each deployment declares a team keyword in their resources nodeSelector, and nodes hosting team-specific workload should repel non-team workload.
Step 1: Create a team nodepool¶
The key configurations in the team nodepool are:
-
The
microsoft.com/teamtoExistsaddition to the requirements array, which tells Karpenter that this Nodepool will only deploy nodes for workloads which present amicrosoft.com/teamkey with any value in theirnodeSelector. The value specified will be added to the newly created nodes. -
The nodepool also applies taints to the nodes with the
microsoft.com/teamkey, which ensures that only pods with matching tolerations can be scheduled on these nodes. This provides two-way protection - the nodeSelector ensures team pods go to the right nodes, while taints ensure non-team pods stay away from those nodes.
Together, these configurations ensure that pods coming later and presenting a different value for their team will not be scheduled on the existing nodes, and nodes created for specific teams will only accept workloads from those teams.
Step 2: Deploy Team-Specific Workloads¶
Now, let's verify this by deploying workloads for each team that will run on their dedicated nodes, as well as a workload that tries to deploy with the same team-demo nodeSelector but without a toleration:
The team workload deployments (data and research) contain two crucial elements for proper isolation:
-
nodeSelector with team label: This directs pods to nodes with a matching team label (similarly to what we have done in all modules so far with
aks-karpenter: demoselector but this time this is enforced ) -
Matching toleration: This allows pods to run on nodes tainted for their specific team
The no-team deployment should fail to schedule because:
- It has a nodeSelector targeting team nodes (
aks-karpenter: demo-team) - But it lacks the required toleration to overcome the team-specific taint
This demonstrates how the combination of nodeSelectors and taints/tolerations creates a complete isolation system:
- nodeSelectors ensure pods land on the right nodes
- Taints prevent pods without proper tolerations from scheduling on team-specific nodes
Step 3: Verify Team Isolation¶
Check that each workload is running on their team's dedicated nodes. You should see each of these pods on a specific node:
Looking at all pods you deployed which all use the app:inflate-team label, you should see the one that did not include team information remains indefinitely in pending state:
You can also see the team-specific taints, team and instance type of each node:
Using AKS Node Viewer this time with a different node selector as the nodepool in this exercise uses a different label aks-karpenter=demo-team, you should however see nodes with the same naming pattern as they come from the same nodepool.
Step 4: Cleanup¶
Exercise 2: Implement Team-Specific Provisioning Strategies¶
Different teams often have different infrastructure needs and cost sensitivities and might be required to use specific types of nodes. Let's implement team-specific provisioning strategies and this time ensure that each team has its own NodePool.
Step 1: Create a NodePool for each Team¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
Note the key changes here relevant to team scheduling:
- The requirement section requiring
microsoft.com/keynow explicitely mentions it requires a team-relevant value (team-data or team-research depending on the nodepool) - The attached taint now also requires a specific team-relevant value
Besides those, from a workload perspective the team-data nodepool provisions arm64/spot instances whereas the team-research nodepool provisions amd64/on-demand instances. A usecase for this could be the data team running java-based stateless microservices which can be easily stopped and started while the research team hosting long running batch jobs that should not be interrupted.
Note
In a production context, the research team would be both using on-demand nodes (to avoid involuntary evictions due to spot reclaims) and the do-not-disrupt annotation that was demonstrated in module 2 for this kind of workload.
Step 2: Deploy Team-Specific Workloads¶
Let's deploy workloads to verify it works as expected.
Step 3: Verify each team is now running on different types of nodes.¶
Let's wait until nodes are correctly created:
Now let's verify the pods are running on different types of nodes based on their team assignment. This command will show each pod, the node it's running on, and details about that node's instance type and architecture:
You should see that:
- Data team pod is running on arm64 spot instances
- Research team pod is running on amd64 on-demand instances
You can also verify with AKS Node Viewer:
Step 4: Cleanup¶
Best Practices for Team Management with Karpenter¶
-
Use bidirectional protection with nodeSelectors and taints: Implement both mechanisms to ensure pods go to appropriate nodes and nodes reject inappropriate pods.
-
Define team-specific provisioning strategies: Customize infrastructure based on each team's unique requirements.
-
Use explicit team labeling: Standardize identification of team resources for clarity and management.
-
Create mutually exclusive NodePools: Avoid conflicts when Karpenter selects which NodePool to use.
-
Set appropriate resource limits per team: Control costs and capacity allocation on a team-by-team basis.
-
Monitor team resource utilization: Track usage patterns to optimize costs and performance.
Conclusion¶
In this module, you've learned how to use Karpenter NodePools to create isolated environments for different teams while maintaining efficient resource utilization. Key takeaways include:
- Creating team-specific NodePools with nodeSelectors and taints for bidirectional isolation
- Configuring different infrastructure strategies for different teams' requirements
Karpenter allows you to provide customized infrastructure experiences for multiple colocated groups while still benefiting from efficiency and automation. This approach enables cost optimization, security isolation, and tailored infrastructure on a per-team basis.
