Run LeaderWorkerSet
This page shows how to leverage Kueue’s scheduling and resource management capabilities when running LeaderWorkerSet.
We demonstrate how to support scheduling LeaderWorkerSets where a group of Pods constitutes a unit of admission represented by a Workload. This allows to scale-up and down LeaderWorkerSets group by group.
This integration is based on the Plain Pod Group integration.
This guide is for serving users that have a basic understanding of Kueue. For more information, see Kueue’s overview.
Before you begin
-
The
leaderworkerset.x-k8s.io/leaderworkersetintegration is enabled by default. -
For Kueue v0.15 and earlier, learn how to install Kueue with a custom manager configuration and ensure that you have the
leaderworkerset.x-k8s.io/leaderworkersetintegration enabled, for example:apiVersion: config.kueue.x-k8s.io/v1beta2 kind: Configuration integrations: frameworks: - "leaderworkerset.x-k8s.io/leaderworkerset"Pod integration requirements
Since Kueue v0.15, you don’t need to explicitly enable
"pod"integration to use the"leaderworkerset.x-k8s.io/leaderworkerset"integration.For Kueue v0.14 and earlier,
"pod"integration must be explicitly enabled.See Run Plain Pods for configuration details.
-
Check Administer cluster quotas for details on the initial Kueue setup.
Running a LeaderWorkerSet admitted by Kueue
When running a LeaderWorkerSet on Kueue, take into consideration the following aspects:
a. Queue selection
The target local queue should be specified in the metadata.labels section of the LeaderWorkerSet configuration.
metadata:
labels:
kueue.x-k8s.io/queue-name: user-queue
b. Configure the resource needs
The resource needs of the workload can be configured in the spec.template.spec.containers.
spec:
leaderWorkerTemplate:
leaderTemplate:
spec:
containers:
- resources:
requests:
cpu: "100m"
workerTemplate:
spec:
containers:
- resources:
requests:
cpu: "100m"
c. Scaling
You can perform scale up or scale down operations on a LeaderWorkerSet .spec.replicas.
The unit of scaling is a LWS group. By changing the number of replicas in the LWS you can create
or delete entire groups of Pods. As a result of scale up the newly created group of Pods is
suspended by a scheduling gate, until the corresponding Workload is admitted.
Example
Here is a sample LeaderWorkerSet:
apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
name: nginx-leaderworkerset
labels:
app: nginx
kueue.x-k8s.io/queue-name: user-queue
spec:
replicas: 2
leaderWorkerTemplate:
leaderTemplate:
spec:
containers:
- name: nginx-leader
image: registry.k8s.io/nginx-slim:0.27
resources:
requests:
cpu: "100m"
ports:
- containerPort: 80
size: 3
workerTemplate:
spec:
containers:
- name: nginx-worker
image: registry.k8s.io/nginx-slim:0.27
resources:
requests:
cpu: "200m"
ports:
- containerPort: 80You can create the LeaderWorkerSet using the following command:
kubectl create -f sample-leaderworkerset.yaml
Configure Topology Aware Scheduling
For performance-sensitive workloads like large-scale inference or distributed training, you may require the Leader and Worker pods to be co-located within a specific network topology domain (e.g., a rack or a data center block) to minimize latency.
Kueue supports Topology Aware Scheduling (TAS) for LeaderWorkerSet by reading annotations from the Pod templates. To enable this:
- Configure the cluster for Topology Aware Scheduling.
- Add the
kueue.x-k8s.io/podset-required-topologyannotation to both theleaderTemplateand theworkerTemplate. - Add the
kueue.x-k8s.io/podset-group-nameannotation to both theleaderTemplateand theworkerTemplatewith the same value. This ensures that the Leader and Workers are scheduled in the same topology domain.
Example: Rack-Level Co-location
The following example uses the podset-group-name annotation to ensure that the Leader and all Workers are scheduled within the same rack (represented by the cloud.provider.com/topology-rack label).
apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
name: nginx-leaderworkerset
labels:
app: nginx
kueue.x-k8s.io/queue-name: tas-user-queue
spec:
replicas: 2
leaderWorkerTemplate:
leaderTemplate:
metadata:
annotations:
# Require leader to be in the topology domain
kueue.x-k8s.io/podset-required-topology: "cloud.provider.com/topology-rack"
# Identify the group to ensure co-location with workers
kueue.x-k8s.io/podset-group-name: "lws-group"
spec:
containers:
- name: nginx-leader
image: registry.k8s.io/nginx-slim:0.27
resources:
requests:
cpu: "100m"
ports:
- containerPort: 80
size: 3
workerTemplate:
metadata:
annotations:
# Require workers to be in the same topology domain
kueue.x-k8s.io/podset-required-topology: "cloud.provider.com/topology-rack"
# Identify the group to ensure co-location with leader
kueue.x-k8s.io/podset-group-name: "lws-group"
spec:
containers:
- name: nginx-worker
image: registry.k8s.io/nginx-slim:0.27
resources:
requests:
cpu: "200m"
nvidia.com/gpu: "1"
ports:
- containerPort: 80When replicas is greater than 1 (as in the example above where replicas: 2), the topology constraints apply to each replica individually. This means that for each replica, the Leader and its Workers will be co-located in the same topology domain (e.g., rack), but different replicas may be assigned to different topology domains.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.