Use the computing power of ACS in ACK Pro clusters - Container Compute Service

Container Compute Service (ACS) is integrated into Container Service for Kubernetes. This allows you to use the computing power of ACS in ACK Pro clusters. This topic describes how to use the computing power of ACS in ACK Pro clusters.

How to use the computing power of ACS in ACK Pro clusters

Container Compute Service (ACS) is a cloud computing service that provides container compute resources that comply with the container specifications of Kubernetes. ACS adopts a layered architecture to implement Kubernetes control and computing power. The compute resources layer schedules and allocates resources to pods. The Kubernetes control layer manages workloads, such as Deployments, Services, StatefulSets, and CronJobs.

The computing power of ACS can be implemented in Kubernetes clusters by using virtual nodes. This way, Kubernetes clusters are empowered with high elasticity and are no longer limited by the computing capacity of cluster nodes. After you use ACS to take over infrastructure management for pods, the Kubernetes cluster no longer needs to schedule or launch individual pods. In addition, the Kubernetes cluster no longer needs to be concerned about the resources of underlying VMs. ACS can meet the resource requirements of pods at any time.

Container Service for Kubernetes (ACK) is one of the first services to participate in the Certified Kubernetes Conformance Program in the world. ACK provides high-performance containerized application management service. ACK is integrated with the virtualization, storage, network, and security capabilities provided by Alibaba Cloud. ACK simplifies your cluster setup and scaling and allows you to focus on containerized application development and management.

Before you can create ACS pods in an ACK Pro cluster, you must deploy virtual nodes in the cluster. If you need to scale out your ACK Pro cluster, you can create ACS pods on virtual nodes, without the need to plan the resource capacities of the virtual nodes. ACS pods can communicate with pods on physical nodes in the cluster. We recommend that you deploy long-lived applications whose workloads periodically fluctuate on virtual nodes. This improves resource utilization, reduces resource costs, and accelerates the scaling process. When the workload of your application decreases, you can remove pods from virtual nodes to reduce resource costs. Pods on virtual nodes run in a secure and isolated environment that is built on top of ACS. In this case, a pod is referred to as an ACS pod. For more information, see ACK cluster overview.

Prerequisites

To use the computing power of ACS in ACK Pro clusters, you must first activate the required cloud services and grant the required permissions.
- Activate Container Service for Kubernetes, assign default roles to ACS, and activate other required cloud services. For more information, see Create an ACK managed cluster.
- Log on to the ACS console. Follow the on-screen instructions to activate ACS.
An ACK Pro cluster that runs Kubernetes 1.26 or later is created. For more information, see Create an ACK managed cluster. For more information about how to update a cluster, see Manually upgrade ACK clusters.

You must install a specific version of the ACK Virtual Node component based on the Kubernetes version of your ACK Pro cluster. The following table describes the version mapping details.
Kubernetes version
ACK Virtual Node version
≥ 1.26
≥ v2.13.0

Install ACK Virtual Node to implement the computing power of ACS

The computing power of ACS can be implemented in ACK clusters by using virtual nodes. This way, Kubernetes clusters are empowered with high elasticity and are no longer limited by the computing capacity of cluster nodes. The following section describes how to transfer files by using SFTP.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the one you want to manage and click its name. In the left-side navigation pane, click Add-ons.
On the Core Components tab, select ACK Virtual Node and click Install to install the component or click Update to update the component to the required version.
If the console prompts you to activate and grant permissions to ACS when you install ACK Virtual Node, follow the on-screen instructions to activate and grant permissions to ACS. Click OK.
After you install the component, choose Nodes > Nodes in the left-side navigation pane of the cluster details page. By default, the names of virtual nodes are prefixed with virtual-kubelet-.

Example

After you install the required version of ACK Virtual Node or update the component to the required version as described in the Prerequisites section, you can create ACS pods and elastic container instances.

Note

When you schedule pods to virtual nodes, if you do not specify the compute class of the pods, elastic container instances are prioritized for pod scheduling by default.

To implement the computing power of ACS in an ACK cluster, perform the following steps:

Configure node selectors, affinity and anti-affinity rules, ResourcePolicies, and the alibabacloud.com/acs: "true" label to schedule pods to virtual nodes. For more information, see Node affinity scheduling.
Note
The alibabacloud.com/acs: "true" label does not apply to ACK Serverless clusters. It applies to the following clusters: ACK managed clusters, ACK dedicated clusters, ACK One registered clusters, and ACK Edge clusters.
When you create an ACS pod, add the alibabacloud.com/compute-class:Compute class label to the pod to specify the compute class of the pod. For more information about the compute classes of ACS pods, see ACS pod overview.

The following section describes how to transfer files by using SFTP.

Create a Deployment.

Important

If you schedule pods to virtual nodes by using the alibabacloud.com/acs: "true" pod label, the WaitForFirstConsumer StorageClass is not supported. Therefore, when you use ACS pods that are mounted with disks in ACK clusters, use the nodeSelector or create a ResourcePolicy to schedule pods to virtual nodes. For more information about configuring a ResourcePolicy, see ACK Pro clusters support colocated scheduling of ECS instances and ACS computing power.

NodeSelector

Run the following command to query the labels of a virtual node. Replace virtual-kubelet-cn-hangzhou-k in the following command with the actual virtual node name.

kubectl get node virtual-kubelet-cn-hangzhou-k -oyaml

The following expected output displays only the content related to labels:

apiVersion: v1
kind: Node
metadata:
  labels:
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: virtual-kubelet-cn-hangzhou-k
    kubernetes.io/os: linux
    kubernetes.io/role: agent
    service.alibabacloud.com/exclude-node: "true"
    topology.diskplugin.csi.alibabacloud.com/zone: cn-hangzhou-k
    topology.kubernetes.io/region: cn-hangzhou
    topology.kubernetes.io/zone: cn-hangzhou-k
    type: virtual-kubelet # Each virtual node has this label. If you want to schedule a pod to a virtual node, you can configure this label as the node selector of the pod. 
  name: virtual-kubelet-cn-hangzhou-k
spec:
  taints:
  - effect: NoSchedule
    key: virtual-kubelet.io/provider
    value: alibabacloud

Create a YAML file named nginx.yaml based on the following content to provision two pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      name: nginx
      labels:
        app: nginx 
        alibabacloud.com/compute-class: general-purpose # The compute class of the ACS pod. Default value: general-purpose.
        alibabacloud.com/compute-qos: default # The QoS class of the ACS pod. Default value: default.
    spec:
      nodeSelector:
        type: virtual-kubelet # The node selector used to select a virtual node.
      tolerations:
      - key: "virtual-kubelet.io/provider" # The toleration used to tolerate virtual nodes. 
        operator: "Exists"
        effect: "NoSchedule"
      containers:
      - name: nginx
        image: registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
        resources:
          limits:
            cpu: 2
          requests:
            cpu: 2

Deploy an NGINX application and query the pods.

Run the following command to deploy an NGINX application:
```
kubectl apply -f nginx.yaml 
```

Run the following command to check whether the NGINX application is deployed:

kubectl get pods -o wide

Expected results:

NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE                            NOMINATED NODE   READINESS GATES
nginx-9cdf7bbf9-s****   1/1     Running   0          36s   10.0.6.68        virtual-kubelet-cn-hangzhou-j   <none>           <none>
nginx-9cdf7bbf9-v****   1/1     Running   0          36s   10.0.6.67        virtual-kubelet-cn-hangzhou-k   <none>           <none>

The result shows that the two pods are deployed on nodes that have the type=virtual-kubelet label, which is specified by the nodeSelector parameter in the Deployment configurations.

Schedule pods based on pod labels

Create a file named nginx.yaml and copy the following content to the file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx 
        alibabacloud.com/acs: "true" # Use the compute power of ACS.
        alibabacloud.com/compute-class: general-purpose # The compute class of the ACS pod. Default value: general-purpose.
        alibabacloud.com/compute-qos: default # The QoS class of the ACS pod. Default value: default.
    spec:
      containers:
      - name: nginx
        image: registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
        resources:
          limits:
            cpu: 2
          requests:
            cpu: 2

Deploy an NGINX application and query the pods.

Run the following command to deploy an NGINX application:
```
kubectl apply -f nginx.yaml 
```

Run the following command to check whether the NGINX application is deployed:

kubectl get pods -o wide

Expected results:

NAME                    READY   STATUS    RESTARTS   AGE   IP               NODE                            NOMINATED NODE   READINESS GATES
nginx-9cdf7bbf9-s****   1/1     Running   0          36s   10.0.6.68        virtual-kubelet-cn-hangzhou-j   <none>           <none>
nginx-9cdf7bbf9-v****   1/1     Running   0          36s   10.0.6.67        virtual-kubelet-cn-hangzhou-k   <none>           <none>

The result shows that the two pods are deployed on nodes that have the type=virtual-kubelet label, which is specified by the nodeSelector parameter in the Deployment configurations.

Check whether ACS pods are created for the NGINX applications.

Run the following command to query the details of a pod created for the NGINX application:

kubectl describe pod nginx-9cdf7bbf9-s****

The following expected output displays only the key information:

Annotations:      ProviderCreate: done
                  alibabacloud.com/client-token: edf29202-54ac-438e-9626-a1ca007xxxxx
                  alibabacloud.com/instance-id: acs-2ze008giupcyaqbxxxxx
                  alibabacloud.com/pod-ephemeral-storage: 30Gi
                  alibabacloud.com/pod-use-spec: 2-4Gi
                  alibabacloud.com/request-id: A0EF3BF3-37E7-5A07-AC2D-68A0CFCxxxxx
                  alibabacloud.com/schedule-result: finished
                  alibabacloud.com/user-id: 14889995898xxxxx
                  kubernetes.io/pod-stream-port: 10250
                  kubernetes.io/preferred-scheduling-node: virtual-kubelet-cn-hangzhou-j/1
                  kubernetes.io/resource-type: serverless

The output shows that the configurations of the pod include the alibabacloud.com/instance-id: acs-2ze008giupcyaqbxxxxx annotation, which indicates that the pod is an ACS pod.

Example

The procedure for using ACS GPU compute power is similar to that for using ACS CPU compute power. However, you also need to ensure that the scheduling components meet the version requirements and add some additional configurations.

Configure the component

You must install a specific version of the kube-scheduler component based on the Kubernetes version of your ACK Pro cluster. The following table describes the version mapping details.

Kubernetes version

Scheduler version

≥ 1.26

Scheduler versions for different Kubernetes versions:

Scheduler versions for Kubernetes 1.31: v1.31.0-aliyun.6.8.4.8f585f26 and later.
Scheduler versions for Kubernetes 1.30: v1.30.3-aliyun.6.8.4.946f90e8 and later.
Scheduler versions for Kubernetes 1.28: v1.28.12-aliyun-6.8.4.b27c0009 and later.
Scheduler versions for Kubernetes 1.26: v1.26.3-aliyun-6.8.4.4b180111 and later.

How to activate

The feature of using ACS GPU compute power in ACK clusters is invitational preview. To use this feature, submit a ticket.

How to use this feature

...     
     labels:
        # Add labels to request ACS GPU resources.
        alibabacloud.com/compute-class: gpu     #Set to gpu if GPU compute power is used.
        alibabacloud.com/compute-qos: default   #The QoS class, which is the same as regular ACS compute power.
        alibabacloud.com/gpu-model-series: GN8IS  # The GPU model. Specify the actual model that you use.
...

Note

For more information about the relationship between ACS compute classes and QoS classes, see Relationship between compute classes and QoS classes.
For more information about the supported GPU models for gpu-model-series, see Specify GPU models and driver versions for ACS GPU-accelerated pods.
The alibabacloud.com/acs: "true" label does not apply to ACK Serverless clusters. It applies to the following clusters: ACK managed clusters, ACK dedicated clusters, ACK One registered clusters, and ACK Edge clusters.

The following section shows three examples on using GPU compute power.

NodeSelector

Create a GPU-accelerated workload based on the following content.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dep-node-selector-demo
  labels:
    app: node-selector-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: node-selector-demo
  template:
    metadata:
      labels:
        app: node-selector-demo
        # The ACS attributes.
        alibabacloud.com/compute-class: gpu
        alibabacloud.com/compute-qos: default
        alibabacloud.com/gpu-model-series: example-model  # The GPU model. Specify the actual model that you use, such as T4.
    spec:
      # The specified label.
      nodeSelector:
        type: virtual-kubelet
      # The taint to be tolerated.
      tolerations:
      - key: "virtual-kubelet.io/provider" # The toleration used to tolerate virtual nodes.
        operator: "Exists"
        effect: "NoSchedule"
      containers:
      - name: node-selector-demo
        image: registry-cn-hangzhou.ack.aliyuncs.com/acs/stress:v1.0.4
        command:
        - "sleep"
        - "1000h"
        resources:
          limits:
            cpu: 1
            memory: 1Gi
            nvidia.com/gpu: "1"
          requests:
            cpu: 1
            memory: 1Gi
            nvidia.com/gpu: "1"

ResourcePolicy

Create a GPU-accelerated workload based on the following content.

apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
metadata:
  name: dep-rp-demo
  namespace: default
spec:
  selector:
    app: dep-rp-demo
  units:
  - resource: acs
    podLabels:
      alibabacloud.com/compute-class: gpu
      alibabacloud.com/compute-qos: default
      alibabacloud.com/gpu-model-series: example-model  # The GPU model. Specify the actual model that you use, such as T4.
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dep-rp-demo
  labels:
    app: dep-rp-demo
  annotations:
    resourcePolicy: "dep-rp-demo"  # The name of the ResourcePolicy.
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dep-rp-demo
  template:
    metadata:
      labels:
        app: dep-rp-demo
    spec:
      containers:
      - name: demo
        image: registry-cn-hangzhou.ack.aliyuncs.com/acs/stress:v1.0.4
        command:
        - "sleep"
        - "1000h"
        resources:
          limits:
            cpu: 1
            memory: 1Gi
            nvidia.com/gpu: "1"
          requests:
            cpu: 1
            memory: 1Gi
            nvidia.com/gpu: "1"

For more information about how to use ResourcePolicies to schedule resources, see Resource scheduling based on custom priorities.

Schedule pods based on pod labels

Create a GPU-accelerated workload based on the following content.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dep-node-selector-demo
  labels:
    app: node-selector-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: node-selector-demo
  template:
    metadata:
      labels:
        app: node-selector-demo
        # The ACS attributes.
        alibabacloud.com/acs: "true" # Use the compute power of ACS.
        alibabacloud.com/compute-class: gpu
        alibabacloud.com/compute-qos: default
        alibabacloud.com/gpu-model-series: example-model  # The GPU model. Specify the actual model that you use, such as T4.
    spec:
      containers:
      - name: node-selector-demo
        image: registry-cn-hangzhou.ack.aliyuncs.com/acs/stress:v1.0.4
        command:
        - "sleep"
        - "1000h"
        resources:
          limits:
            cpu: 1
            memory: 1Gi
            nvidia.com/gpu: "1"
          requests:
            cpu: 1
            memory: 1Gi
            nvidia.com/gpu: "1"

Run the following command to query the status of the GPU-accelerated workload:

kubectl get pod node-selector-demo-9cdf7bbf9-s**** -oyaml

The following expected output displays only the key information:

    phase: Running

    resources:
      limits:
        #other resources
        nvidia.com/gpu: "1"
      requests:
        #other resources
        nvidia.com/gpu: "1"

Examples on using ACS GPU-HPN pods in ACK

You can use ACS GPU-HPN pods in the same way as using ACS CPU-accelerated pods. Make sure that the following requirements are met:

You can use ACS GPU-HPN pods only in ACK managed clusters, ACK One registered clusters, and ACK One Kubernetes clusters for distributed Argo workflows.
You need to first purchase GPU-HPN capacity reservations and associated them with your clusters.
You need to update and configure the kube-scheduler and ACK Virtual Node components. The component versions are in invitational preview, submit a ticket to update to these versions.

Procedure

...     
labels:
  # Add labels to request ACS GPU resources.
  alibabacloud.com/compute-class: gpu-hpn     #Set to gpu-hpn.
  alibabacloud.com/compute-qos: default   #The QoS class.
...

Note

For more information about compute classes and QoS classes, see Mappings between compute classes and computing power QoS classes.
For more information about other ACS pod parameters, see ACS Pod.

You can configure the Kubernetes NodeSelector to schedule pods to GPU-HPN nodes.

Important

When you configure ACS GPU-HPN pods, take note of the following parameters:

alibabacloud.com/compute-class: gpu-hpn: Specify the compute class.
alibabacloud.com/node-type: reserved: Specify reserved nodes.
Configure requests and limits based on the actual model.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dep-node-selector-demo
  labels:
    app: node-selector-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: node-selector-demo
  template:
    metadata:
      labels:
        app: node-selector-demo
        # ACS attributes.
        alibabacloud.com/compute-class: gpu-hpn
        alibabacloud.com/compute-qos: default
    spec:
      # Specify GPU-HPN reserved nodes.
      nodeSelector:
        alibabacloud.com/node-type: reserved
      containers:
      - name: node-selector-demo
        image: registry-cn-hangzhou.ack.aliyuncs.com/acs/stress:v1.0.4
        command:
        - "sleep"
        - "1000h"
        resources:
          limits:
            cpu: 1
            memory: 1Gi
            nvidia.com/gpu: "1" # Specify the resource name based on the actual model.
          requests:
            cpu: 1
            memory: 1Gi
            nvidia.com/gpu: "1" # Specify the resource name based on the actual model.

Query the GPU loads.

kubectl get pod node-selector-demo-9cdf7bbf9-s**** -oyaml

Expected output (key information):

    phase: Running

    resources:
      limits:
        #other resources
        nvidia.com/gpu: "1"
      requests:
        #other resources
        nvidia.com/gpu: "1"

Kubernetes version	ACK Virtual Node version
≥ 1.26	≥ v2.13.0