All Products
Search
Document Center

Container Service for Kubernetes:Ray on ACK

Last Updated:Mar 26, 2025

Ray is an open source framework designed to manage, execute, and optimize AI workloads. Ray uses a unified and flexible framework to orchestrate infrastructure resources for various AI workloads, including data processing, model training, and model serving. Ray provides a simple API that allows developers to write code for parallel processing and distributed computing workloads in an efficient manner, without the need to worry about the complex configurations of underlying infrastructure. Ray supports a variety of programming paradigms, including parallel processing, the actor model, and distributed object storage. In addition, you can leverage Ray to build extensible AI and Python applications. Ray is widely used in the machine learning industry.

Introduction to Ray

Ray is an open-source unified framework for scaling AI and Python applications. It provides an API to simplify distributed computing to help you efficiently develop parallel processing and distributed Python applications. Ray is widely adopted in the machine learning sector. The unified computing framework of Ray consists of the following layers: Ray AI libraries, Ray Core, and Ray clusters. For more information about Ray, see Ray.

image.svg

Introduction to KubeRay

KubeRay is an open source Kubernetes operator that simplifies the deployment and management of Ray applications on Kubernetes. KubeRay provides a declarative Kubernetes API that is specialized for running Ray clusters on Kubernetes and provides the RayCluster, RayJob, and RayService custom resources. The resources help you run various workloads on Kubernetes in a convenient manner.

Ray on ACK

Container Service for Kubernetes (ACK) is one of the first services to participate in the Certified Kubernetes Conformance Program in the world. ACK provides high-performance containerized application management services and supports lifecycle management for enterprise-class containerized applications. You can use KubeRay to create Ray clusters in ACK clusters in the same way you create ACK clusters in the cloud.

  • Your Ray cluster can work with Simple Log Service, Managed Service for Prometheus, and Tair (Redis OSS-compatible) to improve log management, observability, and availability.

  • You can use the Ray autoscaler and the ACK autoscaler together to scale computing resources on demand.

image.png

For more information about how to install the KubeRay operator, see Install Kuberay-Operator.

Kuberay-Operator

To quickly deploy and manage Ray clusters, we recommend that you install KubeRay in your ACK cluster from the Add-ons page of the ACK console. ACK provides the Kuberay-Operator component that is developed based on the open source KubeRay operator. KubeRay allows your Ray clusters to leverage the capabilities of ACK, such as scheduling, elastic quotas, and priority-based resource scheduling. In addition, you can integrate your Ray clusters with Alibaba Cloud services, such as Simple Log Service, Managed Service for Prometheus, and Object Storage Service (OSS).

After you install Kuberay-Operator in your ACK cluster from the Add-ons page of the ACK console, ACK automatically installs and manages Kuberay-Operator. In addition, ACK creates the RayCluster, RayJob, and RayService resources on the data plane of the cluster.

image.png

Custom resources

  • RayCluster

    You can create a RayCluster to build a Ray cluster on pods in ACK clusters. A Ray cluster consists of a head pod and several worker pods. For more information about the RayCluster custom resource, RayCluster Configuration.

    image.png

  • RayJob

    A RayJob (in K8sJobMode mode) manages a RayCluster and a Kubernetes batch job. The RayCluster is used to build a Ray cluster on Kubernetes pods to provide computing resources. The Kubernetes batch job runs the ray job submit command to submit a Ray job to the RayCluster. For more information about the RayJob custom resource, see RayJob Configuration.

    image.png

  • RayService

    A RayService manages a RayCluster and Ray Serve applications. The RayCluster is used to build a Ray cluster on Kubernetes pods to provide computing resources. The Ray Serve applications are deployed in the Ray cluster for model deployment and inference.

Shared responsibilities for Ray on ACK

When you use KubeRay to run Ray workloads in ACK clusters, you must follow the principle of shared responsibility. The following content describes the shared responsibility model Ray on ACK:

Responsibilities of Alibaba Cloud

Kuberay-Operator is managed by ACK. ACK provides security protection for Kuberay-Operator:

  • ACK ensures that images used by Kuberay-Operator comply with security hardening standards to prevent potential vulnerabilities.

  • ACK ensures the stability and availability of Kuberay-Operator.

  • ACK maintains the Kuberay-Operator versions to ensure availability.

  • ACK enables the management of the RayCluster, RayJob, and RayService custom resources for Kuberay-Operator.

Responsibilities of customers

When you use the RayCluster, RayJob, and RayService custom resources to deploy and manage Ray clusters and applications in ACK clusters, you are responsible for the security protection and configuration updates of Ray applications.

  • You must follow the best practices for Ray cluster protection.

  • You must update and maintain the container images used to deploy Ray head pods and worker pods.

  • You must update and maintain the Ray versions of Ray head pods and worker pods.

  • You must properly configure the resource requirements of Ray clusters, including the requirements for CPU, GPU, and memory resources.

  • You must monitor the status of Ray applications and ensure the availability of Ray applications.

For more information, see Shared responsibility model.