Kuberay and Model Deployment

Vendor Information

Kuberay is open-source and anyscale owns the open-source product. Anyscale, is an unified compute platform that makes it easy to develop, deploy, and manage scalable AI and Python applications using Ray.

License Information

The Kuberay application itself is open-source.

Business Purpose

The Kuberay platform is used for machine learning model deployment.

Usage Information

The Kuberay platform is intended for model deployment with a limited group of people. The platform was intially tested for deploying DL model. We tested LLM as well, it is able to deploy but have issues on handle huge traffic. Users must login in through okta validation.

:warning: Depoly LLMs is expensive. TODO cost optimization

Deployment Model

This application is deployed under ace-test eks cluster and the Kuberay is open source provided by Anyscale.

  1. All the deployments are managed by us, same as other EKS applications, okta + ALB
  2. The EKS deployment configurations (image, GPU, CPU, etc) are managed by Helm chart
  3. User groups are managed by Okta.
  4. One master and three replica redis pods are deployed in the same namespace to make sure the Fault tolerance.

High Level Design

The first diagram explains the overall archtecture for Kuberay. kuberay_access_pattern.png In side the ace-test/kuberay namespace: kuberay_namespace.png

4 kind of pods:

  1. Kuberay operator: control ray cluster deployment
  2. Ray Head: schedule worker pods
  3. Ray Worker: running Ray applications
  4. Redis pods: ensure fault tolerance

Components

AWS Components

  1. Helm Chart
  2. Ray Docker Image

Github Components

Check out the repository

  1. Helm chart, ingress
  2. Python Ray Application

Support Information

The ML Engineering team is the owner of the project in ACE:

How to report issues: TBD

All infrastructure issues must initiate from ML Engineering team.

Additional Information

  1. Pod restarts, DL models are fine but LLM takes a long time to recover.