Introducing Kubeflow: A Kubernetes-Based ML Stack

--- title: " Introducing Kubeflow - A Composable, Portable, Scalable ML Stack Built for Kubernetes " date: 2017-12-21 slug: introducing-kubeflow-composable url: /blog/2017/12/Introducing-Kubeflow-Composable author: > Jeremy Lewi (Google), David Aronchick (Google) --- ## Kubernetes and Machine Learning Kubernetes has quickly become the hybrid solution for deploying complicated workloads anywhere. While it started with just stateless services, customers have begun to move complex workloads to the platform, taking advantage of rich APIs, reliability and performance provided by Kubernetes. One of the fastest growing use cases is to use Kubernetes as the deployment platform of choice for machine learning. Building any production-ready machine learning system involves various components, often mixing vendors and hand-rolled solutions. Connecting and managing these services for even moderately sophisticated setups introduces huge barriers of complexity in adopting machine learning. Infrastructure engineers will often spend a significant amount of time manually tweaking deployments and hand rolling solutions before a single model can be tested. Worse, these deployments are so tied to the clusters they have been deployed to that these stacks are immobile, meaning that moving a model from a laptop to a highly scalable cloud cluster is effectively impossible without significant re-architecture. All these differences add up to wasted effort and create opportunities to introduce bugs at each transition. ## Introducing Kubeflow To address these concerns, we’re announcing the creation of the Kubeflow project, a new open source GitHub repo dedicated to making using ML stacks on Kubernetes easy, fast and extensible. This repository contains: - JupyterHub to create & manage interactive Jupyter notebooks - A Tensorflow [Custom Resource](/docs/concepts/api-extension/custom-resources/) (CRD) that can be configured to use CPUs or GPUs, and adjusted to the size of a cluster with a single setting - A TF Serving container Because this solution relies on Kubernetes, it runs wherever Kubernetes runs. Just spin up a cluster and go! ## Using Kubeflow Let's suppose you are working with two different Kubernetes clusters: a local [minikube](https://github.com/kubernetes/minikube) cluster; and a [GKE cluster with GPUs](https://docs.google.com/forms/d/1JNnoUe1_3xZvAogAi16DwH6AjF2eu08ggED24OGO7Xc/viewform?edit_requested=true); and that you have two [kubectl contexts](/docs/tasks/access-application-cluster/configure-access-multiple-clusters/#define-clusters-users-and-contexts) defined named minikube and gke. First we need to initialize our [ksonnet](https://github.com/ksonnet) application and install the Kubeflow packages. (To use ksonnet, you must first install it on your operating system - the instructions for doing so are [here](https://github.com/ksonnet/ksonnet)) ``` ks init my-kubeflow cd my-kubeflow ks registry add kubeflow \ github.com/google/kubeflow/tree/master/kubeflow ks pkg install kubeflow/core ks pkg install kubeflow/tf-serving ks pkg install kubeflow/tf-job ks generate core kubeflow-core --name=kubeflow-core ``` We can now define [environments](https://ksonnet.io/docs/concepts#environment) corresponding to our two clusters. ``` kubectl config use-context minikube ks env add minikube kubectl config use-context gke ks env add gke ``` And we’re done! Now just create the environments on your cluster. First, on minikube: ``` ks apply minikube -c kubeflow-core ``` And to create it on our multi-node GKE cluster for quicker training: ``` ks apply gke -c kubeflow-core ``` By making it easy to deploy the same rich ML stack everywhere, the drift and rewriting between these environments is kept to a minimum. To access either deployments, you can execute the following command: ``` kubectl port-forward tf-hub-0 8100:8000

This document introduces Kubeflow, an open-source project aimed at simplifying and accelerating the deployment of machine learning (ML) stacks on Kubernetes. It addresses the challenges of building production-ready ML systems, which often involve managing diverse components and vendors. Kubeflow offers a portable and scalable solution that includes JupyterHub, TensorFlow Custom Resource Definitions, and a TF Serving container, enabling users to easily deploy the same ML stack across different environments, such as local minikube clusters and cloud-based GKE clusters with GPUs.