Running a Spark Application and Future Development

To try this yourself on a Kubernetes cluster, simply download the binaries for the official [Apache Spark 2.3 release][13]. For example, below, we describe running a simple Spark application to compute the mathematical constant Pi across three Spark executors, each running in a separate pod. Please note that this requires a cluster running Kubernetes 1.7 or above, a [kubectl][14] client that is configured to access it, and the necessary [RBAC rules][9] for the default namespace and service account. ``` $ kubectl cluster-info Kubernetes master is running at https://xx.yy.zz.ww $ bin/spark-submit --master k8s://https://xx.yy.zz.ww --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.container.image= --conf spark.kubernetes.driver.pod.name=spark-pi-driver local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar ``` To watch Spark resources that are created on the cluster, you can use the following kubectl command in a separate terminal window. ``` $ kubectl get pods -l 'spark-role in (driver, executor)' -w NAME READY STATUS RESTARTS AGE spark-pi-driver 1/1 Running 0 14s spark-pi-da1968a859653d6bab93f8e6503935f2-exec-1 0/1 Pending 0 0s ``` The results can be streamed during job execution by running: ``` $ kubectl logs -f spark-pi-driver ``` When the application completes, you should see the computed value of Pi in the driver logs. In Spark 2.3, we're starting with support for Spark applications written in Java and Scala with support for resource localization from a variety of data sources including HTTP, GCS, HDFS, and more. We have also paid close attention to failure and recovery semantics for Spark executors to provide a strong foundation to build upon in the future. Get started with [the open-source documentation][15] today. ### Get Involved There's lots of exciting work to be done in the near future. We're actively working on features such as dynamic resource allocation, in-cluster staging of dependencies, support for PySpark & SparkR, support for Kerberized HDFS clusters, as well as client-mode and popular notebooks' interactive execution environments. For people who fell in love with the Kubernetes way of managing applications declaratively, we've also been working on a [Kubernetes Operator][16] for spark-submit, which allows users to declaratively specify and submit Spark Applications. And we're just getting started! We would love for you to get involved and help us evolve the project further. Huge thanks to the Apache Spark and Kubernetes contributors spread across multiple organizations who spent many hundreds of hours working on this effort. We look forward to seeing more of you contribute to the project and help it evolve further.

The provided code snippets demonstrate how to run a Spark application on a Kubernetes cluster, including checking cluster information and submitting a Spark job. It also shows how to monitor the created Spark resources and stream the driver logs to observe the computed value of Pi. The text mentions the initial support for Java and Scala applications, resource localization, and failure recovery semantics in Spark 2.3. It also discusses ongoing and future development plans, such as dynamic resource allocation, support for PySpark & SparkR, Kerberized HDFS clusters, client-mode, interactive execution environments, and a Kubernetes Operator for spark-submit. The document encourages community involvement and expresses gratitude to the contributors of Apache Spark and Kubernetes.