How to Submit Spark jobs with Spark on YARN and Oozie

Published on 23 August 2016 in - 1 minute read - Last modified on 06 March 2021

How to run Spark jobs with Spark on YARN? This often requires trial and error in order to make it work.

Hue is leveraging Apache Oozie to submit the jobs. It focuses on the yarn-client mode, as Oozie is already running the spark-summit command in a MapReduce2 task in the cluster. You can read more about the Spark modes here.

Here is how to get started successfully:

PySpark

Simple script with no dependency.

Script with a dependency on another script (e.g. hello imports hello2).

For more complex dependencies, like Panda, have a look at this documentation.

Jars (Java or Scala)

Add the jars as File dependency and specify the name of the main jar:

Another solution is to put your jars in the ‘lib’ directory in the workspace (‘Folder’ icon on the top right of the editor).

The latest Hue is improving the user experience and will provide an even simpler solution in Hue 4.

If you have any questions, feel free to comment here or on the hue-user list or @gethue!

Share on Facebook Share on Twitter

How to Submit Spark jobs with Spark on YARN and Oozie

PySpark

Jars (Java or Scala)

More recent stories

Integrating Trino Editor in Hue: Supporting Data Mesh and SQL Federation

Discover the power of Apache Ozone using the Hue File Browser

Hue community 2023