How to Submit Spark jobs with Spark on YARN and Oozie

Published on 23 August 2016 in - 1 minute read - Last modified on 06 March 2021

How to run Spark jobs with Spark on YARN? This often requires trial and error in order to make it work.

Hue is leveraging Apache Oozie to submit the jobs. It focuses on the yarn-client mode, as Oozie is already running the spark-summit command in a MapReduce2 task in the cluster. You can read more about the Spark modes here.

Here is how to get started successfully:


Simple script with no dependency.

Script with a dependency on another script (e.g. hello imports hello2).

For more complex dependencies, like Panda, have a look at this documentation.


Jars (Java or Scala)

Add the jars as File dependency and specify the name of the main jar:

Another solution is to put your jars in the ‘lib’ directory in the workspace (‘Folder’ icon on the top right of the editor).


The latest Hue is improving the user experience and will provide an even simpler solution in Hue 4.

If you have any questions, feel free to comment here or on the hue-user list or @gethue!

comments powered by Disqus

More recent stories

30 June 2021
Azure Storage sharing by leveraging SAS tokens so that your users don’t need credentials
Read More
10 June 2021
Hue 4.10 and its new SQL Editor component, REST API, small File Importer and Slack App are out!
Read More
29 May 2021
Build your own SQL Editor (BYOE) in 5 minutes via Sql Scratchpad component and public REST API.
Read More