Use the Spark Action in Oozie

Published on 04 January 2016 in - 2 minutes read - Last modified on 06 March 2021

Update September 2016: this post is getting replaced by https://gethue.com/how-to-schedule-spark-jobs-with-spark-on-yarn-and-oozie/

Hue offers a notebook for Hadoop and Spark, but here are the following steps that will successfully guide you to execute a Spark Action from the Oozie Editor.

Run job in Spark Local Mode

To submit a job locally, Spark Master can be one of the following

  • local: Run Spark locally with one worker thread.
  • local[k]: Run Spark locally with K worker threads.
  • local[*]: Run Spark with as many worker threads as logical cores on your machine.

Insert the Mode as client and provide local/HDFS jar path in Jars/py field. You would also need to specify the App name, Main class to the Jar and arguments (if any) by clicking on the ARGUMENTS+ button.

**Note: **Spark's local mode doesn't run with Kerberos.

Run job on Yarn

To submit a job on Yarn Cluster, you need to change Spark Master to yarn-cluster, Mode to cluster and give the compete HDFS path for the Jar in Jars/py files field.

Similarly, to submit a job on yarn-client, change Spark Master to yarn-clientMode to client, keeping rest of the fields same as above. Jar path can be local or HDFS.

 

Additional Spark-action properties can be set by clicking the settings button at the top right corner before you submit the job.

**Note: **If you see the error “Required executor memory (xxxxMB) is above the max threshold…", please increase ‘yarn.scheduler.maximum-allocation-mb’ in Yarn config and restart Yarn service from CM.

Next version is going to include HUE-2645, that will make the UI simple and more intuitive. As usual feel free to comment on the hue-user list or @gethue!


comments powered by Disqus

More recent stories

03 May 2023
Discover the power of Apache Ozone using the Hue File Browser
Read More
23 January 2023
Hue 4.11 and its new dialects and features are out!
Read More