Use the Spark Action in Oozie

Use the Spark Action in Oozie

Update September 2016: this post is getting replaced by

Hue offers a notebook for Hadoop and Spark, but here are the following steps that will successfully guide you to execute a Spark Action from the Oozie Editor.

Run job in Spark Local Mode

To submit a job locally, Spark Master can be one of the following

  • local: Run Spark locally with one worker thread.
  • local[k]: Run Spark locally with K worker threads.
  • local[*]: Run Spark with as many worker threads as logical cores on your machine.

Insert the Mode as client and provide local/HDFS jar path in Jars/py field. You would also need to specify the App name, Main class to the Jar and arguments (if any) by clicking on the ARGUMENTS+ button.


Note: Spark’s local mode doesn’t run with Kerberos.

Run job on Yarn

To submit a job on Yarn Cluster, you need to change Spark Master to yarn-cluster, Mode to cluster and give the compete HDFS path for the Jar in Jars/py files field.


Similarly, to submit a job on yarn-client, change Spark Master to yarn-clientMode to client, keeping rest of the fields same as above. Jar path can be local or HDFS.



Additional Spark-action properties can be set by clicking the settings button at the top right corner before you submit the job.


Note: If you see the error “Required executor memory (xxxxMB) is above the max threshold…”, please increase ‘yarn.scheduler.maximum-allocation-mb’ in Yarn config and restart Yarn service from CM.

Next version is going to include HUE-2645, that will make the UI simple and more intuitive. As usual feel free to comment on the hue-user list or @gethue!


  1. XiaoBendan 3 years ago

    I have a shell program like this to submit my spark task:
    if [ “$#” -ne 1 ]; then
    echo “Param \”Day\” required!”
    echo “Solve ID for Day[$DAY], begining…”
    echo $DAY

    export SPARK_CLASSPATH=”$HIVE_HOME/conf:$HBASE_HOME/conf/:$HBASE_HOME/hbase-client.jar:$HBASE_HOME/hbase-protocol.jar:$HBASE_HOME/lib/htrace-core.jar:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar”&&
    spark-submit –class “com.gridsum.aud.AudienceIdentitySolverApp” –driver-cores 2 –driver-memory 2G –master yarn-client –executor-memory 15G –conf spark.shuffle.memoryFraction=0.50 –executor-cores 3 –num-executors 14 aud-id-recognize-1.0-SNAPSHOT-jar-with-dependencies.jar solveId.yml $DAY
    But how can I do to transplant this task to the OOZIE with hue??

    • XiaoBendan 3 years ago

      I donot know how to work the export and spark in the same action .

    • Hue Team 3 years ago

      It should work and / or you could try a Shell action with –proxy-user USERNAME do run it as the user you want.

  2. XiaoBendan 3 years ago

    How can I edit or delete my comment?? There is something wrong!!!

  3. Ben 3 years ago

    Does this work in Hue 3.7? CDH 5.4.8? I am having problems with it in yarn-client and yarn-cluster modes.

    • Hue Team 3 years ago

      It was tested in CDH5.5/5.7, Spark Action is Oozie is still experimental from what we saw

      • Ben 3 years ago

        I tested the Spark Action in CDH 5.5.2, and it works. It was just CDH 5.4.8 where it didn’t.

        • Hue Team 3 years ago

          Thanks for reporting!

  4. Vincent 3 years ago

    I tried in CDH 5.5.2, to launch in the workflow a spark program (org.apache.oozie.example.SparkFileCopy). It seems to work, because the spark job is well executed but my workflow doesn’t finish, the status is staying in “suspended” all the time.

    I try in “Dry run” too and I have the same issue.

    Have you any idea ?


    • Hue Team 3 years ago

      Did you look at the Oozie logs? Do you have any YARN worker? Any memory lacking?

  5. Vincent 3 years ago

    Yes you are right, I have this error in oozie logs, but I don’t see why for this oozie-spark-job it doesn’t work, because for others jobs it works fine :

    Caused by: org.apache.hadoop.ipc.RemoteException( Permission denied: user=mapred, access=READ, inode=”/user/history/done_intermediate/vincent.moreno/job_1457994974205_1851-1462378126124-vincent.moreno-oozie%3Alauncher%3AT%3Dspark%3AW%3DSpark%3AA%3Dspark%2Dd909%3AID%3D000-1462378163221-1-0-SUCCEEDED-root.vincent_dot_moreno-1462378132610.jhist”:vincent.moreno:supergroup:-rwxrwx—

  6. Vincent 3 years ago

    I tried to change the permissions on the folder during the job execution (776) and the job is running fine. Have you any idea why the files have not the good permissions ?
    Thanks a lot.

  7. fslan 3 years ago

    Hi All,

    I got an error when I tried to run a Spark Job with Oozie in HUE interface. The error is below, I have been searching this error on internet but haven’t found any useful information yet. Any of you has this error or knows how to solve it? If so, please help. Thanks.


    >>> Invoking Spark class now >>>

    Intercepting System.exit(1)

    <<< Invocation of Main class completed <<<

    Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [1]

    Oozie Launcher failed, finishing Hadoop job gracefully

    Oozie Launcher, uploading action data to HDFS sequence file: hdfs://localhost:8020/user/training/oozie-oozi/0000021-160531235012261-oozie-oozi-W/spark-c3a2–spark/action-data.seq

    Oozie Launcher ends

    • Hue Team 3 years ago

      In the Job Browser, if you look at the looks of the Oozie launcher, what do you see?

  8. yashwanth 3 years ago

    I uploaded the jar file in HDFS and tried to run it. But, I got the exception below

    Warning: Local jar /user/yxr6907/sparkhbase-0.0.1-SNAPSHOT.jar does not exist, skipping.

    I also tried copying the same jar into my local folder and giving the local path. But, I got the same error.

    Am I doing anything wrong here?

  9. Abhishek Gupta 3 years ago

    Hi…i am able to run spark job using command line using below command.
    spark-submit \
    –master yarn-client \
    –class com.vw.hy.classname \
    –properties-file /etc/path/spark.conf \
    –files /etc/path/ \
    –conf “spark.executor.extraJavaOptions=-Dconfig.resource=application.conf” \
    –conf “” \
    –driver-java-options -Dconfig.file=/etc/path/application.conf \
    when i create oozie workflow, where do i put these options like conf, driver-java-options and files…i couldnt find where to put that in HUE either.

  10. Miles Y. 3 years ago

    Looks like CDH 5.7 / Hue 3.9 is yet to implement the following (mainly around log4j config):

    * Separate file upload list in UI for –files and –properties-file in particular for and app-specific config file – this seems to be added in your latest implementation?

    * Spark Action also doesn’t seem to support overriding Spark properties like command line: [–conf “” –conf “” ] . Adding to UI Properties field () seems to have no effect, nor adding the above verbatim to “Options list” field () – the former elicits message “Warning: Ignoring non-spark config property” in driver stderr.


  11. Thomas 3 years ago


    I’m running with a spark job on Yarn Cluster. The workflow starts but the spark jobs stays on the “ACCEPTED” status with “The application might not be running yet or there is no Node Manager or Container available”. This is the only thing running on the EMR cluster, no log files for the spark job and no errors in the workflow logs.

    • Hue Team 3 years ago

      Do you have at least one YARN node manager up?

      • Thomas 3 years ago

        Yes, I’ve got a little bit further now. It went to the running state but now It fails at 5% with the error java.lang.IllegalArgumentException: Error while instantiating ‘org.apache.spark.sql.internal.SessionState’. I did some testing and It seems it gives the error the moment I use something from spark.sql. Anyone had this issue before? I’ve ran this job as a step on an emr and with a spark-submit no problem

        • Saurab 2 years ago

          I am facing the same issue. Is your issue resolve with spark-sql?

  12. Ashish Kumar Singh 2 years ago


    I want to invoke the spark scala kafka consumer through oozie spark action
    spark-submit –class org.sabre.consumer.PSSConsumer \
    –master yarn-cluster \
    –driver-memory 4G \
    –executor-memory 3G \
    Consumer-0.0.1-SNAPSHOT.jar bqrhlc130:9092,bqrhlc140:9092 air_sell_segment_pss /data/airsell/segment/landing/pss

    Where first parameter is list of brokers “bqrhlc130:9092,bqrhlc140:9092”
    Second parameter is name of topic “air_sell_segment_pss”
    Third parameter is landing directory “/data/airsell/segment/landing/pss”

    please help me out where how to put the first,second and third parameter in workflow.
    i am using below

    this is my work flow.xml


    sparkopts=–driver-memory 1G –executor-memory 1G –num-executors 3 –conf
    spark.eventLog.dir=${nameNode}/user/spark/applicationHistory –conf
    spark.yarn.historyServer.address=http:${nameNode}:18088 –conf spark.eventLog.enabled=true

  13. naveen 2 years ago


    We have around 50 sources.We have one spark jar file created to execute the process for all the sources for different frequencies.Each time jar file accepts one source and one frequency.

    We have to take the source name,frequeny information from HBASE table.

    We would like to schedule the job using HUE dash board.How to pass those values dynamically in HUE dashboard by creating the single worklfow.Kindly help.

  14. maxiulin 2 years ago

    hello,hue team, please tell me Can I use spark2 in oozie by hue?

  15. naveen 2 years ago

    hi i am not getting oozie workflow in my project.

  16. Small Wong 1 year ago

    how to run job in spark2 with oozie?

Leave a reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.