New Apache Oozie Workflow, Coordinator & Bundle Editors

New Apache Oozie Workflow, Coordinator & Bundle Editors

Oozie is one of the initial major first app in Hue. We are continuously investing in making it better and just did a major jump in its editor (to learn about the improvements in the Dashboard in the other post).

This revamp of the Oozie Editor brings a new look and requires much less knowledge of Oozie! Workflows now support tens of new functionalities and require just a few clicks to be set up!

 

 

The files used in the videos comes with the Oozie Examples.

In the new interface, only the most important properties of an action are asked to be filled, and quick-links for verifying path and other jobs are offered. Hive and Pig script files are parsed in order to extract the parameters and directly propose them with autocomplete. The advanced functionalities of an action are available in a new kind of popup with much less frictions, as it just overlaps with the current node.

 

new-oozie

New Editor

New Editor (edit mode)

New Editor (edit mode)

Old Editor

Old Editor

 

Two new actions have been added:

  • HiveServer2
  • Spark

new-spark-hs2-actions

And the user experience o Pig and Sub-workflows is simplified.

 

Decision node support has been improved, copying an existing action is also now just a way of drag & dropping. Some layout are now possible as the ‘ok’ and ‘end’ nodes can be individually changed.

oozie-avanced-action-options

 

Coordinators have been vastly improved! The notion of Oozie datasets is not needed anymore. The editor pulls the parameters of your workflow and offers 3 types of inputs:

  • parameters: constant or Oozie EL function like time
  • input path: parameterize an input path dependency and wait for it to exist, e.g.
  • output path: like an input path but does not need to exist for starting the job

oozie-new-coordinator

 

The dreaded UTC time zone format is not directly provided either by the calendar or with some helper widgets.

oozie-new-submit-popup

 

Sum-up

In addition to providing a friendlier end user experience, this new architecture opens up for innovations.

First, it makes it easy to add new Oozie actions in the editor. But most importantly, workflows are persisted using the new Hue document model, meaning their import/export is simplified and will be soon available directly from the UI. This model also enables the future generation of your workflows by just drag & dropping saved Hive, Pig, Spark jobs directly in the workflow. No need to manually duplicate your queries on HDFS!

This also opens the door of one click scheduling of any jobs saved in Hue as the coordinators are much simpler to use now. While we are continuing to polish the new editor, the Dashboard section of the app will see a major revamp next!

 

As usual feel free to comment on the hue-user list or @gethue!

 

Note

Old workflows are not automatically convert to the new format. Hue will try to import them for you, and open them in the old editor in case of problems.

oozie-import-try

A new export and / export is planned for Hue 4. It will let you export workflows in both XML / JSON Hue format and import from Hue’s format.

42 Comments

  1. dale 2 years ago

    I have a question regarding SSH…. How do you issue an SSH command that requires a user’s password.
    E.g. User & Host: ssh [email protected]
    Ssh command: ls /home/admin

    There is no prompt for a password therefore the job fails.

  2. Adam 2 years ago

    I’ve set up a workflow that takes an input. Next i’ve set that coordinator to run every five minutes.
    I’ve also passed the workflow parameter as a “input path” with the input as /asdf/${YEAR}file.tsv
    When i submit coordinator i get this error:
    “The frequency of the workflow parameter %s cannot be guessed from the frequency of the coordinator. It so needs to be specified manually.

    None”

    What am I doing wrong here?

    • Hue Team 2 years ago

      Could you specify the frequency of the “input path” in its advanced option? e.g. https://www.dropbox.com/s/lp0uk6724xcafk8/Screenshot%20from%202015-09-21%2011%3A32%3A55.png?dl=0

      Note that using 5 minutes as frequency in Oozie is not recommended as pretty short

      • Vadivel 12 months ago

        I am getting the same error “The frequency of the workflow parameter cannot be guessed from the frequency of the coordinator. It so needs to be specified manually.

        None”

        I am unable to find the shared screenshot . I selected “Same frequency” as that coordinator.

        • Author
          Hue Team 12 months ago

          What frequency did you put on the cron (e.g. 0 0 * * *) of the coordinator?

          • Sandy 2 weeks ago

            Am facing the same problem, can you please help ?

  3. Sonu 2 years ago

    Getting the below error while submitting a spark job. Can you please help me in adding credentials, Thanks
    Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Delegation Token can be issued only with kerberos or web authentication.

    • Hue Team 2 years ago

      Are you using Kerberos on your cluster?
      What Spark job is it? Does it work with spark-submit?

      • Sonu 2 years ago

        Yes, I’m using Kerberos.
        It Works With Spark Submit. Only from hue oozie, i’m getting this error.

        • Sai 2 years ago

          You have to do the following to run a spark action on a kerb cluster

          1. Change Spark Master to ‘yarn-cluster’ and mode to ‘cluster’
          2. Give complete HDFS path (append ${namenode} to path)
          3. Increase ‘yarn.scheduler.maximum-allocation-mb’ (if necessary) in Yarn config and restart yarn

  4. Henry 2 years ago

    In Hue 3.7.0, I tried to parameterize the script executed by a Pig action in the Oozie workflow but got a “Main class [org.apache.oozie.action.hadoop.PigMain], exit code [2]” error.

    Here’s what my Pig script looks like:

    rows = LOAD ‘${input}’;

    And here’s what I set up Params in the Oozie workflow:

    input=${input}

    Any ideas why it didn’t work? Thanks for your answers.

    • Hue Team 2 years ago

      You need to look at the logs on the Oozie workflow and Pig action.

    • Henry 2 years ago

      After modifying the Pig script and Params setting in the Oozie workflow, I’ve solved the problem.

      Here’s what my Pig script looks like:

      rows = LOAD ‘$INPUT’

      and here’s what the Params setting in the Oozie workflow looks like:

      argument -param
      argument INPUT=${input}

  5. Vivek 1 year ago

    I am new to Hadoop. I am trying to create a workflow in HUE using a Pig script. The script works properly and gives output when run from the Query Editor. But when I adds the same script in the Workflow and submits, it shows in Job Browser that a oozie workflow is being created. After that in the Workflow editor window it status as “Kill” and also the output is not getting generated. Can you please help me?
    Thanks in advance!

    • Author
      Hue Team 1 year ago

      What do you see in the logs of the jobs?

      • Vivek 1 year ago

        I am getting below errors in the logs..

        2016-08-03 20:21:56,372 WARN PigActionExecutor:523 – SERVER[quickstart.cloudera] USER[cloudera] GROUP[-] TOKEN[] APP[Pig_EmpFilter_WorkFlow] JOB[0000007-160731144110844-oozie-oozi-W] ACTION[[email protected]] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.PigMain], exit code [2]

        2016-08-03 20:21:57,858 INFO ActionEndXCommand:520 – SERVER[quickstart.cloudera] USER[cloudera] GROUP[-] TOKEN[] APP[Pig_EmpFilter_WorkFlow] JOB[0000007-160731144110844-oozie-oozi-W] ACTION[[email protected]] ERROR is considered as FAILED for SLA

        • Author
          Hue Team 1 year ago

          Could you look in Job Browser for the jobs of the Pig program?

          • Vivek 1 year ago

            I have looked into Job Browser but there is no data for Pig Program. It shows only oozie task data and shows status as Succeeded.

  6. Vivek 1 year ago

    Hey.. I kind of found a workaround.. I re extracted the CDH 5.7 machine’s zip and not facing this issue in the new extracted machine.. But don’t know why it was with the old one..
    Thanks for the response.

  7. Jenny 1 year ago

    One suggestion, could you have a native English speaker to record the video? It’s hard for non-English speaker to understand someone with strong accent even though his voice is nice.

  8. Adam 1 year ago

    Where is the “import action” button like in the old editor. I am trying to import a job from the job designer and I cannot find a way to do this in the new editor. I can only pick workflows from the sub workflow button. The old editor allowed me to import a job from the job designer.

    • Author
      Hue Team 1 year ago

      It will come back officially in Hue 4 with the new Editor and Workflow manager.

  9. Devyani 1 year ago

    is there any video to demonstrate how to set up workflows for pig scripts using Oozie in hue? I have to run pig scripts in series and after which need to run hive scripts in parallel. I could make out having hive scripts but in my case hive scripts use output datasets from pig scripts so trying to find out. I am using CDH 4.7.0

    • Author
      Hue Team 1 year ago

      Not yet, but it will be soon possible to just drag & drop saved Pig scripts into a workflow, similarly to gethue.com/drag-drop-saved-hive-queries-into-your-workflows/

  10. naveen 10 months ago

    Hoe to create a parameterized workflows for different actions.Could you please provide me some examples

    • Author
      Hue Team 10 months ago

      Did you have a look at the ${output} parameter in the video?

  11. yuvakumar r 8 months ago

    Hi, I am unable to create new CO-ORDINATOR, or run existing CO-ORDINATORs from Oozie. It was working till yesterday. Am getting the error “OSError at /oozie/editor/coordinator/edit/ [Errno 28] No space left on device” & “OSError at /oozie/editor/coordinator/new/ [Errno 28] No space left on device” for existing co-ordinator or when trying to create new co-ordinator. Can you please suggest solutions.

    Thanks

    • Author
      Hue Team 8 months ago

      You have no space left on the device where Hue is installed/running from, as the error says.

      • yuvakumar r 8 months ago

        Hi thanks, the hue is installed in one of the node on aws cluster. Am able to run workflows, other hive jobs etc. only am facing issue with co-ordinator? Can you suggest me what I can do to resolve this..

        Thanks

        • Author
          Hue Team 8 months ago

          That’s very odd… Are you sure there is space left on the volume of the node where Hue is installed? Or where the Hue database is installed?

      • yuvakumar r 8 months ago

        Also, can you tell me how to check the total disk space and free space available? I tried df -i command, and it shows 75 % space available? Please find herewith the error trace from hue page:

        OSError at /oozie/editor/coordinator/new/
        [Errno 28] No space left on device
        Request Method: GET
        Request URL: http://xx.xx.xx.xx:8888/oozie/editor/coordinator/new/?workflow=0e0e1fc1-cd1d-30f5-0704-7c64faf6261a
        Django Version: 1.6.10
        Exception Type: OSError
        Exception Value:
        [Errno 28] No space left on device
        Exception Location: /usr/local/hue/build/env/lib/python2.6/site-packages/Mako-0.8.1-py2.6.egg/mako/template.py in _compile_module_file, line 674
        Python Executable: /usr/local/hue/build/env/bin/python2.6
        Python Version: 2.6.6
        Python Path:
        [‘/usr/local/hue/build/env/bin’,
        ‘/usr/local/hue/apps/about/src’,
        ‘/usr/local/hue/apps/beeswax/gen-py’,…….

        Thanks for your help

        • Author
          Hue Team 8 months ago

          You can also do df -h and look for either ‘/’ or ‘/usr’ mounts. Also, another way of trying, is to create a file in /usr/local/hue/ and write something in it, like dd if=/dev/urandom of=file.txt bs=1048576 count=10

  12. yuvakumar r 8 months ago

    Looks like the used % is 100% and has only 5.5Mb, if am correct. Can you please help how or which files that I can remove, with the locations..

    Thank you and appreciate your prompt reply

    Thanks

    • Author
      Hue Team 8 months ago

      Glad you found that out! For which ones to remove, that you have to investigate yourself doing something like https://www.cyberciti.biz/faq/how-do-i-find-the-largest-filesdirectories-on-a-linuxunixbsd-filesystem/ and cleaning up accordingly… and sorry we cannot help you further on this issue, good luck!

      • yuvakumar r 8 months ago

        Hi, thank you very much for your support, I found that there are some files which were in deleted state, but was occupying space in HUE / supervisor, so I manually killed those process using kill -9 , now am able to create / access co-ordinators.

        Commands for deleted files : sudo lsof +L1
        Thanks again

        Yuva

  13. Hemant Jain 8 months ago

    Hello Sir/Mam,

    I have installed hue on my local ubantu system and installed hadoop muti cluster system on two system.

    Hadoop Version : 2.7.3
    Hue Version : 3.12.0
    Ozzie Version : 4.3.0

    I am facing issue when I am running sqoop job process from mysql to import data from hdfs file. I am getting following error.

    Caused by: java.net.NoRouteToHostException: No Route to Host from Developer4/127.0.0.1 to cm:10020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost

    HDFS url hdfs://master:9000

    My /etc/hosts file like

    192.168.1.149 master
    127.0.0.1 developer4
    192.168.1.161 slave

    Please suggest me where I am doing wrong even ozzie command for start and stop command work properly on command line.

  14. Gourav 7 months ago

    I have existing Oozie job zip file.
    1. Workflow.xml (it has two actions one is java action and other is hive, Hive action is expecting output from first action as parameters)
    2. coordinator.xml
    3. process.hql
    4. lib directory (it has jar file).
    Please suggest how can I create a job in Hue console using existing workflow.xml.

    Thanks in advance.

Leave a reply

Your email address will not be published. Required fields are marked *

*