How-to: Import a Pre-existing Oozie Workflow into Hue

Published on 20 January 2013 in - 5 minutes read - Last modified on 19 April 2021

Hue is an open-source web interface for Apache Hadoop packaged with CDH that focuses on improving the overall experience for the average user. The Apache Oozie application in Hue provides an easy-to-use interface to build workflows and coordinators. Basic management of workflows and coordinators is available through the dashboards with operations such as killing, suspending, or resuming a job.

Prior to Hue 2.2 (included in CDH 4.2), there was no way to manage workflows within Hue that were created outside of Hue. As of Hue 2.2, importing a pre-existing Oozie workflow by its XML definition is now possible.

How to import a workflow

Importing a workflow is pretty straightforward. All it requires is the workflow definition file and access to the Oozie application in Hue. Follow these steps to import a workflow:

  1. Go to Oozie Editor/Dashboard > Workflows and click the “Import” button.
  2. Provide at minimum a name and workflow definition file. 

  * Click “Save”. This will redirect you to the workflow builder with a message in blue near the top stating “Workflow imported”.[<img alt="" src="" width="600" height="212" />][6]</ol>

## How It Works

The definition file describes a workflow well enough for Hue to infer its structure. It also provides the majority of the attributes associated with a node, with the exception of some resource references. Resource reference handling is detailed in the following paragraphs.

A workflow is imported into Hue by uploading the XML definition. Its nodes are transformed into Django serialized objects, and then grok’d by Hue:

[<img alt="" src="" width="600" height="88" />][7]

**Workflow transformation pipeline (Without hierarchy resolution)**

**Workflow Definitions Transformation**

Workflow definitions have a general form, which make them easy to transform. There are several kinds of nodes, all of which have a unique representation. There are patterns that simplify the task of transforming the definition XML:

<pre class="code">&lt;?xml version="1.0" encoding="UTF-8"?&gt;

<workflow-app xmlns="uri:oozie:workflow:0.4” name="fs-test”> <start to="Fs” /> <action name="Fs”> <fs> <delete path="${nameNode}${output}/testfs” /> <mkdir path="${nameNode}${output}/testfs” /> <mkdir path="${nameNode}${output}/testfs/source” /> <move source="${nameNode}${output}/testfs/source” target="${nameNode}${output}/testfs/renamed” /> <chmod path="${nameNode}${output}/testfs/renamed” permissions="700” dir-files="false” /> <touchz path="${nameNode}${output}/testfs/new_file” /> </fs> <ok to="end” /> <error to="kill” /> </action> <kill name="kill”> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end” /> </workflow-app>

Nodes are children of the root element `workflow-app`. Every node has a unique representation varying in at least their name. Every action is defined by the element `action` with a unique name. Its immediate children consist of the action type and links. The children of the node type tag are various properties associated with the action. The `start`,`end`, `fork`, `decision`, `join`, and `kill` nodes have their own transformation, where actions are transformed using a general Extensible Stylesheet Language Transformation, or [XSLT][8].

The different attributes are generally not unique to an action. For instance, the Hive action and Sqoop action both have the `prepare` attribute. Hue provides an XSLT for every action type, but only to import non-unique attributes and to define transformations for unique attributes. In the XSLT below, the sqoop action is defined by importing all of the general fields and defining any Sqoop-specific fields:

<pre class="code">&lt;?xml version="1.0" encoding="UTF-8"?&gt;

<xsl:stylesheet xmlns:xsl="” xmlns:workflow="uri:oozie:workflow:0.1” xmlns:sqoop="uri:oozie:sqoop-action:0.2” version="1.0” exclude-result-prefixes="workflow sqoop”> <xsl:import href=”../nodes/fields/archives.xslt” /> <xsl:import href=”../nodes/fields/files.xslt” /> <xsl:import href=”../nodes/fields/job_properties.xslt” /> <xsl:import href=”../nodes/fields/job_xml.xslt” /> <xsl:import href=”../nodes/fields/params.xslt” /> <xsl:import href=”../nodes/fields/prepares.xslt” /> <xsl:template match="sqoop:sqoop”> <object model="oozie.sqoop” pk="0”> <xsl:call-template name="archives” /> <xsl:call-template name="files” /> <xsl:call-template name="job_properties” /> <xsl:call-template name="job_xml” /> <xsl:call-template name="params” /> <xsl:call-template name="prepares” /> <field name="script_path” type="CharField”> <xsl:value-of select=”*[local-name()='command’]” /> </field> </object> </xsl:template> <xsl:output method="xml” version="1.0” encoding="UTF-8” indent="yes” /> </xsl:stylesheet>

The above XSLT imports transformation definitions for the archives, files, job properties, job XML, params, and prepares attributes. If a Sqoop action XML definition were to be transformed by the above XSLT, the resulting XML would take on the following form:

<pre class="code">&lt;object model="oozie.sqoop" pk="0"&gt;

<field name="archives” type="TextField”>…</field> <field name="files” type="TextField”>…</field> <field name="job_properties” type="TextField”>…</field> <field name="job_xml” type="TextField”>…</field> <field name="params” type="TextField”>…</field> <field name="prepares” type="TextField”>…</field> <field name="script_path” type="CharField”>…</field> </object>

**Workflow Structure Resolution**

The structure of the workflow is created after the nodes are imported. Internally, the workflow hierarchy is represented as a set of “links” between nodes. The workflow definition contains references to next nodes in the graph through the tags `ok`, `error`, and `start`. These references are used to create transitions. The following code snippet illustrates a transition that goes to a node called `end` and an error transition that goes to a node named `kill`:

<pre class="code">&lt;ok to="end" /&gt;

<error to="kill” />

Workflow definitions do not have resources, such as a jar file used when running a MapReduce action. Hue intentionally leaves this information out when performing the transformation because it is not in the workflow definition. This forces users to update any resource-specific information within actions.

[<img alt="" src="" width="600" height="684" />][9]

**An imported workflow. Note that its resource information is missing.**

## Summary and Next Steps

Hue can manage workflows with its dynamic [workflow builder][10] and now, officially, can import predefined workflows into its system. Another benefit of parsing the XML definition is it enables all workflows to be displayed as a graph in the dashboard:

[<img alt="" src="" width="600" height="307" />][11]

**Dashboard graph of an imported workflow**

The workflow import process is good, but not perfect yet. Ideally, as detailed above, resources will be found on the system and validated before being imported or resources should be optionally [provided][12].

Have any suggestions? Feel free to tell us what you think via [hue-user][13].

comments powered by Disqus

More recent stories

03 May 2023
Discover the power of Apache Ozone using the Hue File Browser
Read More
23 January 2023
Hue 4.11 and its new dialects and features are out!
Read More