Season II: 5. Bundle Oozie coordinators with Hue

Season II: 5. Bundle Oozie coordinators with Hue

Hue provides a great Oozie UI in order to use Oozie without typing any XML. In Tutorial 3, we demonstrate how to use an Oozie coordinator for scheduling a daily top 10 of restaurants. Now lets imagine that we also want to compute a top 10 and 100. How can we do this? One solution is to use Oozie bundles.

 

 

Workflow and Coordinator updates

Bundles are are way to group coordinators together into a set. This set is easier to manage as a unique instance and can be parameterized too.

 

The first step is to replace 10 by a variable ${n} in our Hive script:

CREATE TABLE top_cool AS
SELECT r.business_id, name, SUM(cool) AS coolness, '${date}' as `date`
FROM review r JOIN business b
ON (r.business_id = b.business_id)
WHERE categories LIKE '%Restaurants%'
AND `date` = '${date}'
GROUP BY r.business_id, name
ORDER BY coolness DESC
LIMIT ${n}

Then, in the workflow, we add a parameter in the Hive action: n=${n}. You can test the workflow by submitting it and providing 10 for the value n.

 

We now need to tell the Coordinator to fill-up with a value. For testing purpose, going to Step #5 of the editor and adding a ‘Workflow properties’ named ‘n’ and with value ‘10’ would produce the same result as in Tutorial 1. In practice these properties are mostly used for entering constants and EL functions that will directly provide a value to the workflow.

 

Bundle Editor

Lets create a new Bundle named ‘daily_tops’ with a kickoff date of 20121201. On the left panel, click on ‘Add’ in the Coordinator section. Select our ‘daily_top’ coordinator and a property named ‘n’ and with value ‘10’.

 

Add again the same coordinator and this time pick ‘10’ for the value of ‘n’. Repeat with ‘n’ set to ‘100’.

 

Bundle Dashboard

You are now ready to go and submit the bundle! You can follow the overall progress in the Bundle dashboard. Bundles can be stopped, killed and re-run. Clicking on an instantiation will link to the corresponding coordinator which is also linking to its generated workflows.

 

Sum-up

Of course, more efficient solutions exist than those in our simplified example. In practice Bundles are great for parameterizing non-date variables like market names (e.g. US, France). Another use case it to group together a series of coordinators in order to make them easier to manage (e.g. start, stop, re-run). Notice that the latest version of Hue that contains HUE-1546 was used in the video.

 

Hue comes up with a full set of Workflow/Coordinator/Bundle examples, ready to be submitted or copied. Hue can even be used with only its Oozie UI Dashboard, making it a breeze to manage Oozie in your browser.

Next, we will see how to browse our Yelp data in HBase! As usual feel free to comment on the hue-user list or @gethue!

0 Comments

Leave a reply

Your email address will not be published. Required fields are marked *

*