Apache Pig Editor in Hue 2.3

Apache Pig Editor in Hue 2.3

In the previous installment of the demo series about Hue — the open source Web UI that makes Apache Hadoop easier to use — you learned how to analyze data with Hue using Apache Hive via Hue’s Beeswax and Catalog applications. In this installment, we’ll focus on using the new editor for Apache Pig in Hue 2.3.

Complementing the editors for Hive and Cloudera Impala, the Pig editor provides a great starting point for exploration and real-time interaction with Hadoop. This new application lets you edit and run Pig scripts interactively in an editor tailored for a great user experience. Features include:

  • UDFs and parameters (with default value) support
  • Autocompletion of Pig keywords, aliases, and HDFS paths
  • Syntax highlighting
  • One-click script submission
  • Progress, result, and logs display
  • Interactive single-page application

Here’s a short video demoing its capabilities and ease of use:

The demo data is based on the previous Hive and Metastore demo and its cleaned business file.

Here is the Pig script used and explained in this demo. It is loading the Yelp business file that was converted in the previous demo and computing the top-25 most reviewed restaurants:

business =
	LOAD '/user/hive/warehouse/business/yelp_academic_dataset_business_clean.json'
	AS (business_id: CHARARRAY, categories: CHARARRAY, city: CHARARRAY, full_address: CHARARRAY,
    	latitude: FLOAT, longitude: FLOAT, name: CHARARRAY, neighborhoods: CHARARRAY,
    	open: BOOLEAN, review_count: INT, stars: FLOAT, state: CHARARRAY, type: CHARARRAY);

business_group =
  GROUP business
  BY city;

business_by_city =
  FOREACH business_group
  GENERATE group, COUNT(business) AS ct;

top =
	ORDER business_by_city
	BY ct DESC;

top_25 = LIMIT top 25;

DUMP top_25;

What’s Next?

New features like support for Python UDFs and better integration with Apache Oozie and File Browser are on the way. As usual, we welcome all feedback!


  1. Savitri 3 years ago

    WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
    2016-08-09 07:06:23,505 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – job job_1470662351891_0140 has failed! Stop running all dependent jobs
    2016-08-09 07:06:23,505 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 100% complete
    2016-08-09 07:06:23,581 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats – ERROR: Deserialization error: Cannot instantiate: org.apache.pig.piggybank.evaluation.xml.XPathAll
    2016-08-09 07:06:23,581 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil – 1 map reduce job(s) failed!
    2016-08-09 07:06:23,582 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats – Script Statistics:

    I am getting above error while executing pig script from pig editor, hue 3.10.

    Please help!

    • Author
      Hue Team 3 years ago

      You would need to look at the log of the specific MR job spawned by Pig

Leave a reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.