This post is about the new release of Hue, an open source web-based interface that makes Apache Hadoop easier to use, that’s included in CDH4.2.
Hue lets you interact with Hadoop services from within your browser without having to go to a command-line interface. It features a file browser for HDFS, an Apache Oozie Application for creating workflows of data processing jobs, a job designer/browser for MapReduce, Apache Hive and Cloudera Impala query editors, a Shell, and a collection of Hadoop APIs.
The goal of this release was to add a set of new features and improve the user experience. Read on for a list of the major changes (from 304 commits).
Oozie Application
With the Oozie Application you can chain jobs and schedule them repeatedly without having to write XML anymore. Workflow and Coordinator management got extra focus and now matches all the Oozie functionalities:
- The workflow editor supports Drag & Drop and was restyled.
- The coordinator page was redesigned with a wizard and data can be specified by range.
- The dashboard displays any workflow as a graph and refreshes itself dynamically.
- All the Oozie actions are supported (addition of Sub-workflow, DistCp, Email, Fs, Generic).
- Forks can be converted to decision nodes.
- A read-only user can access the dashboard.
- A workflow or a coordinator can be resubmitted from specific steps.
- Existing XML workflow definition can be imported.
- The dashboard provides direct access to task logs of any action.
Drag & Drop Workflow Editor
Workflow Dashboard
Coordinator Wizard
Rerun a Workflow (left) or a Coordinator (right)
Beeswax/Hive Query Editor
A number of user experience improvements make it simpler to query your data with SQL:
- Multiple databases are supported (tackling one of the most popular requests HUE-535).
- Query editor is bigger, has line numbers and shows lines with error(s).
- Running queries shows logs in Ajax and lets you scroll through them.
- Query results page has a horizontal scroll bar and a quick column name lookup for accessing a certain column when they are many.
Query Editor
Wide result page with column lookup
Impala Editor
Impala can now be queried from a new interface. More features will be supported when Impala is GA.
Cloudera Impala query
FileBrowser
FileBrowser lets you navigate and manage HDFS files in a UI. Its front end was totally redesigned and new filesystem operations were added. You do not need to use the hadoop fs
command anymore!
- Bulk operations for multiple deletions, changing of permissions or owner
- Supports bulks operation recursively or not (e.g. chmod recursively a folder)
- Upload archives (e.g. upload multiple files at once like the Oozie sharelib)
- Create a file and edit it
Bulk editing
JobBrowser
JobBrowser lists MapReduce jobs with their statistics and statuses. It was prettified and now supports MR2 jobs running over YARN:
- MR2 jobs and their logs can be browsed.
- Job logs can be accessed with one click.
- Other apps like Beeswax and Oozie can now show the MR2 logs.
MR1/MR2 job list and direct log access
UserAdmin
Groups and Hue permissions can be assigned to the users through the UserAdmin application. Access to Hue applications can be customized for each user. (For example, Bob can only see and use the Oozie and Impala applications.) The application has been restyled and simplified, and is no longer accessible by default to non-superuser:
- HDFS home of first/new/imported users is created automatically.
- LDAP support now has wildcard search, user import by wildcard expression and group syncing by distinguished name.
Group permission editing
Desktop
Desktop is the core library of Hue and every application is built on top of it. In this release, the user experience has been improved with more informative errors (now with stack traces and line numbers) and new status messages (such as when critical services like Oozie are down).
On the technical side, users can now upload files to a federated cluster, some XSS vulnerabilities were fixed, and database greenlet support was introduced for more performance. Hue now fully supports transactional databases like MySQL MyISAM and PostgreSQL.
Hue is also internationalized and available in English, French, German, Spanish, Portuguese (and Brazilian Portuguese), Korean, Japanese and simplified Chinese.
Conclusion
With this 2.2 release, a big part of the Hadoop user experience gap was filled in.
The next 2.3 release will target users who wish to better leverage the multiple query solutions in CDH (Beeswax/Hive, Impala, and Pig). A new document model (HUE-950) would make each query (e.g. Hive query) searchable and shareable with your colleagues or importable into an Oozie workflow without any duplication. A trash and source control versioning system (HUE-951) is also discussed as well as Oozie bundle (HUE-869) integrations.
Many past feature and bugs were discussed on the hue-user list, so feel free to chime in! A Hue meetup group was created and it would be a pleasure to meet you in person and see how analyzing your data with Hadoop could be made easier.