Filter, Sort and Browse Hive Partitions with Hue’s Metastore App

Filter, Sort and Browse Hive Partitions with Hue’s Metastore App

With the latest Hue release, the metastore offers better controls over partitioned Hive tables. Partitioning Hive tables is a great strategy to improve query performance for Hive-managed tables by avoiding full-table scans.

However, partitioning is also useful for external tables where the data may already reside in HDFS or be managed by a service other than Hive. In these cases, the partition location may not conform to the default dynamic Hive partition location (which takes the table’s base location and appends a name=value path token for each partition), and can take any valid data path as the location for the partition.

Take for example an external table called “blog” created with the following partition scheme:

CREATE TABLE blog (title STRING, body STRING, pubdate DATE) PARTITIONED BY (dy STRING, dm STRING, dd STRING, dh STRING);

We can continue to alter the table as needed to add data at specific partition locations:

ALTER TABLE blog ADD PARTITION (dy='2015', dm='2015-01', dd='2015-01-01', dh='2015-01-01 00') LOCATION '/user/jennykim/2015/01/01/00';
ALTER TABLE blog ADD PARTITION (dy='2015', dm='2015-01', dd='2015-01-01', dh='2015-01-01 12') LOCATION '/user/jennykim/2015/01/01/12';
ALTER TABLE blog ADD PARTITION (dy='2015', dm='2015-01', dd='2015-01-02', dh='2015-01-02 00') LOCATION '/user/jennykim/2015/01/02/00';
ALTER TABLE blog ADD PARTITION (dy='2015', dm='2015-01', dd='2015-01-02', dh='2015-01-02 12') LOCATION '/user/jennykim/2015/01/02/12';
ALTER TABLE blog ADD PARTITION (dy='2015', dm='2015-01', dd='2015-01-03', dh='2015-01-03 00') LOCATION '/user/jennykim/2015/01/03/00';
ALTER TABLE blog ADD PARTITION (dy='2015', dm='2015-01', dd='2015-01-03', dh='2015-01-03 12') LOCATION '/user/jennykim/2015/01/03/12';
ALTER TABLE blog ADD PARTITION (dy='2015', dm='2015-01', dd='2015-01-04', dh='2015-01-04 00') LOCATION '/user/jennykim/2015/01/04/00';
ALTER TABLE blog ADD PARTITION (dy='2015', dm='2015-01', dd='2015-01-04', dh='2015-01-04 12') LOCATION '/user/jennykim/2015/01/04/12';

Regardless of a table’s partition locations, Hue’s metastore now enables you to browse all the partitions in the table, by clicking the “Show Partitions” link from the table view. By default, the partitions view will sort the partitions in reverse order by name (or newest first, if partitioned by date) and display the first 250 partitions.

If you’re searching for a specific set of partitions however, you can now filter on a partition value by clicking the “Add a filter” link, selecting and specifying the filter parameter, then clicking the “Filter” button! Note that multiple partition filters can be added as needed to refine your query, and you can also disable the default sort order to retrieve partition results in alphabetically ascending order.

Filter, Sort and Browse Hive Partitions with Hue’s Metastore from The Hue Team on Vimeo.

Finally, you can view the data files in any partition by clicking the “View Partition Files” link which will take you to the filebrowser for that partition’s location.

Metastore Partition View

Partition File Browser

Hue offers the flexibility to seamlessly work with your Hive data as-is. Feel free to comment on the hue-user list or @gethue!

0 Comments

Leave a reply

Your email address will not be published. Required fields are marked *

*