Install Hue 3 on Pivotal HD 3.0

Install Hue 3 on Pivotal HD 3.0

This post was originally published on Install Hue 3 with Pivotal HD 3.0 by Christian Tzolov of @Pivotal.

Latest Hadoop distributions from Pivotal (PHD3.0) and Hortonworks (HDP2.2) come with support for Hue 2.6. Unfortunately Hue 2.6 is quite old and does not provide any RDBMS UI. The RDBMS view is useful for Pivotal as it allows friendly web interface for running adhoc HAWQ SQL queries. This feature is demoed here:

 

 

Below I will show how to install latest Hue 3.7.1 on PHD3.0 and how to use it with HAWQ.

Disclaimer: This is an experimental work. It is not thoughtfully tested and will not be supported in the future. The article expresses author’s own opinion.

 

Installing Hue 3.7.1 on Pivotal HD 3.0

The following Hue 3.7.1 rpms are built with Apache BigTop. You can download compressed bundle with all Hue rpms: hue-all-3.7.1-1.el6.x86_64.zip. It contains the following packages:
hue-common
hue-spark – requires additional services
hue-server
hue-sqoop
hue-rdbms
hue-doc
hue-pig
hue-search
hue-zookeeper
hue-beeswax
hue-hbase
hue-impala – required due to issue HUE-2492


Uncompress hue-all.zip bundle on the Ambari node and install the packages as explained below. For some packages the dependency check needs to be disabled (e.g.use rpm -i with  –nodeps option).


# External dependencies
sudo yum -y install cyrus-sasl-gssapi cyrus-sasl-plain libxml2 libxslt zlib python sqlite python-psycopg2
# Hue packages
sudo yum -y install ./hue-common-3.7.1-1.el6.x86_64.rpm
sudo yum -y install ./hue-server-3.7.1-1.el6.x86_64.rpm
sudo yum -y install ./hue-rdbms-3.7.1-1.el6.x86_64.rpm
sudo yum -y install ./hue-zookeeper-3.7.1-1.el6.x86_64.rpm
sudo yum -y install ./hue-pig-3.7.1-1.el6.x86_64.rpm
sudo yum -y install ./hue-hbase-3.7.1-1.el6.x86_64.rpm
sudo yum -y install ./hue-beeswax-3.7.1-1.el6.x86_64.rpm
(sudo rpm -i --nodeps hue-beeswax-3.7.0+cdh5.3.3+180-1.cdh5.3.3.p0.8.el6.x86_64.rpm)
sudo yum -y install ./hue-sqoop-3.7.1-1.el6.x86_64.rpm
sudo yum -y install ./hue-impala-3.7.1-1.el6.x86_64.rpm


Start Hue:
sudo /etc/init.d/hue start
Open the Ambari UI on port https://<Ambari-Host>:8888 (https://ambari.localadmin:8888):


Screen Shot 2015-04-13 at 3.02.30 PM.png

The first login will ask for an username and a password.  Make sure to set the username to hue! Pick a password of your choice.


Next configure Hue. Edit the /etc/hue/conf/hue.ini as explained in Appendix A  and apply the RDBMS table view workaround as explained in Appendix B.

Restart Hue:
sudo /etc/init.d/hue restart


Enable HAWQ remote access


Enable the remote access from the Ambari to HAWQ. On the HAWQ master (phd3) open the master’s pg_hba.conf file:
sudo vi /data/hawq/master/gpseg-1/pg_hba.conf


and add the following line. Replace the IP with the address of you the Ambari node.
host    all     gpadmin &lt;Add your AmbariHost IP here&gt;/32        trust


Restart the HAWQ Service (using Ambari):
Ambari-Restart-HAWQ-Service.png


Start HBase Thrift Server


Hue communicates with the HBase service via Thrift. To start the server on the HBase master node (phd2.localdomain) run:
sudo nohup /usr/bin/hbase thrift start &amp;


Hadoop Proxy configuration


To allow HUE to impersonate various Hadoop services you have to enable the following hadoop proxies. In Ambari from the Dashboard view select the HDFS service and then the ‘Config’ tab. Type ‘proxy’ in the search field and press enter. Change or add the properties to match those values:


Screen Shot 2015-04-14 at 12.27.36 AM.png

Property name
value
hadoop.proxyuser.hcat.groups
*
hadoop.proxyuser.hcat.hosts
*
hadoop.proxyuser.hive.groups
*
hadoop.proxyuser.hive.hosts
*
hadoop.proxyuser.hue.groups
*
hadoop.proxyuser.hue.hosts
*
hadoop.proxyuser.oozie.groups
*
hadoop.proxyuser.oozie.hosts
*


Save the modified configuration and restart the affected services (HDFS, YARN and MapReduce). Tip: follow the restart suggestions.

 

Appendix A: HUE Configuration
Hue configuration is in /etc/hue/conf/hue.ini file. Run ‘/etc/init.d/hue restart’ after configuration modification.

The following service deployment topology is being used for this particular hue.ini configuration:


ambari.localadmin
Ambari, Hue, Nagios, Ganglia
phd1.localadmin
HAWQ SMaster, NameNode, HiveServer2, Hive Metastore, ResourceManager, WebHCat Server, DataNode, HAWQ Segment, RegionServer, NodeManager, PXF
phd2.localadmin
App Timeline Server, History Server, HBase Master, Oozie Server, SNameNode, Zookeeper Server, DataNode, HAWQ Segment, RegionServer, NodeManager, PXF
phd3.localadmin
HAWQ Master, DataNode, HAWQ Segment, RegionServer, NodeManager, PXF


Only the modification from the the default hue configuration properties are listed below. Configuration is aligned with PHD3.0 cluster with the following topology:

 

###########################################################################
# General configuration for core Desktop features (authentication, etc)
###########################################################################
[desktop]
 # Set this to a random string, the longer the better.
 # This is used for secure hashing in the session store.
 secret_key=bozanovakozagoza
 # Time zone name
 time_zone=Europe/Amsterdam
 # Comma separated list of apps to not load at server startup.
 # e.g.: pig,zookeeper
 app_blacklist=impala,indexer
###########################################################################
# Settings for the RDBMS application
###########################################################################
[librdbms]
 # The RDBMS app can have any number of databases configured in the databases
 # section. A database is known by its section name
 # (IE sqlite, mysql, psql, and oracle in the list below).
 [[databases]]
   # mysql, oracle, or postgresql configuration.
   [[[postgresql]]]
     # Name to show in the UI.
     nice_name=&quot;HAWQ&quot;
     name=postgres
     engine=postgresql
     host=phd3.localdomain
     port=5432
     user=gpadmin
     password=
###########################################################################
# Settings to configure your Hadoop cluster.
###########################################################################
[hadoop]
 # Configuration for HDFS NameNode
 # ------------------------------------------------------------------------
 [[hdfs_clusters]]
   [[[default]]]
     # Enter the filesystem uri
     fs_defaultfs=hdfs://phd1.localdomain:8020
     # Use WebHdfs/HttpFs as the communication mechanism.
     # Domain should be the NameNode or HttpFs host.
     # Default port is 14000 for HttpFs.
     webhdfs_url=http://phd1.localdomain:50070/webhdfs/v1
 # Configuration for YARN (MR2)
 # ------------------------------------------------------------------------
 [[yarn_clusters]]
   [[[default]]]
     # Enter the host on which you are running the ResourceManager
     resourcemanager_host=phd1.localdomain
     # The port where the ResourceManager IPC listens on
     resourcemanager_port=8030
     # URL of the ResourceManager API
     resourcemanager_api_url=http://phd1.localdomain:8088
     # URL of the HistoryServer API
     history_server_api_url=http://phd2.localdomain:19888
###########################################################################
# Settings to configure liboozie
###########################################################################
[liboozie]
 # The URL where the Oozie service runs on. This is required in order for
 # users to submit jobs. Empty value disables the config check.
 oozie_url=http://phd2.localdomain:11000/oozie
###########################################################################
# Settings to configure Beeswax with Hive
###########################################################################
[beeswax]
 # Host where HiveServer2 is running.
 # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
 hive_server_host=phd1.localdomain
 # Port where HiveServer2 Thrift server runs on.
 hive_server_port=10000
 # Choose whether Hue uses the GetLog() thrift call to retrieve Hive logs.
 # If false, Hue will use the FetchResults() thrift call instead.
 use_get_log_api=false
 # Set a LIMIT clause when browsing a partitioned table.
 # A positive value will be set as the LIMIT. If 0 or negative, do not set any limit.
 browse_partitioned_table_limit=250
 # A limit to the number of rows that can be downloaded from a query.
 # A value of -1 means there will be no limit.
 # A maximum of 65,000 is applied to XLS downloads.
 download_row_limit=10000
 # Thrift version to use when communicating with HiveServer2
 thrift_version=5
###########################################################################
# Settings to configure the Zookeeper application.
###########################################################################
[zookeeper]
 [[clusters]]
   [[[default]]]
     # Zookeeper ensemble. Comma separated list of Host/Port.
     # e.g. localhost:2181,localhost:2182,localhost:2183
     host_ports=phd2.localdomain:2181
     # The URL of the REST contrib service (required for znode browsing)
     rest_url=http://phd2.localdomain:9998
###########################################################################
# Settings to configure HBase Browser
###########################################################################
[hbase]
 # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
 # Use full hostname with security.
 hbase_clusters=(Cluster|phd2.localdomain:9090)

Appendix B: RDBMS view doesn’t show tables workaround

Credits to Scott Kahler for this workaround!

Edit /usr/lib/hue/desktop/libs/librdbms/src/librdbms/server/postgresql_lib.py. Replace the cursor.execute() statements in the get_tables() and get_columns() methods as shown below
def get_tables(self, database, table_names=[]):
  # Doesn’t use database and only retrieves tables for database currently in use.
  cursor =self.connection.cursor()
  #cursor.execute(“SELECT table_name FROM information_schema.tables WHERE table_schema=’%s'” % database)
  cursor.execute(“SELECT table_name FROM information_schema.tables WHERE table_schema NOT IN (‘hawq_toolkit’,’information_schema’,’madlib’,’pg_aoseg’,’pg_bitmapindex’,’pg_catalog’,’pg_toast’)”)
  self.connection.commit()
  return[row[0]for row in cursor.fetchall()]

def get_columns(self, database, table):
  cursor =self.connection.cursor()
  #cursor.execute(“SELECT column_name FROM information_schema.columns WHERE table_schema=’%s’ and table_name=’%s'” % (database, table))
  cursor.execute(“SELECT column_name FROM information_schema.columns WHERE table_name=’%s’ AND table_schema NOT IN (‘hawq_toolkit’,’information_schema’,’madlib’,’pg_aoseg’,’pg_bitmapindex’,’pg_catalog’,’pg_toast’)”% table)
  self.connection.commit()
  return[row[0]for row in cursor.fetchall()]
You can automate the update like this:
sudo sed -i “s/=’%s’\” % database/NOT IN (‘hawq_toolkit’,’information_schema’,’madlib’,’pg_aoseg’,’pg_bitmapindex’,’pg_catalog’,’pg_toast’)\”/g”/usr/lib/hue/desktop/libs/librdbms/src/librdbms/server/postgresql_lib.py
sudo sed -i “s/table_schema=’%s’ and table_name=’%s’\” % (database, table)/table_name=’%s’ AND table_schema NOT IN (‘hawq_toolkit’,’information_schema’,’madlib’,’pg_aoseg’,’pg_bitmapindex’,’pg_catalog’,’pg_toast’)\” % table/g”/usr/lib/hue/desktop/libs/librdbms/src/librdbms/server/postgresql_lib.py
Restart Hue
sudo /etc/init.d/hue restart

Related links:

2 Comments

  1. Abhijit Das 1 year ago

    Hello,

    I have my HUE 3.7 installed in CDH5 cluster. Can I connect to HAWQ using thsi HUE ??

    — Abhijit

Leave a reply

Your email address will not be published. Required fields are marked *

*