Note: as of January 2015 in Hue master or CDH5.4, this post is deprecated by Automatic High Availability with Hue and Cloudera Manager.
Very few projects within the Hadoop umbrella have as much end user visibility as Hue. Thus, it is useful to add a degree of fault tolerance to deployments. This blog post describes how to achieve a higher level of availability (HA) by placing several Hue instances behind a load balancer.
This tutorial demonstrates how to setup high availability by:
- Installing Hue 2.3 on two nodes in a three-node RedHat 5 cluster.
- Managing all Hue instances via Cloudera Manager 4.7.
- Load balancing using HA Proxy 1.4. In reality, any load balancer with sticky sessions should work.
Here is a video summary of the new features:
Hue should be installed on two of the three nodes. To have Cloudera Manager automatically install Hue, follow the “Parcel Install via Cloudera Manager” section. To install manually, follow the “Package Install” section.
Parcel Install via Cloudera Manager
For more information on Parcels, see Managing Parcels.
- From Cloudera Manager, click on “Hosts” in the menu. Then, go to the “Parcels” section.
- Find the latest CDH parcel, click “Download”.
- Once the parcel has finished downloading, click “Distribute”.
- Once the parcel has finished distributing, click “Activate”.
- Download the yum repository RPM.
- Install the yum repository using “sudo yum —nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm”. For more information, see Installing CDH4.
- Install Hue on each node using the command “sudo yum install hue” via the command line interface. For more information on installing Hue, see CDH documentation.
Managing Hue through Cloudera Manager
Cloudera Manager provides management of the Hue servers on each node. Add two Hue services using the directions below. For more information on managing services, see the Cloudera Manager documentation.
- Go to “Services -> All Services” in the menu.
- Click “Actions -> Add a Service”.
- Select “Hue” and follow the steps on the screen. NOTE: For each Hue service we choose a unique host.
- Ensure that the “Jobsub Examples and Templates Directory” configuration points to different directories in HDFS for each Hue service. It can be changed by going to Services -> <hue service>. In the menu, go to Configuration -> View and Edit. Then, click on “Hue Server”. “Jobsub Examples and Templates Directory” should be at the bottom of the page.
Image 1: Cloudera Manager handling two Hue services.
HA Proxy Installation/Configuration
- Download and unzip the binary distribution of HA Proxy 1.4 on the node that doesn’t have Hue installed.
- Add the following HA Proxy configuration to /tmp/hahue.conf:
global daemon nbproc 1 maxconn 100000 log 127.0.0.1 local6 debug defaults option http-server-close mode http timeout http-request 5s timeout connect 5s timeout server 10s timeout client 10s listen Hue 0.0.0.0:80 log global mode http stats enable balance source server hue1 servera.cloudera.com:8888 cookie ServerA check inter 2000 fall 3 server hue2 serverb.cloudera.com:8888 cookie ServerB check inter 2000 fall 3
- Start HA Proxy:
haproxy -f /tmp/hahue.conf
The key configuration options are balance and server in the listen section. When the balance parameter is set to source, a client is guaranteed to communicate with the same server every time it makes a request. If the server the client is communicating with goes down, the request will automatically be sent to another active server. This is necessary because Hue stores session information in process memory. The server parameters define which servers will be used for load balancing and takes on the form:
server [:port] [settings ...]
In the configuration above, the server “hue1” is available at “servera.cloudera.com:8888” and “hue2” is available at “serverb.cloudera.com:8888”. Both servers have health checks every two seconds and are declared down after three failed health checks. In this example, HAProxy is configured to bind to “0.0.0.0:80”. Thus, Hue should now be available at “http://serverc.cloudera.com”.
Hue can be load balanced easily as long as the server a client is directed to is constant (i.e.: sticky sessions). It can improve performance, but the primary goal is high availability. Also, multiple Hue instances can be easily managed through Cloudera Manager. For true High Availability, Hue needs to be configured to use HA MySQL, PostGreSQL, or Oracle.
Coming up, there will be a blog post on JobTracker HA with Hue. Have any suggestions? Feel free to tell us what you think throughhue-user.