Hue performance tuning guide

Hue performance tuning guide

Last update March 29th 2017

Here you can find a list of tips regarding some performance tuning of Hue:

General Performance

  1. Each Hue instance will support by default 100+ concurrent users by following this guide. Adding more Hue instances behind the load balancer will increase performances by 100 concurrent users. If not using Cloudera Manager, you can manually setup NGINX or Apache in front of Hue.
  2. Move the database from the default database across to another database backend such as MySql/Postgres/Oracle, which handles locking better than the default SQLite DB. Hue should not be run on SQLite in an environment with more than 1 concurrent user. Read more about
    using an External Database for Hue Using Cloudera Manager
  3. There are some memory fragmentation issues in Python that manifest in Hue. Check the memory usage of Hue periodically. Browsing HDFS dir with many files, downloading a query result, copying a HDFS files are costly operations memory wise.
  4. Upgrade to later versions of Hue. There are significant performance gains available in every release.

with-nginx

Query Editor Performance

  1. Compare performance of the Hive Query Editor in Hue with the exact same query in a beeline shell against the exact same HiveServer2 instance that Hue is pointing to. This will determine if Hue needs to be investigated or HiveServer2 needs to be investigated.
  2. Check the logging configuration for HiveServer2 by going to Hive service Configuration in Cloudera Manager. Search for HiveServer2 Logging Threshold and make sure that it is not set to DEBUG or TRACE. If it is, drop the logging level to INFO at a minimum.
  3. Configure individual dedicated HiveServer2 instances for each Hue instance separate from HiveServer2 instances used by other 3rd party tools or clients, or configure Hue to point to multiple HS2 instances behind a Load Balancer.
  4. Tune the query timeouts for HiveServer2 (in hive-site.xml) and Impala on the hue_safety_valve or hue.ini: Query Life Cycle
  5. Downloading queries past a few thousands rows will lag and increase CPU/memory usage in Hue by a lot. It is for this we are truncating the results until further improvements.

Feel free to ask any questions about the architecture, usage of the server in the comments, @gethue or the hue-user list!

0 Comments

Leave a reply

Your email address will not be published. Required fields are marked *

*