Last update May 1st 2016
Here you can find a list of tips regarding some performance tuning of Hue:
General Hue Performance
- Each Hue instance will support by default 50+ concurrent users by following this guide. Adding more Hue instances behind the load balancer will increase performances by 50 concurrent users. If not using Cloudera Manager, you can manually setup NGINX or Apache in front of Hue.
- Increase the
cherry_pythread count to
100works in critical cases too. In Cloudera Manager navigate to the Hue service Configuration, click Main in the Category section on the left side, look for Hue Web Server Threads, enter an appropriate value, save and restart). This lessen the probability of having thread exhaustion and Hue just “hanging”.
- Move the database from the default database across to another database backend such as MySql/Postgres/Oracle, which handles locking better than the default SQLite DB. Hue should not be run on SQLite in an environment with more than 1 concurrent user. Read more about
using an External Database for Hue Using Cloudera Manager
- There are some memory fragmentation issues in Python that manifest in Hue. Check the memory usage of Hue periodically. Browsing HDFS dir with many files, downloading a query result, copying a HDFS files are costly operations memory wise.
- Upgrade to later versions of Hue. There are significant performance gains available in every release.
Hive Query Editor Performance
- Compare performance of the Hive Query Editor in Hue with the exact same query in a beeline shell against the exact same HiveServer2 instance that Hue is pointing to. This will determine if Hue needs to be investigated or HiveServer2 needs to be investigated.
- Check the logging configuration for HiveServer2 by going to Hive service Configuration in Cloudera Manager. Search for HiveServer2 Logging Threshold and make sure that it is not set to
DEBUG or TRACE. If it is, drop the logging level to
INFOat a minimum.
- Configure individual dedicated HiveServer2 instances for each Hue instance separate from HiveServer2 instances used by other 3rd party tools or clients, or configure Hue to point to multiple HS2 instances behind a Load Balancer.
- Tune the query timeouts for HiveServer2 (in
hive-site.xml) and Impala on the hue_safety_valve or hue.ini
[impala] # If QUERY_TIMEOUT_S > 0, the query will be timed out (i.e. canceled) if Impala does not do any work (compute or send back results) for that query within QUERY_TIMEOUT_S seconds. query_timeout_s=600
- Downloading queries past a few thousands rows will lag and increase CPU/memory usage in Hue by a lot. It is for this we are truncating the results until further improvements.
Oozie Dashboard Performance
The Oozie dashboard is populated by performing REST API calls to the Oozie service. These calls also gather log information from /var/log/oozie on the Oozie server. If the number of logs and size of the logs in that directory get too large, they can have a drastic performance impact on the Oozie Dashboard in Hue.
If this command takes 30 or more seconds to complete, then it’s likely the cause of the issue.
curl --output /tmp/hue_oozie_log.json --negotiate --user : "http://<oozie_host>:11000/oozie/v1/job/<id>?timezone=America%2FLos_Angeles&show=log&user.name=hue&len=-1&doAs=<your_user>"
To improve performance, perform the following:
- Purge any logs in
/var/log/ooziethat are older than 7 days. Then re-run curl command (above) to see if performance is improved.
- If performance is still no improved, further reduce the log retention time or Oozie log level. Change the log level to
ERRORfor less logging.
- Decrease the total number of days worth of logs that are kept.
Backend pagination in the latest versions of Hue improved dramatically the performances of dashboard pages