Hi Big Data Explorers,
In this latest updates of Hue, the intelligent editor for SQL Developers and Analysts the focus was on the Editor and security. More than 1570 commits on top of 3.11 went in! Go grab the tarball release and give it a spin!
The Hue editor keeps getting better with these major improvements:
The number of rows returned is displayed so you can quickly see the size of the dataset. If the database engine does not provide the number of rows, Hue estimates the value and appends a plus sign, e.g. 100+.
This popup offers a quick way to see sample of the data and other statistics on databases, tables, and columns. You can open the popup from the SQL Assist or with a right-click on any SQL object (table, column, function…). In this release, it also opens faster and caches the data.
The footer provides direct links to the metastore page or to the table in the table Assist.
The rendering of the items was rewritten and optimized. You should not experience any lag on databases with thousands of columns.
The SQL Formatter has a new and smarter algorithm that will make your queries look pretty with a single click!
Timeline and Pivot Graphing
These visualizations are convenient for plotting chronological data or when subsets of rows have the same attribute and they will be stacked together.
Creating an External Table
The improved support for S3 introduced the possibility of directly creating an external table in HDFS or S3.
Read more about the SQL improvements here.
Automated S3 Configuration
When using Cloudera Manager, Hue will now inherit automatically the S3 credentials if those are configured.
Regular user won’t have automatically access to the S3 Browser and autocomplete. They will require to have the “File Browser S3 permission” in Hue User admin added to one of their groups.
Read more about S3 configuration here.
New Security Improvements
Many security options have been added in order to help administrators enforce and manage secure Hue installations.
Fixed Arbitrary host header acceptance
Fixed Arbitrary host header acceptance in Hue. Now one can set host/domain names that the Hue server can serve.
[desktop] allowed_hosts="*.domain" # your own fqdn example: allowed_hosts="*.hadoop.cloudera.com" # or specific example: allowed_hosts="hue1.hadoop.cloudera.com,hue2.hadoop.cloudera.com"
Note: “Bad Request (400)” error: when hosting Hue in an AWS cluster, you might need to set the value to ‘*’ to allow external client of the network to access Hue.
Fixed sessionid and csrftoken with HttpOnly Flag
If the HttpOnly flag is included in HTTP response header, then the cookie cannot be accessed through client side script and thus browser will not reveal the cookie to any third party. In order to help mitigate the risk of cross-site scripting, A cookie with this attribute is called an HTTP-only cookie. Any information contained in an HTTP-only cookie is less likely to be disclosed to a hacker or a malicious Web site.
SASL Support for
SASL mechanisms support integrity and privacy protection of the communication channel after successful authentication.
In Thrift SASL library, the sasl_max_buffer support is already implemented. sasl_max_buffer in the hue.ini provides a bigger and configurable buffer size that allow to provide support for
[desktop] # This property specifies the maximum size of the receive buffer in bytes in thrift sasl communication (default 2 MB). sasl_max_buffer=2 * 1024 * 1024
Introducing Request HTTP Pool in Hue
The Request Session object allows the persistence of certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3’s connection pooling. We are making several requests to the same host:port, with this change the underlying TCP connection will be reused, which can result in a significant performance increase.
CACHE_SESSION = requests.Session() CACHE_SESSION.mount('http://', requests.adapters.HTTPAdapter(pool_connections=conf.CHERRYPY_SERVER_THREADS.get(), pool_maxsize=conf.CHERRYPY_SERVER_THREADS.get())) CACHE_SESSION.mount('https://', requests.adapters.HTTPAdapter(pool_connections=conf.CHERRYPY_SERVER_THREADS.get(), pool_maxsize=conf.CHERRYPY_SERVER_THREADS.get()))
The new Content-Security-Policy HTTP response header helps you reduce XSS risks on modern browsers by declaring what dynamic resources are allowed to load via a HTTP Header. (For more reading: https://content-security-policy.com/)
[desktop] secure_content_security_policy="script-src 'self' 'unsafe-inline' 'unsafe-eval' *.google-analytics.com *.doubleclick.net *.mathjax.org data:;img-src 'self' *.google-analytics.com *.doubleclick.net http://*.tile.osm.org *.tile.osm.org *.gstatic.com data:;style-src 'self' 'unsafe-inline';connect-src 'self';child-src 'self' data:;object-src 'none'" #In HUE 3.11 and higher it is enabled by default.
It becomes easy to receive an email notification after a workflow execution is complete. The workflow submission popup now shows the “Send completion email” checkbox.
Extended Dashboard Filtering
Just start typing in the text field to get the list of jobs whose Name or Submitter partially matches with the text. From the below picture, you can see that text sh partially matches with the names of all four jobs. Note that the filter is applied on all the jobs and not just the ones in the current page.
To find the one single job among thousands of submitted jobs, you should enter the complete ID.
Read more about the Oozie improvements here.
Next iteration will keep improving the data SQL Edition and Data Discovery. Hue 4 will get real, with the goal of becoming the equivalent of “Excel for Big Data”. The current apps are being unified into the new Editor and the whole Hue will become a single app that would provide the best Data Analytics user experience for Hadoop on prem or in the Cloud.