Browsing ADLS data, querying it with SQL and exporting the results back in Hue 4.2

Published on 20 November 2017 in Version 4 - 4 minutes read - Last modified on 06 March 2021

We’re happy to present Microsoft Azure Data Lake Store (ADLS) integration in Hue with Hue 4.2 release. Similarly to the S3 integration, Hue can be setup to read and write to a configured ADLS, and users can directly query from and save data to ADLS without any intermediate moving / copying to HDFS.

For a detailed video walkthrough of file manipulation using ADLS in Hue, have a look:

In case you missed the one for S3, here is the link to the post.

 

Exploring ADLS in Hue’s file browser

Once Hue is successfully configured to connect to ADLS, we can view all accessible folders within the account by clicking on the ADLS root. From here, we can view the existing keys (both directories and files) and create, rename, move, copy, or delete existing directories and files. Additionally, we can directly upload files to ADLS.

Create Hive Tables Directly From ADLS

Hue’s table browser import wizard can create external Hive tables directly from files in ADLS. This allows ADLS data to be queried via SQL from Hive or Impala, without moving or copying the data into HDFS or the Hive Warehouse. To create an external Hive table from ADLS, navigate to the table browser, select the desired database and then click the plus icon in the upper right. Select a file using the file picker and browse to a file on ADLS.

Choose your input files’ delimiter and press next. Keep unchecked “Store in Default location” if you want the file to stay intact on ADLS, update the column definition options and finally click “Submit” when you’re ready to create the Hive table. Once created, you should see the newly created table details in the table browser.

Save Query Results to ADLS

Now that we have created external Hive tables created from our ADLS data, we can jump into either the Hive or Impala editor and start querying the data directly from ADLS seamlessly. These queries can join tables and objects that are backed either by ADLS, HDFS, or both. Query results can then easily be saved back to ADLS.

 

ADLS Configuration in Hue

Hue’s file browser can now allow users to explore, manage, and upload data in an ADLS, in addition to HDFS and S3.

In order to add an ADLS account to Hue, you’ll need to configure Hue with valid ADLS credentials, including the client ID, client secret and tenant ID.
These keys can securely stored in a script that outputs the actual access key and secret key to stdout to be read by Hue (this is similar to how Hue reads password scripts). In order to use script files, add the following section to your hue.ini configuration file:



[adls]

[[azure_accounts]]

[[[default]]]

client_id_script=/path/to/client_id_script.sh

client_secret_script=/path/to/client_secret_script.sh

tenant_id_script=/path/to/tenant_id_script.sh

[[adls_clusters]]

[[[default]]]

fs_defaultfs=adl://.azuredatalakestore.net

webhdfs_url=https://.azuredatalakestore.net

Alternatively (but not recommended for production or secure environments), you can set the client_secret value in plain-text:



[adls]

[[azure_accounts]]

[[[default]]]

client_id_script=/path/to/client_id_script.sh

client_secret_script=/path/to/client_secret_script.sh

tenant_id_script=/path/to/tenant_id_script.sh

[[adls_clusters]]

[[[default]]]

fs_defaultfs=adl://.azuredatalakestore.net

webhdfs_url=https://.azuredatalakestore.net

Alternatively (but not recommended for production or secure environments), you can set the client_secret value in plain-text:



[adls]

[[azure_account]]

[[[default]]]

client_id=adlsclientid

client_secret=adlsclientsecret

tenant_id=adlstenantid

[[adls_clusters]]

[[[default]]]

fs_defaultfs=adl://.azuredatalakestore.net

webhdfs_url=https://.azuredatalakestore.net

Integrating Hadoop with ADLS

In addition to configuring Hue with your ADLS credentials, Hadoop will also need to be configured with the ADLS authentication credentials in order to read from and save to ADLS. This can be done by setting the following properties in your core-site.xml file:





fs.adl.oauth2.access.token.provider.type

ClientCredential





fs.adl.oauth2.refresh.url

https://login.microsoftonline.com//oauth2/token





fs.adl.oauth2.client.id

adlsclientid





fs.adl.oauth2.credential

adlsclientsecret



With Hue and Hadoop configured, we can verify that Hue is able to successfully connect to ADLS by restarting Hue and checking the configuration page. You should not see any errors related to ADLS, and you should notice an additional option in the menu from the main navigation.

For performance numbers on ADLS, please read: A look at ADLS Performance.
As always, if you have any questions, feel free to comment here or on the hue-user list or @gethue!


comments powered by Disqus

More recent stories

26 June 2024
Integrating Trino Editor in Hue: Supporting Data Mesh and SQL Federation
Read More
03 May 2023
Discover the power of Apache Ozone using the Hue File Browser
Read More