Oozie workflow credentials with a Hive action with Kerberos

Published on 02 April 2014 in - 2 minutes read - Last modified on 06 March 2021

When using Hadoop security and scheduling jobs using Hive (or Pig, HBase) you might have received this error:



Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: GSS initiate failed

Indeed, in order to use an Oozie Hive action with the Hive metastore server when Kerberos is enabled, you need to use HCatalog credentials in your workflow.

Here is a demo, with a kerberized cluster and a MySql Hive metastore showing how it works. We create a Hive script that will list the tables and performs an operation requiring the HCat credential. Please find all the used and generated configurations here.

Hue fills up automatically the parameters for you, just check the credentials required on your workflow action and Hue will:

  • Pull dynamically the available credentials details from the cluster
  • Configure the credentials in workflows for you

Then don’t forget to check the HCat credential in the Hive action advanced properties. You can check multiple credentials if you ever need to.

And that’s it! Submit the workflow and check its output, you will see the list of tables and the result of the computation of the second query!

As usual feel free to comment on the hue-user list or @gethue!

Note:

Hive should not access directly the metastore database via JDBC, or it will bypass the protection.

Include a hive-config.xml in the Job XML property of the Hive action with this type of configuration:





javax.jdo.option.ConnectionURL

jdbc:mysql://hue.com:3306/hive1?useUnicode=true&characterEncoding=UTF-8





javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver





javax.jdo.option.ConnectionUserName

hive1





javax.jdo.option.ConnectionPassword

hive1



Use this one:





hive.metastore.local

false





hive.metastore.uris

thrift://hue.com:9083





hive.metastore.sasl.enabled

true



Note:

When the job will try to connect to MySql, you might hit this missing jar problem:



Caused by: org.datanucleus.store.rdbms.datasource.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.

To solve it, simply download the MySql jar connector from http://dev.mysql.com/downloads/connector/j/, and have HiveServer2 points to it with:





hive.aux.jars.path

file:///usr/share/java//mysql-connector-java.jar



Note:

To activate the credentials in Oozie itself, update this property in oozie-site.xml





 oozie.credentials.credentialclasses

 

   hcat=org.apache.oozie.action.hadoop.HCatCredentials,

   hbase=org.apache.oozie.action.hadoop.HbaseCredentials

 




comments powered by Disqus

More recent stories

03 May 2023
Discover the power of Apache Ozone using the Hue File Browser
Read More
23 January 2023
Hue 4.11 and its new dialects and features are out!
Read More