Oozie workflow credentials with a Hive action with Kerberos

Published on 02 April 2014 in Scheduling - 2 minutes read - Last modified on 04 February 2020

When using Hadoop security and scheduling jobs using Hive (or Pig, HBase) you might have received this error:



Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: GSS initiate failed

Indeed, in order to use an Oozie Hive action with the Hive metastore server when Kerberos is enabled, you need to use HCatalog credentials in your workflow.

Here is a demo, with a kerberized cluster and a MySql Hive metastore showing how it works. We create a Hive script that will list the tables and performs an operation requiring the HCat credential. Please find all the used and generated configurations here.

Hue fills up automatically the parameters for you, just check the credentials required on your workflow action and Hue will:

  • Pull dynamically the available credentials details from the cluster
  • Configure the credentials in workflows for you

Then don’t forget to check the HCat credential in the Hive action advanced properties. You can check multiple credentials if you ever need to.

And that’s it! Submit the workflow and check its output, you will see the list of tables and the result of the computation of the second query!

As usual feel free to comment on the hue-user list or @gethue!

Note:

Hive should not access directly the metastore database via JDBC, or it will bypass the protection.

Include a hive-config.xml in the Job XML property of the Hive action with this type of configuration:





javax.jdo.option.ConnectionURL

jdbc:mysql://hue.com:3306/hive1?useUnicode=true&characterEncoding=UTF-8





javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver





javax.jdo.option.ConnectionUserName

hive1





javax.jdo.option.ConnectionPassword

hive1



Use this one:





hive.metastore.local

false





hive.metastore.uris

thrift://hue.com:9083





hive.metastore.sasl.enabled

true



Note:

When the job will try to connect to MySql, you might hit this missing jar problem:



Caused by: org.datanucleus.store.rdbms.datasource.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.

To solve it, simply download the MySql jar connector from http://dev.mysql.com/downloads/connector/j/, and have HiveServer2 points to it with:





hive.aux.jars.path

file:///usr/share/java//mysql-connector-java.jar



Note:

To activate the credentials in Oozie itself, update this property in oozie-site.xml





 oozie.credentials.credentialclasses

 

   hcat=org.apache.oozie.action.hadoop.HCatCredentials,

   hbase=org.apache.oozie.action.hadoop.HbaseCredentials

 




comments powered by Disqus

More recent stories

19 May 2020
How to grant Ranger permissions for a new user on a Secure Cluster
Read More
06 May 2020
SQL Editor for Apache Flink SQL
Read More
05 May 2020
How to Configure Hue to authenticate with Apache Knox SSO on a Secure Cluster
Read More