When using Hadoop security and scheduling jobs using Hive (or Pig, HBase) you might have received this error:
Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: GSS initiate failed
Indeed, in order to use an Oozie Hive action with the Hive metastore server when Kerberos is enabled, you need to use HCatalog credentials in your workflow.
Here is a demo, with a kerberized cluster and a MySql Hive metastore showing how it works. We create a Hive script that will list the tables and performs an operation requiring the HCat credential. Please find all the used and generated configurations here.
Hue fills up automatically the parameters for you, just check the credentials required on your workflow action and Hue will:
- Pull dynamically the available credentials details from the cluster
- Configure the credentials in workflows for you
Then don’t forget to check the HCat credential in the Hive action advanced properties. You can check multiple credentials if you ever need to.
And that’s it! Submit the workflow and check its output, you will see the list of tables and the result of the computation of the second query!
As usual feel free to comment on the hue-user list or @gethue!
Note:
Hive should not access directly the metastore database via JDBC, or it will bypass the protection.
Include a hive-config.xml in the Job XML property of the Hive action with this type of configuration:
javax.jdo.option.ConnectionURL
jdbc:mysql://hue.com:3306/hive1?useUnicode=true&characterEncoding=UTF-8
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName
hive1
javax.jdo.option.ConnectionPassword
hive1
Use this one:
hive.metastore.local
false
hive.metastore.uris
thrift://hue.com:9083
hive.metastore.sasl.enabled
true
Note:
When the job will try to connect to MySql, you might hit this missing jar problem:
Caused by: org.datanucleus.store.rdbms.datasource.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
To solve it, simply download the MySql jar connector from http://dev.mysql.com/downloads/connector/j/, and have HiveServer2 points to it with:
hive.aux.jars.path
file:///usr/share/java//mysql-connector-java.jar
Note:
To activate the credentials in Oozie itself, update this property in oozie-site.xml
oozie.credentials.credentialclasses
hcat=org.apache.oozie.action.hadoop.HCatCredentials,
hbase=org.apache.oozie.action.hadoop.HbaseCredentials