Hive Query editor with HiveServer2 and Sentry

Hive Query editor with HiveServer2 and Sentry

Hue provides a Web interface for submitting Hive queries. Hue had its own server to service Hive queries called Beeswax. The more sophisticated and robust service, Apache HiveServer2, is supported as of Hue 2.5.

Beeswax Hive Editor

Thanks to HiveServer2 integration, Hue is getting the benefits from Sentry (How to configure Sentry Video). In addition to the security provided, Hue’s interface becomes more consistent. For example, a user without permissions on a database or table won’t see it in the query editor or in the Metastore app.

HiveServer2 also provides performant access to the Metastore.

On top of this, the Beeswax Hive UI is a Web editor for increasing the productivity:

  • Syntax highlighting and auto completion
  • Submit several queries and check they progress later
  • UDF integration
  • Multiple queries execution
  • Select and send a fraction of a query
  • Download or save the query results
  • Navigate through the metadata

Hue 2.x

We recommend to use the latest version of Hue (2.5). Have Hue point to HiveServer2 by updating the Beeswax section in the hue.ini:

[beeswax]
  beeswax_server_host=<FQDN of Beeswax server>
  server_interface=hiveserver2
  beeswax_server_port=10000

Hue 3.x

Hue 3 does not bundle Beeswaxd anymore, and is configured by default to use HiveServer2. If HiveServer2 is not on the same machine as Hue update hue.ini with:

[beeswax]
 hive_server_host=<FQDN of HiveServer2>

Other Hive specific settings (e.g. security, impersonation) are read from a local /etc/hive/conf/hive-site.xml. We recommend to keep this one in exact sync with the original Hive one (or put Hue and Hive on the same machine).


Note
:

If you are using Hive 0.12 or later, Hue needs to have HUE-1561 (or use Hue 3.0 or later).

With Sentry: Hue 2.x or 3.x

Hue will automatically work with a HiveServer2 configured with Sentry.

Notice that HiveServer2 impersonation (described below) should be turned off in case of Sentry. Permissions of the impersonated user (e.g. ‘bob’) will be used instead of the ones of the ‘hue’ user. Also we need the warehouse permissions to be owned by hive:hive with 770 so that only super users in hive group can read, write.

HiveServer2 needs to be using strong authentication like Kerberos/LDAP for Sentry to work.

Troubleshooting without Sentry

org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=hive, access=WRITE, inode="/user/test/data":test:supergroup:drwxr-xr-x

By default HiveServer2 now owns the Hive warehouse (default ‘/user/hive/warehouse’), meaning the data files need to belong to the ‘hive’ user. If you get this error when creating a table, change the permission of the data directory (here /user/test/data) to ‘write’ for everybody or revert HiveServer2 to the old Beeswax behavior by authorizing ‘hive’ to impersonate the user.

Adding ‘hive’ as a Hadoop proxy user and edit your hive-site.xml:

 <property>
   <name>hive.server2.enable.doAs</name>
   <value>true</value>
 </property>

Then restart HiveServer2:

sudo service hive-server2 restart

Another common error when using YARN is:

Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.

It means that the HADOOP_MAPRED_HOME environment variable is not set to:

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

HADOOP_HOME could also be wrong.

TTransportException('Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found',)

Hue is missing a SASL lib in your system.

HiveServer2 supports 3 authentication modes specified by the ‘hive.server2.authentication’ in hive-site.xml:

  • NOSASL
  • NONE (default)
  • KERBEROS

Only NOSASL does not require SASL, so you either switch to it or install the missing packages.

Hue will pick the value from its local /etc/hive/conf/hive-site.xml so make sure it is synced with the original hive-site.xml (manually or via CM Beeswax safety valve).

e.g.

<property>
<name>hive.server2.authentication</name>
<value>NOSASL</value>
</property>


Error while compiling statement: FAILED: RuntimeException org.apache.hadoop.security.AccessControlException: Permission denied: user=admin, access=WRITE, inode="/tmp/hive-hive":hive:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:214) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:158)

The Hive HDFS workspace ‘/tmp/hive-hive’ would need to be set to 1777 permissions.

Troubleshooting with Sentry

AuthorizationException: User 'hue/test.com' does not have privileges to execute 'CREATE' on: default.sample_08"

The user ‘hue’ is not configured in Sentry and have not the CREATE table permission.

Conclusion

Hue provides a great environment for executing Hive queries in a friendly UI. Beeswaxd was a great service but has been deprecated in favor of HiveServer2. HiveServer2 offers more stability and security.

As a side note, if you are looking for even faster SQL queries, we encourage you to test the Impala Editor!

If you have questions or feedback, feel free to contact the Hue community on hue-user or @gethue.com!

20 Comments

  1. xueqinshuang 3 years ago

    [25/Nov/2015 13:55:47 +0800] hive_server2_lib INFO Retrying with a new session because for hue of TGetSchemasResp(status=TStatus(errorCode=0, errorMessage=’Invalid SessionHandle: SessionHandle [02e79dd5-472b-464b-bf6c-f32874682c8f]’, sqlState=None, infoMessages=[‘*org.apache.hive.service.cli.HiveSQLException:Invalid SessionHandle: SessionHandle [02e79dd5-472b-464b-bf6c-f32874682c8f]:12:11’, ‘org.apache.hive.service.cli.session.SessionManager:getSession:SessionManager.java:318’, ‘org.apache.hive.service.cli.CLIService:getSchemas:CLIService.java:312’, ‘org.apache.hive.service.cli.thrift.ThriftCLIService:GetSchemas:ThriftCLIService.java:530’, ‘org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas:getResult:TCLIService.java:1373’, ‘org.apache.hive.service.cli.thrift.TCLIService$Processor$GetSchemas:getResult:TCLIService.java:1358’, ‘org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39’, ‘org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39’, ‘org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56’, ‘org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285’, ‘java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1142’, ‘java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:617’, ‘java.lang.Thread:run:Thread.java:745′], statusCode=3), operationHandle=None)
    [25/Nov/2015 13:55:47 +0800] hive_server2_lib INFO Opening session THandleIdentifier(secret=’mm\xfcW\x8c\xceHp\x99U\x96\x03\x16\x1f\xa7\x02′, guid=’\x8e\xb3\xa6\xf8\x8c\[email protected])\xaeH\xd6\xa7\x94\xdf\xd2\x87’)
    [25/Nov/2015 13:55:47 +0800] api WARNING Autocomplete data fetching error None.None: (1062, “Duplicate entry ‘0’ for key ‘PRIMARY'”)

    • Hue Team 3 years ago

      This just means that Hue needs to create a new session to impala, is the error consistent?

      • xueqinshuang 3 years ago

        i did not configure impala ,
        and now there is a new question where operate the hadoop file :SecurityException: Failed to obtain user group information: org.apache.hadoop.security.authorize.AuthorizationException: User: hue is not allowed to impersonate hue (error 403)

        • Hue Team 3 years ago

          You need to set Hue as a proxy user:
          http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_cdh_hue_configure.html?scroll=topic_15_4#topic_15_4_1_unique_1

          : Add to core-site.xml:

          hadoop.proxyuser.hue.hosts
          *
          hadoop.proxyuser.hue.groups
          *

          • xueqinshuang 3 years ago

            i had do this ,but it not take effect. the error still appear
            i met two problem :
            1, i can not operate the hadoop file,i try all solution but not take effect
            2 ,i can access all database but default,the error is :
            Error while compiling statement: FAILED: SemanticException Call From ip1/ip1 to ip2:9000 failed on connection exception: java.net.ConnectException: Connection refused;
            select * from DBS ;
            Default Hive database | hdfs://ip1:9000/user/hive/warehouse
            ip1 is namenode ip2 is datanode , hue install ip2

          • Hue Team 3 years ago

            Did you install the cluster yourself? Which version?

  2. xueqinshuang 3 years ago

    i can access all database but default,the error is :
    Error while compiling statement: FAILED: SemanticException Call From ip1/ip1 to ip2:9000 failed on connection exception: java.net.ConnectException: Connection refused;
    select * from DBS ;
    Default Hive database | hdfs://ip1:9000/user/hive/warehouse
    ip1 is namenode ip2 is datanode , hue install ip2

  3. shuang 3 years ago

    en ,my workmate install the cluster, hadoop 2.5.1

    • Hue Team 3 years ago

      any chance your colleague can help you out on this? 🙂

  4. Edgars 3 years ago

    I use Hue 3.9 + Ambari + Ranger. I cant see any databases in Hue -> Query Editors -> Hive (“No databases or tables found”). I grant access to some databases via Ranger, but still the same. Metastore tables works well.

    • Hue Team 3 years ago

      Does Ranger picks up the impersonated user like Sentry? If not everything will be seen as the ‘hue’ user.

      • Edgars 3 years ago

        I think yes, but I am not sure how to check it. If I am logged in as a ‘hive’ or ‘hue’ users in Hue, I can list the databases. When I relog to user ‘admin’ there are no databases. All mentioned users have the same access (the same policies in Ranger).

        • Edgars 3 years ago

          as hive.server2.authentication I am using “none”

  5. Edgars 3 years ago

    sorry for triplepost. Users ‘hue’, ‘hive’ and ‘admin’ have access to databases. Rest users do not. They are all in the same groups and has the same policies configured.

    • Hue Team 3 years ago

      What error do you see? Or do they just don’t see the SQL apps?
      Are they in the default group which has the permissions too? http://gethue.com/how-to-manage-permissions-in-hue/

      • Edgars 3 years ago

        I dont see any errors. The users just don’t see the Hive databases via “Query Editors” -> “Hive”. The only users that can list the databases are: hue, hive, admin.

        • Hue Team 3 years ago

          So you need to configure the Hive authorization properly and GRANT the Database permissions to everybody

Leave a reply to xueqinshuang Click here to cancel the reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.