Quick Task: How to query Apache Druid analytic database

Quick Task: How to query Apache Druid analytic database

Self-service exploratory analytics is one of the most common use cases of the Hue users. While deeply integrated with Apache Impala and Apache Hive, Hue also lets you take advantage of its smart editor and assistants with any databases. In this tutorial, let’s see how to query Apache Druid.

Apache Druid is an “OLAP style” database.

If not already running, it is easy to get Druid downloaded and started. In our case we will just query the provided Wikipedia data sample.

Administrator

First, let’s make sure that Hue can talk to Druid via the pydruid SqlAlchemy connector. Either make sure it is in the global Python environment via a usual ‘pip install’ or install it in the Hue virtual environment.

./build/env/bin/pip install pydruid

Note: Make sure the version is equal or more to 0.4.1 if not you will get a “Can’t load plugin: sqlalchemy.dialects:druid”.

In the hue.ini configuration file, now let’s add the interpreter. Here ‘druid-host.com’ would be the machine where Druid is running.

[notebook]
[[interpreters]]
[[[druid]]]
name = Druid
interface=sqlalchemy
options='{"url": "druid://druid-host.com:8082/druid/v2/sql/"}'

And now restart Hue.

User

And that’s it, now open-up http://127.0.0.1:8000/hue/editor/?type=pydruid (replace host or port of your actual Hue) and you can start querying!

SELECT countryName, count(*) t
FROM druid.wikipedia
GROUP BY countryName
ORDER BY t DESC
LIMIT 100

As usual feel free to comment here or to send feedback to the hue-user list or @gethue!