Analyse Apache logs and build your own Web Analytics dashboard with Hadoop and Solr

20 June 2014 in Querying - 1 minute read

Hue (3.6 or upcoming CDH5.1) ships with a dynamic dashboard builder for search. We presented the new interface in the previous Search episode.

Here is the second part! We show how to index Apache log data and recreate the same dashboard in a few clicks. In this video, we are using real Apache logs from demo.gethue.com, the live Hadoop cluster:

For those wishing to skip to the end, a log file ready to be ingested is available here.

As explained in the How to Proxy Hue blog post we are getting Apache logs for every page view. We retrieve the logs from the production machine and download the script that is going to clean them up, extract the Solr schema fields and geolocalize each page.

With this new indexer library we can now install the Hue search examples without any manual steps. Next features will include automatic geolocalization at query time, indexing of Hive or HBase tables and maybe a Morphline editor (basically all for getting rid of the Python part and allowing data ingestion of gigabytes or more).

As usual feel free to send any feedback on the hue-user list or @gethue!


comments powered by Disqus

More recent stories

10 February 2020
The Hue SQL Query Experience for your Data Warehouse
Read More
28 January 2020
10 years of Data Querying Experience Evolution with Hue
Read More
05 December 2019
Hue 4.6 and its improvements are out!
Read More