Analyse Apache logs and build your own Web Analytics dashboard with Hadoop and Solr

Published on 20 June 2014 in - 1 minute read - Last modified on 06 March 2021

Hue (3.6 or upcoming CDH5.1) ships with a dynamic dashboard builder for search. We presented the new interface in the previous Search episode.

Here is the second part! We show how to index Apache log data and recreate the same dashboard in a few clicks. In this video, we are using real Apache logs from demo.gethue.com, the live Hadoop cluster:

For those wishing to skip to the end, a log file ready to be ingested is available here.

As explained in the How to Proxy Hue blog post we are getting Apache logs for every page view. We retrieve the logs from the production machine and download the script that is going to clean them up, extract the Solr schema fields and geolocalize each page.

With this new indexer library we can now install the Hue search examples without any manual steps. Next features will include automatic geolocalization at query time, indexing of Hive or HBase tables and maybe a Morphline editor (basically all for getting rid of the Python part and allowing data ingestion of gigabytes or more).

As usual feel free to send any feedback on the hue-user list or @gethue!


comments powered by Disqus

More recent stories

03 May 2023
Discover the power of Apache Ozone using the Hue File Browser
Read More
23 January 2023
Hue 4.11 and its new dialects and features are out!
Read More