Analyse Apache logs and build your own Web Analytics dashboard with Hadoop and Solr

Analyse Apache logs and build your own Web Analytics dashboard with Hadoop and Solr

Hue (3.6 or upcoming CDH5.1) ships with a dynamic dashboard builder for search. We presented the new interface in the previous Search episode.

Here is the second part! We show how to index Apache log data and recreate the same dashboard in a few clicks. In this video, we are using real Apache logs from demo.gethue.com, the live Hadoop cluster:

For those wishing to skip to the end, a log file ready to be ingested is available here.

As explained in the How to Proxy Hue blog post we are getting Apache logs for every page view. We retrieve the logs from the production machine and download the script that is going to clean them up, extract the Solr schema fields and geolocalize each page.

With this new indexer library we can now install the Hue search examples without any manual steps. Next features will include automatic geolocalization at query time, indexing of Hive or HBase tables and maybe a Morphline editor (basically all for getting rid of the Python part and allowing data ingestion of gigabytes or more).

As usual feel free to send any feedback on the hue-user list or @gethue!

This post is also available in: Japanese

3 Comments

  1. kay 4 months ago

    I thought the new HUE Search is logstash/kibana alternative. Unfortunately it is not.

    Do you have plans to enhance current functionality to do live log indexing without the python scripts?

  2. Author
    Hue Team 4 months ago

    Yes, it is listed at end of the blog post and nobody is offering everything in a single UI yet ;)

    If you want more indexing possibilities, have a look to the list of indexing methods (realt time, HBase….): http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/Search/Cloudera-Search-User-Guide/csug_flume_nrt_index_ref.html

    In the meantime, check the search UI, we got great feedback so far!

Leave a reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>