Hue (3.6 or upcoming CDH5.1) ships with a dynamic dashboard builder for search. We presented the new interface in the previous Search episode.
Here is the second part! We show how to index Apache log data and recreate the same dashboard in a few clicks. In this video, we are using real Apache logs from demo.gethue.com, the live Hadoop cluster:
For those wishing to skip to the end, a log file ready to be ingested is available here.
As explained in the How to Proxy Hue blog post we are getting Apache logs for every page view. We retrieve the logs from the production machine and download the script that is going to clean them up, extract the Solr schema fields and geolocalize each page.
With this new indexer library we can now install the Hue search examples without any manual steps. Next features will include automatic geolocalization at query time, indexing of Hive or HBase tables and maybe a Morphline editor (basically all for getting rid of the Python part and allowing data ingestion of gigabytes or more).