You can now view Snappy compressed Avro files in Hue through the File Browser! Here is a quick guide on how to get setup with Snappy and Avro.
Make sure Hue is stopped before installing.
Install the snappy system packages on your system. They can either be downloaded from https://code.google.com/p/snappy/ or, preferably, installed via your package management system (e.g.
yum install snappy-devel).
Install the python-snappy package via ‘pip’ from the Hue home (cd /usr/lib/hue or /opt/cloudera/parcels/CDH/lib/hue):
yum install gcc gcc-c++ python-devel snappy-devel build/env/bin/pip install -U setuptools build/env/bin/pip install python-snappy
Once Snappy and python-snappy have been installed, the File Browser will automatically detect and view Snappy compressed Avro files. Here is a quick video demonstrating this!
Note: In this demo, we are using Avro files found in this github (1).
It turns out that
python-snappy is not compatible with the python library called
snappy. If you see this error, uninstall
[03/Sep/2015 06:56:34 -0700] views WARNING Could not read avro file at //user/cconner/test_snappy.avro Traceback (most recent call last): File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 701, in _read_avro data_file_reader = datafile.DataFileReader(fhandle, io.DatumReader()) File "/usr/lib//lib/hue/build/env/lib/python2.6/site-packages/avro-1.7.6-py2.6.egg/avro/datafile.py", line 240, in _init_ raise DataFileException('Unknown codec: %s.' % self.codec) DataFileException: Unknown codec: snappy. [03/Sep/2015 06:56:34 -0700] middleware INFO Processing exception: Failed to read Avro file.: Traceback (most recent call last): File "/usr/lib//lib/hue/build/env/lib/python2.6/site-packages/Django-1.4.5-py2.6.egg/django/core/handlers/base.py", line 111, in get_response response = callback(request, \*callback_args, \**callback_kwargs) File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 168, in view return display(request, path) File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 573, in display read_contents(compression, path, request.fs, offset, length) File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 663, in read_contents contents = _read_avro(fhandle, path, offset, length, stats) File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 716, in _read_avro raise PopupException(_("Failed to read Avro file.")) PopupException: Failed to read Avro file.
We hope this helps you look at the inputs and outputs of MapReduce jobs, Hive queries, and Pig scripts. Have any suggestions? Feel free to tell us what you think through hue-user or @gethue!
- Reading and writing Avro files from the command line - http://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/