You can now view Snappy compressed Avro files in Hue through the File Browser! Here is a quick guide on how to get setup with Snappy and Avro.
Tutorial
Installation
-
Make sure Hue is stopped before installing.
-
Install the snappy system packages on your system. They can either be downloaded from https://code.google.com/p/snappy/ or, preferably, installed via your package management system (e.g.
yum install snappy-devel
). -
Install the python-snappy package via ‘pip’ from the Hue home (cd /usr/lib/hue or /opt/cloudera/parcels/CDH/lib/hue):
yum install gcc gcc-c++ python-devel snappy-devel build/env/bin/pip install -U setuptools build/env/bin/pip install python-snappy
-
Start Hue!
Demo
Once Snappy and python-snappy have been installed, the File Browser will automatically detect and view Snappy compressed Avro files. Here is a quick video demonstrating this!
Note: In this demo, we are using Avro files found in this github (1).
Note
It turns out that python-snappy
is not compatible with the python library called snappy
. If you see this error, uninstall snappy
:
[03/Sep/2015 06:56:34 -0700] views WARNING Could not read avro file at //user/cconner/test_snappy.avro
Traceback (most recent call last):
File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 701, in _read_avro
data_file_reader = datafile.DataFileReader(fhandle, io.DatumReader())
File "/usr/lib//lib/hue/build/env/lib/python2.6/site-packages/avro-1.7.6-py2.6.egg/avro/datafile.py", line 240, in _init_
raise DataFileException('Unknown codec: %s.' % self.codec)
DataFileException: Unknown codec: snappy.
[03/Sep/2015 06:56:34 -0700] middleware INFO Processing exception: Failed to read Avro file.: Traceback (most recent call last):
File "/usr/lib//lib/hue/build/env/lib/python2.6/site-packages/Django-1.4.5-py2.6.egg/django/core/handlers/base.py", line 111, in get_response
response = callback(request, \*callback_args, \**callback_kwargs)
File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 168, in view
return display(request, path)
File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 573, in display
read_contents(compression, path, request.fs, offset, length)
File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 663, in read_contents
contents = _read_avro(fhandle, path, offset, length, stats)
File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 716, in _read_avro
raise PopupException(_("Failed to read Avro file."))
PopupException: Failed to read Avro file.
Conclusion
We hope this helps you look at the inputs and outputs of MapReduce jobs, Hive queries, and Pig scripts. Have any suggestions? Feel free to tell us what you think through hue-user or @gethue!
References:
- Reading and writing Avro files from the command line - http://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/