Visualize Snappy compressed Avro files

Visualize Snappy compressed Avro files

You can now view Snappy compressed Avro files in Hue through the File Browser! Here is a quick guide on how to get setup with Snappy and Avro.

Tutorial

Installation

  1. Make sure Hue is stopped before installing.
  2. Install the snappy system packages on your system. They can either be downloaded from https://code.google.com/p/snappy/ or, preferably, installed via your package management system (e.g. yum install snappy-devel).
  3. Install the python-snappy package via ‘pip’ from the Hue home (cd /usr/lib/hue or /opt/cloudera/parcels/CDH/lib/hue):
    yum install gcc gcc-c++ python-devel snappy-devel
    
     build/env/bin/pip install -U setuptools
     build/env/bin/pip install python-snappy
  4. Start Hue!

Demo

Once Snappy and python-snappy have been installed, the File Browser will automatically detect and view Snappy compressed Avro files. Here is a quick video demonstrating this!

Note: In this demo, we are using Avro files found in this github (1).

 

Note

It turns out that python-snappy is not compatible with the python library called snappy. If you see this error, uninstall snappy:

[03/Sep/2015 06:56:34 -0700] views WARNING Could not read avro file at //user/cconner/test_snappy.avro
Traceback (most recent call last):
File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 701, in _read_avro
data_file_reader = datafile.DataFileReader(fhandle, io.DatumReader())
File "/usr/lib//lib/hue/build/env/lib/python2.6/site-packages/avro-1.7.6-py2.6.egg/avro/datafile.py", line 240, in _init_
raise DataFileException('Unknown codec: %s.' % self.codec)
DataFileException: Unknown codec: snappy.
[03/Sep/2015 06:56:34 -0700] middleware INFO Processing exception: Failed to read Avro file.: Traceback (most recent call last):
File "/usr/lib//lib/hue/build/env/lib/python2.6/site-packages/Django-1.4.5-py2.6.egg/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 168, in view
return display(request, path)
File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 573, in display
read_contents(compression, path, request.fs, offset, length)
File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 663, in read_contents
contents = _read_avro(fhandle, path, offset, length, stats)
File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 716, in _read_avro
raise PopupException(_("Failed to read Avro file."))
PopupException: Failed to read Avro file.

Conclusion

We hope this helps you look at the inputs and outputs of MapReduce jobs, Hive queries, and Pig scripts. Have any suggestions? Feel free to tell us what you think through hue-user or @gethue!

References:

  1. Reading and writing Avro files from the command line – http://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/

3 Comments

  1. Stefano 3 years ago

    really helpfull.

    This is what I actually ran for CDH 5.4 on CentOS 6.6:

    yum install gcc gcc-c++ python-devel snappy-devel
    cd /opt/cloudera/parcels/CDH/lib/hue
    build/env/bin/pip install -U setuptools
    build/env/bin/pip install python-snappy

    • Hue Team 3 years ago

      Cool! thanks!

Leave a reply

Your email address will not be published. Required fields are marked *

*