Visualize Snappy compressed Avro files

Published on 20 May 2014 in - 2 minutes read - Last modified on 06 March 2021

You can now view Snappy compressed Avro files in Hue through the File Browser! Here is a quick guide on how to get setup with Snappy and Avro.

Tutorial

Installation

  1. Make sure Hue is stopped before installing.

  2. Install the snappy system packages on your system. They can either be downloaded from https://code.google.com/p/snappy/ or, preferably, installed via your package management system (e.g. yum install snappy-devel).

  3. Install the python-snappy package via ‘pip’ from the Hue home (cd /usr/lib/hue or /opt/cloudera/parcels/CDH/lib/hue):

    yum install gcc gcc-c++ python-devel snappy-devel
    
    
     build/env/bin/pip install -U setuptools
    
     build/env/bin/pip install python-snappy
  4. Start Hue!

Demo

Once Snappy and python-snappy have been installed, the File Browser will automatically detect and view Snappy compressed Avro files. Here is a quick video demonstrating this!

Note: In this demo, we are using Avro files found in this github (1).

 

Note

It turns out that python-snappy is not compatible with the python library called snappy. If you see this error, uninstall snappy:



[03/Sep/2015 06:56:34 -0700] views WARNING Could not read avro file at //user/cconner/test_snappy.avro

Traceback (most recent call last):

File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 701, in _read_avro

data_file_reader = datafile.DataFileReader(fhandle, io.DatumReader())

File "/usr/lib//lib/hue/build/env/lib/python2.6/site-packages/avro-1.7.6-py2.6.egg/avro/datafile.py", line 240, in _init_

raise DataFileException('Unknown codec: %s.' % self.codec)

DataFileException: Unknown codec: snappy.

[03/Sep/2015 06:56:34 -0700] middleware INFO Processing exception: Failed to read Avro file.: Traceback (most recent call last):

File "/usr/lib//lib/hue/build/env/lib/python2.6/site-packages/Django-1.4.5-py2.6.egg/django/core/handlers/base.py", line 111, in get_response

response = callback(request, \*callback_args, \**callback_kwargs)

File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 168, in view

return display(request, path)

File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 573, in display

read_contents(compression, path, request.fs, offset, length)

File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 663, in read_contents

contents = _read_avro(fhandle, path, offset, length, stats)

File "/usr/lib//lib/hue/apps/filebrowser/src/filebrowser/views.py", line 716, in _read_avro

raise PopupException(_("Failed to read Avro file."))

PopupException: Failed to read Avro file.

Conclusion

We hope this helps you look at the inputs and outputs of MapReduce jobs, Hive queries, and Pig scripts. Have any suggestions? Feel free to tell us what you think through hue-user or @gethue!

References:

  1. Reading and writing Avro files from the command line - http://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/

comments powered by Disqus

More recent stories

21 September 2021
Access your data in ABFS without any credential keys!
Read More
14 September 2021
SSO for REST APIs with your custom JWT authentication
Read More
17 August 2021
Create Phoenix tables in Just 2 steps
Read More