Season II: 8. How to transfer data from Hadoop with Sqoop 2

Published on 08 November 2013 in Querying / Tutorial - 2 minutes read - Last modified on 04 February 2020

Note: Sqoop2 is now replaced by https://gethue.com/importing-data-from-traditional-databases-into-hdfshive-in-just-a-few-clicks/

 

Apache Sqoop is a great tool for moving data (in files or databases) in or out of Hadoop. In Hue 3, a new app was added for making Sqoop2 easier to use.

In this final episode (previous one was about Search) of the season 2 of the Hadoop Tutorial series let’s see how simple it becomes to export our Yelp results into a MySql table!

Sqoop2 currently only Comma Separated Values files. Moreover, Sqoop2 currently require on export for String constants to be enclosed in single quotes.

We are going to save our data analysis into this format with a Pig script with the Pig Editor. Then, as detailed in the video we specify an export job, set the input path as the output of our previous Pig job. The data is in on HDFS and the path can either be a single file or a directory.

We previously created a MySql table ‘stats’ with this SQL script. This table is going to store the exported data. Here are the properties of our job. They are explained in more depth in the previous Sqoop2 App blog post.

Table name: yelp_cool_test

Input directory: /user/hdfs/test_sqoop

Connector: mysql

JDBC Driver Class : com.mysql.jdbc.Driver

JDBC Connection String: jdbc:mysql://hue.com/test

Then click ‘Save & Execute’, and here we go, the data is now available in MySql!

 

mysql> select * from yelp_cool_test limit 2;

+--+--+--+--+

| a | b | c | d |

+--+--+--+--+

| 1 | 2 | 3 | 4 |

| 2 | 3 | 4 | 5 |

+--+--+--+--+

2 rows in set (0.00 sec)

Data stored in Hive or HBase can not be sqooped natively yet by Sqoop2. A current (less efficient) workaround would be to dump it to a HDFS directory with Hive or Pig and then do a similar Sqoop export.

 

As usual, if you have questions or feedback, feel free to contact the Hue community on hue-user or @gethue.com!

Thank you for watching this season 2!


comments powered by Disqus

More recent stories

19 May 2020
How to grant Ranger permissions for a new user on a Secure Cluster
Read More
06 May 2020
SQL Editor for Apache Flink SQL
Read More
05 May 2020
How to Configure Hue to authenticate with Apache Knox SSO on a Secure Cluster
Read More