Note: Sqoop2 is now replaced by https://gethue.com/importing-data-from-traditional-databases-into-hdfshive-in-just-a-few-clicks/
Apache Sqoop is a great tool for moving data (in files or databases) in or out of Hadoop. In Hue 3, a new app was added for making Sqoop2 easier to use.
In this final episode (previous one was about Search) of the season 2 of the Hadoop Tutorial series let’s see how simple it becomes to export our Yelp results into a MySql table!
Sqoop2 currently only Comma Separated Values files. Moreover, Sqoop2 currently require on export for String constants to be enclosed in single quotes.
We are going to save our data analysis into this format with a Pig script with the Pig Editor. Then, as detailed in the video we specify an export job, set the input path as the output of our previous Pig job. The data is in on HDFS and the path can either be a single file or a directory.
We previously created a MySql table ‘stats’ with this SQL script. This table is going to store the exported data. Here are the properties of our job. They are explained in more depth in the previous Sqoop2 App blog post.
Table name: yelp_cool_test
Input directory: /user/hdfs/test_sqoop
Connector: mysql
JDBC Driver Class : com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://hue.com/test
Then click ‘Save & Execute’, and here we go, the data is now available in MySql!
mysql> select * from yelp_cool_test limit 2;
+--+--+--+--+
| a | b | c | d |
+--+--+--+--+
| 1 | 2 | 3 | 4 |
| 2 | 3 | 4 | 5 |
+--+--+--+--+
2 rows in set (0.00 sec)
Data stored in Hive or HBase can not be sqooped natively yet by Sqoop2. A current (less efficient) workaround would be to dump it to a HDFS directory with Hive or Pig and then do a similar Sqoop export.
As usual, if you have questions or feedback, feel free to contact the Hue community on hue-user or @gethue.com!
Thank you for watching this season 2!