Season II: 8. How to transfer data from Hadoop with Sqoop 2

Season II: 8. How to transfer data from Hadoop with Sqoop 2

Apache Sqoop is a great tool for moving data (in files or databases) in or out of Hadoop. In Hue 3, a new app was added for making Sqoop2 easier to use.

In this final episode (previous one was about Search) of the season 2 of the Hadoop Tutorial series let’s see how simple it becomes to export our Yelp results into a MySql table!

Sqoop2 currently only Comma Separated Values files. Moreover, Sqoop2 currently require on export for String constants to be enclosed in single quotes.

We are going to save our data analysis into this format with a Pig script with the Pig Editor. Then, as detailed in the video we specify an export job, set the input path as the output of our previous Pig job. The data is in on HDFS and the path can either be a single file or a directory.

We previously created a MySql table ‘stats’ with this SQL script. This table is going to store the exported data. Here are the properties of our job. They are explained in more depth in the previous Sqoop2 App blog post.

Table name: yelp_cool_test
Input directory: /user/hdfs/test_sqoop
Connector: mysql
JDBC Driver Class : com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://hue.com/test

Then click ‘Save & Execute’, and here we go, the data is now available in MySql!

 

mysql> select * from yelp_cool_test limit 2;
+------+------+------+------+
| a    | b    | c    | d    |
+------+------+------+------+
|    1 |    2 |    3 |    4 |
|    2 |    3 |    4 |    5 |
+------+------+------+------+
2 rows in set (0.00 sec)

Data stored in Hive or HBase can not be sqooped natively yet by Sqoop2. A current (less efficient) workaround would be to dump it to a HDFS directory with Hive or Pig and then do a similar Sqoop export.

 

As usual, if you have questions or feedback, feel free to contact the Hue community on hue-user or @gethue.com!

Thank you for watching this season 2!

14 Comments

  1. abhishek kumar 4 years ago

    the jdbc driver is not accepting the mysql-connector 5.1.21 and an error is being displayed for sqoop

  2. ronnie 2 years ago

    After running sqoop2 import job, my data is split to a few files, but the file name is not part-r-000x, any idea how to fix it?

  3. ronnie 2 years ago

    Is it possible to automate sqoop2 job like how sqoop one did by creating workflow?

    • Hue Team 2 years ago

      Yes, you can use Oozie Coordinators and Bundles!

  4. k v vittal hebbar 2 years ago

    hi
    can we use sqoop provided in hue for transferring spatial data(feature class) into hdfs.

    vittal

    • Hue Team 2 years ago

      Hue provides Sqoop1 action in the Oozie workflow Editor or a Sqoop2 UI, which sits on top of Sqoop2 itself and provide the same functionalities. So if you can do it with Sqoop, you can do it with Hue

  5. lida 1 year ago

    Which the version of hue can manage the sqoop-1.4.6.

    • Author
      Hue Team 1 year ago

      Sqoop 1 works in the Oozie Editor with the Sqoop action since Hue 2.

      • lida 1 year ago

        Thank you g🤓

  6. Abhinav 1 year ago

    I get the following stack trace on running the sqoop2 job

    java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException
    org.apache.hadoop.mapreduce.Job.getJobSubmitter(Job.java:1291)
    org.apache.hadoop.mapreduce.Job.submit(Job.java:1302)
    org.apache.sqoop.submission.mapreduce.MapreduceSubmissionEngine.submitToCluster(MapreduceSubmissionEngine.java:274)
    org.apache.sqoop.submission.mapreduce.MapreduceSubmissionEngine.submit(MapreduceSubmissionEngine.java:255)
    org.apache.sqoop.driver.JobManager.start(JobManager.java:288)
    org.apache.sqoop.handler.JobRequestHandler.startJob(JobRequestHandler.java:380)
    org.apache.sqoop.handler.JobRequestHandler.handleEvent(JobRequestHandler.java:116)
    org.apache.sqoop.server.v1.JobServlet.handlePutRequest(JobServlet.java:96)
    org.apache.sqoop.server.SqoopProtocolServlet.doPut(SqoopProtocolServlet.java:79)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
    javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
    org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:622)
    org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291)
    org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:574)

    I placed the jackson-core-2.8.1.jar file in /var/lib/sqoop2 but continued to get the same error. Tried system restart too.

    • Author
      Hue Team 1 year ago

      Could you put it in a ‘lib’ directory in the workspace of the workflow? Or point to it by adding a ‘oozie.libpath’ property that points to its path on HDFS

  7. Janle 1 year ago

    Sqoop error: Could not get connectors. how about it?

  8. Jason 3 months ago

    Hello, when create link using the sqoop2, I always received the error that can’t find the JDBC driver, why? And how to solve it?

Leave a reply

Your email address will not be published. Required fields are marked *

*