Move data in & out your Hadoop cluster with the Sqoop UI

Move data in & out your Hadoop cluster with the Sqoop UI

Hi Ice Cream lovers,

Hue, the open source Big Data UI, has a brand new application that enables transferring data between relational databases and Hadoop. This new application is driven by Sqoop 2 and has several user experience improvements to boot.

Sqoop is a batch data migration tool for transferring data between traditional databases and Hadoop. The first version of Sqoop is a heavy client that drives and oversees data transfer via MapReduce. In Sqoop 2, the majority of the work was moved to a server that a thin client communicates with. Also, any client can communicate with the Sqoop 2 server over its JSON-REST protocol. Sqoop 2 was chosen instead of its predecessors because of its client-server design.

Importing from MySQL to HDFS

The following is the canonical import job example sourced from http://sqoop.apache.org/docs/1.99.2/Sqoop5MinutesDemo.html. In Hue, this can be done in 3 easy steps:

Environment

  • CDH 4.4 or Hue 3.0.0
  • MySQL 5.1

First, make sure that Sqoop2 is up and running and the Hue points to it in its hue.ini:

###########################################################################
# Settings to configure Sqoop
###########################################################################

[sqoop]
  # Sqoop server URL
  server_url=http://sqoop2.com:12000/sqoop

Troubleshooting

If the new job button is not appearing, Sqoop2 is probably not starting. Make sure the MySql or other DB connectors are in the /usr/lib/sqoop/lib directory of Sqoop2. Make sure you have these properties in the Sqoop2 Server configuration:

org.apache.sqoop.repository.schema.immutable=false
org.apache.sqoop.connector.autoupgrade=true
org.apache.sqoop.framework.autoupgrade=true

1. Create a Connection

In the Sqoop app, the connection manager is available from the “New Job” wizard. To get to the new job wizard, click on “New Job”. There may be a list of connections available if a few have been created before. For the purposes of this demo, we’ll go through the process of creating a new connection. Click “Add a new connection” and fill in the blanks with the data below. Then click save to return to the “New Job” wizard!

Connection Parameter                  Value

Name                                  mysql-connection-demo 

JDBC Driver Class                     com.mysql.jdbc.Driver 

JDBC Connection String                jdbc:mysql://hue-demo/demo

Username                              demo

Password                              demo

Connection form values.

2. Create a Job

After creating a connection, follow the wizard and fill in the blanks with the information below.

Job Wizard Parameter              Value

Name                              mysql-import-job-demo

Type                              IMPORT

Connection                        mysql-connection-demo

Table name                        test

Storage Type                      HDFS

Output format                     TEXT_FILE

Output directory                  /tmp/mysql-import-job-demo

Job wizard form values.

3. Save and Submit the Job

At the end of the Job wizard, click “Save and Run”! The job should automagically start after that and the job dashboard will be displayed. As the job is running, a progress bar below the job listing will be dynamically updated. Links to the HDFS output via the File Browser and Map Reduce logs via Job Browser will be available on the left hand side of the job edit page.

Sum Up

The new Sqoop application enables batch data migration from a more traditional databases to Hadoop and vice versa through Hue. Using Hue, a user can move data between storage systems in a distributed fashion with the click of a button.

I’d like to send out a big thank you to the Sqoop community for the new client-server design!

Both projects are undergoing heavy development and are welcoming external contributions! Have any suggestions? Feel free to tell us what you think through hue-user or @gethue​!

30 Comments

  1. abhishek 4 years ago

    the error that shows when i run the sqoop job is that it says that my login is not s superuser of hue

  2. jctampere 4 years ago

    what can I enter for fields table SQL statement, table column names, partition column name? I have a table with over millions rows in postpresql, I want to transfer the data partly with SQL statement, how to do that? Where to find some instructions?

  3. jctampere 4 years ago

    is the swoop-user group still active? I saw the post are dated 2012.

    I tried to enter “SELECT * FROM traffic WHERE timestamp>=’2014-07-28′ ${CONDITIONS}” in the Table SQL statement field, without adding ${CONDITIONS}, it will give me the error, so I added to the statement.
    The job can’t started, it gives error: “GENERIC_JDBC_CONNECTOR_0002: Unable to execute the SQL statement”
    I also tried to add the partition column name, but no success.

  4. wu 4 years ago

    error:Can’t load specified driver
    i can’t create a connection.

  5. pravee 4 years ago

    Hi hue team,

    Only if sqoop 2 server is up, we can view sqoop UI or else sqoop 1.4.5 is enough?

    • Hue Team 4 years ago

      The Sqoop UI is for Sqoop2 only! Sqoop1 works only in the Oozie workflows.

  6. kerm 3 years ago

    I’m trying to make a connection with ms sql server, i have downloaded the driver jdbc of mssql server and place it in /usr/lib/sqoop/lib but when I create a job in sqoop it saying that the JDBC criver class is not found.
    What should I do to fix this?
    Thank you

    • Hue Team 3 years ago

      If you do ‘jar -tf your_jar_file.jar | grep your_class_name’ does it find it?

      • kerm 3 years ago

        Well thank you for replying
        When i try this commande: jar -tf sqljdbc4.jar | grep com.microsoft.sqlserver.jdbc.SQLServerDriver
        I get the following output:
        com/microsoft/sqlserver/jdbc/SQLServerDriver.class
        com/microsoft/sqlserver/jdbc/SQLServerDriverBooleanProperty.class
        com/microsoft/sqlserver/jdbc/SQLServerDriverIntProperty.class
        com/microsoft/sqlserver/jdbc/SQLServerDriverPropertyInfo.class
        com/microsoft/sqlserver/jdbc/SQLServerDriverStringProperty.class

        But when I try to past the class name in hue-sqoop it does not recognize it

        • Hue Team 3 years ago

          Hum, is the Sqoop shell working fine? If not maybe ask on the Sqoop user list

          • kerm 3 years ago

            the shell is working, I was able to tranfer a table from sql server to hive but when I try it in the sqoop tranfer tool in hue it doesn’t recognize JDBC Driver Class

  7. Sid 3 years ago

    How can I use the quest software to import the data from oracle to sqoop2 in Hue

    they mentioned I need to use –direct parameter

    In which command I have to mention that and where

    Thanks

    • Hue Team 3 years ago

      Have you tried using the Oracle JDBC Driver? You need to have the JAR installed and specify the JDBC connection to something like jdbc:oracle:thin://@xxxx.yyyy.zzzzz:1521/foobar

  8. Sid 3 years ago

    what will be the driver class to create connection in sqoop2

    i’m trying to move data from oracle to hdfs

    created new connection ..i placed a ojdbc_6g.jar into /var/sqoop/lib

    whats the next procedure ?

    while creating a connection i’m getting this error : Can’t connect to the database with given credentials: No suitable driver found for jdbc:oracle:thin:xxxxxxxxxxxxxxxx

  9. Sid 3 years ago

    yes .How can I configure with sqoop2 client

  10. Sid 3 years ago

    Warning: /cloudera/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    15/07/23 13:45:07 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.5
    Enter password:
    15/07/23 13:45:14 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
    15/07/23 13:45:14 INFO manager.SqlManager: Using default fetchSize of 1000
    15/07/23 13:45:14 INFO tool.CodeGenTool: Beginning code generation
    15/07/23 13:45:14 ERROR manager.SqlManager: Error executing statement: java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection
    java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection
    at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:548)
    at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:564)
    at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:251)
    at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:29)
    at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:563)
    at java.sql.DriverManager.getConnection(DriverManager.java:571)
    at java.sql.DriverManager.getConnection(DriverManager.java:215)
    at org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:327)
    at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
    at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736)
    at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759)
    at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269)
    at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
    at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
    at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
    at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
    at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
    Caused by: oracle.net.ns.NetException: The Network Adapter could not establish the connection
    at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:436)
    at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:451)
    at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:897)
    at oracle.net.ns.NSProtocol.connect(NSProtocol.java:271)
    at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1663)
    at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:385)
    … 25 more
    Caused by: java.net.UnknownHostException: MTLWHSDBDEV01-VIP.arifleet.com
    at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
    at java.net.InetAddress.getAllByName(InetAddress.java:1162)
    at java.net.InetAddress.getAllByName(InetAddress.java:1098)
    at oracle.net.nt.TcpNTAdapter.connect(TcpNTAdapter.java:114)
    at oracle.net.nt.ConnOption.connect(ConnOption.java:130)
    at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:402)
    … 30 more
    15/07/23 13:45:14 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
    at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)

    • Hue Team 3 years ago

      You seem to have some sqoop specific questions, I would recommend to post on the Sqoop user list instead

  11. Igor 3 years ago

    I use your ‘Try HUE demo’ website (http://demo.gethue.com/sqoop/#link/edit)
    ‘Class not found’ error message appears when I try to create New Connection according to this tutorial instructions.

    • Hue Team 3 years ago

      Hi Igor,
      for security reasons we don’t allow that on demo.gethue.com. You can try and download a quickstart VM from Cloudera to play with it.

  12. Aleks 3 years ago

    I’m trying to connect to MS SQL Server with Windows-Authentication. Should I only add integratedSecurity=true as JDBC Connection Properties? Should I set Login-Name and Password in this case?

    • Hue Team 3 years ago

      I would recommend to ask this one on the Sqoop2 user list!

  13. bigdata.gdb 2 years ago

    Where can we find the var/lib/sqoop on the hue UI. I checked in file browser but it doesn’t show sqoop inside lib.

Leave a reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.