Season II: 2. Execute Hive queries and schedule them with Oozie

Season II: 2. Execute Hive queries and schedule them with Oozie

In the previous episode, we saw how to to transfer some file data into Apache Hadoop. In order to interrogate easily the data, the next step is to create some Hive tables. This will enable quick interaction with high level languages like SQL and Pig.

 

We experiment with the SQL queries, then parameterize them and insert them into a workflow in order to run them together in parallel. Including Hive queries in an Oozie workflow is a pretty common use case with recurrent pitfalls as seen on the user group. We can do it with Hue in a few clicks.

Get prepared

First, based on the data of the previous episode we create two tables in the Hive Metastore. We use the Metastore app and its create table wizard. Then, it is time to study the data!

We previously created the Hive table in http://gethue.com/tutorial-analyzing-data-with-hue-and-hive/

 

 

Hive

Goal: we want to get the 10 coolest restaurants for a day.

 

Let’s open Beeswax Hive Editor and explore the range of dates that we have:

SELECT DISTINCT `date` FROM review ORDER BY `date` DESC;

Notice that you need to use backticks in order to use date as a column name in Hive.

 

The data is a bit old, so let’s pick 2012-12-01 as our target date. We can join the two tables in order to get the name of the restaurant and its average ‘cool’ score of the day. Submit this parameterized query and enter 2012-12-01 when prompted for the date:

 

SELECT r.business_id, name, AVG(cool) AS coolness
FROM review r JOIN business b
ON (r.business_id = b.business_id)
WHERE categories LIKE '%Restaurants%'
AND `date` = '$date'
GROUP BY r.business_id, name
ORDER BY coolness DESC
LIMIT 10

We have a good Hive query. Let’s create a result table ‘top_cool’ that will contain the top 10:

CREATE TABLE top_cool AS
SELECT r.business_id, name, SUM(cool) AS coolness, '$date' as `date`
FROM review r JOIN business b
ON (r.business_id = b.business_id)
WHERE categories LIKE '%Restaurants%'
AND `date` = '$date'
GROUP BY r.business_id, name
ORDER BY coolness DESC
LIMIT 10

And later replace ‘CREATE TABLE top_cool AS’ by ‘INSERT INTO TABLE top_cool’ in the Hive script as we want to create the table only the first time:

INSERT INTO TABLE top_cool
SELECT r.business_id, name, SUM(cool) AS coolness, '${date}' as `date`
FROM review r JOIN business b
ON (r.business_id = b.business_id)
WHERE categories LIKE '%Restaurants%'
AND `date` = '$date'
GROUP BY r.business_id, name
ORDER BY coolness DESC
LIMIT 10

Hive action in Apache Oozie

The video also starts here.

First we create a new workflow and add an Oozie action. We need to specify which SQL we want to run. This one needs to be uploaded to HDFS. In our case we open up the ‘workspace’ of the workflow, create a new file and copy paste the query. We we upload and pick the query file as the ‘Script name’.

 

Important

Then comes a crucial step. Our Hive action needs to talk to the Hive Metastore and so know its location. This is done by copying /etc/hive/conf/hive-site.xml as ‘hive-conf.xml’ on HDFS and including it as a ‘File’ resource and telling Oozie to use it as ‘Job XML’ configuration.

 

Note: when using a demo VM or a pseudo distributed cluster (everything on one machine), you might hit the error explained in the ‘Warning!’ section of the HCatalog post.

 

Note: when using a real cluster, as the workflow is going to run somewhere in the cluster, we need to the metastore to be remote. A remote Metastore can be contacted from any other hosts.

 

Lets specify that we are using a ‘date’ parameter in the Hive script. In our case we add the parameter in the Hive action:

date=${date}

The we save the workflow, fill up the date when prompted and look at the dynamic progress of the workflow! The output of the query will appear when you click on the ‘View the logs’ button on the action graph. In practice, INSERT, LOAD DATA would be used instead of SELECT in order to persist the calculation.

 

You can now monitor the workflow in the dashboard and stop or rerun it.

 

Note:

If you are seeing this error, it means that the input file or destination directory of the table is not writable by your user or the ‘hive’ user if you are with HiveServer2:

 

 Failed with exception copyFiles: error while moving files!!!
 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

 

Sum-up

Hive queries can be simply tested in Beeswax Hive Editor before getting inserted in an Oozie workflow, all without touching the command line.

 

One of the Hue 3 goal is to remove the duplication of the hive script on the HDFS and the manual creation of the Hive action. With the new document model, one would refer to the saved Hive query in Beeswax and with just a click create it.

 

Creating a workflow lets you group other scripts together and run them atomically. Another advantage is to then execute the workflow repetitively (e.g. run a query every day at midnight) with an Oozie coordinator.

This is what we will cover in the next episode!

38 Comments

  1. Karan 3 years ago

    I have already upload hive-site.xml in hdfs but still getting error this don’t know why???

    java.io.FileNotFoundException: hive-site.xml (Permission denied)
    at java.io.FileOutputStream.open(Native Method)
    at java.io.FileOutputStream.(FileOutputStream.java:221)
    at java.io.FileOutputStream.(FileOutputStream.java:110)
    at org.apache.oozie.action.hadoop.HiveMain.setUpHiveSite(HiveMain.java:169)
    at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:199)
    at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
    at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:227)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

    • Hue Team 3 years ago

      Could you rename it hive-conf.xml?

  2. Elia 2 years ago

    I renamed the file to hive-conf.xml but i had an oyher error:
    Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]
    I also copied the full path to hive-conf.xml and it’s still doesn’t work

    • Hue Team 2 years ago

      This error is a generic error from Oozie. Could you see if you have more details in the stderr section of the logs?

      • Elia 2 years ago

        Thank you for replying .
        om there is the stderr logs.

        Log Length: 1242
        SLF4J: Class path contains multiple SLF4J bindings.
        SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
        SLF4J: Found binding in [jar:file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/filecache/70/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
        SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
        SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
        log4j:ERROR Could not find value for key log4j.appender.CLA
        log4j:ERROR Could not instantiate appender named “CLA”.
        log4j:ERROR Could not find value for key log4j.appender.CLA
        log4j:ERROR Could not instantiate appender named “CLA”.
        log4j:ERROR Could not find value for key log4j.appender.CLA
        log4j:ERROR Could not instantiate appender named “CLA”.
        Logging initialized using configuration in /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/cloudera/appcache/application_1429789345553_0001/container_1429789345553_0001_01_000002/hive-log4j.properties
        FAILED: SemanticException [Error 10001]: Line 3:5 Table not found ‘review’
        Intercepting System.exit(10001)
        Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]

        • Hue Team 2 years ago

          This error is typical of not having the hive-site.xml in the path of the workflow: This is done by copying /etc/hive/conf/hive-site.xml as ‘hive-conf.xml’ on HDFS and including it as a ‘File’ resource and telling Oozie to use it as ‘Job XML’ configuration.

          Could you share a screenshot of your action?

          • uuball 2 years ago

            How to including it as a ‘File’ resource and telling Oozie to use it as ‘Job XML’ configuration?

            Which configuration file should i modify?

          • Hue Team 2 years ago

            You specify the path to the file in ‘Job XML’ in the workflow properties or action properties if there is one.

  3. abdul 2 years ago

    Hi,

    I am trying to move data from Hive table A to Hive table B using simple hive query, which is running successfully in hive editor but if it is schedule using oozie workflow it is showing as map 0% and Reduce 0% in the logs .

    I don’t know whether it is stuck or wht, can any one please help me?

    Some information.
    I have installed cloudera cdh 5
    using map reduce (not Yarn service)
    Table A contains : 15 Records and 3 Columns
    Table B : expected records are 5
    Hive Query :
    Select col1,Count(A) from TableA
    group by col1

    Any help is appreciated

    • Hue Team 2 years ago

      Did you look at more at the Oozie log jobs? Is the hive-site.xml that was included in the Hive action correct?

      More information in the MapReduce logs? Did you directly look at the JobTracker and TaskTracker logs?

      FYI: running Hive queries is now much simpler with the new HiveServer2 action and Hue 3.8: http://gethue.com/new-apache-oozie-workflow-coordinator-bundle-editors/

      • abdul 2 years ago

        Hi Team,

        1) Oozie job logs : I checked oozie job logs it is not showing any error. PFB sample logs info.
        [email protected]:/var/log/oozie$ tail -n 30 oozie-cmf-oozie-OOZIE_SERVER-master.log.out
        2015-05-18 11:44:46,086 INFO org.apache.oozie.service.PauseTransitService: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for [org.apache.oozie.service.PauseTransitService]
        2015-05-18 11:44:46,109 INFO org.apache.oozie.service.PauseTransitService: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.PauseTransitService]
        2015-05-18 11:45:43,492 INFO org.apache.oozie.action.hadoop.Hive2ActionExecutor: SERVER[master] USER[hdfs] GROUP[-] TOKEN[] APP[My_Workflow] JOB[0000000-150518111043137-oozie-oozi-W] ACTION[[email protected]] checking action, hadoop job ID [job_201505181109_0004] status [RUNNING]
        2015-05-18 11:45:44,553 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for [org.apache.oozie.service.StatusTransitService]
        2015-05-18 11:45:44,555 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Running coordinator status service from last instance time = 2015-05-18T06:14Z
        2015-05-18 11:45:44,561 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Running bundle status service from last instance time = 2015-05-18T06:14Z
        2015-05-18 11:45:44,564 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.StatusTransitService]
        2015-05-18 11:45:46,113 INFO org.apache.oozie.service.PauseTransitService: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for [org.apache.oozie.service.PauseTransitService]
        2015-05-18 11:45:46,131 INFO org.apache.oozie.service.PauseTransitService: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.PauseTransitService]
        2015-05-18 11:46:43,655 INFO org.apache.oozie.service.CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] CoordMaterializeTriggerService – Curr Date= 2015-05-18T06:21Z, Num jobs to materialize = 0
        2015-05-18 11:46:43,661 INFO org.apache.oozie.service.CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.CoordMaterializeTriggerService]
        2015-05-18 11:46:44,565 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for [org.apache.oozie.service.StatusTransitService]
        2015-05-18 11:46:44,568 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Running coordinator status service from last instance time = 2015-05-18T06:15Z
        2015-05-18 11:46:44,574 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Running bundle status service from last instance time = 2015-05-18T06:15Z
        2015-05-18 11:46:44,587 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.StatusTransitService]
        2015-05-18 11:46:46,132 INFO org.apache.oozie.service.PauseTransitService: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for [org.apache.oozie.service.PauseTransitService]
        2015-05-18 11:46:46,146 INFO org.apache.oozie.service.PauseTransitService: SERVER[master] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.PauseTransitService]

        2) hive-site.xml : Yes, i think it is correct. Reason for my assumption is that i have 4 queries in my script. PFB Hive script
        CREATE EXTERNAL TABLE TEST.test1(EmployeeId int,EmpName STRING)
        ROW FORMAT DELIMITED
        FIELDS TERMINATED BY ‘\t’
        LINES TERMINATED BY ‘\n’
        STORED AS TEXTFILE;

        LOAD DATA INPATH “/user/admin/hive-oozietest/test.txt” OVERWRITE INTO TABLE TEST.test1;

        CREATE EXTERNAL TABLE TEST.TestHiveOutput1 (EmpId int)
        ROW FORMAT DELIMITED
        FIELDS TERMINATED BY ‘,’
        LINES TERMINATED BY ‘\n’
        STORED AS TEXTFILE;

        INSERT INTO TABLE TEST.TestHiveOutput1
        SELECT EmployeeId FROM TEST.test1
        Where EmployeeId = 1
        group by EmployeeId

        First 3 hiveQL statement are executing successfully and i am getting this error in the last statement which is insert one hive table output to another hive table.

        3) Map Reduce logs : Yes, i have checked map reduce logs. PFB logs
        stderr logs

        SLF4J: Class path contains multiple SLF4J bindings.
        SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
        SLF4J: Found binding in [jar:file:/mapred/local/taskTracker/distcache/-3757786545146709436_1735967141_1358240049/master/user/oozie/share/lib/lib_20150514111513/hive2/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
        SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
        SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
        Connecting to jdbc:hive2://master:10000/default
        Connected to: Apache Hive (version 1.1.0-cdh5.4.0)
        Driver: Hive JDBC (version 1.1.0-cdh5.4.0)
        Transaction isolation: TRANSACTION_REPEATABLE_READ
        No rows affected (0.397 seconds)
        INFO : Loading data to table test.test1 from hdfs://master:8020/user/admin/hive-oozietest/test.txt
        INFO : Table test.test1 stats: [numFiles=1, numRows=0, totalSize=89, rawDataSize=0]
        No rows affected (0.943 seconds)
        No rows affected (0.173 seconds)
        INFO : Number of reduce tasks is set to 0 since there’s no reduce operator
        INFO : Starting Job = job_201505181109_0005, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201505181109_0005
        INFO : Kill Command = /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop/bin/hadoop job -kill job_201505181109_0005
        INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
        INFO : 2015-05-18 11:36:20,864 Stage-1 map = 0%, reduce = 0%
        INFO : 2015-05-18 11:37:21,570 Stage-1 map = 0%, reduce = 0%
        INFO : 2015-05-18 11:38:22,060 Stage-1 map = 0%, reduce = 0%

        4) I have tried with HIVEServer2, it is showing same error.

        5) In Hive logs i found the following error:
        2015-05-18 11:36:11,441 INFO org.apache.hadoop.hive.ql.exec.Utilities: local path = hdfs://master:8020/tmp/hive/hdfs/63862683-73a1-4552-8473-c63c744ff654/hive_2015-05-18_11-36-01_393_1841444089606252292-3/-mr-10004/c000f600-f9d1-46f7-bb2e-75021d51294f/reduce.xml
        2015-05-18 11:36:11,441 INFO org.apache.hadoop.hive.ql.exec.Utilities: Open file to read in plan: hdfs://master:8020/tmp/hive/hdfs/63862683-73a1-4552-8473-c63c744ff654/hive_2015-05-18_11-36-01_393_1841444089606252292-3/-mr-10004/c000f600-f9d1-46f7-bb2e-75021d51294f/reduce.xml
        2015-05-18 11:36:11,482 INFO org.apache.hadoop.hive.ql.exec.Utilities: File not found: File does not exist: /tmp/hive/hdfs/63862683-73a1-4552-8473-c63c744ff654/hive_2015-05-18_11-36-01_393_1841444089606252292-3/-mr-10004/c000f600-f9d1-46f7-bb2e-75021d51294f/reduce.xml
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1929)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:546)
        at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)

        2015-05-18 11:36:11,482 INFO org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://master:8020/tmp/hive/hdfs/63862683-73a1-4552-8473-c63c744ff654/hive_2015-05-18_11-36-01_393_1841444089606252292-3/-mr-10004/c000f600-f9d1-46f7-bb2e-75021d51294f/reduce.xml
        2015-05-18 11:36:11,493 INFO org.apache.hadoop.hive.ql.log.PerfLogger:
        2015-05-18 11:36:11,493 INFO org.apache.hadoop.hive.ql.exec.Utilities: PLAN PATH = hdfs://master:8020/tmp/hive/hdfs/63862683-73a1-4552-8473-c63c744ff654/hive_2015-05-18_11-36-01_393_1841444089606252292-3/-mr-10004/c000f600-f9d1-46f7-bb2e-75021d51294f/map.xml
        2015-05-18 11:36:11,493 INFO org.apache.hadoop.hive.ql.io.CombineHiveInputFormat: Total number of paths: 1, launching 1 threads to check non-combinable ones.
        2015-05-18 11:36:11,540 INFO org.apache.hadoop.hive.ql.io.CombineHiveInputFormat: CombineHiveInputSplit creating pool for hdfs://master:8020/user/hive/warehouse/test.db/test1; using filter path hdfs://master:8020/user/hive/warehouse/test.db/test1
        2015-05-18 11:36:11,565 INFO org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1
        2015-05-18 11:36:11,615 INFO org.apache.hadoop.hive.ql.io.CombineHiveInputFormat: number of splits 1
        2015-05-18 11:36:11,628 INFO org.apache.hadoop.hive.ql.io.CombineHiveInputFormat: Number of all splits 1
        2015-05-18 11:36:11,628 INFO org.apache.hadoop.hive.ql.log.PerfLogger:
        2015-05-18 11:36:12,614 INFO org.apache.hadoop.hive.ql.exec.Task: Starting Job = job_201505181109_0005, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201505181109_0005
        2015-05-18 11:36:12,616 INFO org.apache.hadoop.hive.ql.exec.Task: Kill Command = /opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop/bin/hadoop job -kill job_201505181109_0005
        2015-05-18 11:36:20,823 INFO org.apache.hadoop.hive.ql.exec.Task: Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
        2015-05-18 11:36:20,890 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
        2015-05-18 11:36:20,893 INFO org.apache.hadoop.hive.ql.exec.Task: 2015-05-18 11:36:20,864 Stage-1 map = 0%, reduce = 0%
        2015-05-18 11:37:21,570 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
        2015-05-18 11:37:21,570 INFO org.apache.hadoop.hive.ql.exec.Task: 2015-05-18 11:37:21,570 Stage-1 map = 0%, reduce = 0%

        • Donato 2 years ago

          Hi Abdul and Hi Team, I have the same error.
          What was wrong? You solved?
          Thanks.

          • tao 2 years ago

            hi, Donato; i have the same error.
            You solved?
            Thanks.

      • tao 2 years ago

        Hi Abdul and Hi Team, about this issue, You solved?
        What was wrong?
        Thanks.

  4. Arjun 2 years ago

    Hello,
    I have a question related to parameter passing, what if I want to pass the oozie workflow id into the hive script rather than date. I know we have wf:id() , but how do we set it ?
    As well as is there any way to pass the parameter from one node to another ? Like an output string of one action into another node ?

  5. Himanshu 2 years ago

    Hi Abraham
    Nice article.
    What is the best practice around Oozie Exception/Error handling?
    We have Hive Actions within Oozie workflows and find that the errors are not logging with enough detail. We need more of stack trace and more context around each failure.
    Any suggestions?
    Thanx
    Himanshu

    • Hue Team 2 years ago

      We recommend to click on the ‘Log’ icon of the action, it will directly open the launcher with all the sdtout and stderr logs

  6. wangmeng 2 years ago

    Nice productition ! I used Oozie in Hue to execute a Hive query( select count(1)) and it runned successfully . But the Oozie UI did not return the query result. Does Oozie can not return query results ? Thanks.

    • Hue Team 2 years ago

      Thanks! You cannot see the results directly in the Oozie app but you can save them back into Hive and/or HDFS or transform them!

  7. Arjun 2 years ago

    Hue Need help !!! 🙂

    I have a hive query that has like 6-7 joins inside it. I decided to use hive.exec.parallel(For Parallel Mapreduce) to speed up the process. The query when executed from hive CLI takes about 6 minutes , But when i try the same it takes ages and i am not sure if it completes or not. I have added hive.exec.parallel as the property and could see in the workflow.xml as below.

    hive.exec.parallel
    true

    I use YARN and was wondering if it has anything to do with it ? As of now i execute the script as SSH action so as to use the HIVE CLI in the edge node . Is there a way i can use hive action and get this completed ?

    if i code
    set hive.exec.parallel=true
    inside the script , i believe it does the same job right ?

  8. Manish 2 years ago

    Need help!!

    I am trying to run shell script in Oozie using Hue. However it is not running and give exception.

    My Shell script file:

    #! /bin/bash
    hive -e ‘desc emptable;’

    =======================================

    Also added same script file in FILE option in script action.

    =======================================

    Gives exceptions: Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

    =========================================

    I also tried with hive-site file, added in FILE option. but not worked.

  9. Manish 2 years ago

    In Oozie, I have used Hive action in Hue, same action I used parameter options to supply date parameter. Here I want to provide dynamic date parameter such as yesterday date and day before yesterday. How can I generate those date? and how can I pass as parameter.

    My HQL is :
    CREATE TABLE IF NOT EXISTS tmp_table as
    select * from emptable
    where day>=${fromdate} and day<=${todate}

    My HiveServer Action contains:
    a. HQL script
    b. Two parameters options one for each dates like as fromdate = , todate =
    c. Added file option for HQL script.

    What I tried:
    I created two separate shell scripts which returns date.
    One of Shell script is
    #! /bin/bash

    FROM_DAY=$(date +%Y-%m-%d -d " – 1 day")
    echo "$FROM_DAY"

    and hive action parameter become fromdate = /user/manishs/Oozie/filter.sh

    However this approach is not working and gives exceptions:
    Error: Error while compiling statement: FAILED: ParseException line 4:11 cannot recognize input near '/' 'user' '/' in expression specification (state=42000,code=40000)

    NOTE: If I pass a date as : fromdate ='2015-08-01' , it is working and give results.

    My question is how can I pass a dynamic date in a hive server action as a parameter.
    OR Is any way to pass dynamic filter to a query.

    • Hue Team 2 years ago

      They are 2 ways:
      – use a coordinator to specify the date in the format you need to the workflow
      – specify a workflow EL function https://oozie.apache.org/docs/3.2.0-incubating/WorkflowFunctionalSpec.html#a4.2.2_Basic_EL_Functions in the workflow properties for the variable FROM_DAY, e.g. ${wf:timestamp()}

      http://gethue.com/new-apache-oozie-workflow-coordinator-bundle-editors/

      • Manish 2 years ago

        As per suggested I have used
        ${coord:formatTime(coord:dateOffset(coord:nominalTime(), -1, ‘DAY’), ‘yyyy-MM-dd’)}
        And also added hive-site.xml as a file.
        got the exceptions:
        Error while compiling statement: FAILED: SemanticException org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out (state=42000,code=40000) –

        • Hue Team 2 years ago

          This error seems more like a problem with your hive-site.xml or the Hive Metastore (it times-out when trying to talk to it)

  10. Manish 2 years ago

    In Oozie, I have used Hive action in Hue. I want to redirect the output of the query to a file. How can I generate those file?

    My HQL is :
    select * from emptable

    My HiveServer Action contains:
    a. HQL script
    b. Added file hive-site.xml.

    My question is how can I redirect the output of a query to a file

  11. Sonu 2 years ago

    I’m trying to run hive -e ‘select * from abc’ from Hue Oozie Shell Script in a kerberized environment. The same works fine if i run in command promt, but if i run from oozie getting the below error, i even tried passing credentials.
    Failed with exception java.io.IOException:org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7388)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:525)

    • Hue Team 2 years ago

      We would recommend using the Hive action instead (the Shell probably does not pick up the ticket of the user if there is no kinit on the node for this user). And the credentials boxes to check on the action will be automatic if you are using the latest Hue.

  12. dma 2 years ago

    Hi,
    I am trying to executea hive script with a simple select statement from a hive table in a oozie workflow but i get an error :
    Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [40000]

    I took a look over the logs and nothing seems to be on the error level of the logger. I’ve configured oozie with oozie.use.system.libpath = true and point to a hive-conf.xml. All files involved (script and configs) are on my local hdfs user path.

    Any hints or suggestions are very helpful

    Thanks

    • Hue Team 2 years ago

      Did you look at the MapReduce logs, Hive logs? We also recommend to use tge HiveServer2 action instead of Hive, it is simpler to use.

  13. pranaya 1 year ago

    hi abdul ,

    i am getting same error did you find solution ?

  14. Mark T 1 year ago

    I followed the steps shown in the video above but when I submit the job I receive an error saying “java.net.UnknownHostException: irxvlndchad1.corp.irco.com.namenode.host”. I am trying to query a hive table and return a sample of that data. I am able to see the information in the hive view but am having trouble scheduling the job. How would I be able to resolve this? I have my host specified in my hue.ini file and hosts file.

    • Mark T 1 year ago

      I found the error, I needed to modify the name of my host in the hue.ini file. I had hdfs://irxvlndchad1.corp.irco.com.namenode.host:8020 when I should have had hdfs://irxvlndchad1.corp.irco.com:8020

  15. Alejandro Vargas 1 year ago

    Hello,

    I’m executing the oozie workflow, I’ve installed derby on the quickstart vm.
    The workflow executes without errors, it takes about 20 minutes to complete.
    All the stages complete without errors.
    But do not insert any records.

    • Author
      Hue Team 1 year ago

      Do you use the Hive action? If yes, you need to add the /etc/conf/hive/hive-site.xml into the lib directory of the workspace of the workflow.

      Or just use the Hive2 action instead and it will work out of the box.

Leave a reply

Your email address will not be published. Required fields are marked *

*