Apache Hive comes with a lot of built-in UDFs, but what happens when you need a “special one”? This post is about how to get started with a custom Hive UDF from compilation to execution in no time.
Our goal is to create a UDF that transforms its input to upper case. All the code is available in our public repository of Hadoop examples and tutorials.
If you want to go even faster, the UDF is already precompiled here.
If not, checkout the code:
git clone https://github.com/romainr/hadoop-tutorials-examples.git cd hive-udf
And compile the UDF (Java and Hive need to be installed):
javac -cp $(ls /usr/lib/hive/lib/hive-exec*.jar):/usr/lib/hadoop/hadoop-common.jar org/hue/udf/MyUpper.java jar -cf myudfs.jar -C . .
Or use Maven with our pom.xml that will automatically pull the dependent jars
Register the UDF in the Hive Editor
Then open up Beeswax in the Hadoop UI Hue, click on the ‘Settings’ tab.
In File Resources, upload myudfs.jar, pick the jar file and point to it, e.g.:
Make the UDF available by registering a UDF (User Defined Function ):
That’s it! Just test it on one of the Hue example tables:
select myUpper(description) FROM sample_07 limit 10
We are using the most common type of UDF. If you want to learn more in depth about the other ones, some great resources like the Hadoop Definitive guide are available. Notice that adding a jar loads it for the entirety of the session so you don’t need to load it again. Next time we will demo how to create a Python UDF for Hive!
If you did not register the UDF as explained above, you will get this error:
error while compiling statement: failed: parseexception line 1:0 cannot recognize input near 'myupper' '' ''