Hive UDFs that use the Hive table

I developed hive udf in java that works correctly, my function returns the best match between the input and the column in the hive table, so it has this simplified pseudocode:

class myudf extends udf{ evaluate(Text input){ getNewHiveConnection(); //i want to replace this by getCurrentHiveUserConnetion(); executeHiveQuery(input); return something; } 

My question is: if this function is called by Hive, why do I need to connect to the hive in my code? can i use the current connection that the user who uses my function is connected?

+7
java hive user-defined-functions udf
source share
2 answers

If you want to return the closest match from the entire column in the query, you might think that it is like some kind of aggregation and use Hive UDAF: https://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy

There's also a pretty handy tutorial: http://beekeeperdata.com/posts/hadoop/2015/08/17/hive-udaf-tutorial.html

0
source share

Yes - you can make UDF permanent. For example:

 CREATE FUNCTION MatchFinder as 'com.mycompany.packagex.myudf' using jar 'hdfs:///an_HDFS_directory/my_jar_name.jar'; 

This will make your function permanent, and anyone can name it. In this case, the jar file is stored on HDFS for easy access, but there are other options.

See the Hive wiki for more details.

-2
source share

All Articles