I would write a simple UDF for this purpose. You must have hive-exec in your build path.
For example, in the case of Maven :
<dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>0.8.1</version> </dependency>
A simple raw implementation would look like this:
package com.myexample; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.IntWritable; public class SubArraySum extends UDF { public IntWritable evaluate(ArrayList<Integer> list, IntWritable from, IntWritable to) { IntWritable result = new IntWritable(-1); if (list == null || list.size() < 1) { return result; } int m = from.get(); int n = to.get();
Then create a jar and load it into the Hive shell:
hive> add jar /home/user/jar/myjar.jar; hive> create temporary function subarraysum as 'com.myexample.SubArraySum';
Now you can use it to calculate the sum of the array that you have.
eg:
Suppose you have an input file with columns separated by tabs:
1 0,1,2,3,4 2 5,6,7,8,9
Download it to the table:
hive> create external table mytable ( id int, nums array<int> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/user/hadoopuser/hive/input';
Run the following queries:
hive> select * from mytable; 1 [0,1,2,3,4] 2 [5,6,7,8,9]
Sum it in the range m, n, where m = 1, n = 3
hive> select subarraysum(nums, 1,3) from mytable; 3 13
or
hive> select sum(subarraysum(nums, 1,3)) from mytable; 16