You can use the custom map reduction feature in Hive.
With the following:
add file /some/path/identity.pl; add file /some/path/collect.pl; from ( from trace_input MAP id, lat, lon, ts USING './identity.pl' as id, lat, lon, ts CLUSTER BY id) map_output REDUCE id, lat, lon, ts USING './collect.pl' as id, list
trace_input contains your trace data as described above:
create table trace_input(id string, lat string, lon string, ts string) row format delimited fields terminated by '\t' stored as textfile ;
identity.pl is a simple script to unload each line (there may also be a script to select only lat, long fields):
#!/usr/bin/perl while (<STDIN>) { print; }
collect.pl (example here ) is a simple script that collects consecutive lines with the same object identifier, saves the remainder of each line and unloads the line with the identifier and a comma-separated list (tab delimiter).
The cluster proposal ensures that the reducers get the correctly sorted input needed to build the script.
Custom script output is divided into STRING column tabs.
Running the query will result in the following output:
1 X11,X12,T11,X21,X22,T12,X31,X22,T13 2 X11,X12,T21,X21,X22,T22
You can change the script map to restrict the columns and / or change the script shortening to add results or separate lat, lon from ts, etc.
If this form is sufficient, you can insert it directly into the results table by adding to it before insertion:
from ( from trace_input MAP id, lat, lon, ts USING './identity.pl' as id, lat, lon, ts CLUSTER BY id) map_output INSERT overwrite table trace_res REDUCE id, lat, lon, ts USING './collect.pl';
Fields will be converted from string fields according to the trace_res schema as needed.
If you use collection types like me, you can also do something like:
create table trace_res as select sq.id, split(sq.list,",") from ( from ( from trace_input MAP id, lat, lon, ts USING './identity.pl' as id, lat, lon, ts CLUSTER BY id) map_output REDUCE id, lat, lon, ts USING './collect.pl' as (id int, list string) ) sq;
This second field in the created table will be a list of all lat, lon, ts; but will probably have a more complex table.