Here is one example in which I use a split Hive ORC table as input:
hadoop jar /usr/hdp/2.2.4.12-1/hadoop-mapreduce/hadoop-streaming-2.6.0.2.2.4.12-1.jar \
-libjars /usr/hdp/current/hive-client/lib/hive-exec.jar \
-Dmapreduce.task.timeout=0 -Dmapred.reduce.tasks=1 \
-Dmapreduce.job.queuename=default \
-file RStreamMapper.R RStreamReducer2.R \
-mapper "Rscript RStreamMapper.R" -reducer "Rscript RStreamReducer2.R" \
-input /hive/warehouse/asv.db/rtd_430304_fnl2 \
-output /user/Abhi/MRExample/Output \
-inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
-outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Here /apps/hive/warehouse/asv.db/rtd_430304_fnl2is the path to the background ORC data store for the HIVE table. Rest I need to provide the appropriate banks for streaming, as well as HIVE.
source
share