Using Pig / Hive to process data instead of a direct java card reduces code?

Question

Using Pig / Hive to process data instead of a direct java card reduces code?

(Even more basic than the Difference between Pig and Hive? Why are there both?

I have a data processing pipeline written by several Java map-reduce tasks on Hadoop (my own code comes from Hadoop Mapper and Reducer). This is a series of basic operations such as combining, inverting, sorting, and grouping. My code is involved and not very general.

What are the pros and cons of continuing this admittedly development-based approach, or porting everything to Pig / Hive with multiple UDFs? What tasks can I not do? Will I suffer from poor performance (working with 100 TB patients)? Will I lose my ability to customize and debug my code while saving it? Will I be able to convey the jobs pipeline as a shorthand for a Java map and use their I / O with my Pig / Hive jobs?

+5

mapreduce hadoop hive apache-pig

ihadanny Nov 07 '11 at 14:38

source share

2 answers

2009 , 1,5 , MapReduce. , , Hadoop, , MapReduce, , MapReduce , (, ).

API Pangool ( ), API Hadoop MapReduce, ( , ). Pangool ( 5% ) API MapRed.

+3

Pere 06 . '12 10:57

Praveen Sripati · Accepted Answer · 2011-11-07T16:45:42+0000

Twitter: Pig script 5% /, 5% . , , , 110 150% , /. , , , map/reduce.

Pig MapReduce.

, Pig/Hive .

Using Pig / Hive to process data instead of a direct java card reduces code?

More articles: