(Even more basic than the Difference between Pig and Hive? Why are there both?
I have a data processing pipeline written by several Java map-reduce tasks on Hadoop (my own code comes from Hadoop Mapper and Reducer). This is a series of basic operations such as combining, inverting, sorting, and grouping. My code is involved and not very general.
What are the pros and cons of continuing this admittedly development-based approach, or porting everything to Pig / Hive with multiple UDFs? What tasks can I not do? Will I suffer from poor performance (working with 100 TB patients)? Will I lose my ability to customize and debug my code while saving it? Will I be able to convey the jobs pipeline as a shorthand for a Java map and use their I / O with my Pig / Hive jobs?
Twitter: Pig script 5% /, 5% . , , , 110 150% , /. , , , map/reduce.
Pig MapReduce.
, Pig/Hive .
2009 , 1,5 , MapReduce. , , Hadoop, , MapReduce, , MapReduce , (, ).
API Pangool ( ), API Hadoop MapReduce, ( , ). Pangool ( 5% ) API MapRed.