File formats that can be read using PIG

What file formats can be read with PIG?

How can I store them in different formats? Say we have a CSV file and I want to save it as an MXL file, how can this be done? Whenever we use the STORE command, it creates a directory and stores the file as part-m-00000, how can I change the file name and overwrite the directory?

+5
source share
1 answer

What file formats can be read with PIG? How can I store them in different formats?

There are several built-in methods for loading and storing , but they are limited:

  • BinStorage - "binary" storage
  • PigStorage - , - (, )
  • TextLoader - (.. )

piggybank - , , , XML, XML-.


, CSV. MXL, ?

, XML ... XML - Hadoop, , , root? , , - XML.

, , UDF, XML:

B = FOREACH A GENERATE customudfs.DataToXML(col1, col2, col3);

, col1, col2, col3 "foo", 37, "lemons", . UDF "<item><name>Foo</name><num>37</num><fruit>lemons</fruit></item>".


, STORE, part-m-00000, ?

part-m-00000. Hadoop. , - - hadoop fs -mv output/part-m-00000 newoutput/myoutputfile. bash script, script, .

+6

All Articles