How to add an ORC file

We have a requirement when we need appednd ORC files. I tried to do this, but nothing worked. In addition, org.apache.hadoop.hive.ql.io.orc.WriterImpl of ORC does not have an append API. Do I need to add ORC files? (More specifically, the use of JAVA)

+5
source share
1 answer

ORC data files are divided into independent bands; each strip is created in one atomic step. See official documentation for more details.

I do not believe that you can directly add an existing file on the fly. This would mean leaving a damaged strip (hence a damaged file) in the event of a recording malfunction.

But you can

  • create a new ORC data file (which will contain bands 1..N depending on the actual data volume versus the orc.stripe.size property) for each reducer.
  • then โ€œmergeโ€ these data files - and existing files - with Hive V0.14 and higher
+4
source

All Articles