How to check file format on HDFS?

Give the HDFS path how to determine which format it is (text, sequence or parquet)?

+4
source share
3 answers

I think it’s not easy for you to fulfill your demand if all your files inside HDFS do not follow some conventions, for example. .txtfor text, .seqfro sequence and .parquetfor parquet file.

However, you can check your file manually using cat.

  • HDFS cat: hadoop dfs -cat /path/to/file | headto check, not a text file.

  • Parquet head : parquet tool head [option ...] / path / to / file

  • or write a program to read ....

+3
source

use "hdfs dfs -cat / path / to / file | head",

1) orc "ORC"

2) "PAR1"

3)

+1

Line extension = FilenameUtils.getExtension ("hdfs: // file path"); Work with Hadoop 2.5.2

0
source

All Articles