I need to transfer data from an HDFS directory using a spark stream.
JavaDStream<String> lines = ssc.textFileStream("hdfs://ip:8020/directory");
The above is very good work on monitoring the HDFS directory for new files, but it is limited to the same level of directories, it does not control sub directories.
I come to the following posts that mention adding depth parameter to this API
https://mail-archives.apache.org/mod_mbox/spark-reviews/201502.mbox/% 3C20150220121124.DBB5FE03F7@git1-us-west.apache.org% 3E
https://github.com/apache/spark/pull/2765
The problem is that in spark version 1.6.1 (checked) this parameter is not, so I canβt use it, I donβt want to change the source of the eight
JavaDStream<String> lines = ssc.textFileStream("hdfs://ip:8020/*/*/*/");
some post in the stack overflow mentions using the syntax above that the fighter doesn't work.
Am I missing something?
source
share