I am trying to read / control txt files from a Hadoop file system directory. But I noticed that all the txt files inside this directory are the directories themselves, as shown in the following example below:
/crawlerOutput/b6b95b75148cdac44cd55d93fe2bbaa76aa5cccecf3d723c5e47d361b28663be-1427922269.txt/_SUCCESS
/crawlerOutput/b6b95b75148cdac44cd55d93fe2bbaa76aa5cccecf3d723c5e47d361b28663be-1427922269.txt/part-00000
/crawlerOutput/b6b95b75148cdac44cd55d93fe2bbaa76aa5cccecf3d723c5e47d361b28663be-1427922269.txt/part-00001
I would like to read all the data inside the part files. I am trying to use the following code as shown in this snippet:
val testData = ssc.textFileStream("/crawlerOutput/*/*")
But unfortunately, he said that does not exist /crawlerOutput/*/*. Does it accept textFileStreamwildcards? What to do to solve this problem?
source
share