I am creating one RDD 2.csv file such as
val combineRDD = sc.textFile("D://release//CSVFilesParellel//*.csv")
Then I want to define a custom section on this RDD so that one section should contain one file. so that each section of ieone csv file is processed through one node for faster data processing
Is it possible to write a custom delimiter based on the size of the file or the number of lines in one file or at the end of a file character of one file?
How do I achieve this?
The structure of one file looks something like this:
00-00
Time (in seconds) Measure1 Measure2 Measure3 ..... Measuren
0
0.25
0.50
0.75
1
...
3600
1. The first line of data contains hours: mins Each file contains data for 1 hour or 3600 seconds
2. - , 4 250 , 250
: , : --. , ,
→ , RDD , .
, node , , .
.
,
Vinay Joglekar