Once CSV is loaded via read.csv , it is quite simple to use multicore , segue , etc. to play data in CSV. However, reading it is currently quite tedious.
Understand that it is better to use mySQL, etc.
Suppose an AWS 8xl cluster computing instance is running R2.13
The spectrum is as follows:
Cluster Compute Eight Extra Large specifications: 88 EC2 Compute Units (Eight-core 2 x Intel Xeon) 60.5 GB of memory 3370 GB of instance storage 64-bit platform I/O Performance: Very High (10 Gigabit Ethernet)
Any thoughts / ideas that are much appreciated.
parallel-processing r csv bigdata
new
source share