You should use the randomSplit method:
def randomSplit(weights: Array[Double], seed: Long = Utils.random.nextLong): Array[RDD[T]]
Here is its implementation in spark 1.0:
def randomSplit(weights: Array[Double], seed: Long = Utils.random.nextLong): Array[RDD[T]] = { val sum = weights.sum val normalizedCumWeights = weights.map(_ / sum).scanLeft(0.0d)(_ + _) normalizedCumWeights.sliding(2).map { x => new PartitionwiseSampledRDD[T, T](this, new BernoulliSampler[T](x(0), x(1)),seed) }.toArray }
Shyamendra solanki
source share