How to convert RDD [(String, String)] to RDD [Array [String]]?

I am trying to add a file name to every entry in the file. I thought that if RDD would be Array, it would be easy for me to do this.

Some help with converting to RDD type or solving this problem would be much appreciated!

In (String, String) type

scala> myRDD.first()(1) scala><console>:24: error: (String, String) does not take parametersmyRDD.first()(1) 

In an array (string)

 scala> myRDD.first()(1) scala> res1: String = abcdefgh 

My function:

 def appendKeyToValue(x: Array[Array[String]){ for (i<-0 to (x.length - 1)) { var key = x(i)(0) val pattern = new Regex("\\.") val key2 = pattern replaceAllIn(key1,"|") var tempvalue = x(i)(1) val finalval = tempvalue.split("\n") for (ab <-0 to (finalval.length -1)){ val result = (I am trying to append filename to each record in the filekey2+"|"+finalval(ab)) } } } 
+5
source share
1 answer

If you have RDD[(String, String)] , you can access the first field of the tuple of the first tuple by calling

 val firstTupleField: String = myRDD.first()._1 

If you want to convert RDD[(String, String)] to RDD[Array[String]] , you can do the following

 val arrayRDD: RDD[Array[String]] = myRDD.map(x => Array(x._1, x._2)) 

You can also use a partial function to destroy tuples:

 val arrayRDD: RDD[Array[String]] = myRDD.map { case (a,b) => Array(a, b) } 
+8
source

All Articles