How to sum values and group them by key value in Scala Map List?
I have a list of maps:
val list = List(
Map("id" -> "A", "value" -> 20, "name" -> "a"),
Map("id" -> "B", "value" -> 10, "name" -> "b"),
Map("id" -> "A", "value" -> 5, "name" -> "a"),
Map("id" -> "C", "value" -> 1, "name" -> "c"),
Map("id" -> "D", "value" -> 60, "name" -> "d"),
Map("id" -> "C", "value" -> 3, "name" -> "c")
)
I want to summarize valueand group them by value in the idmost efficient way so that it becomes the following:
Map(A -> 25, B -> 10, C -> 4, D -> 60)
Also using foldLeft:
list.foldLeft(Map[String, Int]().withDefaultValue(0))((res, v) => {
val key = v("id").toString
res + (key -> (res(key) + v("value").asInstanceOf[Int]))
})
UPDATE: from reduceLeft:
(Map[String, Any]().withDefaultValue(0) :: list).reduceLeft((res, v) => {
val key = v("id").toString
res + (key -> (res(key).asInstanceOf[Int] + v("value").asInstanceOf[Int]))
})
By the way, if you look at the definition reduceLeft, you will see that it uses the same foldLeft:
def reduceLeft[B >: A](f: (B, A) => B): B =
if (isEmpty) throw new UnsupportedOperationException("empty.reduceLeft")
else tail.foldLeft[B](head)(f)
UPDATE 2: c parand reduce: The problem here is to distinguish the value of the result map from the initial value of the map. I have chosen contains("id").
list.par.reduce((a, b) => {
def toResultMap(m: Map[String, Any]) =
if (m.contains("id"))
Map(m("id").toString -> m("value")).withDefaultValue(0)
else m
val aM = toResultMap(a)
val bM = toResultMap(b)
aM.foldLeft(bM)((res, v) =>
res + (v._1 -> (res(v._1).asInstanceOf[Int] + v._2.asInstanceOf[Int])))
})
A) , :
scala> list.groupBy(_("id")).mapValues(_.map(_("value").asInstanceOf[Int]).sum)
res14: scala.collection.immutable.Map[Any,Int] = Map(D -> 60, A -> 25, C -> 4, B -> 10)
list.groupBy(_("id")).par.... , , .
.par , map(_"value").sum ( ) , . N= , N , par, , .
B) , ( ), "" groupBy :
val m = scala.collection.mutable.Map[String, Int]() withDefaultValue(0)
for (e <- list; k = e("id").toString) m.update(k, m(k) + e("value").asInstanceOf[Int])
C) :
val m = new scala.collection.concurrent.TrieMap[String, Int]()
for (e <- list.par; k = e("id").toString) {
def replace = {
val v = m(k)
m.replace(k, v, v + e("value").asInstanceOf[Int]) //atomic
}
m.putIfAbsent(k, 0) //atomic
while(!replace){} //in case of conflict
}
scala> m
res42: scala.collection.concurrent.TrieMap[String,Int] = TrieMap(B -> 10, C -> 4, D -> 60, A -> 25)
import scalaz._; import Scalaz._
scala> list.map(x => Map(x("id").asInstanceOf[String] -> x("value").asInstanceOf[Int]))
.par.reduce(_ |+| _)
res3: scala.collection.immutable.Map[String,Int] = Map(C -> 4, D -> 60, A -> 25, B -> 10)
, , "+".
, :
def time[T](n: Int)(f: => T) = {
val start = System.currentTimeMillis()
for(i <- 1 to n) f
(System.currentTimeMillis() - start).toDouble / n
}
Scala 2.12 REPL JDK8 MacBook Pro 2,3 Intel Core i7. - JVM.
1) time(100000){...}, :
`par.groupBy.par.mapValues` = 0.13861 ms
`groupBy.par.mapValues` = 0.07667 ms
`most parallelized` = 0.06184 ms
`scalaz par.reduce(_ |+| _)` = 0.04010 ms //same for other reduce-based implementations, mentioned here
`groupBy.mapValues` = 0.00212 ms
`for` + `update` with mutable map initialization time = 0.00201 ms
`scalaz suml` = 0.00171 ms
`foldLeft` from another answer = 0.00114 ms
`for` + `update` without mutable map initialization time = 0.00105
, foldLeft .
2)
scala> val newlist = (1 to 1000).map(_ => list).reduce(_ ++ _)
newList time(1000){...}:
`scalaz par.reduce(_ |+| _)` = 1.422 ms
`foldLeft`/`for` = 0.418 ms
`groupBy.par.mapValues` = 0.343 ms
groupBy.par.mapValues .
3) , :
scala> implicit class RichInt(i: Int){ def ++ (i2: Int) = { Thread.sleep(1); i + i2}}
defined class RichInt
list time(1000):
`foldLeft` = 7.742 ms
`most parallelized` = 3.315 ms
.
:
8 . [1] + ... + [1] [1 + ... + 1]:
time(([1] + [1]) + ([1] + [1]) + ([1] + [1]) + ([1] + [1])
=> ([1 +1] + [1 +1]) + ([1 + 1] + [1 + 1])
=> [1 + 1 + 1 + 1] + [1 + 1 + 1 + 1])
= (1 + 1 + 1 + 1) + (2 + 2) + 4 = 12
(N = 8) = 8/2 + 2 * 8/4 + 4 * 8/8 = 8 * (1/2 + 2/4 + 4/8) = 8 * log2 (8)/2 = 12
:
, , 2. O(NlogN), , foldLeft O(N). O(N), Map-Reduce Big Data, , - .
, , - , 6 ( O(1) ) - - , " ". , reduce . - , , (. 2).
Scala 2.13, K)(f:A=>B)(reduce:(B,B)=>B):scala.collection.immutable.Map[K,B] rel="nofollow noreferrer"> groupMapReduce ( ) groupBy mapValues reduce:
// val list = List(Map("id" -> "A", "value" -> 20, "name" -> "a"), Map("id" -> "B", "value" -> 10, "name" -> "b"), Map("id" -> "A", "value" -> 5, "name" -> "a"), Map("id" -> "C", "value" -> 1, "name" -> "c"), Map("id" -> "D", "value" -> 60, "name" -> "d"), Map("id" -> "C", "value" -> 3, "name" -> "c"))
list.groupMapReduce(_("id"))(_("value").asInstanceOf[Int])(_ + _)
// Map("A" -> 25, "B" -> 10, "C" -> 4, "D" -> 60)
:
group"id" (_("id")) ( MapReduce)map"", Int (_("value").asInstanceOf[Int]) ( Map Reduce)reduce(_ + _), ( groupMap Reduce).
list.groupBy(_("id")).mapValues(_.map(_("value").asInstanceOf[Int]).reduce(_ + _)).toMap