"A", "value" -> 20, "n...">

How to sum values ​​and group them by key value in Scala Map List?

I have a list of maps:

val list = List(
  Map("id" -> "A", "value" -> 20, "name" -> "a"),
  Map("id" -> "B", "value" -> 10, "name" -> "b"),
  Map("id" -> "A", "value" -> 5, "name" -> "a"),
  Map("id" -> "C", "value" -> 1, "name" -> "c"),
  Map("id" -> "D", "value" -> 60, "name" -> "d"),
  Map("id" -> "C", "value" -> 3, "name" -> "c")
)

I want to summarize valueand group them by value in the idmost efficient way so that it becomes the following:

Map(A -> 25, B -> 10, C -> 4, D -> 60)
+5
source share
4 answers

Also using foldLeft:

list.foldLeft(Map[String, Int]().withDefaultValue(0))((res, v) => {
  val key = v("id").toString
  res + (key -> (res(key) + v("value").asInstanceOf[Int]))
})

UPDATE: from reduceLeft:

(Map[String, Any]().withDefaultValue(0) :: list).reduceLeft((res, v) => {
  val key = v("id").toString
  res + (key -> (res(key).asInstanceOf[Int] + v("value").asInstanceOf[Int]))
})

By the way, if you look at the definition reduceLeft, you will see that it uses the same foldLeft:

  def reduceLeft[B >: A](f: (B, A) => B): B =
    if (isEmpty) throw new UnsupportedOperationException("empty.reduceLeft")
    else tail.foldLeft[B](head)(f)

UPDATE 2: c parand reduce: The problem here is to distinguish the value of the result map from the initial value of the map. I have chosen contains("id").

list.par.reduce((a, b) => {
  def toResultMap(m: Map[String, Any]) =
    if (m.contains("id"))
      Map(m("id").toString -> m("value")).withDefaultValue(0)
    else m
  val aM = toResultMap(a)
  val bM = toResultMap(b)
  aM.foldLeft(bM)((res, v) =>
    res + (v._1 -> (res(v._1).asInstanceOf[Int] + v._2.asInstanceOf[Int])))
})
+5
source

A) , :

scala> list.groupBy(_("id")).mapValues(_.map(_("value").asInstanceOf[Int]).sum)
res14: scala.collection.immutable.Map[Any,Int] = Map(D -> 60, A -> 25, C -> 4, B -> 10)

list.groupBy(_("id")).par.... , , .

.par , map(_"value").sum ( ) , . N= , N , par, , .

B) , ( ), "" groupBy :

val m = scala.collection.mutable.Map[String, Int]() withDefaultValue(0)
for (e <- list; k = e("id").toString) m.update(k, m(k) + e("value").asInstanceOf[Int])

C) :

val m = new scala.collection.concurrent.TrieMap[String, Int]()
for (e <- list.par; k = e("id").toString) {
    def replace = {           
       val v = m(k)
       m.replace(k, v, v + e("value").asInstanceOf[Int]) //atomic
    }
    m.putIfAbsent(k, 0) //atomic
    while(!replace){} //in case of conflict
}

scala> m
res42: scala.collection.concurrent.TrieMap[String,Int] = TrieMap(B -> 10, C -> 4, D -> 60, A -> 25)

D) (, , ), scalaz :

import scalaz._; import Scalaz._
scala> list.map(x => Map(x("id").asInstanceOf[String] -> x("value").asInstanceOf[Int]))
    .par.reduce(_ |+| _)
res3: scala.collection.immutable.Map[String,Int] = Map(C -> 4, D -> 60, A -> 25, B -> 10)

, , "+".


, :

def time[T](n: Int)(f: => T) = {
  val start = System.currentTimeMillis()
  for(i <- 1 to n) f
  (System.currentTimeMillis() - start).toDouble / n
}

Scala 2.12 REPL JDK8 MacBook Pro 2,3 Intel Core i7. - JVM.

1) time(100000){...}, :

`par.groupBy.par.mapValues` = 0.13861 ms
`groupBy.par.mapValues` = 0.07667 ms
`most parallelized` = 0.06184 ms    
`scalaz par.reduce(_ |+| _)` = 0.04010 ms //same for other reduce-based implementations, mentioned here
`groupBy.mapValues` = 0.00212 ms
`for` + `update` with mutable map initialization time = 0.00201 ms
`scalaz suml` = 0.00171 ms      
`foldLeft` from another answer = 0.00114 ms
`for` + `update` without mutable map initialization time = 0.00105

, foldLeft .

2)

 scala> val newlist = (1 to 1000).map(_ => list).reduce(_ ++ _)

newList time(1000){...}:

 `scalaz par.reduce(_ |+| _)` = 1.422 ms
 `foldLeft`/`for` = 0.418 ms
 `groupBy.par.mapValues` = 0.343 ms

groupBy.par.mapValues .

3) , :

scala> implicit class RichInt(i: Int){ def ++ (i2: Int) = { Thread.sleep(1); i + i2}}
defined class RichInt

list time(1000):

`foldLeft` = 7.742 ms
`most parallelized` = 3.315 ms

.


:

8 . [1] + ... + [1] [1 + ... + 1]:

time(([1] + [1]) + ([1] + [1]) + ([1] + [1]) + ([1] + [1]) 
   => ([1 +1] + [1 +1]) + ([1 + 1] + [1 + 1]) 
   => [1 + 1 + 1 + 1] + [1 + 1 + 1 + 1]) 
 = (1 + 1 + 1 + 1) +  (2 + 2) + 4 = 12

(N = 8) = 8/2 + 2 * 8/4 + 4 * 8/8 = 8 * (1/2 + 2/4 + 4/8) = 8 * log2 (8)/2 = 12

:

time (N) = N * log2 (N) / 2

, , 2. O(NlogN), , foldLeft O(N). O(N), Map-Reduce Big Data, , - .

, , - , 6 ( O(1) ) - - , " ". , reduce . - , , (. 2).

+7

" ", , , - scalaz suml, Monoid; Monoid Map , . Map[String, Any] - (, Map("A" → 20)).

import scalaz._, Scalaz._
list.map{m => 
  Map(m("id").asInstanceOf[String] → m("value").asInstanceOf[Int])
}.suml
+3

Scala 2.13, K)(f:A=>B)(reduce:(B,B)=>B):scala.collection.immutable.Map[K,B] rel="nofollow noreferrer"> groupMapReduce ( ) groupBy mapValues reduce:

// val list = List(Map("id" -> "A", "value" -> 20, "name" -> "a"), Map("id" -> "B", "value" -> 10, "name" -> "b"), Map("id" -> "A", "value" -> 5, "name" -> "a"), Map("id" -> "C", "value" -> 1, "name" -> "c"), Map("id" -> "D", "value" -> 60, "name" -> "d"), Map("id" -> "C", "value" -> 3, "name" -> "c"))
list.groupMapReduce(_("id"))(_("value").asInstanceOf[Int])(_ + _)
// Map("A" -> 25, "B" -> 10, "C" -> 4, "D" -> 60)

:

  • group "id" (_("id")) ( MapReduce)

  • map "", Int (_("value").asInstanceOf[Int]) ( Map Reduce)

  • reduce (_ + _), ( groupMap Reduce).

, :

list.groupBy(_("id")).mapValues(_.map(_("value").asInstanceOf[Int]).reduce(_ + _)).toMap
0
source

All Articles