Performance difference in toString.map and toString.toArray.map

Question

Performance difference in toString.map and toString.toArray.map

While coding Euler's problems, I came across what, in my opinion, is bizarre:

The toString.map method is slower than toString.toArray.map.

Here is an example:

def main(args: Array[String]) { def toDigit(num : Int) = num.toString.map(_ - 48) //2137 ms def toDigitFast(num : Int) = num.toString.toArray.map(_ - 48) //592 ms val startTime = System.currentTimeMillis; (1 to 1200000).map(toDigit) println(System.currentTimeMillis - startTime) }

Should the method display a string backup of the map on an array on String? Why is there such a noticeable difference? (Note that increasing the number even causes a stack overflow in the case without an array).

+4

performance string scala

Vinicius seufitele May 21 '12 at 11:27

source share

2 answers

You are being fooled by a lack of memory. The toDigit version creates more intermediate objects, but if you have a lot of memory, the GC will not be affected much (and everything will work faster). For example, if instead of creating 1.2 million numbers, I create 12k 100x per line, I get roughly the same time for the two methods. If I create 5-digit numbers of 1.2 thousand 1000x in a row, I find that toDigit about 5% faster.

Given that the toDigit method creates an immutable collection that is better when everything else is equal, because it is easier to reason with, and given that everything else is equal for all but very complex tasks, I think the library is as it should be.

When trying to improve performance, of course, you need to keep in mind all kinds of tricks; one of them is that arrays have better memory characteristics for collections of known length than the fancy collections in the Scala library. In addition, you need to know that a card is not the fastest way to achieve a result; if you really want it to be fast you should

 final def toDigitReallyFast(num: Int, accum: Long = 0L, iter: Int = 0): Array[Byte] = { if (num==0) { val ans = new Array[Byte](math.max(1,iter)) var i = 0 var ac = accum while (i < ans.length) { ans(ans.length-i-1) = (ac & 0xF).toByte ac >>= 4 i += 1 } ans } else { val next = num/10 toDigitReallyFast(next, (accum << 4) | (num-10*next), iter+1) } }

which on my machine is 4 times faster than any other. And you can get almost 3 times faster if you leave everything long and pack the results into an array instead of using 1 to N :

 final def toDigitExtremelyFast(num: Int, accum: Long = 0L, iter: Int = 0): Long = { if (num==0) accum | (iter.toLong << 48) else { val next = num/10 toDigitExtremelyFast(next, accum | ((num-10*next).toLong<<(4*iter)), iter+1) } } // loop, instead of 1 to N map, for the 1.2k number case { var i = 10000 val a = new Array[Long](1201) while (i<=11200) { a(i-10000) = toDigitReallyReallyFast(i) i += 1 } a }

As with many things, performance tuning is highly dependent on what you want to do. On the contrary, the design of the library should balance many different problems. I really think it is worth noting that the library is not optimal in performance, but this is not one of those cases when IMO; flexibility is worth using common use cases.

+2

Rex kerr May 21 '12 at 17:50

source share

ron · Accepted Answer · 2012-05-21T12:13:47+0000

Original

Maybe because toString.map uses WrappedString implicitly, and toString.toArray.map uses WrappedArray implicitly to resolve the map .

Let's see the map as defined in the TraversableLike :

 def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = { val b = bf(repr) b.sizeHint(this) for (x <- this) b += f(x) b.result }

WrappedString uses the StringBuilder constructor as:

 def +=(x: Char): this.type = { append(x); this } def append(x: Any): StringBuilder = { underlying append String.valueOf(x) this }

Calling String.valueOf for Any uses the Java Object.toString in Char instances, perhaps by first inserting a box. These additional operations may cause a difference in speed compared to the supposedly shorter code tracks of the Array constructor.

This assumption, however, would have to measure.

Edit

After the revision, the common point is still there, but I was referring to incorrect implications, as the toDigit methods return an Int sequence (or the like) and not a translated string, as I read incorrectly.

toDigit uses LowPriorityImplicits.fallbackStringCanBuildFrom[T]: CanBuildFrom[String, T, immutable.IndexedSeq[T]] , with T = Int , which simply jumps to the general IndexedSeq constructor.

toDigitFast uses a direct array of implicit type CanBuildFrom[Array[_], T, Array[T]] , which is undeniably faster.

Passing the following CBF to toDigit explicitly makes these two methods equal:

 object FastStringToArrayBuild { def canBuildFrom[T : ClassManifest] = new CanBuildFrom[String, T, Array[T]] { private def newBuilder = scala.collection.mutable.ArrayBuilder.make() def apply(from: String) = newBuilder def apply() = newBuilder } }

Performance difference in toString.map and toString.toArray.map

More articles: