Why is this Scala code slow?

I run the following Scala code:

import scala.util.parsing.json._ import scala.io._ object Main { def jsonStringMap(str: String) = JSON.parseFull(str) match { case Some(m: Map[_,_]) => m collect { // If this doesn't match, we'll just ignore the value case (k: String, v: String) => (k,v) } toMap case _ => Map[String,String]() } def main(args: Array[String]) { val fh = Source.fromFile("listings.txt") try { fh.getLines map(jsonStringMap) foreach { v => println(v) } } finally { fh.close } } } 

On my machine, it takes ~ 3 minutes in a file from http://sortable.com/blog/coding-challenge/ . The Haskell and Ruby equivalent programs I wrote take less than 4 seconds. What am I doing wrong?

I tried the same code without a map (jsonStringMap) and it was very fast, just like a JSON parser is just very slow?

It seems likely that the default JSON parser is just very slow, however I tried https://github.com/stevej/scala-json , and as long as it reaches 35 seconds, it's still a lot slower than Ruby.

Now I am using https://github.com/codahale/jerkson , which is even faster! My program now only works for 6 seconds according to my data, only 3 seconds slower than Ruby, which is probably just starting the JVM.

+7
source share
3 answers

A quick look at the scala -user archive shows that no one is doing serious work with the JSON parser in the standard scala library.

See http://groups.google.com/group/scala-user/msg/fba208f2d3c08936

It seems that the parser got into the standard library at a time when scala was less in the spotlight and did not have the expectations that it has today.

+8
source

Use Jerkson . Jerksson uses Jackson, which is always the fastest JSON library on the JVM (especially when reading / writing stream) with large documents.

+3
source

Using my JSON library , I get an almost instant analysis of both files:

 import com.github.seanparsons.jsonar._ import scala.io.Source def parseLines[T](file: String, transform: (Iterator[String]) => T): T = { val log = Source.fromFile(file) val logLines = log.getLines() try { transform(logLines) } finally { log.close } } def parseFile(file: String) = parseLines(file, (iterator) => iterator.map(Parser.parse(_)).toList) parseFile("products.txt"); parseFile("listings.txt") 

However, as mentioned above, it would be more useful to simply parse all this as a JSONArray, rather than having many separate lines as it does.

+2
source

All Articles