Scala - using a high heap when executing XML.loadFile for a large number of files in a local area

I am trying to create an object tree from a large number of xmls. However, when I run the following code of about 2000 xml files (from 100 KB to 200 MB) (note that I commented on the code that creates the object tree), I get a large amount of memory of 8-9 GB. I expect the memory to be minimal in the following example, because the code does not contain any links, it just creates an Elem and discards it. Heap memory remains unchanged after starting a full GC.

def addDir(dir: File) {
dir.listFiles.filter(file => file.getName.endsWith("xml.gz")).foreach { gzipFile =>
    addGzipFile(gzipFile)
}
}
def addGzipFile(gzipFile: File) {
val is = new BufferedInputStream(new GZIPInputStream(new FileInputStream(gzipFile)))
val xml = XML.load(is)
// parse xml and create object tree
is.close()
}

My JVM options: -server -d64 -Xmx16G -Xss16M -XX: + DoEscapeAnalysis -XX: + UseCompressedOops

And the output of jmap -histo looks like this:

num #instances #bytes class name
----------------------------------------------
   1:      67501390     1620033360  scala.collection.immutable.$colon$colon
   2:      37249187     1254400536  [C
   3:      37287806     1193209792  java.lang.String
   4:      37200976      595215616  scala.xml.Text
   5:      18600485      595215520  scala.xml.Elem
   6:       3420921       82102104  scala.Tuple2
   7:        213938       58213240  [I
   8:       1140334       36490688  scala.collection.mutable.ListBuffer
   9:       2280468       36487488  scala.runtime.ObjectRef
  10:       1140213       36486816  scala.collection.Iterator$$anon$24
  11:       1140210       36486720  scala.xml.parsing.FactoryAdapter$$anonfun$startElement$1
  12:       1140210       27365040  scala.collection.immutable.Range$$anon$2
...
Total     213412869     5693850736
+5
1

. :

import java.io._
import xml.XML

object XMLLoadHeap {

  val filename = "test.xml"

  def addFile() {
    val is = new BufferedInputStream(new FileInputStream(filename))
    val xml = XML.load(is)
    is.close()
    println(xml.label)
  }

  def createXMLFile() {
    val out = new FileWriter(filename)
    out.write("<foo>\n")
    (1 to 100000) foreach (i => out.write("  <bar baz=\"boom\"/>\n"))
    out.write("</foo>\n")
    out.close()
  }

  def main(args:Array[String]) {
    println("XMLLoadHeap")
    createXMLFile()
    (1 to args(0).toInt) foreach { i => 
      println("processing " + i)
      addFile()
    }
  }

}

: -Xmx128m -XX:+HeapDumpOnOutOfMemoryError -verbose:gc , .

, XML. , , . 200MB XML 64- , 3G . . . XMLEventReader.

, , -Xmx4G -XX:+HeapDumpOnOutOfMemoryError, , MAT. 4 XML, , , , , GC. , , XML-.

+2

All Articles