2010-04-01

Scala performance - garbage collection analysis

Scala runs on top of JVM which means that your code is compiled into bytecode. That in turn means that JVM uses garbage collection as your Scala program is running - garbage collector (GC) cleans up the memory when objects you allocated in your code become obsolete.

When tuning your application for performance, you may discover that it is the garbage collector that slows things down. For instance, garbage collector may occupy 50% of the CPU, which means that your program spends rather little of its time doing things you want it to do and a lot of its time cleaning up garbage. Then - you should diagnose problems and optimize. However, you can almost always speed up your application by optimizing garbage collection.

Note: I'm running Scala 2.8 on Windows 7 with a 32-bit JRE version 1.6.0_17.

Note: This article applies to any JVM language, not only Scala. By the end of the article I will present a Scala sample though.


Accessing garbage collector performance data


First, we need to collect GC data.

Before you run your app, you should set a certain Java option that will provide you with garbage collection information. How you set it is up to you - I will show how to set it via command line in Windows:

set JAVA_OPTS=-verbosegc

-verbose:gc should also be valid.

Now, when you run your application:

scala me.m1key.approximation.Launcher

... JVM will print something similar to output:

0.074: [GC 886K->173K(5056K),  0.0015737 secs]
0.100: [GC 1066K->128K(5056K), 0.0007356 secs]
0.124: [GC 1009K->183K(5056K), 0.0006837 secs]
0.154: [GC 1079K->130K(5056K), 0.0008542 secs]
0.181: [GC 1021K->148K(5056K), 0.0006898 secs]
0.210: [GC 1039K->241K(5056K), 0.0009012 secs]

You should run your Scala application from command line to see it. Of course, the result will vary depending on the program you run and it's always a bit different for each run - plus different JVM versions may have different formats of this data.

Now, before we analyze this, I would like to say that you might want to have this printed to a file instead of standard output (you probably do). In that case, this option will do it:

set JAVA_OPTS=-Xloggc:gc.log

And should you need more data, you can use -XX:+PrintGCDetails.

set JAVA_OPTS=-XX:+PrintGCDetails -Xloggc:gc.log

And you will also get this information:

Heap
 def new generation   total 960K, used 117K [0x04180000, 0x04280000, 0x04660000)
  eden space 896K,   6% used [0x04180000, 0x0418fa98, 0x04260000)
  from space 64K,  86% used [0x04260000, 0x0426dcb0, 0x04270000)
  to   space 64K,   0% used [0x04270000, 0x04270000, 0x04280000)
 tenured generation   total 4096K, used 324K [0x04660000, 0x04a60000, 0x08180000)
   the space 4096K,   7% used [0x04660000, 0x046b1118, 0x046b1200, 0x04a60000)
 compacting perm gen  total 12288K, used 7394K [0x08180000, 0x08d80000, 0x0c180000)
   the space 12288K,  60% used [0x08180000, 0x088b8908, 0x088b8a00, 0x08d80000)
No shared spaces configured.

Here you can see each generation and how much space it used.

Analyzing GC data


The three most interesting values are throughput, pauses and footprint.

  • throughput - the time spent not garbage collecting, i.e. your program doing useful things
  • pause - the time spent garbage collecting
  • footprint - the amount of memory your program took

Let's see.

0.074: [GC 886K->173K(5056K),  0.0015737 secs]

  • 0.074 - how long the program lived at that point, in seconds
  • GC - indicates it is a minor collection (Full GC means major collection)
  • 886K - the combined size of live objects before garbage collection
  • 173K - the combined size of live objects after garbage collection
  • 5056K - the footprint, i.e. total available space, excluding permanent objects
  • 0.0015737 - indicates how long this garbage collection took

What is the difference between minor and major collections? Well, major collections are slower as they check all objects if they should be garbage collected; minor just check the young objects.

OK, you can see the footprint here, but that's basically it - throughput and pause you'd have to calculate yourself which is a tedious task. It's quite easy to write a program to analyze those values though. There is a Perl script included in this rather lengthy article on improving Java application performance by Nagendra Nagarajayya and J. Steven Mayer. But there is also a freeware GUI tool you might use.

It is called GC Viewer and you can obtain it from the tagtraum.com website. This tool feeds on the log file that we generated previously. Below you can see a screenshot from this application.

GC Viewer screenshot

Note: When I used -XX:+PrintGCDetails, GC Viewer had some problems.


Speeding up your application


Once you have discovered that it is indeed the garbage collector your JVM runs that slows down your application, you should optimize. More, I think it's almost always worthwhile to see if you can speed up your program. But how?

You must realize that there are different kinds of garbage collectors (which and how many depends on your JVM version). You may decide that another garbage collector would be better for your program. Sun Docs provide a description of collectors and information on choosing them.

Obviously, there is a lot of parameters you can set to, e.g. set throughput ratio goal for your application (-XX:GCTimeRatio=<ratio>), maximum pause goal (-XX:MaxGCPauseMillis=<ms>) etc. You can find a list of those parameters on the tagtraum.com page. You might want to consider tuning those parameters.

You should take a separate look at your generations - perhaps the problem lies there and you can fix it by resizing generation sizes (for instance, the bigger the young generation, the less often minor collections happen - but the more often major collections will).

Update 2010-04-02: The following statement in bold is true - unless all objects die young before they make it to the tenured generation. Thanks goes to Vincent (see comments below).

Also, you can try to optimize your code. However, bear in mind that turning elegant code into disastrous chaotic madness to save precious milliseconds may not always prove worthwhile.


Scala


I played around a bit with my Scala program that approximates square roots. I managed to reduce the time it runs almost by two. These are the parameters I used:

set JAVA_OPTS=-XX:+PrintGCDetails -Xloggc:gc.log
-Xmx16m -Xms16m -Xmn2m

Further increasing the size of the new generation (-Xmn) decreased the number of collections and therefore increased the application speed. But then your program needs more memory.

Decreasing the new generation size means less memory needed, but collections run a lot, your application is slower and throughput decreases.

That works here this way because in my application objects live a short life and then die quickly. If it was otherwise, decreasing the new generation size would push objects to the adult stage earlier and they would be checked for garbage collection less often.


Read on


  1. Improving Java Application Performance and Scalability by Reducing Garbage Collection Times and Sizing Memory by Nagendra Nagarajayya and J. Steven Mayer
  2. Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine
  3. List of JVM flags for GC tuning
  4. GC Portal by Alka Gupta
  5. Java Performance Tuning - by buying this book after clicking this link you help me run this website

Download source code

PS. When investigating performance, you may also want to try out those commands:

-Xrunhprof (for methods profiling)
-Xrunhprof:cpu=samples (similar but simpler)
-Xprof (a heap/CPU profiling tool)
-Xaprof (allocation profiling tool)

PPS. There are more graphical tools and some of them might be pluggable to your IDE. However, when using the command line you can manipulate GC parameters in an easier manner (but that's my opinion).