2010-04-01

Scala performance - garbage collection analysis

Scala runs on top of JVM which means that your code is compiled into bytecode. That in turn means that JVM uses garbage collection as your Scala program is running - garbage collector (GC) cleans up the memory when objects you allocated in your code become obsolete.

When tuning your application for performance, you may discover that it is the garbage collector that slows things down. For instance, garbage collector may occupy 50% of the CPU, which means that your program spends rather little of its time doing things you want it to do and a lot of its time cleaning up garbage. Then - you should diagnose problems and optimize. However, you can almost always speed up your application by optimizing garbage collection.

Note: I'm running Scala 2.8 on Windows 7 with a 32-bit JRE version 1.6.0_17.

Note: This article applies to any JVM language, not only Scala. By the end of the article I will present a Scala sample though.


Accessing garbage collector performance data


First, we need to collect GC data.

Before you run your app, you should set a certain Java option that will provide you with garbage collection information. How you set it is up to you - I will show how to set it via command line in Windows:

set JAVA_OPTS=-verbosegc

-verbose:gc should also be valid.

Now, when you run your application:

scala me.m1key.approximation.Launcher

... JVM will print something similar to output:

0.074: [GC 886K->173K(5056K),  0.0015737 secs]
0.100: [GC 1066K->128K(5056K), 0.0007356 secs]
0.124: [GC 1009K->183K(5056K), 0.0006837 secs]
0.154: [GC 1079K->130K(5056K), 0.0008542 secs]
0.181: [GC 1021K->148K(5056K), 0.0006898 secs]
0.210: [GC 1039K->241K(5056K), 0.0009012 secs]

You should run your Scala application from command line to see it. Of course, the result will vary depending on the program you run and it's always a bit different for each run - plus different JVM versions may have different formats of this data.

Now, before we analyze this, I would like to say that you might want to have this printed to a file instead of standard output (you probably do). In that case, this option will do it:

set JAVA_OPTS=-Xloggc:gc.log

And should you need more data, you can use -XX:+PrintGCDetails.

set JAVA_OPTS=-XX:+PrintGCDetails -Xloggc:gc.log

And you will also get this information:

Heap
 def new generation   total 960K, used 117K [0x04180000, 0x04280000, 0x04660000)
  eden space 896K,   6% used [0x04180000, 0x0418fa98, 0x04260000)
  from space 64K,  86% used [0x04260000, 0x0426dcb0, 0x04270000)
  to   space 64K,   0% used [0x04270000, 0x04270000, 0x04280000)
 tenured generation   total 4096K, used 324K [0x04660000, 0x04a60000, 0x08180000)
   the space 4096K,   7% used [0x04660000, 0x046b1118, 0x046b1200, 0x04a60000)
 compacting perm gen  total 12288K, used 7394K [0x08180000, 0x08d80000, 0x0c180000)
   the space 12288K,  60% used [0x08180000, 0x088b8908, 0x088b8a00, 0x08d80000)
No shared spaces configured.

Here you can see each generation and how much space it used.

Analyzing GC data


The three most interesting values are throughput, pauses and footprint.

  • throughput - the time spent not garbage collecting, i.e. your program doing useful things
  • pause - the time spent garbage collecting
  • footprint - the amount of memory your program took

Let's see.

0.074: [GC 886K->173K(5056K),  0.0015737 secs]

  • 0.074 - how long the program lived at that point, in seconds
  • GC - indicates it is a minor collection (Full GC means major collection)
  • 886K - the combined size of live objects before garbage collection
  • 173K - the combined size of live objects after garbage collection
  • 5056K - the footprint, i.e. total available space, excluding permanent objects
  • 0.0015737 - indicates how long this garbage collection took

What is the difference between minor and major collections? Well, major collections are slower as they check all objects if they should be garbage collected; minor just check the young objects.

OK, you can see the footprint here, but that's basically it - throughput and pause you'd have to calculate yourself which is a tedious task. It's quite easy to write a program to analyze those values though. There is a Perl script included in this rather lengthy article on improving Java application performance by Nagendra Nagarajayya and J. Steven Mayer. But there is also a freeware GUI tool you might use.

It is called GC Viewer and you can obtain it from the tagtraum.com website. This tool feeds on the log file that we generated previously. Below you can see a screenshot from this application.

GC Viewer screenshot

Note: When I used -XX:+PrintGCDetails, GC Viewer had some problems.


Speeding up your application


Once you have discovered that it is indeed the garbage collector your JVM runs that slows down your application, you should optimize. More, I think it's almost always worthwhile to see if you can speed up your program. But how?

You must realize that there are different kinds of garbage collectors (which and how many depends on your JVM version). You may decide that another garbage collector would be better for your program. Sun Docs provide a description of collectors and information on choosing them.

Obviously, there is a lot of parameters you can set to, e.g. set throughput ratio goal for your application (-XX:GCTimeRatio=<ratio>), maximum pause goal (-XX:MaxGCPauseMillis=<ms>) etc. You can find a list of those parameters on the tagtraum.com page. You might want to consider tuning those parameters.

You should take a separate look at your generations - perhaps the problem lies there and you can fix it by resizing generation sizes (for instance, the bigger the young generation, the less often minor collections happen - but the more often major collections will).

Update 2010-04-02: The following statement in bold is true - unless all objects die young before they make it to the tenured generation. Thanks goes to Vincent (see comments below).

Also, you can try to optimize your code. However, bear in mind that turning elegant code into disastrous chaotic madness to save precious milliseconds may not always prove worthwhile.


Scala


I played around a bit with my Scala program that approximates square roots. I managed to reduce the time it runs almost by two. These are the parameters I used:

set JAVA_OPTS=-XX:+PrintGCDetails -Xloggc:gc.log
-Xmx16m -Xms16m -Xmn2m

Further increasing the size of the new generation (-Xmn) decreased the number of collections and therefore increased the application speed. But then your program needs more memory.

Decreasing the new generation size means less memory needed, but collections run a lot, your application is slower and throughput decreases.

That works here this way because in my application objects live a short life and then die quickly. If it was otherwise, decreasing the new generation size would push objects to the adult stage earlier and they would be checked for garbage collection less often.


Read on


  1. Improving Java Application Performance and Scalability by Reducing Garbage Collection Times and Sizing Memory by Nagendra Nagarajayya and J. Steven Mayer
  2. Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine
  3. List of JVM flags for GC tuning
  4. GC Portal by Alka Gupta
  5. Java Performance Tuning - by buying this book after clicking this link you help me run this website

Download source code

PS. When investigating performance, you may also want to try out those commands:

-Xrunhprof (for methods profiling)
-Xrunhprof:cpu=samples (similar but simpler)
-Xprof (a heap/CPU profiling tool)
-Xaprof (allocation profiling tool)

PPS. There are more graphical tools and some of them might be pluggable to your IDE. However, when using the command line you can manipulate GC parameters in an easier manner (but that's my opinion).

15 comments:

  1. "the bigger the young generation, the less often minor collections happen - but the more often major collections will)"

    I'm not sure this is true, I think it would depend on your average object lifetime.

    If no object exists for more than a few milliseconds, why would there be more major collections? They should never make it to the old generation to begin with.

    ReplyDelete
  2. Hey Vincent, thanks for your comment.

    Well, if an object lives a long life it will enter another generation (tenured). Then it is no longer subject to minor collections.

    But you're right. If they all die young enough - then major collections will not run at all. I will update the article.

    Best regards.

    ReplyDelete
  3. I think this is a very interesting subject to study, as more languages are created (and migrate) to the JVM, there may be vastly different amounts and types of garbage generated for each.

    I wonder if there is an easy way to see the average object lifetime, and compare scala programs to java programs.

    My thought is scala will generate more short term garbage, but I don't have any proof.

    The G1 Collector has some interesting properties, and though it was originally intended as a 'concurrent' collector (which usually sacrifices overall througput), its certainly possible it could be even faster for scala like programs.

    ReplyDelete
  4. Vincent, to partially answer your question, you might be interested in seeing the output generated by your program when the JVM is given the hprof option.

    java -Xrunhprof[:options] your_program

    It will generate a report for heap usage. If you ask it to generate a binary report (Xrunhprof:format=b), you can analyze it with Eclipse Memory Analyzer.

    I ran this for my Scala program and I guess the result was similar to the one you predicted. Quite a lot of Scala garbage, it even detected two potential memory leaks.

    Perhaps you'd like to try out this program yourself. It has quite a few options and you might find what you are looking for.
    http://www.eclipse.org/mat/

    ReplyDelete
  5. Good garbage optimization is better then no garbage optimization.
    Can't argue on that one :)

    ReplyDelete
  6. Hi thanks for brief overview about the same. Enjoyed it. I am facing another issue. Its PermGen space after redeployment. a well known issue on the web forums. Can you please suggest me how to look for details on the same ? Or can you point to some place Where i can get a little more help ?

    ReplyDelete
  7. jigarshah.net I am not sure mau be this can help you with PermGen problem, see http://www.tomcatexpert.com/blog/2010/04/06/tomcats-new-memory-leak-prevention-and-detection

    ReplyDelete
  8. Thanks for all your replies.

    @jigarshah.net
    You have these parameters you can pass on to the server:
    -XX:PermSize=
    -XX:MaxPermSize=
    As far as I know, the server is responsible for this, it has a memory leak of some sort. I found more information here:
    http://www.jroller.com/agileanswers/entry/preventing_java_s_java_lang

    "The message is a symptom of an incomplete garbage collection sweep where resources are not properly released upon unload/restart. [...] the finger of blame has been pointed at CGLIB, Hibernate, Tomcat, and even Sun's JVM."

    ReplyDelete
  9. It depends on the scope and complexity of the project. However, as a general rule, we charge about £40-£60 for the labour involved +VAT and for the parts needed to complete the project.
    have a peek at this web-site

    ReplyDelete
  10. In the end however the improvement of independent, underground water and sewage frameworks dispensed with open sewage trench and cesspools. Plumbing repair

    ReplyDelete
  11. There are many plumbing services in Chicago the choice would limit to budget and locality although rates wouldn't vary that much you don't want to be fooled. Plumbing service

    ReplyDelete
  12. a plumbing contractor, you must check whether he is licensed or not. Only the licensed persons are trustworthy and can easily do any critical plumbing task easily. Every house owner would like to get the best professional to take care of any emergency situations. plumbing tools

    ReplyDelete
  13. Hence a specialized financing company that has experience in various types of equipment is required to seek garbage truck financing. garbage disposal reviews

    ReplyDelete
  14. كيف تحقق النجاح بالنسبة لخدمات كشف تسربات المياه بجدة حتي نكون علي اقتناع تام بأننا شركة كشف تسربات بجدة تقدم خدمات لا مثيل لها من خلال فنين يمتلكون القدرة علي اكتشاف جميع انواع التسريبات من خلال تدريبهم العالي علي الاجهزة الحديثة التي تساعد علي ذلك بدون هدم او تكسير كل ذلك سوف تجدونه عندما تتواصلوا مع شركة كشف تسربات المياه بجدة والتي تقدم كل ما لديها لمساعدتكم

    ReplyDelete
  15. Whenever you find leakage in your kitchen water pipe or a block in your basin, you must call the plumber for further assistance. Since, these tasks are quite difficult for a normal person to perform;professional contractors

    ReplyDelete