Thursday, November 4, 2010

Taming the Java Garbage Collector

Tuning the garbage collection in JVM is one of those things that developers often tend to ignore and overlook. But if done properly, it can save you hundreds of megabytes worth precious memory, without making a significant impact on the application performance. Also GC tuning becomes an absolute necessity in certain applications due to various QoS requirements such as real-time processing, low response time and high throughput. So today I'm going to discuss a little bit about GC in JVM and how to properly tune up GC for best application performance. My discussion is entirely focused on Java 5 and 6, so if you are using an older JVM you are at the wrong place.
Before we jump into the discussion on GC tuning we need to have a basic understanding on the various concepts associated with GC. So here goes...
What is garbage collection?
Garbage collection is the process of finding objects which are no longer reachable from references in the executing code, and reclaiming the memory occupied by such objects. An object which is reachable by at least one reference in the executing code is considered to be 'live'. Objects which cannot be reached by any references are called 'dead' objects. So the process of garbage collection can be also defined as the process of finding dead objects in the memory and reclaiming the memory used by them. In general, a garbage collector is responsible for 3 tasks:
  1. Allocating memory for new objects
  2. Ensuring that any referenced objects (live objects) remain in memory
  3. Recovering memory used by objects that are no longer reachable (dead objects)
The following characteristics govern the notion of a 'good' garbage collector:
  • Safe (never reclaims a live object)
  • Comprehensive (a dead object does not remain unclaimed for more than a small number of collection cycles)
  • Efficient (does not introduce too long pauses)
  • Defragmentation (does not leave the memory fragments all over the place)
  • Scalable (allocation and deallocation scales well in multithreaded systems)
It is not possible and necessary for a garbage collector to display all these desirable characteristics at once. Usually each garbage collector has its own strengths and weaknesses.
GC Classification
Garbage collectors can be classified based on a number of factors.
Serial vs Parallel:
In serial collection only one collection occurs at a time (even with multiple CPU cores). In parallel collection, the task of collection is divided into subtasks and executed in parallel, possibly on multiple CPUs. This speeds up collection but is more complex and leads to potential fragmentation.
Stop-the-world vs Concurrent:
Stop-the-world collectors suspend the entire application during collections. Concurrent collectors run concurrently with the application (there could be occasional stop-the-world collections). With concurrent collection, freeze times are shorter but it has to operate on the objects which are being used by the running application. This adds more overhead on the collector and requires more CPU power and heap.
Compacting vs Non-compacting vs Copying:
Compacting collectors arrange all the live objects together in contiguous memory blocks. Then the remaining space can be considered free. This way the collection is slow but the allocations are faster. Non-compacting collectors free dead objects in-place. This leads to faster collections but also a recipe for fragmentation. Copying collectors copy (in contrast to moving) all the live objects to a different area in the memory. Then the source area can be considered free. This leads to slower and expensive collections but provides better allocation performance.
Generational Collection
Modern garbage collectors follow a scheme known as 'generational collection'. With this approach, the heap memory is allocated to two or more regions known as generations. A generation is a block of memory which contains objects of a certain age. In many implementations there are two generations, one for young (new and reasonably new) objects and one for old objects. Young generation is usually much smaller compared to the old generation. Generational collection allows employing different GC algorithms in different generations. This enables selecting a suitable algorithm based on the maturity of the objects.
Generational collection makes use of an interesting characteristic of applications, often referred to as the "generational hypothesis". This hypothesis states:
  • Most allocated objects are not referenced for long (they die young - sometimes also stated as infant mortality)
  • Few references from older objects to younger objects exist
Based on this hypothesis, modern garbage collectors run collections on the young generation frequently. This is fast and efficient because the young generation is small and likely to contain many dead objects. Objects that survive some number of young generation collections are moved (tenured) to the old generation. Because old generation is much larger, its occupancy grows very slowly. Therefore old generation collections are infrequent. But when they do occur, they tend to take a much longer time to complete.
In new HotSpot JVMs, the garbage collector divides the heap into 3 generations:
  • Young generation - contains newly allocated objects
  • Old generation - objects that has survived some number of young gen collections and some very large objects that were directly allocated in old gen
  • Permanent generation (perm gen) - contains classes, methods and associated descriptors which are managed by the JVM
The young generation is further divided into 3 regions. The larger division is known as the Eden. This is where almost all the new object allocations take place (under special circumstances, large objects may get allocated in the old generation). The other smaller spaces are known as survivor spaces. One of the survivor spaces are always kept empty until the next young generation collection.
When an old generation fills up a full collection (major collection) is performed. All generations are collected during a full collection. First the young generation is collected using the young generation collection algorithm. Then the old generation collection algorithm is run on the old generation and permanent generation. If compaction occurs, each generation is compacted separately. During a full collection if the old generation is too full to accept tenured objects from the young generation, the old generation collection algorithm is run on the entire heap (except with CMS collector).
Available Garbage Collectors
HotSpot JVM contains 3 garbage collectors:
  1. Serial collector
  2. Parallel collector
  3. Concurrent mark-sweep collector
In addition to that there is a special enhanced version of the parallel collector known as the parallel compacting collector. Let's see how each of these collectors work.
Serial Collector (Mark-Sweep-Compact collector)
This is the collector used by default on Java HotSpot client JVM. It is a serial, stop-the-world, copying collector. Because it is serial and operates in the stop-the-world mode it is not a very efficient collector.
Young generation collection:
  • Live objects in Eden are copied to the empty survivor space (objects that won't fit are directly copied to the old generation)
  • Live objects in the occupied survivor space are copied to the empty survivor space (relatively old objects are copied to old generation)
  • If the free survivor space becomes full during the process, remaining live objects in Eden and occupied survivor space are tenured
  • At this point Eden and the occupied survivor space contains only dead objects and so can be considered empty - The previously free survivor space contains some live objects
  • The two survivor spaces switch roles
Old/Permanent generation collection:
  • A two phase Mark-and-sweep algorithm is used to clean up dead objects
  • After that a sliding compaction (live objects are slided towards the beginning of the generation) is performed to compact the generations
Parallel Collector (Throughput Collector)
This is very similar to the serial collector in many ways. In fact the only notable difference is that parallel collector uses multiple threads to perform the young generation collection. Other than that it uses the same algorithms as the serial collector. The number of threads used for collection is equal to the number of CPUs available. Because of the parallel collection feature, this collector usually results in much shorter pauses and higher application throughput. However note that the old generation collection is still carried out using a single thread in serial fashion. This is the default collector used in Java HotSpot server JVM.
Parallel Compacting Collector
This is an enhanced version of the parallel collector. It uses multiple threads to perform the old generation collection as well. The old generation collection divides the generations into regions and operate on individual regions in parallel. The algorithm used for old generation collection is also slightly different from what's used in serial and parallel collectors.
Concurrent Mark-Sweep Collector (CMS Collector)
While the parallel collectors give prominence to application throughput, this collector gives prominence to low response time. It uses the same young generation collection algorithm as the parallel collectors. But the old generation collection is performed concurrently with the application instead of going to stop-the-world mode (at least most of the time). A collection cycle starts with a short pause known as the initial mark. This identifies the initial set of live objects directly reachable from the application code. Then during the concurrent marking phase, collector marks all live objects transitively reachable from the initially marked set. Because this happens concurrently with the application not all live objects get marked up. To handle this, the application stops again for a second pause for the remark phase. Remark phase is often run using multiple threads for efficiency. After this marking process a concurrent sweep phase is initiated.
CMS collector is not a compacting collector. Therefore it uses a set of free-lists when it comes to allocation. Therefore the allocation overhead is higher. Also CMS collector is best suited for large heaps. Because collection happens concurrently, the old generation will continue to grow even during collection. So the heap should be large enough to accommodate that growth. Another issue with CMS is floating garbage. That is objects considered as live may become garbage towards the end of the collection cycle. These will not get immediately cleaned up but will definitely get cleaned up during the next collection cycle. CMS collector requires lot of CPU power as well.
Unlike other collectors, CMS collector does not wait till the old generation becomes full to start a collection. Instead it starts collecting early so it can avoid old generation getting filled up to the capacity. If the old generation gets filled up before CMS kicks in, it resorts to the serial stop-the-world collection mode used by serial and parallel collectors. To avoid this CMS uses some statistics regarding previous collection times and the time taken to fill up the old generation. CMS also kicks in if the old generation occupancy exceeds a predefined threshold known as the initiating occupancy fraction. This is a configurable parameter and defaults to 68%.
There is a special mode of operation for the CMS collector known as the incremental mode. In the incremental mode, concurrent collection cycles are broken down into smaller chunks. Therefore during a concurrent collection cycle, the collector will suspend itself to give full CPU to the running application. This in turns reduces the impact of long concurrent collection phases. This mode is particularly useful in cases where the number of CPUs is small.
Selecting the Appropriate Collector
Most of the time JVM is smart enough to select the right collector for the situation by analyzing the system configuration (see the next section). But since the JVM has no knowledge of the application requirements, sometimes the JVM chosen collector will not be good enough. When it comes to manually selecting a collector for an application, there are no hard and fast rules that say a particular collector is suitable for a given scenario. So there are only a set of general guidelines and recommendations. Most of the time they will work, but there could be exceptions. Hope you find them useful:
  • Serial collector is the collector of choice for most applications that run on client-style machines and do not have low pause requirements. On modern hardware this collector can handle a wide range of applications with heaps as small as 64MB.
  • If the application has a small data set select the serial collector.
  • If the application has no low pause requirements and runs on a machine with a single CPU select the serial collector.
  • Parallel collector is best for applications that run on machines with multiple CPUs and do not have low pause requirements.
  • If application throughput is the main consideration and slightly longer pauses are acceptable select the parallel collector.
  • The parallel compacting collector can be used in almost any scenario where the parallel collector seems suitable. In fact the parallel compacting collector is almost always preferred over the parallel collector. However with this collector a single application can hog the CPU for an extended period of time. Therefore it's not very suitable in cases where multiple JVMs reside on a single large machine.
  • CMS collector should be the collector of choice whenever there is a requirement for low pauses and low response time. However this collector makes use of lot of CPU resources. Therefore impact on CPU usage must be carefully evaluated. Usually applications that have large sets of long-lived data (a large old generation), which run on machines with multiple CPUs, tend to benefit from this collector. Web servers and most interactive applications often fall into the category of CMS benefactors.
  • If it is needed to run an application with low pause requirements on a machine which doesn't have too many CPU resources, consider using the CMS incremental mode.
Note: Command line options needed to enable different collectors are given at the end of the post
Ergonomics
Modern JVMs has the ability to select a suitable garbage collector, heap size and JVM type by looking at the host platform and the OS. In addition to that, a new way of dynamically tuning memory management has been introduced to the parallel collectors. This way, a user can specify the desired behavior and the collector dynamically tunes the sizes of the heap regions in an attempt to achieve the requested behavior. The combination of platform-dependent default selections and dynamic GC tuning is referred to as ergonomics.
The ergonomics default selections are done based on the 'class' of the machine. A machine is considered to be server-class if it has:
  • 2 or more processors
  • 2 GB or more physical memory
This applies to all platforms except for 32-bit machines running Windows.
If the machine is classified as non server-class, ergonomics will choose the following settings:
  • HotSpot client JVM
  • serial collector
  • initial heap size of 4 MB
  • maximum heap size of 64 MB
For server-class machines it is slightly complicated. However in general it is something like:
  • HotSpot server JVM
  • parallel collector
  • initial heap size of 1/64 th of physical memory
  • maximum heap size of 1/4 th of physical memory under an upper limit of 1 GB
One can run the HotSpot client JVM on a server-class machine by explicitly enabling the -client command line option. In that case ergonomics will select the serial collector. Also when the collector selected by ergonomics is not the parallel collector the initial heap size and maximum heap size will be 4MB and 64MB respectively.
Sizing the Heap
There are lots of options available when it comes to sizing the heap, its regions and generations. Most of the time we don't have to meddle with these settings, but there could be exceptional situations where we have to tune them up to obtain optimal performance and avoid out of memory errors. The most common settings related to heap size are the initial heap size and maximum heap size which are set by -Xms and -Xmx command line options respectively. If you set a lower value to the initial heap size than the maximum heap size, JVM will try to start with a heap of initial size and then grow the heap as and when required. If you set equal values to both parameters, JVM will start with the specified maximum heap size.
Another important parameter is the NewRatio value which governs the ratio between the old generation size and young generation size. On most server class systems this defaults to 8. That means the old generation is 8 times larger than the young generation. If the application tends to do lot of new allocations then reducing this ratio to a reasonable value may not be a bad idea. Reducing this ratio will generally result in less minor collections. The size of the young generation can be further controlled by the NewSize and MaxNewSize options.
We can get the JVM to dynamically grow or shrink the generations based on how they are utilized by setting the MinHeapFreeRatio and MaxHeapFreeRatio parameters. These parameters try to impose some restrictions on the amount of free heap space in each generation. If the free space percentage in a given generation is about to drop below the MinHeapFreeRatio, JVM will expand the generation to meet the lower limit. Similarly generations will be contracted if the free space percentage crosses the MaxHeapRatio.
Command Line Options
Collector selection
  • Serial collector: -XX:+UseSerialGC
  • Parallel collector: -XX:+UseParallelGC
  • Parallel compacting collector: -XX:+UseParallelOldGC (combine with -XX:+UseParallelGC)
  • CMS collector: -XX:+UseConcMarkSweepGC
Parallel collector settings
  • Parallel GC thread count: -XX:ParallelGCThreads=n
  • Desired maximum pause length: -XX:MaxGCPauseMilis=n
  • Throughput (percentage of CPU time spent on application - defaults to 99): -XX:GCTimeRatio=n
CMS collector settings
  • Enable incremental mode: -XX:+CMSIncrementalMode
  • Parallel GC thread count: -XX:+ParallelGCThreads=n
  • Old gen occupancy threshold that triggers collections: -XX:CMSInitiatingOccupancyFraction=n
Heap sizing options
  • Initial size: -Xms
  • Maximum size: -Xmx
  • Initial size of the new generation: -XX:NewSize=n
  • Maximum size of the perm gen space: -XX:MaxPermSize=n
  • Ratio between old and new generation sizes: -XX:NewRatio=n
Debug options
  • Print basic GC info: -XX:+PrintGC
  • Print verbose GC info: -XX:+PrintGCDetails
  • Print details with time: -XX:+PrintGCTimeStamps
References

7 comments:

Unknown said...

If fed up of GC,

Check this out
http://www.dzone.com/links/r/terracotta_fairy_brings_big_memory_for_java_users.html

Anonymous said...

Oh very beautiful article which I was wondering for years. Already bookmarked in my browser.
Thanks for sharing.

javin paul said...

Excellent article thanks for sharing important information .Indeed its very useful for any one want to understand. Thanks for sharing information

Javin
5 tips on writing equals method in Java

Mayur Shiralkar said...

Gr8 article.. Keep t up..

Mayur Shiralkar said...

Gr8 article.. keep it up..

Sandeep said...

I would like to add that Garbage collection in Java can involve some algorithms like Mark and Sweep, stop and copy etc.

Kamal said...

Really useful article. Thanks a lot for sharing.

Regards,
Kamal Mettananda