Modern garbage collectors follow a scheme known as 'generational collection'. With this approach, the heap memory is divided to two or more regions known as generations. A generation is a block of memory which contains objects of a certain age category. In many implementations there are two generations, one for young (new and reasonably new) objects and one for old objects. Young generation is usually much smaller compared to the old generation. Generational collection allows employing different GC algorithms in different generations. This enables selecting a suitable algorithm based on the maturity of the objects.
Generational collection makes use of an interesting characteristic of applications, often referred to as the "generational hypothesis". This hypothesis states:
Based on this hypothesis, modern garbage collectors run collections on the young generation frequently. This is fast and efficient because the young generation is small and likely to contain many dead objects. Objects that survive some number of young generation collections are moved (tenured) to the old generation. Because old generation is much larger, its occupancy grows very slowly. Therefore old generation collections are infrequent. But when they do occur, they tend to take a much longer time to complete.
The young generation is further divided into 3 regions. The larger division is known as the Eden. This is where almost all the new object allocations take place (under special circumstances, large objects may directly get allocated in the old generation). The other smaller spaces are known as survivor spaces. One of the survivor spaces are always kept empty until the next young generation collection.
When an old generation fills up a full collection (major collection) is performed. All available generations are collected during a full collection. First the young generation is collected using the young generation collection algorithm. Then the old generation collection algorithm is run on the old generation and permanent generation. If compaction should be done, each generation is compacted separately. During a full collection if the old generation is too full to accept tenured objects from the young generation, the old generation collection algorithm is run on the entire heap (except with CMS collector).
Available Garbage Collectors
HotSpot JVM contains 3 garbage collectors:
- Serial collector (mark-sweep-compact collector)
- Parallel collector (throughput collector)
- Concurrent mark-sweep collector
In addition to that there is a special enhanced version of the parallel collector known as the parallel compacting collector. Let's see how each of these collectors work.
Serial Collector (Mark-Sweep-Compact collector)
This is the collector used by default on Java HotSpot client JVM. It is a serial, stop-the-world, copying collector. Because it is serial and operates in the stop-the-world mode it is not a very efficient collector.
Young generation collection is performed by following these steps:
- Live objects in Eden are copied to the empty survivor space (objects that won't fit are directly copied to the old generation)
- Live objects in the occupied survivor space are copied to the empty survivor space (relatively old objects are copied to old generation)
- If the free survivor space becomes full during the process, remaining live objects in Eden and occupied survivor space are tenured
- At this point Eden and the occupied survivor space contains only dead objects and so can be considered empty - The previously free survivor space contains some live objects
- The two survivor spaces switch roles
Old/Permanent generation collection:
- A two phase Mark-and-sweep algorithm is used to clean up dead objects
- After that a sliding compaction (live objects are slided towards the beginning of the generation) is performed to compact the generations
Parallel Collector (Throughput Collector)
This is very similar to the serial collector in many ways. In fact the only notable difference is that parallel collector uses multiple threads to perform the young generation collection. Other than that it uses the same algorithms as the serial collector. The number of threads used for collection is equal to the number of CPUs available. Because of the parallel collection feature, this collector usually results in much shorter pauses and higher application throughput. However, note that the old generation collection is still carried out using a single thread in serial fashion. This is the default collector used in Java HotSpot server JVM.
Parallel Compacting Collector
This is an enhanced version of the parallel collector. It uses multiple threads to perform the old generation collection as well. The old generation collection divides the generations into regions and operate on individual regions in parallel. The algorithm used for old generation collection is also slightly different from what's used in serial and parallel collectors.
Concurrent Mark-Sweep Collector (CMS Collector)
While the parallel collectors give prominence to application throughput, this collector gives prominence to low response time. It uses the same young generation collection algorithm as the parallel collectors. But the old generation collection is performed concurrently with the application instead of going to stop-the-world mode (at least most of the time). A collection cycle starts with a short pause known as the initial mark. This identifies the initial set of live objects directly reachable from the application code. Then during the concurrent marking phase, collector marks all live objects transitively reachable from the initially marked objects. Because this happens concurrently with the application not all live objects get marked up. To handle this, the application stops again for a second pause for the remark phase. Remark phase is often run using multiple threads for efficiency. After this marking process a concurrent sweep phase is initiated.
CMS collector is not a compacting collector. Therefore it uses a set of free-lists when it comes to allocation. Hence the allocation overhead is higher. Also CMS collector is best suited for large heaps. Because collection happens concurrently, the old generation will continue to grow even during collection. So the heap should be large enough to accommodate that growth. Another issue with CMS is floating garbage. That is objects considered as live may become garbage towards the end of the collection cycle. These will not get immediately cleaned up but will definitely get cleaned up during the next collection cycle. CMS collector requires lot of CPU power as well.
Unlike other collectors, CMS collector does not wait till the old generation becomes full to start a collection. Instead it starts collecting early, so it can avoid old generation getting filled up to the capacity. If the old generation gets filled up before CMS kicks in, it resorts to the serial, stop-the-world collection mode used by serial and parallel collectors. To avoid this CMS uses some statistics regarding previous collection times and the time taken to fill up the old generation. CMS also kicks in if the old generation occupancy exceeds a predefined threshold known as the initiating occupancy fraction. This is a configurable parameter in HotSpot JVM and defaults to 68%.
There is a special mode of operation for the CMS collector known as the incremental mode. In the incremental mode, concurrent collection cycles are broken down into smaller chunks. Therefore during a concurrent collection cycle, the collector will suspend itself to give full CPU to the running application. This in turns reduces the impact of long concurrent collection phases. This mode is particularly useful in cases where the number of CPUs available is small.
Selecting the Appropriate Collector
Most of the time JVM is smart enough to select the right collector for the situation by analyzing the system configuration (see the next section). But since the JVM has no knowledge of the application requirements, sometimes the JVM chosen collector may not be good enough. When it comes to manually selecting a collector for an application, there are no hard and fast rules that say a particular collector is suitable for a given scenario. So there are only a set of general guidelines and recommendations. Most of the time they will work, but there could be exceptions (trial-and-error is the order of the day). Hope you find them useful:
- Serial collector is the collector of choice for most applications that run on client-style machines and do not have low pause requirements. On modern hardware this collector can handle a wide range of applications with heaps as small as 64 MB.
- If the application has a small data set, select the serial collector.
- If the application has no low pause requirements and runs on a machine with a single CPU, select the serial collector.
- Parallel collector is best for applications that run on machines with multiple CPUs and do not have low pause requirements.
- If application throughput is the main consideration and slightly longer pauses are acceptable, select the parallel collector.
- The parallel compacting collector can be used in almost any scenario where the parallel collector seems suitable. In fact the parallel compacting collector is almost always preferred over the parallel collector. However with this collector a single application can hog the CPU for an extended period of time. Therefore it's not very suitable in cases where multiple JVMs reside on a single large machine.
- CMS collector should be the collector of choice whenever there is a requirement for low pauses and low response time. However this collector makes use of lot of CPU resources. Therefore impact on CPU usage must be carefully evaluated. Usually applications that have large sets of long-lived data (in other words, a large old generation), which run on machines with multiple CPUs, tend to benefit from this collector. Server applications and most interactive programs often fall into the category of CMS benefactors.
- If it is needed to run an application with low pause requirements on a machine which doesn't have too many CPU resources, consider using the CMS incremental mode.
Note: Command line options needed to enable different collectors are given towards the end of the article
Modern JVMs has the ability to select a suitable garbage collector, heap size and JVM type by looking at the host platform and the OS. In addition to that, starting from Java 5, a new way of dynamically tuning memory management has been introduced to the parallel collectors. This way, a user can specify the desired behavior and the collector dynamically tunes the sizes of the heap regions in order to provide the requested behavior. The combination of platform-dependent default selections and dynamic GC tuning is referred to as ergonomics.
The ergonomics default selections are done based on the 'class' of the machine. A machine is considered to be server-class if it has:
- 2 or more processors
- 2 GB or more physical memory
This applies to all platforms except for 32-bit machines running Windows.
If the machine is classified as non server-class, ergonomics will choose the following settings:
- HotSpot client JVM
- serial collector
- initial heap size of 4 MB
- maximum heap size of 64 MB
For server-class machines it is slightly complicated. However in general it is something like:
- HotSpot server JVM
- parallel collector
- initial heap size of 1/64 th of physical memory
- maximum heap size of 1/4 th of physical memory under an upper limit of 1 GB
One can run the HotSpot client JVM on a server-class machine by explicitly enabling the -client command line option. In that case ergonomics will select the serial collector. Also when the collector selected by ergonomics is not the parallel collector the initial heap size and maximum heap size will be 4 MB and 64 MB respectively.
Sizing the Heap
There are lots of options available when it comes to sizing the heap, its regions and generations. Most of the time we don't have to meddle with these settings, but there could be exceptional situations where we have to tune them up to obtain optimal performance and avoid out of memory errors. The most common settings related to heap size are the initial heap size and maximum heap size which are set by -Xms and -Xmx command line options respectively. If you set a lower value to the initial heap size than the maximum heap size, JVM will try to start with a heap of initial size and then grow the heap as and when required. If you set equal values to both parameters, JVM will start with the specified maximum heap size.
Another important parameter is the NewRatio value which governs the ratio between the old generation size and young generation size. On most server class systems this defaults to 8. That means the old generation is 8 times larger than the young generation. If the application tends to do lot of new allocations, then reducing this ratio to a reasonable value may not be a bad idea. Reducing this ratio will generally result in less minor collections. The size of the young generation can be further controlled by the NewSize and MaxNewSize options.
We can get the JVM to dynamically grow or shrink the generations based on how they are utilized by setting the MinHeapFreeRatio and MaxHeapFreeRatio parameters. These parameters try to impose some restrictions on the amount of free heap space in each generation. If the free space percentage in a given generation is about to drop below the MinHeapFreeRatio, JVM will expand the generation to meet the lower limit. Similarly generations will be contracted if the free space percentage crosses the MaxHeapRatio.
Command Line Options
- Serial collector: -XX:+UseSerialGC
- Parallel collector: -XX:+UseParallelGC
- Parallel compacting collector: -XX:+UseParallelOldGC (combine with -XX:+UseParallelGC)
- CMS collector: -XX:+UseConcMarkSweepGC
Parallel collector settings
- Parallel GC thread count: -XX:ParallelGCThreads=n
- Desired maximum pause length: -XX:MaxGCPauseMilis=n
- Throughput (percentage of CPU time spent on application - defaults to 99): -XX:GCTimeRatio=n
CMS collector settings
- Enable incremental mode: -XX:+CMSIncrementalMode
- Parallel GC thread count: -XX:+ParallelGCThreads=n
- Old gen occupancy threshold that triggers collections: -XX:CMSInitiatingOccupancyFraction=n
Heap sizing options
- Initial size: -Xms
- Maximum size: -Xmx
- Initial size of the new generation: -XX:NewSize=n
- Maximum size of the perm gen space: -XX:MaxPermSize=n
- Ratio between old and new generation sizes: -XX:NewRatio=n
- Print basic GC info: -XX:+PrintGC
- Print verbose GC info: -XX:+PrintGCDetails
- Print details with time: -XX:+PrintGCTimeStamps