JVM performance insights

Luiz Fernando Teston
Team Lead - Java Core @ Flow Traders

PROGRAMMING

The Java Virtual Machine was initially perceived as slow. However, since the 90's it has had time to evolve and mature its ecosystem, platform and the JVM itself. Being very popular, it is only natural that good tooling develops around it. Choosing Java as a language or the JVM as a platform brings way more advantages than only "running anywhere".

This article is going to show some of the tooling available to gather insights around the behavior of an application, helping to find evidence of where the cause of a bottleneck is. Java as a platform offers diverse options of profilers and even Flight Recording natively on the standard JVM since version 11. Depending on the choice of the profiler, there is also an option for collecting data and visualizing it as a Flame Graph, making it easier to read and quickly spot bottlenecks.

The tooling presented here can be a good alternative for environments without good observability or not enough performance sensitive. Very often, attaching an agent or having an Application Performance Monitor is not possible or viable in the short term.

Profiling information, what to look for

When troubleshooting something, it is interesting to know a few aspects:

How are the computer resources used over time?
Where is time spent?
What is the behavior of the application serving too many clients?
How much time is needed for an application to respond?

The computer resources in question are usually memory and CPU. It is important to know if there are enough resources or even how much of a resource is needed. The profilers are usually able to help with this by showing memory and CPU usage graphs, garbage collection behavior and so on.

Having the thread stack traces collected over time gives a view on where the time is spent.

Regarding serving too many clients, this is usually achieved by measurements under high throughput. Depending on the application, it can be difficult to reproduce production behavior for most scenarios.

The time needed to respond, called latency, is usually going to be influenced by the saturation of the machine. A sample saturation scenario is, when memory is largely used. Chances are the garbage collection will take over and latency will suffer.

Profilers can show even more useful information such as thread contention, diverse internal information about the JVM, stack trace information from a given point in time, and so on and so forth. During troubleshooting, it can be easier to understand the behavior of the application over time than simply looking at logs for example.

Naturally collecting profiling information has its cost on application performance, file exports, and so on. It is not usually a good option to keep it on, but when struggling to find a bottleneck, they can provide valuable information.

Java and its virtual machines

Initially, Java gained focus as a programming language targeting embedded platforms. Over time, this changed to a more server-side focus. Defining a Virtual Machine and a stack-based instruction set made it easier for compiler writers to target this VM. Over time, with its popularity, bottlenecks were discovered and addressed. Given java had good specifications in its instruction set, it was possible for other providers to develop their own version of the Java Virtual Machine.

Some vendors, besides Sun Microsystems, had their own JVM, including IBM, Microsoft among others. The objective was to keep compatibility and this was the case most of the time. One Swedish company, called Appeal Virtual Machines, created a JVM called JRockit. This JVM became known for being fast. In an interesting chain of events, this company was bought by BEA Systems in 2002.

BEA systems did acquisition of a few companies back then and gained traction on its SOA platform, given the amount of solid application servers and the development tools gathered over time. BEA was later bought by Oracle in 2008. This time, Oracle became stronger in the Java ecosystem and later, in 2009, Oracle managed to buy Sun Microsystems. Two of the most advanced JVMs were now owned by Oracle. In 2010, Oracle shared the intentions of merging both JVMs, which was definitely not an easy statement to make.

What is available nowadays, as on the standard Hotspot JVM, is a result of more than two decades of continuous evolution using the knowledge of many companies in a row. JRockit features were incrementally added on the standard JVM.

Flight Recorder initially as an internal profiler

The best way to execute performance optimizations is to have good measurements. Being aware of that, Appeal developed its own profiler to be able to measure its own JVM and optimize it. JRockit Flight Recorder turned out to be so good it was distributed together with the JVM, so its users could have the capabilities of troubleshooting performance on their own applications.

The available tooling was a good addition for justify its JVM to Sun, that, at the time, required each JVM licensee to have a "value added". The companion tool, JRockit Mission Control, was able to gather the Flight Recorder information and provide some good analysis with graphs and so on.

This tool was intended for production usage in an "always on" approach. This means that special care was taken not to interfere too much with the performance of the running application. Most of the widely used profilers use safe points, which interferes much more with the performance of the application. A sample profiler of this category is VisualVM.

While those profilers are very useful for troubleshooting on the developer's machine, usually for production like load, profilers that makes usage of the AsyncCallTrace internal API are less intrusive. This category has both Flight Recorder and Honest Profiler.

One capability of Flight Recorder is to extract information from a running JVM for later analysis. This makes it possible to enable it when needed. This is not a unique feature of Flight Recorder, but what makes it a really interesting option is the fact it is already bundled on JVM (starting on the long term support version 11 it is even enabled from non-paying users). From an operations perspective, it is easier to run commands that are already installed on the production machines than installing additional components.

For applications running on an older JVM, a viable option for gathering Flight Recording information on version 8 is the Liberica JVM that backported this support from OpenJDK 11 into their Java 8.

Flight Recording, back in the day and nowadays

The author of this article, while doing troubleshooting on slowness during peak usage for a big client, had problems finding why some specific methods of this software were slow. At the time, the JVM used was BEA JRockit (luckily) and Flight Recorder information could be gathered. After staring at the graphs of JRockit Mission Control for a few minutes, the slow spot of the method could be easily identified and a simple query optimization could be done based on the information from the JVM. Notice this happened more than 10 years ago when APMs and observability were not a hot topic.

BEA JRockit Mission Control 1.0 from the early days (presentation of Code One in 2019)

Nowadays, recording flight recording information is easy, being possible to start it together with the application or attaching it while it is running. Very often, the author uses it to record execution during development and tests for later analysis.

VisualVM among other profilers supports reading flight recorder files these days

Flame Graphs

Brendan Gregg is an Australian performance engineer currently working at Netflix, but with a long record of contributions to observability which started during his time at Sun Microsystems. He wrote some really good books on performance analysis that are highly recommended.

One of this best contributions to this space is a visualization tool called Flame Graphs. These are simply another way of visualizing a set of collected stack traces over time. For example, with snapshots taken over time, it would be only natural that with more samples, data would get more difficult to be analyzed.

Flame Graphs aggregates those snapshots in such a way that common parts of those snapshots are grouped together. Aiming naively on big chunks means there is a good chance of meeting the part responsible for bottlenecks.

Flame Graphs as a tool to avoid guessing

When dealing with complex code, it is easy to blame "ugly parts" and start refactoring things that apparently don't make sense. Refactoring is a good practice and code should be kept as clean as possible. However during performance troubleshooting it is difficult to predict the behavior of the code in question or even predict how the refactored code will behave.

The author of this article could spot some performance bottlenecks by using Flame Graphs on collected profiling data on several occasions by himself. However, more rewarding than that was the fact that sharing this knowledge with team members turned out to produce better performance improvements, given the fact that team members could spot bottlenecks by themselves.

What happens very often is that the code change guided by Flame Graph data has great accuracy. For example, a method that is showing largely on the Flame Graph might not be necessary at all or could be cached to avoid repeated calling.

In this image from the Nextflix blog Java in Flames, it is possible to see file IO as the responsible of the bottleneck for this profiling data. The packages in the right part in green starting with "Lorg/Mozilla/javascript/gen/file" are clearly the biggest time consumers. Following this feedback the next step is to avoid those calls, cache its results or optimize this operation.

Where to go from here?

Getting familiar with both Flight Recording and Flame Graphs can turn out to advance your knowledge of the subject matter. The best way to do it is to get your hands dirty yourself. Some references are left both as a source material and reference for some of the commands.

In regards to Flight Recording, an exercise for the reader is to get Flight Recorder information via java arguments or jcmd. Opening such recording on a profiler and understanding the information that can be available is a nice next step.

In regards to Flame Graphs, getting familiarized with Brendan Gregg's website and learning how to export it from profiler agents or even gathering it from Flight Recorder information is a nice next step. Experimenting with it during performance optimizations can turn out to be a good tooling to have on your tool belt.

The tools presented by this article are a good option for enabling and disabling observability at will, gathering more complete data than usually available via APMs with minor performance impact. When properly used and understood they help to eliminate slow spots, from development all the way to production.