Applogic IT Solutions India Pvt Ltd: Solving Java Memory Regressions with Zero Overhead and High Accuracy

Customer trust is Salesforce’s highest priority. Our customers trust us with their data and that our software platform will perform reliably. They also trust that our applications and architecture will be the fastest and most responsive user experience. That’s why performance is at the top of our priorities.

The Salesforce Performance Engineering team is tasked with ensuring that the platform and SaaS applications perform at the highest level. Our team conducts extensive performance tests continually. We monitor and analyze the results and resolve any regressions that are found. Even a few percentage points degradation in performance is not allowed to go into production.

The performance testing is done in the form of workloads. A workload is a repeatable load test consisting of a set of user requests that exercise specific features or functionalities (Apex cache, Visualforce pages, or Chatter feeds for example). A given workload is run periodically, usually daily, on the latest code version at that time. We achieve repeatability and high accuracy of the test through full automation of the run, data collection, and data analysis. The performance engineering team relies mostly on open source tools (e.g., JMeter for generating load) and tools developed in-house (e.g., test automation orchestration, data collection, and results processing).

Memory Allocations Heavily Influence Application Performance

Application performance depends on many factors, including: the architecture of the system, algorithms used to achieve given functionality, efficiencies of the code and database queries, cache system, the database, and so on. Among these factors, object allocations play an important role in Java application performance, or any other application utilizing a VM that manages application memory. An increase in the number and/or size of objects allocated may take more operations by the application code. Also, a higher object allocation rate usually leads to an increase in the overhead of memory management by the host VM.

Solving Memory Regressions in a Complex Application

While detecting memory regressions is a relatively easy task, finding the root cause of the increase in memory allocations is usually a very hard problem to tackle. A number of commercial and open source tools exist that aim to help in solving this problem. Commercial tools like YourKit can track object allocations by instrumenting bytecode of the application. Instrumentation is done by an agent attached to the JVM at startup. Another approach to solve memory regressions are heap dumps taken at runtime of the app and inspected later with tools like Eclipse Memory Analyser (MAT), YourKit, etc. In addition to that, ThreadMXBean which is part of JMX MBeans can be used to estimate amount of memory allocated in a given transaction. This usually requires embedding an instrumentation framework in the application that collects and records this data in the logs for every transaction executed by the application.

Collecting Information About Allocated Objects with Zero Overhead

If these approaches do not help, what can we do to solve memory allocation regressions? Let’s summarize what we need to succeed:

We want to record all object allocations and associated parameters (e.g. object type and amount of bytes allocated) during the run of our workload.
We do not want our workload to be impaired by overhead either caused by bytecode instrumentation, or overhead associated with collecting the data about allocated objects by the profiling agent.
We need to ensure high accuracy of the results collected in test experiments.

Illustartion:

A simple Java application that allocates double arrays wrapped in class MyDoubleArray. In the base run, the application allocates 2000 MyDoubleArray objects each containing adouble[102400] array. The regressed version of the code allocates the same number of MyDoubleArray objects, but with the double array larger by 100 elements (i.e., double[102500]). Comparison of heap dumps produced as the result of the algorithm we discussed show this difference in the amount of memory occupied by double[] (see the below image). Note that other objects have the same count and occupy the same space in the 2 heap dumps.