JAX London Blog

Improving Java Performance: Clear Details on Java Collection ‘Clear()’ API

Taking a tour through the clear() API

Jul 25, 2023

Several of us might already be familiar with the clear () API in Java collections framework. In this post, let’s discuss what the purpose of this clear() API is. What is the performance impact of using this API? What happens under the JVM when this API is invoked?

STAY TUNED!

Learn more about JAX London

What does the clear() API do?

The clear() API is present in the Java Collection interface. It’s implemented by all the concrete classes that implement the Collection interface: ArrayList, TreeSet, Stack, etc. When this method is invoked, it removes all the elements that are present in data structure.

How does ArrayList’s clear() method work in Java?

In this post, let’s focus on the ArrayList’s implementation of the clear() method. Other data structures implementation is also quite similar. ‘ArrayList’ underlying has an Object array i.e., ‘Object[]’ as a member variable. When you add records to the ‘ArrayList’, they are added to this ‘Object[]’. When you invoke the ‘clear()’ API on the ‘ArrayList’, all the objects (i.e., contents) of this ‘Object[]’ will be removed. Let’s say we created an ‘ArrayList’ and added a list of integers 0 to 1,000,000 (1 million). When the ‘clear()’ method is invoked on it, all the 1 million integers from the underlying ‘Object[]’ will be removed. However, the empty ‘Object[]’ with size of 1 million will continue to remain, consuming memory unnecessarily.

Creating ArrayList example

It’s always easy to learn with an example. Let’s learn the ‘clear()’ API functionality with this simple example:

Here are the operations we are performing in this ‘ClearNoDemo’ class:

  • We are creating a ‘myList’ object whose type is ‘ArrayList’ in line #3.
  • We are adding 0 to 1 million ‘Long’ wrapper objects to this ‘myList’ from line #07 – #10.
  • In line #14, we are putting the thread to sleep for 10 seconds, to capture the heap dump for our discussions.

We ran this program and captured the heap dump from the program using the open source yCrash script when it was sleeping in line #14. We captured the heap dump so that we can study how objects are stored in the memory. A heap dump is basically a binary file, which contains information such as: What are the objects residing in the memory? What is their size? Who is referencing them? What are the values that are present in them? Since heap dump is a binary file in unreadable format, we analyzed the heap dump using the heap dump analysis tool – HeapHero. The report generated by the tool can be found here. Below is the Dominator Tree section from the report that displays the largest objects in the application:

Fig 1: ‘ArrayList’ without invoking ‘clear()’ API (heap report by HeapHero)

 

You will notice that our ‘myList’ object is reported as the largest object because we created 1 million ‘Long’ objects and stored them in it. You may also notice that the ‘myList’ object has a child object ‘elementData’ whose type is the ‘Object[]’. This is the actual Object[] where 1 million+ records are stored. Also, you’ll see that this ‘Object[]’ occupies 27.5mb of memory. This analysis confirms that the objects we’re adding are stored in the internal ‘Object[]’.

List#clear() API example

Now, we have created a slightly modified version of the above program where we are invoking the ‘clear()’ API on the ‘ArrayList’.

Here are the operations we are performing in this ‘ClearDemo’ class:

  • We are creating a ‘myList’ object whose type is ‘ArrayList’ in line #3.
  • We are adding 0 to 1 million ‘Long’ wrapper objects to this ‘myList’ from line #07 – #10.
  • We are removing the objects from the ‘myList’ on line #13 using the ‘clear()’ API.
  • In line #16, we are putting the thread to sleep for 10 seconds, to capture the heap dump for our discussions.

When you invoke ‘clear()’ API, all the 1 million ‘Long’ objects that were stored in the ‘Object[]’ will be removed from the memory. However, ‘Object[]’ itself will continue to remain in the memory. To confirm this theory, we ran the above program and captured the heap dump using the open source yCrash script when the program was sleeping in line #16. We analyzed the heap dump using the heap dump analysis tool – HeapHero. The report generated by the tool can be found here.

Below is the Dominator Tree section from the report that displays the largest objects in the application:

Fig 2: ‘ArrayList’ after invoking ‘clear()’ API (heap report by HeapHero)

 

You’ll notice our ‘myList’ object is reported as the largest object. You can also see that the ‘myList’ object has a child object ‘elementData’ whose type is the ‘Object[]’. However, this ‘Object[]’ has 0 entries (i.e., no elements in it), but it has an array size of 1 million+. Since this empty array with 1 million+ size is present, it occupies 4.64mb of memory. This analysis confirms that even though objects are removed by invoking ‘clear()’ API, still the underlying ‘Object[]’ with 1 million+ size will continue to exist, consuming memory unnecessarily.

Note: Refer to the ‘Memory Impact’ section below to learn what kind of performance impact your application will experience when invoking ‘clear()’ API.

Assigning List to null example

To make our study even more interesting, we created a slightly modified version of the above program where we were assigned the ‘myList’ to ‘null’ reference instead of invoking ‘clear()’ API to remove the objects from the ‘ArrayList’.

Here are the operations we are performing in this ‘ClearNullDemo’ class:

  • We are creating a ‘myList’ object whose type is ‘ArrayList’ in line #3.
  • We are adding 0 to 1 million ‘Long’ wrapper objects to this ‘myList’ from line #07 – #10.
  • We are assigning the list to ‘null’ in line# 13 instead using the ‘clear()’ API.
  • In line# 16, we are putting the thread to sleep for 10 seconds, to capture the heap dump for our discussions.

When you are assigning ‘null’ to ‘myList’, it will make the ‘ArrayList’ and underlying ‘Object[]’ eligible for garbage collection. They will no longer exist in the memory. To confirm this theory, we ran the above program and captured the heap dump using the open source yCrash script, when the program was sleeping in line# 16.

We analyzed the heap dump using the heap dump analysis tool – HeapHero. The report generated by the tool can be found here. Below is the Dominator Tree section from the report that displays the largest objects in the application. You may notice that our ‘myList’ object reported is not even present in the list (as it was garbage collected from the memory). This is totally in contrast to the earlier two example programs.

 

Memory Impact

Fig 3: Memory occupied by ArrayList

 

The above chart shows the memory occupied by the ‘ArrayList’.

  • When ‘ArrayList’ created 1 million ‘Long’ records it occupies 27.5MB.
  • When ‘clear()’ API was invoked, it continues to occupy 4.64MB because the underlying empty ‘Object[]’ will continue to remain in memory.
  • On the other hand, when assigned to ‘null’, ‘ArrayList’ gets garbage collected and doesn’t occupy any memory.

Thus, from the memory perspective, it’s a prudent decision to assign the ‘ArrayList’ to ‘null’ instead of invoking the ‘clear()’ API.

Processing Time Impact

Above is the source code of the ‘clear()’ method from the JDK. From the source code (i.e., line #4 and #5) – you can see that this method loops through all the elements in the underlying ‘Object[]’ assigns them to the ‘null’ value. This is a time consuming process, especially on a collection that has a lot of elements, like our example of 1 million elements. In such circumstances, assigning the ‘ArrayList’ variable to ‘null’ would be more performant.

When to use Collection#Clear() API?

This raises a question of whether we should never invoke ‘clear()’ API because of its memory and processing impact. Although I would vote for this option, there might be scenarios in which clear() API might have its case:

  • Passing by reference: If you are passing a Collection object as a reference to other parts of the code, then assigning ‘null’ value, will result in the famous ‘NullPointerException’. To avoid that exception, you may use the ‘clear()’ API.
  • Collection size is small: If you are creating only a few collection instances and their size is very small (say it has only 10 or 20 elements), then invoking ‘clear()’ API or assigning null might not make much difference.

Conclusion

I hope that in this post, you have learned about clear() API and its performance impacts in detail.

Behind the Tracks

Software Architecture & Design
Software innovation & more
Microservices
Architecture structure & more
Agile & Communication
Methodologies & more
DevOps & Continuous Delivery
Delivery Pipelines, Testing & more
Big Data & Machine Learning
Saving, processing & more