String concatenation is a subject that most developers are familiar with but is often forgotten when it comes to code optimization. If the code works, that doesn't mean that it works well. Although sometimes this may be enough, it usually leads to unsatisfactory results. Being a good developer means that you have to pay attention to little things as well. One of these things is string concatenation.
The easiest way to concatenate two strings is just by using the '+' sign. It's one line of code, straight and simple, and, most importantly, it works. Then we forget about that part in which we do concatenation many times in a loop and after some time we start wondering why that part of our application is running slower and slower. The only different thing is the amount of data you are fetching from the database that is being processed by your “for” loop.
JVM is the one which doesn’t allow string mutability, even though it looks like appending is done by looking at the code. Under the hood, JVM does all the necessary things to make it easy for developers to do the concatenation.
It's immutable, it can't be changed. Still, you don’t encounter any problems while using it, right? Well, some say it's slow, but mostly because a new memory block will be allocated with every addition.
If you have to concatenate two strings, you will be ok since only one additional object will be created and java compiler will optimize it for you. This will be explained in more details later in the text. Now, imagine you have 10000+ strings that need to be fetched from the database, do some parsing on them and then do concatenation which can result in 10000+ newly instantiated objects and lots of executed operations.
Each time '+=' is called, a new String is created which means that part of the memory is allocated with the size of the old 'result' and the value of the 'letter' variable. After that variable, ‘result’ must be reassigned. We end up leaving more work for the garbage collector since we have created a lot of objects that are not in use anymore.
To reduce the complexity StringBuilder is made. StringBuilder is mutable. This means that by calling its append() function and passing a string as a parameter you will do concatenation without creating any extra objects. However, you will still get the same result. Only one extra object that will be created is the StringBuilder instance. We can instantiate StringBuilder in three ways:
The constructor with no parameters will create an instance of StringBuilder with the buffer size of 16. This is a default buffer size meaning it can contain 16 characters before it needs to be adjusted. When the buffer is full, StringBuilder will reallocate a new array which will have two times bigger capacity than the previous instance plus 2 additional buffer locations.
If a current instance has a 16 buffer size, that means that a new instance will have 34. That is a lot better than regular string concatenation using '+' sign which does the reallocation each time it's called.
For simple string concatenation without “for” or “while” loops, you don't need to use StringBuilder. Java compiler will jump right in and optimize your '+' string concatenation by replacing it with a StringBuilder, if all of the substrings building the final String are known at compile time. This optimization is known as a static string concatenation optimization and has been available since Java 5.
It's different with loops since Java compiler doesn't know how the final String will look like. The compiler will replace '+' in loop body with StringBuilder, but with each iteration, a new StringBuilder will be instantiated. The best solution is to create a StringBuilder just before the loop and call append() inside of the loop.
To prove everything that we just stated, we created a small application as an example and ran micro-benchmarking using JMH. JMH is short for Java Microbenchmark Harness. It’s an open-source tool that helps developers test smaller parts of their application.
We used the following configuration of our JMH test:
It’s good to make a couple of warmup iterations, just so we don’t get random results. We enlarged our batch size to 100 for warm-ups and 200 for the actual measurement since our example string array is not that big. We used this option to say JMH that one operation contains N invocations - in our case 100 and 200.
Benchmark mode measures the time for a single operation. This mode can be changed to show different kinds of measurements, from average time it takes for the benchmark method to execute to how many iterations the code is able to go through in time.
Fork option sets the number of separate execution environments.
The score column shows the time taken to execute an operation. In our test case the lower is better and from the result table, we can see how much faster StringBuilder does its job.
Code is available on Github.