In part 1 of this series, we introduced a barebones string implementation in C++ using the basic_string
class and explored small string optimization with our string32
class. Small string optimization in string32
improves performance by avoiding dynamic memory allocation for strings with 15 or fewer characters. In this article, we will run benchmarks to demonstrate the advantages of this optimization. We will also compare our implementations to the performance of the libc++ std::string
implementation in the end.
The code was compiled with clang and we used the Google Benchmark library to run the benchmark tests. More information on how the library runs the benchmark tests can be found here. The key metric that we are focusing on in this article is computation time for the string implementations.
The tests were run on a 14-inch MacBook Pro with the following specifications:
Processor
8-core Apple M1 Pro (2.06 - 3.22 GHz)
14-core GPU
RAM
16 GB
Storage
257 GB (Available) / 512 GB SSD
Benchmark results
Constructor initialization
In our initial test, we compared the construction times of basic_string
and string32
across various sizes, starting from 1 and doubling each time until reaching 4096 characters.
The findings revealed that for string sizes between 1 and 8, string32
exhibited significantly faster initialization, outperforming basic_string
by approximately 3 to 4 times. However, for strings 16 characters and larger, string32
's performance declined, matching or even falling behind basic_string
. This shift at size 16 suggests that dynamic allocation, required for larger strings, is the primary factor for the slowdown.
To further validate this hypothesis, we conducted additional tests with finer size variations. The results clearly showed a performance drop at the 16-character mark, aligning with the threshold for Small String Optimization (SSO). Therefore, we can reasonably conclude that string32
's superior performance with smaller sizes is due to SSO.
For larger string sizes, string32
's marginally inferior performance can likely be attributed to its need for an additional conditional check to determine whether the string is small or large, and an extra operation to set the corresponding flag. While branch prediction might mitigate the impact of the conditional check during our benchmarks, these extra steps could result in a slight overhead, making string32
marginally slower than basic_string
for large strings.
size() and c_str()
The next methods we benchmarked were size()
and c_str()
. This evaluation followed the same procedure as the constructor test, examining string sizes ranging from 1 to 4096 characters. The performance of size()
and c_str()
exhibited a much more uniform pattern across both basic_string
and string32
, unlike the variation observed in constructor initialization.
The methods indicated a constant time complexity. While the graphs appear to have some spikes, looking at the low standard error and standard deviation it is clear that the time taken remains practically constant regardless of string size.
Interestingly, both methods for both string classes maintained a consistent average runtime of approximately 0.3 nanoseconds. This number is noteworthy because it matches the duration of a single clock cycle on the MacBook's high-performance core, calculated as 1 divided by 3.22 GHz (clock speed), which approximates to 0.3 nanoseconds. These findings align well with our expectations. Since both size()
and c_str()
are simple constant read operations for basic_string
and string32
, they should logically translate into single processor instructions.
Comparing to std::string
In our concluding benchmarks, we compare basic_string
and string32
against the libc++ implementation of std::string
. The expectation is not for string32
to match the performance of std::string
, as there are decades of optimizations and improvements not included in our relatively simple string32
class. However, by comparing the patterns in the performance trends, we can gain some insight on whether we are on the right track with our SSO implementation.
Constructor initialization
Examining the chart for string construction with sizes up to 4096, we notice a similar trend in performance between std::string
and string32
, especially at larger sizes. This similarity is probably due to dynamic allocation being the primary performance bottleneck for large strings, a necessary step both the string implementations share. Despite having a similar pattern, std::string
has a significant improvement in performance from size 256 onwards, and this is likely due to large string optimizations that apply to exceptionally large strings. However, this is out of the scope of this article. More relevantly, there are distinct trend variations for smaller sizes, particularly at 32 and below, which warrants further analysis.
Taking a closer look, we can see that the trends of string32
and std::string
look similar, except that the std::string
trend line appears to be displaced to the right. Notably, std::string
's performance declines at size 23, unlike string32
's at size 16. This implies that std::string
's SSO effectively handles sizes up to 22.
size() and c_str()
The benchmarks for size()
and c_str()
methods in std::string produced consistent results with basic_string
and string32
. The average execution time of approximately 0.3 nanoseconds suggests that both methods also complete in a single CPU clock cycle.
What’s next?
In this article we can conclude that SSO provides a significant improvement in performance for construction of smaller strings. In our own benchmarking, it was an improvement factor of 3-4 times, but the exact numbers will differ depending on the system and type of tests run.
A notable discovery is std::string
's SSO capacity to accommodate strings up to size 22 efficiently. For applications frequently handling strings within the 16 to 22 size range, this characteristic of std::string could yield significant performance improvements. In our next article, we will delve into replicating this behavior in our string implementation, aiming to support SSO for strings up to size 22.