A very nice blog post went by yesterday titled “The Little Things: The Missing Performance in std::vector”.

You should read it, but in case you don’t, the premise is that std::vector::push_back is wasteful when you’re appending a bunch of things to a vector, since you can reserve space for them all but it keeps checking if it needs to allocate more space on every loop iteration. The author calls for a std::vector::push_back_unchecked method that appends under the assumption that space is already allocated. This exposes Undefined Behaviour if you do it wrong, of course, and the author argues that this is both necessary and good. And I agree.

This blog, and my efforts in the C++ language space, are all around reducing the negative impact of Undefined Behaviour and eliminating memory-safety bugs. So the above may seem counterintuitive at first glance.

I recently argued that the nature of the C++ standard library is basically to expose Undefined Behaviour:

This is the nature of bare C++; it is as close to the hardware as possible for historical and good reasons. The C++ standard intentionally leaves no room for a lower level language (except hardware assembly), and it exposes all the complexity of the compiler through the std lib (e.g. type_traits). This is an “assembly language for systems programming”.

And in that same vein, adding std::vector::push_back_unchecked would expose the nature of the conceptual C++ machine, and allow authors to control that machine more effectively.

Undefined Behaviour is Good.

Chandler Carruth argues for Undefined Behaviour from a slightly different perspective in his CppCon 2016 talk, Garbage In, Garbage Out: Arguing about Undefined Behavior With Nasal Demons. Undefined Behaviour gives us performance, as we can see clearly from the push_back_unchecked proposal.

But that sentence above is not complete without a qualification which usually gets missed in the C++ world. Completed, it reads:

Undefined Behaviour is Good when it is Encapsulated.

I believe std::vector should expose that Undefined Behaviour along with, and because it exposes, many other Undefined Behaviours. However that also implies that std::vector is not suitable for use in most application code. It is a building block on which performant APIs can be built which encapsulate the inherent unsafety of its own API that leaks Undefined Behaviour all throughout.

FromIterator and Collect

Subspace provides the FromIterator concept and then implements this concept for sus::Vec and std::vector as well as all the types in the standard containers library.

Like most of the Subspace library, this is a reimplementation of the Rust FromIterator trait. If not already familiar with the trait and its many uses, the key points for us here are:

  • You can construct a container from another container, or an iterator.
  • The construction happens in a single atomic step, from the perspective of application logic, which is typically the iterator collect method.
    • See also the Rust collect for more code examples that will look similar in C++ but aren’t in the Subspace docs (yet).

By providing the ability to construct a container like a vector in an atomic operation from an arbitrary set of inputs, we provide the ability for the library to give a safe and simple API abstraction around a powerful operation which can be implemented entirely on top of APIs that expose Undefined Behaviour. Because of the atomic nature, the use of those unsafe APIs is fully encapsulated from the application layer, allowing the use of them in a controlled manner. Much like in how “safe Rust” applications are built on top of a stdlib full of unsafe Rust, yet retain memory safety for themselves, this creates the opportunity to build a safer security-critical C++ application without sacrificing performance. In fact, as we’ll see, we can gain performance.

Performance Optimizing FromIterator and Collect

The benchmarks provided in the aforementioned “Missing Performance in std::vector” article showed on Clang 15:

  • A 5x improvement when building a small to medium vector from a simple integer mapping.
  • A 1.3x improvement when building a large vector of the same.
  • A nearly 3x improvement when building a small to medium vector from a simple mapping function that unwraps a struct.
  • A 1.1x improvement when building a large vector of the same.

I was curious how FromIterator would compare with sus::Vec. If push_back_unchecked existed, the implementation of FromIterator for std::vector could absolutely make use of the method, with no risk of introducing Undefined Behaviour and memory safety bugs into the application as a result. For sus::Vec we have full access to the underlying data structure so we can do all the unsafe things internally that we want, and we should be able to see the same impact as with push_back_unchecked. Given that, how does FromIterator compare to the naive v.reserve(..); for (..) { v.push_back(..) } approach?

I copied the benchmarks from the above article and added them to the Subspace nanobench test suite.

At first, it was slow. Really slow. Vec does not define that it is empty after a move, instead the moved-from Vec is left invalid and Vec checks for use-after-move. Additionally, Vec tracks outstanding iterators and terminates if mutated instead of invalidating the iterator and proceeding with Undefined Behaviour. But the inner loop of FromIterator was checking these things over and over again. So I added some internal methods to break apart checks and implementation, and to split up methods for better inlining. Then I used local variables to avoid indirections through this-pointers.

The PR containing the changes and the benchmarks is https://github.com/chromium/subspace/pull/337.

Here’s the results of building sus::Vec via FromIterator compared to building a std::vector, on my M1 Mac with Clang 18:

  • A nearly 4.5x improvement when building a small to medium vector from a simple integer mapping.
  • A 1.7x improvement when building a large vector of the same.
  • A 1.5-2x improvement when building a small to medium vector from a simple mapping function that unwraps a struct.
  • A 1.03x improvement when building a large vector of the same.

Comparing FromIterator on std::vector is done to verify the iterator approach isn’t changing the nature of the operation, and indeed we see that putting the push_back loop into the collect() operation has no visible impact. But the structure enables something more.

benchmark n std::vector std::vector + FromIterator sus::Vec + FromIterator std::vector / sus::Vec
copy + mult 1,000 426.00 +- 0.85 ns 452.67 +- 2.7 ns 98.45 +- 0.1 ns 432.7%
copy + mult 100,000 39.144 +- 0.04 us 37.95 +- 0.04 us 8.75 +- 0.41 us 447.4%
copy + mult 10,000,000 6.14 ms +- 0.04 ms 6.33 +- 0.03 ms 3.67 +- 0.03 ms 167.3%
to_index 1,000 351.07 +- 0.35 ns 350.30 +- 0.35 ns 176.97 +- 1.24 ns 198.4%
to_index 100,000 31.20 +- 0.031 us 31.20 +- 0.031 us 20.56 +- 0.88 us 151.8%
to_index 10,000,000 7.83 +- 0.031 ms 8.19 +- 0.057 ms 7.65 +- 0.038 ms 102.4%

The last column shows the relative speedup of the sus::Vec + FromIterator approach is compared to std::vector.

We have a similar speed improvement to what we saw by using push_back_unchecked, and without any Undefined Behaviour exposed to application developers.