I read a blog post
about #embed
and initializer lists today, and was pleasantly surprised to find a reference to
P1144R6 Object relocation in terms of move plus destroy which I was previously
unaware of. The Subspace library makes use of
trivially relocating objects when possible, and it’s nice to see work continuing toward bringing
this idea into the core language.
Let’s take a look at trivially relocatable types in C++.
- Trivially Relocatable for Libraries
- Clang and libc++
- Subspace
- Opting into trivially relocatable
- Using trivially relocatable
- Thanks
Relocatable objects are a key property of Rust types and allow for efficient code generation. When
an object is relocatable, it means that the combination of “move to destination” and
“destroy source” can be combined into a single memcpy()
. And LLVM loves to optimize
memcpy()
s, so this leads to some really good code generation.
In Rust, all moves are relocations. And any use of an lvalue that is not Copy
will move
(relocate) the object.
#[derive(Copy, Clone)]
struct Copy(i32);
struct NoCopy(i32);
fn main() {
let c = Copy(0);
let n = NoCopy(0);
let _c2 = c; // Copies `c` via memcpy().
let _n2 = n; // Moves `n` to `n2` via memcpy().
// `n` is no longer accessible, and will not be dropped, aka its
// "destructor" will be skipped in C++ terms.
// let _n3 = n; // Does not compile.
}
C++ makes the story of moving and relocation more tricky in a few ways:
- When an object
A
is moved from (i.e. another objectB
accesses the objectA
as an rvalue reference), theA
object may end up in any possible state as defined byB
. The only hard requirement is that it’s valid to runA
’s destructor, and itsoperator=()
if it has one (what is called a “valid but undefined state”). This makes it very difficult to understand what moving any object does. As it’s all defined by the receiver, it can even mean different things with the same types but in different places. - Because of the above, a moved-from object still has to be destroyed, so if it was an lvalue, C++ has to keep it around and you can keep referring to it, possibly erroneously, after it was moved from. This makes relocating impossible in the general case without significantly changing the well-established concepts of rvalue references and destructors.
- Using an lvalue generates an “lvalue reference” to it. When needed, C++ will automatically copy from an lvalue reference for types that support copying. But C++ is unable to promote an “lvalue reference” into an “rvalue reference” which is how the language expresses something that can be moved from. Thus constructing or assigning from an lvalue that can’t be copied will simply fail to compile instead of moving from the lvalue.
struct Copy {
Copy() = default;
Copy(const Copy&) = default;
};
struct NoCopy {
NoCopy() = default;
NoCopy(NoCopy&&) = default;
};
int main() {
Copy c;
NoCopy n;
Copy c2 = c; // Automatically copies from the implied `Copy&` reference.
NoCopy n2 = n; // Cannot copy from `NoCopy&` and will not choose to move it.
}
We can see the implied NoCopy&
in the error produced by GCC, which tries to call NoCopy(x)
where
x
is a NoCopy&
. The overload resolution promotes the NoCopy&
to a const NoCopy&
in order to
match the copy constructor. There’s no similar promotion from NoCopy&
to NoCopy&&
which would
match the move constructor, so overload resolution fails to find something callable.
<source>:15:17: error: use of deleted function 'constexpr NoCopy::NoCopy(const NoCopy&)'
15 | NoCopy n2 = n; // Cannot copy from `NoCopy&` and will not choose to move it.
| ^
The auto_ptr type attempted to play with moving
on assignment without involving rvalue references and std::move()
but to do so it had to do a move
when copying, which breaks const correctness, was confusing, and quite incompatible with generic
code. The type would advertise it’s copyable (by implementing copy operations) but would move
instead. A language level support for moving on assignment when unable to copy could produce a much
better result. It seems unlikely to happen due to risks of breaking backward source-compatability,
though ideally it would only kick in at times where code would not compile at all today.
The one case where a type is trivially relocatable in the language today is when it is trivially
copyable, as this is defined to allow the type to be copied with memcpy()
. From that we derive
that a type can be, more precisely, trivially relocatable if it is
trivially-movable and trivially-destructible.
Trivial destructors are a no-op, and trivially moving depends only on trivial operations, which
perform the same function as a trivial copy. This handles things like primitives and structs of
primitives, but we’d like to handle more cases.
Trivially Relocatable for Libraries
While we can’t expect that a = b
could do a relocation in C++, as the author of a library like
Subspace, we can still perform relocation instead of
move + destroy inside our library types.
Clang and libc++
libc++ does similar things. Clang provides the [[clang::trivial_abi]]
attribute for marking a
type as trivial for the purposes of calls. This allows the destructor of a temporary object to be
moved into a function callee, allowing its contents to be passed by value instead of by reference
and avoiding the associated dereferences inside the callee.
@Quuxplusone proposed the builtin
__is_trivially_relocatable(T)
to Clang in 2018, which was in
review for 2 years 🤯, and not accepted. He wrote a blog post about it also.
Then in 2021, @ssbr re-proposed the __is_trivially_relocatable(T)
builtin, by making it refer to types that are annotated by the
Clang attribute [[clang::trivial_abi]]
.
Since std::unique_ptr
is marked [[clang::trivial_abi]]
,
this also makes it considered as trivially-relocatable. This more narrow implementation of
__is_trivially_relocatable(T)
was ultimately merged in early 2022.
Extending the definition of [[clang::trivial_abi]]
in this way should allow the libc++ library to
then perform relocations instead of move + destroy for annotated types. However the
implementation of this in std::vector by
@ssbr has not yet received the necessary support to land, about a year
later now, for reasons that are not clear to me.
Subspace
In subspace we have a public library-only implementation of a concept similar to the proposed
__libcpp_is_trivially_relocatable
for libc++. We call that
concept sus::mem::relocate_by_memcpy<T>
at this time, though names are subject to change until explicitly stabilized.
The implementation of the concept looks like the following:
template <class... T>
concept relocate_by_memcpy = __private::relocate_by_memcpy_helper<T...>::value;
With its inner helper implementation:
template <class... T>
struct relocate_by_memcpy_helper final
: public std::integral_constant<
bool,
(... &&
(!std::is_volatile_v<std::remove_all_extents_t<T>>
&& sus::mem::data_size_of<std::remove_all_extents_t<T>>()
&& (relocatable_tag<std::remove_all_extents_t<T>>::value(0)
|| (std::is_trivially_move_constructible_v<std::remove_all_extents_t<T>> &&
std::is_trivially_move_assignable_v<std::remove_all_extents_t<T>> &&
std::is_trivially_destructible_v<std::remove_all_extents_t<T>>)
#if __has_extension(trivially_relocatable)
|| __is_trivially_relocatable(std::remove_all_extents_t<T>)
#endif
)
)
)
> {};
The inner helper trait uses std::remove_all_extents_t<T>
everywhere instead of just T
and that
is so that we produce the same answer for T
and T[]
and T[][]
, etc. Conceptually we can ignore
it here, and we will strip it out in the snippets below.
Let’s walk though what the inner helper trait is doing from the bottom to the top.
First, we ask the compiler if the type is trivially relocatable, if it can tell us. This refers
to types annotated as [[clang::trivial_abi]]
in the Clang compiler thanks to the work of
@ssbr mentioned above. So T
is relocate_by_memcpy
if:
__is_trivially_relocatable(T)
Second, as we noted above any type that is trivially-movable and trivially-destructible
can be relocated by memcpy() operation instead of the move + destroy operation. This is why T
is relocate_by_memcpy
if:
(std::is_trivially_move_constructible_v<T> &&
std::is_trivially_move_assignable_v<T> &&
std::is_trivially_destructible_v<T>)
We require move-constructing and move-assigning to both be trivial as trivially relocatable must
mean the type can be trivially relocated under both scenarios. This is also brought up in P1144R6 since
std::pmr::vector
is not trivially relocatable under move-assignment.
Next we start to get into our library definitions of relocatability with relocatable_tag<T>
. The
tag this is looking for is generated by a tool that the Subspace library provides for marking a type
as trivially relocatable. We’ll get back to how that works later. For now it’s enough to say that
our type T
has opted into being trivially relocatable in a manner that depends only on standard
C++ and thus works across all compilers. So T
is relocate_by_memcpy
if the author has explcitly
declared it to be so, which is found through the relocatable_tag
helper:
relocatable_tag<T>::value(0)
Any single one of the above conditions tells Subspace that T
may be trivially relocatable. But
regardless of which was true, the type T
must also have a non-zero “data size”, which is checked
by sus::mem::data_size_of<T>()
.
sus::mem::data_size_of<T>()
Data size
The “data size” of a type is an idea introduced by @ssbr and described
in the documentation
for sus::mem::data_size_of<T>()
. It was then used in his Rust RFC to describe this same
concept to Rust, in order to allow Rust to relocate C++ objects soundly.
We’ll try to describe it here simply. C++ has a concept of the size of an object, which is returned
by sizeof()
. The size of an object may include padding, and of particular interest here is the
tail padding.
struct S {
int32_t a; // 4 bytes.
int8_t b; // 1 byte.
// 3 bytes of tail padding.
};
For the above type, sizeof(S)
is 8. Why is there tail padding making the size 8
instead of 5? The answer is alignment and arrays. In particular, incrementing a pointer by the size
of a type must produce another pointer that is properly aligned for that type. Another way to put
it is that the position of each object in an array must be aligned, so the size must be a multiple
of its alignment.
S arr[] = { S(1, 1), S(2, 2), S(3, 3) };
S* p = &arr[0]; // A well-aligned pointer.
p += 1; // Increments the pointer by the size of `S`. Must be well-aligned again.
In this case the alignment of S
is going to match the alignment of its most-restricted member,
which is the int32_t
. The size and alignment of int32_t
is 4, so the alignment of S
is 4.
Then we know the size of S
must be a multiple of 4 (its alignment), that is greater than or
equal to 5 (the size taken by its members). The result is that the size of S
is 8,
with 3 bytes of tail padding.
Given that we understand the size of a type, the “data size” is the actual size of the data
inside the type excluding tail padding. So the data size of S
above would be 5, which is
the number of bytes occupied by its members int32_t a
and int8_t b
.
Why do we care about data size?
We care about “data size” because when the “data size” of a type differs from its “size”, it becomes a potentially overlapping type.
It’s very common to write code that will memcpy()
a type based on its size. Something like:
S s;
S s2;
memcpy(&s2, &s, sizeof(S));
This works great! Until it doesn’t! There’s one way this could go wrong in older versions of C++, and a new way for this to go wrong in C++20.
Base classes
The empty base class optimization allows a class type’s size to be treated as 0 when it is inherited from. Typically C++ does not have zero-sized types. Every object must have a unique address, which means it must have a size of at least 1 byte.
struct S {};
static_assert(sizeof(S) == 1);
But the language relaxes this specifically for a base class. Here the size of T
is still 1, even
though the size of S
is also 1:
struct S {};
struct T : public S {};
static_assert(sizeof(T) == 1);
And to be clear that it’s the base class which has a zero size, in the following, the size
of T
becomes 4, which is the size of its member. There is no extra byte for its base class S
:
struct S {};
struct T : public S { int32_t a; };
static_assert(sizeof(T) == 4);
This presents the first case where the data size of an object matters. If we were to
memcpy(&s, &from, sizeof(S))
into an S*
but it so happens that the object is a T
, we would
overwrite one byte of T::a
with garbage!
The Clang and GCC (but not MSVC 19) compilers have taken this further, and will generally make use of the tail padding in any base class that is not a Standard-Layout type.
Recall our earlier example of a struct with tail padding. We mark the b
member as private
in
order to make the type not Standard-Layout:
struct S {
int32_t a;
private:
int8_t b;
// 3 bytes of tail padding.
};
static_assert(sizeof(S) == 8);
Then if we make a subclass of S
, the compiler is entitled to place members into the tail padding
of S
, as we see in T
below:
struct T : S {
int8_t c;
// 2 bytes of tail padding.
};
static_assert(sizeof(S) == 8);
static_assert(sizeof(T) == 8);
The size of T
is the same as S
because the member c
has been placed inside the tail padding of
the base class S
. If we were to memcpy(&s, &from, sizeof(S))
into an S*
that is pointing to
a T
subclass, we would overwrite one byte of T::c
with garbage, producing a memory safety bug!
However if we memcpy(&s, &from, sus::mem::data_size_of<S>())
,
then we copy only 5 bytes (the int32_t a
and int8_t b
) into s
and avoid clobbering any
subclass members that may exist in its tail padding.
The [[no_unique_address]]
attribute
C++20 introduces the [[no_unique_address]]
attribute which can appear on a class member
declaration. It tells the compiler to allow the member’s tail padding to be used by later
members.
This new attribute allows us to compose a new type with a member S
and which makes use of the tail
padding in S
. Here we use the same non-Standard-Layout type S
from the previous section.
We add another member below S
but the size of T
is again no larger than S because the
member T::c
has been located in the tail padding of S
.
struct T {
[[no_unique_address]] S s;
int8_t c;
// 2 bytes of tail padding.
};
static_assert(sizeof(S) == 8);
static_assert(sizeof(T) == 8);
The side effect of the [[no_unique_address]]
attribute is that memcpy()
can do the wrong thing
again. If we memcpy(&s, &from, sizeof(S))
into an S
that is the embedded T::s
member,
we will copy the tail padding of from
into T::c
, clobbering its value with garbage and causing
a memory safety bug.
Here again, if we memcpy(&s, &from, sus::mem::data_size_of<S>())
then we copy
only 5 bytes (the int32_t a
and int8_t b
) into s
and avoid clobbering anything
in its tail padding.
Since this behaviour is all implementation defined, we have unfortunate differences between
compilers, which can make testing for behaviour tricky. As in the
base class example, MSVC 19
does not make use of the tail padding in S
and the size of T
will be 12 as a result. This is even true
when using the compiler-specific [[msvc::no_unique_address]]
attribute instead.
Sorry but memcpy() with sizeof(T) is dangerous
The outcome of the above is that memcpy(dest, src, sizeof(T))
is dangerous in generic code, and
even more so in C++20. The more correct thing to do is
memcpy(dest, src, sus::mem::data_size_of<T>())
.
The Limits of Data size in a Library
For some types, we cannot determine a data size. In particular, a union’s data size is dynamic depending on its active member. If we could enumerate the members of a union, we could use the maximum data size of all its members, but that is beyond the scope of what a library can achieve unfortunately.
For that reason, the implementation of sus::mem::data_size_of<T>()
on a union type
returns 0.
And since a data size of 0 is returned for an unknown data size, sus::mem::relocate_by_memcpy
requires a non-zero data size.
Volatile and trivially relocatable
Popping the stack, we were walking through our implementation of sus::mem::relocate_by_memcpy
.
The first condition checked is if the type is volatile
. A volatile
type can not be copied
byte-by-byte with memcpy()
without introducing the chance of tearing. So these types
are strictly excluded. This also helps with defining trivial relocatable for classes with volatile
members as we will see.
Opting into trivially relocatable
In Clang, a type may opt into being trivially relocatable with the class annotation
[[clang::trivial_abi]]
. However there are two important things missing from this attribute:
- The attribute only works on Clang. A standard library should work well across all compilers.
- The attribute can’t take template parameters into account.
While [[clang::trivial_abi]]
works for std::unique_ptr<T>
, it does so because the data member of
the class is T*
and a pointer is always trivially relocatable. A simple example where we can not
use the attribute would be:
template <class T>
struct S {
T t;
};
Here the struct S
is trivially relocatable if T
is. But marking the type with
[[clang::trivial_abi]]
could introduce Undefined Behaviour when T
is not, as it imples the
object’s this
pointer may change during the object’s lifetime and this would break any
self-referential or external pointers managed by and pointing to the object itself. For example, the
unfortunately common “Observer” or “Client” pointer patterns where the destructor unsets pointers
to itself.
So this is where the proposal
P1144R6
does something important by adding a boolean expression to the [[trivially_relocatable]]
attribute that is not present in [[clang::trivial_abi]]
.
This makes it possible to conditionally apply the attribute to different template instantiations of
a template type when opting into being trivially relocatable explicitly.
However, the proposal also automatically infers trivially relocatable from its members, so we wouldn’t need the attribute at all in the above example. We need a more complex example to see when this matters:
template <class T>
struct S : InheritMyMoveOperations<T>, InheritMyDestructor<T> {
};
Here neither InheritMyMoveOperations
nor InheritMyDestructor
have enough information to
determine if the type S
should be trivially relocatable. The ability for S
to be trivially
relocated must be determined based on the interaction of the move operations and destructor, of
which only the author of S
has the full picture. A similar problem occurs for data
members that provide implementation details and which may not define their own move operations or
destructors, as is commonly done with a union member.
In this example S
is trivially relocatable, since the destructor is a no-op after being moved
from. However, this is something that can only be determined by manually reading the implementation;
the compiler cannot directly determine this based on the types of its data members. Thus it is up to the
author to vouch for the type S
being trivially relocatable:
template <class T>
struct S {
S() : t(T()) {}
S(S&& o) : moved_from(o.moved_from) {
if (!o.moved_from) {
new(&t) T(sus::move(o.t));
o.t.~T();
o.moved_from = true;
}
}
~S() {
// No-op if we were moved from.
if (!moved_from)
t.~T();
}
bool moved_from = false;
union {
T t;
};
}
};
We would like to use something like the [[trivially_relocatable]]
attribute on S
to opt into
being trivially relocatable based on the knowledge of our implementation:
template <class T>
struct [[trivially_relocatable(sus::mem::relocate_by_memcpy<T>)]] S {
...
};
The Subspace library provides a mechanism to do so, but at the library level instead of in the compiler. We provide 4 macros that can opt a class type into being trivially relocatable.
sus_class_trivially_relocatable(unsafe_fn, types…)
By using the sus_class_trivially_relocatable()
macro in a
class definition, the class is marked unconditionally as trivially relocatable. It receives as
arguments a list of types, which should typically be the types of all data members in the class in
order to assert that they are all trivially relocatable as well.
This is similar to the [[clang::trivial_abi]]
attribute, and whenever it appears it would be ideal
to also mark the class [[clang::trivial_abi]]
. By specifying both, the type will:
- Be trivial for the purpose of passing under Clang.
- Be opted into trivial relocation in the Subspace library across all compilers.
The macro must receive the unsafe_fn
marker type as its first parameter to indicate
that this requires careful scrutiny. The author declares that move + destroy can be done through
memcpy()
and it is up to them to get that correct. If the move constructor, move assignment, or
the destructor must run for correctness, this would introduce bugs and possibly memory safety
bugs and Undefined Behaviour.
Since the macro requires that the types are trivially relocatable, it makes sense to use in non-template classes. Typically the type of every non-static data member would be passed to the macro.
struct sus_if_clang([[clang::trivial_abi]]) S {
Thing<int> thing;
int i;
sus_class_trivially_relocatable(
unsafe_fn,
decltype(thing),
decltype(i));
};
sus_class_trivially_relocatable_unchecked(unsafe_fn)
The simplest but most risky macro is sus_class_trivially_relocatable_unchecked()
. This macro is like
sus_class_trivially_relocatable()
but without the additional help of the assertion
against the member types. When using this macro, the type should also be annotated with the
[[clang::trivial_abi]]
attribute.
struct sus_if_clang([[clang::trivial_abi]]) S {
Thing<int> thing;
int i;
sus_class_trivially_relocatable_unchecked(unsafe_fn);
};
sus_class_trivially_relocatable_if_types(unsafe_fn, types…)
The format of the sus_class_trivially_relocatable_if_types()
macro is just like
sus_class_trivially_relocatable()
but if any type given to the macro is not trivially relocatable,
the containing class will also not be.
Specifically, this allows a type to opt into being trivially relocatable if all of its members are trivially relocatable, including template parameter types, and to avoid incorrectly being marked trivially relocatable if any member is not.
This macro is probably only worth using in a template, as otherwise the types are either known to
be trivially relocatable or to not, and the sus_class_trivially_relocatable()
macro
could be used in the former case. And since the condition can evaluate to false, the use of
[[clang::trivial_abi]]
on such a class type would be a bug.
template <class T>
struct S {
Thing<T> thing;
T t;
sus_class_trivially_relocatable_if_types(
unsafe_fn,
decltype(thing),
decltype(t));
};
The behaviour of sus_class_trivially_relocatable_if_types()
is much like the extensions to
the compiler proposed in
P1144R6.
sus_class_trivially_relocatable_if(unsafe_fn, bool)
The sus_class_trivially_relocatable_if()
macro receives a boolean argument that will be
constant evaluated and used to determine if the type is ultimately marked as trivially relocatable
or not. This is useful when the condition is more complex than just whether the members of the type
are themselves trivially relocatable, but the caller can make use of
sus::mem::relocate_by_memcpy<T>
to check members as well.
Since the condition can evaluate to false, the use of [[clang::trivial_abi]]
on such a class type
would be a bug.
template <class T>
struct S {
Thing<T> thing;
T t;
sus_class_trivially_relocatable_if(
unsafe_fn,
StuffAbout<T> &&
sus::mem::relocate_by_memcpy<decltype(thing)> &&
sus::mem::relocate_by_memcpy<decltype(t)>);
};
The sus_class_trivially_relocatable_if()
macro is most similar to the proposed
[[trivially_relocatable(bool)]]
attribute in
P1144R6.
Using trivially relocatable
Wow that turned into a lot more text than I thought it would. Finally we can talk about what all of this machinery is for. How we use trivial relocation in Subspace.
Since we can’t change the language itself, we can’t use trivial relocation unless we have control over the execution of the destructor. As such, Subspace provides or does the following.
swap(a, b)
template <class T>
requires(sus::mem::Move<T>)
constexpr void swap(T& lhs, T& rhs) noexcept;
The sus::mem::swap(a, b)
function will swap the contents of a
and b
by using memcpy()
if the
objects’ type is sus::mem::relocate_by_memcpy()
. This is mentioned in P1144R6 to show
large binary-size improvements for any algorithm that is
implemented with swap, and we can
see the same improvements from Subspace’s
sus::mem::swap()
. Of course, this function will only copy sus::mem::data_size_of<T>()
many
bytes to avoid clobbering unrelated types in the process.
sus::Vec
The sus::Vec
(technically sus::containers::Vec
) type will avoid moving each element in the
vector’s storage when resizing if the types are sus::mem::relocate_by_memcpy()
. When that is the
case it will simply realloc()
to resize the memory which copies the contents of the memory to the
new allocation. And sometimes this doesn’t need to copy at all if the allocation did not move!
This is also mentioned in P1144R6 where it claims a 3x speedup for the same.
Future work
We’ll continue to take advantage of trivially relocatable whenever possible. In general it requires two considerations to be useful:
- Control over the source object’s lifetime, to avoid running its destructor after relocating.
- A preexisting object in the final destination, to avoid having to run a constructor before relocating.
As such, outside of swap, this mostly comes up in containers. We should be able to leverage it again in structures like a flat hash map, in sorting, or inserting into a vector.
Thanks
I really want to thank @Quuxplusone for his work on
P1144R6 as I was able to
generate both fixes and improvements to the Subspace library while writing this blog post and
considering his proposal work. I hope that it will make its way into the language in a way
that’s maximally useful (with the boolean argument in [[trivially_relocatable]]
).
I also want to thank @ssbr for his keen insights on “data size” which
have provided for a sound implementation of trivial relocation in sus::mem::swap()
.
And thanks to @zetafunction for his suggestions and review of this post.