Move semantics in C++ and Rust: The case for destructive moves

For value-oriented programming languages, move semantics present a big step forward in both optimization and representing uniqueness invariants. C++ has chosen the path of non-destructive moves, where moved-from variables are still usable (albeit usually in an unspecified state). Rust, on the other hand, uses destructive moves, where the moved-from variable can no longer be used. I’ll introduce both approaches in a little more detail and present some issues with non-destructive moves. Finally, I will present what C++ could have looked like with destructive moves.

Move semantics in C++ (simplified)

Value categories

  • An lvalue is, simply put, a variable; a memory address with a name.
  • An xvalue is like an lvalue, but we declare that the resources that this variable owns may be transferred to a new owner.
  • A prvalue is a temporary value without a name.
  • A glvalue (mixed) is either an lvalue or an xvalue.
  • An rvalue (mixed) is either an xvalue or a prvalue.
auto i = std::string{"value categories"};
// `i` is an lvalue
// `std::move(i)` is an xvalue
// `static_cast<std::string&&>(i)` is an xvalue
// `std::string{"value categories"}` is a prvalue (pure rvalue)

rvalue references

struct MyData
{
std::string data1;
std::string data2;
MyData() noexcept = default;
// this is (basically) what the compiler will generate for you
// never write these by hand unless you're managing resources
// copy constructor
MyData(const MyData& other)
: data1{other.data1}
, data2{other.data1}
{}
// copy assignment
MyData& operator=(const MyData& other) {
data1 = other.data1;
data2 = other.data2;
return *this;
}
// move constructor
MyData(MyData&& other) noexcept
: data1{std::move(other.data1)}
, data2{std::move(other.data1)}
{}
// move assignment
MyData& operator=(MyData&& other) noexcept {
data1 = std::move(other.data1);
data2 = std::move(other.data2);
return *this;
}
};

Classes that manage resources, like std::vector<T>, std::string, will usually do the following in their move constructors: instead of allocating new memory, they will take the already allocated buffer from the rvalue they’re being constructed from, and leave some valid value in its stead. Move assignment will usually simply swap the allocated resources with the rvalue, where they will be freed with the moved-from rvalue after the assignment call.

template <typename T>
class almost_vector {
T* buffer = nullptr;
T* data_end = nullptr;
T* buffer_end = nullptr;
public:
almost_vector() noexcept = default;
almost_vector(const almost_vector& other)
{
// allocate buffer, copy elements
}
almost_vector& operator=(const almost_vector& other) {
// allocate new buffer, copy elements
// swap the buffers
// deallocate the old buffer
}
// the move constructor will do something like this
almost_vector(almost_vector&& other) noexcept
{
std::swap(buffer, other.buffer);
std::swap(data_end, other.data_end);
std::swap(buffer_end, other.buffer_end);
}
// move assignment will do something like this
almost_vector& operator=(almost_vector&& other) noexcept {
std::swap(buffer, other.buffer);
std::swap(data_end, other.data_end);
std::swap(buffer_end, other.buffer_end);
return *this;
}
};

There is a non-intuitive side to rvalue references. For example, variables that are rvalue references become lvalues when used in expressions! Also, when you write std::move(data), the expression actually does nothing on its own; it is merely a cast to an rvalue reference.

void foo(std::string data);void bar() {
std::string data;
std::string&& data_ref = std::move(data);
foo(data); // this will copy!
foo(std::move(data)); // this moves
}

There exists a third kind of reference in C++ aside from lvalue references and rvalue references: the forwarding reference. In templated functions, T&& becomes a forwarding reference instead of an rvalue reference, and auto&& is always a forwarding reference. Forwarding references preserve the value category of the expression they’re initialized with, and can be preserved when passing to other functions.

std::string baz(std::string);template <typename T>
struct Templated {
// t is an rvalue reference
void foo(T&& t) {}
// u is a forwarding reference
template <typename U>
void bar(U&& u) {
// forward to another function
// x is a forwarding reference
auto&& x = baz(std::forward<U>(u));
}
};

std::move

void foo (T&&);void bar() {
T value;
// this is a noop
std::move(value);
foo(std::move(value));
// this is the same thing, only cryptic
T value2;
foo(static_cast<T&&>(value2);
}

Moved-from states

The C++ standard library chooses to keep the moved-from variables in a valid, but unspecified state; this means that we can reuse the variable, we just cannot rely on its contents.

For user-declared types, the only real requirement is that the destructor on a moved-from variable must run without causing any issues for the rest of the program. Any invariants of the type may be broken and calling any functions on them can cause undefined behavior, it is just a matter of convention (and convenience) that we usually don’t do these things.

Move semantics in Rust

Rust move operations are also destructive. After we move from a variable (even potentially), that variable becomes unusable in code.

#[derive(Clone)]
struct MyData {
boxed_uint: Box<u64>,
data: String,
}
fn foo(_data: MyData) {
// do something with _data
}
fn bar() {
let data = MyData{
boxed_uint: Box::new(42),
data: "".to_owned()
};
foo(data.clone()); // we copy here
if random_bool() {
foo(data); // we move here
}
// foo(data); // ERROR: use of moved value
}

The reason why bit copies are always enough for a move operation in Rust is that Rust does not support self-referential structs in its safe subset. The borrowing rules in Rust make it impossible for a struct to borrow from its own fields (unless you reach for raw pointers and unsafe). Such structs would require their move operations to adjust these references after moving the resources from the original object, but without them, there is no real need to execute arbitrary code on moves.

// you cannot do this in Rust// C++
struct SelfReferential {
std::array<char, 1'000> data;
char* cursor = nullptr;
SelfReferential(): data{{}}, cursor{&data[0]} noexcept {}
SelfReferential(SelfReferential&& other)
: data{other.data}
, cursor{&(data[0]) + (other.cursor - &(other.data[0]))}
{}
// copy constructor, assignment operators omitted
};

The Clone and Copy traits

#[derive(Clone)]
struct MyData {
boxed_uint: Box<u64>,
data: String,
}
/* derive(Clone) will generate
something semantically identical to this
impl Clone for MyData {
#[inline]
fn clone(&self) -> MyData {
MyData {
boxed_uint: self.boxed_uint.clone(),
data: self.data.clone(),
}
}
}
*/

There exist types where copying by default is desirable (like integers, bools, floats, tuples of integers, arrays of integers, etc.). These types can be marked with the Copy trait, which makes them copy-by-default (and thus impossible to move). By convention, only types that are inexpensive to copy are marked with this trait.

Where non-destructive moves fail

Weaker invariants for resource management

  1. Nothing (nullptr)
  2. An address of a single object in owned dynamically allocated memory
  3. An address of a single object in non-owned memory
  4. An address of an array of objects in owned dynamically allocated memory
  5. An address of an array of objects in non-owned memory

Because of this semantic ambiguity, references are usually preferred in modern C++, because they always point to one valid object, where we always know that we don’t own it (both are possible to break, but breaking the first assumption is undefined behavior and breaking the second breaks every reasonable C++ convention). There exist alternatives for other scenarios from this list as well.

Where C++ has been able to improve this situation in non-owning contexts, it still has the same billion dollar mistake ingrained in its core smart pointers: both unique_ptr and shared_ptr can be nullptr.

With non-destructive moves, this is a neccessity. There exists no other real option for a moved-from state other than nullptr for smart pointers: If they kept the original pointer in them, unique_ptr would free the same memory twice, and shared_ptr would have more references than it tracks. If they assigned a random address, we would access (and delete) random memory. Finally, an explicit marker for moved-from states would be exactly nullptr, but slower.

Thanks to destructive moves, Rust’s smart pointers (Box, the counterpart of unique_ptr and Arc, the equivalent of shared_ptr) always hold dynamically allocated memory. This invariant lets us prevent many possible errors at compile time instead of relying on conventions (like never passing nullptr smart pointers) or runtime checks everywhere. For situations where we actually want nullable pointers, we have the very explicit Option<Box<T>>, Option<&T> and Option<Arc<T>>, where we always have to check for the presence of the value explicitly (and have nice built-in ways of handling those situations).

Non-destructive move operations may fail (if you consider OOM errors recoverable by default)

With destructive moves (or by treating OOM errors as unrecoverable), C++ could realistically mandate that all of its move constructors are noexcept. While there theoretically exist other potential failures when moving objects with arbitrary code, I haven’t seen any convincing examples where types with other kinds of move errors are worth complicating the language over.

Move semantics become complicated

Containers become complicated

  1. Potentially increase the size of the buffer to fit the new element
  2. Move or copy the new element in its new place.

To achieve strong exception guarantees when increasing its size and copying elements, push_back will

  1. Allocate a new buffer
  2. Copy all elements into the new buffer
  3. Swap the old buffer with the new one
  4. Free the memory of the old buffer

Done this way, if any of the copy operations fail, the container still has its original buffer with all of its elements. For fallible move operations, achieving strong exception guarantees is impossible this way:

  1. Allocate a new buffer
  2. Move all the elements into the new buffer
  3. In the middle, a move operation fails
  4. We can’t move the already moved objects back, because that could fail too

For this reason, only vectors containing objects with noexcept move constructors will use move semantics when resizing their internal buffers. If you forget to mark your move constructors noexcept, you lose a lot of the optimization you thought you were getting by implementing them.

C++ with destructive moves

void foo(std::string x);void bar() {
std::string data {"Important stuff"};
if (random_bool()) {
foo(move data);
} else {
// do nothing
}
// ERROR: cannot use potentially moved-from variable
// foo(data);
// data's destructor will run if it hasn't been moved from here
}

Operator move

struct Movable {
std::string data;
std::string data2;
Movable() = default;
// default move constructor; always noexcept
// the argument's destructor is not called after this
Movable(Movable&& other)
: data {move other.data}
, data2 {move other.data2}
{}
// default assignment for movable types
Movable& operator=(Movable other) {
data = move other.data;
data2 = move other.data1;
// after (partially) moving from a variable's members
// the destructor is only called for non-moved-from members
return *this;
}
};
struct NotMovable {
std::string data;
std::string data2;
NotMovable() = default;
// declaring a copy constructor still disables move semantics
NotMovable(const NotMovable&) = default;
// default assignment for non-movable types
NotMovable& operator=(const NotMovable& other) {
data = other.data;
data2 = other.data2;
return *this;
}
};

Operator ref_move

This is what a destructive move-based swap function could look like:

// enable if T is movable
template <typename T>
void swap(T& lhs, T& rhs) noexcept {
T temp {ref_move lhs};
ref_move(&lhs) rhs; // place the move into lhs
move(&rhs) temp; // place the move into rhs
}

rvalue references

If we gave up both of these and trusted the compiler to optimize away extra moves, we could use different syntax (such as move T(T& other)) for move constructors and do away with rvalue and forwarding references entirely.

Solving nondestructive move’s issues with destructive moves

  1. We never need to allocate memory for moved-from objects. Move operations are always noexcept and cannot fail.
  2. We only have two value categories: lvalues and rvalues. There exist no moved-from states. Overall, move semantics become less complicated.
  3. Containers can always move when types are movable. Upholding strong exception guarantees becomes easier.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store