Avoiding single-threaded memory access bugs with Rust (for C++ developers)

Radek Vít
5 min readMar 7, 2021

--

In a previous article, I showed how Rust prevents us from introducing race conditions and invalid memory access to our code in multi-threaded contexts. In this article, we will look at several kinds of memory access bugs in single-threaded C++ and how Rust prevents us from making these mistakes.

Returning references to temporaries

Returning references to temporaries causes the callers of our functions to access invalid memory, and either crash their application or worse, overwrite random memory and cause a hard to debug error later. References can also be hidden inside classes we may return, whether they’re our own or from the standard library.

A great new standard library type from C++17 std::string_view lets us pass references to strings or parts of strings cheaply and efficiently. Unless you want to gain ownership of a std::string, you very likely want to pass this type instead of const std::string& to your functions. The downside of std::string_view is that we have to make sure the memory we point to stays alive at least as long as the view does. If we return a string view from a temporary string, we get undefined behavior when we try to read from it.

std::string_view get_string(char c) noexcept {
// clang and MSVC emit a warning, gcc is fine with this
return std::string (42, c);
}
std::string_view get_string2(char c) noexcept {
auto x = std::string (42, c);
// only clang warns here, MSVC and gcc are fine with this
return x;
}

Rust’s type &str is the equivalent of std::string_view: it is an unowned slice of a string. Rust’s borrow checker will ensure that the underlying data lives long enough.

fn get_string<'a>(c: char) -> &'a str {
let string: String = std::iter::repeat(c).take(42).collect();
// error[E0515]:
// cannot return reference to local variable `string`
&string
}
// This is how to do this correctly
fn get_string2(c: char) -> String {
std::iter::repeat(c).take(42).collect()
}

Short lifetimes

In C++, we have to make sure that the long-living references we store inside structs and classes are valid as long as we’re using them. In trivial cases, this can be easily achieved. In nontrivial cases, mistakes can easily happen and trigger UB.

In C++ code, these bugs are often very similar (or even the same) as returning references to temporaries: we’d have to resort to delayed initialization of our TempAgency (with optional or unique_ptr) if we didn’t want to return references to temporaries as described in the previous section.

#include <vector>
#include <string>
#include <optional>
#include <functional>
class Worker {
public:
void work() { diary += "I have done some work. "; }
private:
std::string diary {"I was just born. Neat. "};
};
class TempAgency {
public:
TempAgency(std::vector<std::reference_wrapper<Worker>> workers)
: workers {std::move(workers)} {}
void do_work() {
for (auto&& worker: workers) {
worker.get().work();
}
}
private:
// the agency doesn't get ownership of the workers
// it just borrows them
std::vector<std::reference_wrapper<Worker>> workers;
};
int main() {
TempAgency agency = [] {
Worker amanda;
Worker bob;
return TempAgency {{{amanda}, {bob}}};
}();
// Bob and Amanda no longer live, we write to invalid memory
agency.do_work();
}

Rust’s borrow checker exists to prevent exactly these cases. The things we reference must outlive structs that reference them.

struct Worker {
diary: String,
}
impl Worker {
fn new() -> Worker {
Worker {
diary: "I was just born. Neat. ".to_string(),
}
}
fn work(&mut self) { self.diary += "I have done some work. "; }
}
struct TempAgency<'a> {
// notice how we have to name the lifetime of these references
pub workers: Vec<&'a mut Worker>,
}
impl TempAgency<'_> {
fn work(&mut self) {
self.workers.iter_mut().for_each(|worker| worker.work());
}
}
fn main() {
let mut agency = {
// ERROR: `amanda` does not live long enough
let mut amanda = Worker::new();
// ERROR: `bob` does not live long enough
let mut bob = Worker::new();
TempAgency {
workers: vec![&mut amanda, &mut bob],
}
};
agency.work();
}

References to container contents

It is sometimes convenient to store a reference to something in our function. It may be expensive to compute the address of that reference, or it may just help us to stop repeating ourselves. When we store a reference to contents of containers, however, there are always operations on those containers that invalidate these references. We have to rely on manually checking that all of our references stay valid until we have stopped using them.

#include <vector>
#include <string>
#include <iostream>
int main() {
std::vector<std::string> my_stuff (64, "Stuff");
const auto& first_thing = my_stuff.front();
// ...
for (int i = 0; i < 128; ++i) my_stuff.push_back("More Stuff");
// ...
// first_thing now points to invalid memory
std::cout << first_thing << '\n';
}

In Rust, the borrow checker prevents us from borrowing the vector mutably while we have it borrowed immutably. Not only are we guaranteed that the first_thing doesn’t change, we are also guaranteed that the vector itself won’t be changed!

fn main() {
let mut my_stuff = vec!["Stuff".to_string(); 64];
let first_thing = my_stuff.first().unwrap();
// ...
for _ in 0..50 {
// ERROR:
// cannot borrow `my_stuff` as mutable because
// it is also borrowed as immutable
my_stuff.push("More Stuff".to_string());
}
// ...
// FIXME: move this before the for loop
println!("{}", first_thing);
}

Tricky lifetime extensions

In C++, when storing the result of an expression in a const T& or a T&&, the temporary’s object’s lifetime is automatically extended. This allows us to iterate over temporaries like this in for-range loops (the iterated-over expression is stored in an auto&& variable):

std::vector<std::string> get_huge_data() {
return {"aaaaaaa", "bbbbbbb", "cccccc"};
}
int main() {
for (const auto& str: get_huge_data()) {
// do something with str
}
}

This lifetime extension unfortunately only applies to the expression itself, not to any intermediary values. If our expression gets a reference from a temporary object, our reference will point to invalid data! This means that if we iterated over characters of the last string from get_huge_data, that string would have already been freed.

std::vector<std::string> get_huge_data() {
return {"aaaaaaa", "bbbbbbb", "cccccc"};
}
int main() {
// clang and MSVC emit a warning, gcc does nothing
// we iterate over a freed std::string
for (auto c: get_huge_data().back()) {
// do something with c
}
}

Thanks to the borrow checker, we don’t even need to look up lifetime extension rules in Rust, and we can simply try compiling this: if the lifetime gets extended, we are happy, and if it doesn’t we would get an error about using a freed object.

fn get_huge_data() -> Vec<String> {
vec!["aaaaaa".to_owned(),
"bbbbbb".to_owned(),
"cccccc".to_owned()]
}
fn main() {
// this is OK, lifetime gets extended
for c in get_huge_data().last().unwrap().chars() {
println!("{}", c);
}
}

--

--