References and Borrowing
Ownership, boxes, and moves provide a foundation for safely programming with the heap. However, move-only APIs can be inconvenient to use. For example, say you want to read some strings twice:
In this example, calling greet
moves the data from m1
and m2
into the parameters of greet
. Both strings are dropped at the end of greet
, and therefore cannot be used within main
. If we try to read them like in the operation format!(..)
, then that would be undefined behavior. The Rust compiler therefore rejects this program with the same error we saw last section:
error[E0382]: borrow of moved value: `m1`
--> test.rs:5:30
(...rest of the error...)
This move behavior is extremely inconvenient. Programs often need to use a string more than once. An alternative greet
could return ownership of the strings, like this:
However, this style of program is quite verbose. Rust provides a concise style of reading and writing without moves through references.
References Are Non-Owning Pointers
A reference is a kind of pointer. Here's an example of a reference that rewrites our greet
program in a more convenient manner:
The expression &m1
uses the ampersand operator to create a reference to (or "borrow") m1
. The type of the greet
parameter g1
is changed to &String
, meaning "a reference to a String
".
Observe at L2 that there are two steps from g1
to the string "Hello". g1
is a reference that points to m1
on the stack, and m1
is a String containing a box that points to "Hello" on the heap.
While m1
owns the heap data "Hello", g1
does not own either m1
or "Hello". Therefore after greet
ends and the program reaches L3, no heap data has been deallocated. Only the stack frame for greet
disappears. This fact is consistent with our Moved Heap Data Principle. Because g1
did not own "Hello", Rust did not deallocate "Hello" on behalf of g1
.
References are non-owning pointers, because they do not own the data they point to.
Dereferencing a Pointer Accesses Its Data
The previous examples using boxes and strings have not shown how Rust "follows" a pointer to its data. For example, the println!
macro has mysteriously worked for both owned strings of type String
, and for string references of type &String
. The underlying mechanism is the dereference operator, written with an asterisk (*
). For example, here's a program that uses dereferences in a few different ways:
Observe the difference between r1
pointing to x
on the stack, and r2
pointing to the heap value 2
.
You probably won't see the dereference operator very often when you read Rust code. Rust implicitly inserts dereferences and references in certain cases, such as calling a method with the dot operator. For example, this program shows two equivalent ways of calling the i32::abs
(absolute value) and str::len
(string length) functions:
fn main() {
let x: Box<i32> = Box::new(-1);
let x_abs1 = i32::abs(*x); // explicit dereference
let x_abs2 = x.abs(); // implicit dereference
assert_eq!(x_abs1, x_abs2);
let r: &Box<i32> = &x;
let r_abs1 = i32::abs(**r); // explicit dereference (twice)
let r_abs2 = r.abs(); // implicit dereference (twice)
assert_eq!(r_abs1, r_abs2);
let s = String::from("Hello");
let s_len1 = str::len(&s); // explicit reference
let s_len2 = s.len(); // implicit reference
assert_eq!(s_len1, s_len2);
}
This example shows implicit conversions in three ways:
-
The
i32::abs
function expects an input of typei32
. To callabs
with aBox<i32>
, you can explicitly dereference the box likei32::abs(*x)
. You can also implicitly dereference the box using method-call syntax likex.abs()
. The dot syntax is syntactic sugar for the function-call syntax. -
This implicit conversion works for multiple layers of pointers. For example, calling
abs
on a reference to a boxr: &Box<i32>
will insert two dereferences. -
This conversion also works the opposite direction. The function
str::len
expects a reference&str
. If you calllen
on an ownedString
, then Rust will insert a single borrowing operator. (In fact, there is a further conversion fromString
tostr
!)
We will say more about method calls and implicit conversions in later chapters. For now, the important takeaway is that these conversions are happening with method calls and some macros like println
. We want to unravel all the "magic" of Rust so you can have a clear mental model of how Rust works.
Rust Avoids Simultaneous Aliasing and Mutation
Pointers are a powerful and dangerous feature because they enable aliasing. Aliasing is accessing the same data through different variables. On its own, aliasing is harmless. But combined with mutation, we have a recipe for disaster. One variable can "pull the rug out" from another variable in many ways, for example:
- By deallocating the aliased data, leaving the other variable to point to deallocated memory.
- By mutating the aliased data, invalidating runtime properties expected by the other variable.
- By concurrently mutating the aliased data, causing a data race with nondeterministic behavior for the other variable.
As a running example, we are going to look at programs using the vector data structure, Vec
. Unlike arrays which have a fixed length, vectors have a variable length by storing their elements in the heap. For example, Vec::push
adds an element to the end of a vector, like this:
The macro vec!
creates a vector with the elements between the brackets. The vector vec
has type Vec<i32>
. The syntax <i32>
means the elements of the vector have type i32
.
One important implementation detail is that vec
allocates a heap array of a certain capacity. We can peek into Vec
's internals and see this detail for ourselves:
Note: click the binocular icon in the top right of the diagram to toggle this detailed view in any runtime diagram.
Notice that the vector has a length (len
) of 3 and a capacity (cap
) of 3. The vector is at capacity. So when we do a push
, the vector has to create a new allocation with larger capacity, copy all the elements over, and deallocate the original heap array. In the diagram above, the array 1 2 3 4
is in a (potentially) different memory location than the original array 1 2 3
.
To tie this back to memory safety, let's bring references into the mix. Say we created a reference to a vector's heap data. Then that reference can be invalidated by a push, as simulated below:
Initially, vec
points to an array with 3 elements on the heap. Then num
is created as a reference to the third element, as seen at L1. However, the operation v.push(4)
resizes vec
. The resize will deallocate the previous array and allocate a new, bigger array. In the process, num
is left pointing to invalid memory. Therefore at L3, dereferencing *num
reads invalid memory, causing undefined behavior.
In more abstract terms, the issue is that the vector vec
is both aliased (by the reference num
) and mutated (by the operation vec.push(4)
). So to avoid these kinds of issues, Rust follows a basic principle:
Pointer Safety Principle: data should never be aliased and mutated at the same time.
Data can be aliased. Data can be mutated. But data cannot be both aliased and mutated. For example, Rust enforces this principle for boxes (owned pointers) by disallowing aliasing. Assigning a box from one variable to another will move ownership, invalidating the previous variable. Owned data can only be accessed through the owner — no aliases.
However, because references are non-owning pointers, they need different rules than boxes to ensure the Pointer Safety Principle. By design, references are meant to temporarily create aliases. In the rest of this section, we will explain the basics of how Rust ensures the safety of references through the borrow checker.
References Change Permissions on Paths
The core idea behind the borrow checker is that variables have three kinds of permissions on their data:
- Read (R): data can be copied to another location.
- Write (W): data can be mutated in-place.
- Own (O): data can be moved or dropped.
These permissions don't exist at runtime, only within the compiler. They describe how the compiler "thinks" about your program before the program is executed.
By default, a variable has read/own permissions (RO) on its data. If a variable is annotated with let mut
, then it also has the write permission (W). The key idea is
that references can temporarily remove these permissions.
To illustrate this idea, let's look at the permissions on a variation of the program above that is actually safe. The push
has been moved after the println!
. The permissions in this program are visualized with a new kind of diagram. The diagram shows the changes in permissions on each line.
Let's walk through each line:
- After
let mut vec = (...)
, the variablevec
has been initialized (indicated by ). It gains +R+W+O permissions (the plus sign indicates gain). - After
let num = &vec[2]
, the data invec
has been borrowed bynum
(indicated by ). Three things happen:- The borrow removes WO permissions from
vec
(the slash indicates loss).vec
cannot be written or owned, but it can still be read. - The variable
num
has gained RO permissions.num
is not writable (the missing W permission is shown as a dash ‒) because it was not markedlet mut
. - The path
*num
has gained RO permissions.
- The borrow removes WO permissions from
- After
println!(...)
, thennum
is no longer in use, sovec
is no longer borrowed. Therefore:vec
regains its WO permissions (indicated by ).num
and*num
have lost all of their permissions (indicated by ).
- After
vec.push(4)
, thenvec
is no longer in use, and it loses all of its permissions.
Next, let's explore a few nuances of the diagram. First, why do you see both num
and *num
? Because it's different to access data through a reference, versus to manipulate the reference itself. For example, say we declared a reference to a number with let mut
:
Notice that x_ref
has the W permission, while *x_ref
does not. That means we can assign x_ref
to a different reference (e.g. x_ref = &y
), but we cannot mutate the pointed data (e.g. *x_ref += 1
).
Note: you might wonder why
*num
and*x_ref
have the O permission, since references are non-owning pointers. That's because the vector contains numbers of typei32
, andi32
is a copyable type. We will discuss this difference more next section in "Copying vs. Moving Out of a Collection".
More generally, permissions are defined on paths and not just variables. A path is anything you can put on the left-hand side of an assignment. Paths include:
- Variables, like
a
. - Dereferences of paths, like
*a
. - Array accesses of paths, like
a[0]
. - Fields of paths, like
a.0
for tuples ora.field
for structs (discussed next chapter). - Any combination of the above, like
*((*a)[0].1)
.
Second, why do paths lose permissions when they become unused? Because some permissions are mutually exclusive. If num = &vec[2]
, then vec
cannot be mutated or dropped while num
is in use. But that doesn't mean it's invalid to use num
for more time. For example, if we add another print
to the above program, then num
simply loses its permissions later:
The Borrow Checker Finds Permission Violations
Recall the Pointer Safety Principle: data should not be aliased and mutated. The goal of these permissions is to ensure that data cannot be mutated if it is aliased. Creating a reference to data ("borrowing" it) causes that data to be temporarily read-only until the reference is no longer used.
Rust uses these permissions in its borrow checker. The borrow checker looks for potentially unsafe operations involving references. Let's return to the unsafe program we saw earlier, where push
invalidates a reference. This time we'll add another aspect to the permissions diagram:
Any time a path is used, Rust expects that path to have certain permissions depending on the operation. For example, the borrow &vec[2]
requires that vec
is readable. Therefore the R permission is shown between the operation &
and the path vec
. The letter is filled-in because vec
has the read permission at that line.
By contrast, the mutating operation vec.push(4)
requires that vec
is readable and writable. Both R and W are shown. However, vec
does not have write permissions (it is borrowed by num
). So the letter W is hollow, indicating that the write permission is expected but vec
does not have it.
If you try to compile this program, then the Rust compiler will return the following error:
error[E0502]: cannot borrow `vec` as mutable because it is also borrowed as immutable
--> test.rs:4:1
|
3 | let num: &i32 = &vec[2];
| --- immutable borrow occurs here
4 | vec.push(4);
| ^^^^^^^^^^^ mutable borrow occurs here
5 | println!("Third element is {}", *num);
| ---- immutable borrow later used here
The error message explains that vec
cannot be mutated while the reference num
is in use. That's the surface-level reason — the underlying issue is that num
could be invalidated by push
. Rust catches that potential violation of memory safety.
Mutable References Provide Unique and Non-Owning Access to Data
The references we have seen so far are read-only immutable references (also called shared references). Immutable references permit aliasing but disallow mutation. However, it is also useful to temporarily provide mutable access to data without moving it.
The mechanism for this is mutable references (also called unique references). Here's a simple example of a mutable reference with the accompanying permissions changes:
Note: when the expected permissions are not strictly relevant to an example, we will abbreviate them as dots like. You can hover your mouse over the circles (or tap on a touchscreen) to see the corresponding permission letters.
A mutable reference is created with the &mut
operator. The type of num
is written as &mut i32
. Compared to immutable references, you can see two important differences in the permissions:
- When
num
was an immutable reference,vec
still had R permissions. Now thatnum
is a mutable reference,vec
has lost all permissions whilenum
is in use. - When
num
was an immutable reference, the path*num
only had RO permissions. Now thatnum
is a mutable reference,*num
has also gained the W permission.
The first observation is what makes mutable references safe. Mutable references allow mutation but prevent aliasing. The borrowed path vec
becomes temporarily unusable, so effectively not an alias.
The second observation is what makes mutable references useful. vec[2]
can be mutated through *num
. For example, *num += 1
mutates vec[2]
. Note that *num
has the W permission, but num
does not. num
refers to the mutable reference itself, e.g. num
cannot be reassigned to a different mutable reference.
Mutable references can also be temporarily "downgraded" to read-only references. For example:
Note: when permission changes are not relevant to an example, we will hide them. You can view hidden steps by clicking "»", and you can view hidden permissions within a step by clicking "● ● ●".
In this program, the borrow &*num
removes the W permission from *num
but not the R permission, so println!(..)
can read both *num
and *num2
.
Permissions Are Returned At The End of a Reference's Lifetime
We said above that a reference changes permissions while it is "in use". The phrase "in use" is describing a reference's lifetime, or the range of code spanning from its birth (where the reference is created) to its death (the last time(s) the reference is used).
For example, in this program, the lifetime of y
starts with let y = &x
, and ends with let z = *y
:
The W permission on x
is returned to x
after the lifetime of y
has ended, like we have seen before.
In the previous examples, a lifetime has been a contiguous region of code. However, once we introduce control flow, this is not necessarily the case. For example, here is a function that capitalizes the first character in a vector of ASCII characters:
The variable c
has a different lifetime in each branch of the if-statement. In the then-block, c
is used in the expression c.to_ascii_uppercase()
. Therefore *v
does not regain the W permission until after that line.
However, in the else-block, c
is not used. *v
immediately regains the W permission on entry to the else-block.
Data Must Outlive All Of Its References
The last safety property is that data must outlive any references to it. For example, let's say we tried to return a reference to data inside a function:
fn return_a_string() -> &String {
let s = String::from("Hello world");
let s_ref = &s;
s_ref
}
Rust will refuse to compile this program. It will give you a somewhat mysterious error message:
error[E0106]: missing lifetime specifier
--> test.rs:1:25
|
1 | fn return_a_string() -> &String {
| ^ expected named lifetime parameter
|
= help: this function's return type contains a borrowed value, but there is no value for it to be borrowed from
We will explain more about named lifetime parameters in Chapter 10.3, "Validating References with Lifetimes". For now, you can see the underlying safety issue from this simulation:
At L1, s_ref
points to a variable s
within the stack frame of return_a_string
. When return_a_string
ends, s
and its heap data are deallocated. At L2, s_ref
has been returned and is now s_main
, which points to freed memory. So far, the program is actually safe --- no undefined behavior has happened.
However, at L3 we try to actually use the deallocated pointer s_main
by reading it in the println
. That read of freed memory is undefined behavior. In sum, it's unsafe to return a reference to data on a function's stack frame, and to then use that reference. The reference cannot outlive the data.
As a more interesting example, let's say we tried to add a reference to a vector of references:
fn add_ref(v: &mut Vec<&i32>, n: i32) {
let r = &n;
v.push(r);
}
Rust will also reject this function, but with a different error:
error[E0597]: `n` does not live long enough
--> src/lib.rs:2:13
|
1 | fn add_ref(v: &mut Vec<&i32>, n: i32) {
| - let's call the lifetime of this reference `'1`
2 | let r = &n;
| ^^ borrowed value does not live long enough
3 | v.push(r);
| --------- argument requires that `n` is borrowed for `'1`
4 | }
| - `n` dropped here while still borrowed
The argument n
only lives for the duration of add_ref
. However, the reference r
is being pushed into v
, and v
lives longer than add_ref
. Therefore Rust complains that the data (n
) does not outlive all of its references (r
).
If this function were allowed, we could call add_ref
like this:
At L1, by pushing &n
into v
, the vector now contains a reference to data within the frame for add_ref
. However, when add_ref
returns, its frame is deallocated. Therefore the reference in the vector points to deallocated memory. Using the reference by printing v[0]
violates memory safety.
Summary
References provide the ability to read and write data without consuming ownership of it. References are created with borrows (&
and &mut
) and used with dereferences (*
), often implicitly.
However, references can be easily misused. Rust's borrow checker enforces a system of permissions that ensures references are used safely:
- All variables can read, own, and (optionally) write their data.
- Creating a reference will transfer permissions from the borrowed path to the reference.
- Permissions are returned once the reference's lifetime has ended.
- Data must outlive all references that point to it.
In this section, it probably feels like we've described more of what Rust cannot do than what Rust can do. That is intentional! One of Rust's core features is allowing you to use pointers without garbage collection, while also avoiding undefined behavior. Understanding these safety rules now will help you avoid frustration with the compiler later.