While learning Rust you encounter the “borrow checker” and the concept of ownership. The borrow-checker automatically does checks you’d probably be doing in your head when programming in other non-garbage collected languages anyway. If you don’t think about ownership when coding in C++, C or Pascal, for instance, you may end up debugging strange behavior or segmentation faults when you run your programs.
Rust Ownership Rules
The Rust compiler enforces these rules at compile time:
- By default, values on the heap may only bind to one variable at a time
- Value reassignment / copying moves ownership by default, even for stack allocated data
- Passing variables to functions moves Ownership by default.
- Pass-by-reference, known as “borrowing” prevents ownership moves
- Variables are imutible by default
Other non-garbage collected languages encourage good practices that point the programmer in the direction of these rules but only partly enforce them if at all.
Comparing to C, C++ and Pascal
Here’s a comparison between Rust, C, C++ and Pascal. If you’re familiar with any of them you’ll see how Rust improves on the way they work. It shifts work the programmer should be doing mentally, to work the compiler does for the programmer. Unlike the programmer, the compiler is perfectly consistent and doesn’t lose track of book-keeping details. The downside is that at times the strictness of the Rust compiler doesn’t provide much benefit; but generally it is just enforcing a clean approach you probably should use anyhow.
1. By default, values on the heap may only bind to one variable at a time
This is similar to, but not the same as, the intent of C++ unique_ptr<> smart pointer. C++ will transfer ownership to ‘bdt2’ and prevent the following code from compiling:
unique_ptr<BigObject> bg = make_unique<BigObject>("Object A");
cout << bg->name << "\n";
// Since bg is a unique_ptr, you cannot assign it to another variable
// unless you use 'move' to move ownership.
auto bg2 = bg;
cout << "bg2: " << bg2->name << "\n";
cpp_example.cpp: In function ‘void owning_data()’:
cpp_example.cpp:66:13: error: use of deleted function ‘std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = BigObject; _Dp = std::default_delete<BigObject>]’
66 | auto bg2 = bg;
| ^~
I
Not a very direct explanation of what’s wrong, but you learn eventually…
This compiles:
unique_ptr<BigObject> bg = make_unique<BigObject>("Object A");
cout << bg->name << "\n";
// Since bg is a unique_ptr, you cannot assign it to another variable
// unless you use 'move' to move ownership.
auto bg2 = move(bg);
cout << "bg2: " << bg2->name << "\n";
// Now 'bg' has been moved to 'bg2',. This unfortunately compiles, and causes a seg fault when run.
// Rust would not allow a similar situation to arise because a similar statement wouldn't compile.
cout << "BG: " << bg->name << "\n";
The last line compiles too, but will cause a segmentation fault when run.
The Rust compiler, like C++, will catch the double ownership. But, it will move ownership for you first without complaint. Rust will generate a program that doesn’t seg fault, or nothing if it cannot. For instance:
let v:Vec<i32> = Vec::new();
let v2 = v;
// Will not compile, ownership has moved to v2
println!("hello world! {}", v.len());
error[E0382]: borrow of moved value: `v`
--> ex.rs:10:30
|
6 | let v:Vec<i32>=Vec::new();
| - move occurs because `v` has type `Vec<i32>`, which does not implement the `Copy` trait
7 | let v2 = v;
| - value moved here
...
10 | println!("hello world! {}", v.len());
Notice the code is simpler and the compiler message much more clear. You probably don’t want to “fix” this code. Instead consider it an improvement over getting a crash when you run a program that you will wish hadn’t compiled.
The unique_ptr is just one type of smart pointer used by C++. You can use a shared_ptr to explicitly allow multiple ownership of values. Using shared_ptr everywhere allows you to program in a more Python or Java-like style; deallocation only happens when the very last reference to the shared value goes out of scope. This may sound ideal but C++ doesn’t enforce that you only use smart pointers to bind variables to heap data so you can still end up with crashes or worse unless you exercise a lot of discipline and consistency in your code.
In addition to smart pointers, C++ provides move semantics to move ownership around and the ‘move()’ function, but it’s all opt-in, not enforced by the compiler though done by default in some places. You sometimes need to use ‘move()’ in C++ to mimic the behavior you get by default with the Rust ownership model.
Rust allows C++ shared_ptr like behavior through use of a “atomic reference counter”. You wrap variables in an ‘“Arc” and it keeps track of where the variable has been used and counts the references so you can use a variable in more than one place at once. Importantly, using Arc isn’t the default in Rust, you have to really intend to use it. Code looks nicer without it. It’s a superficial thing but it matters.
2. Value reassignment / copying moves ownership by default in Rust
Most languages do shallow copies by default: Using the copy operator ‘=’ or ‘:=’ will copy all values on the stack from one variable to another. In Pascal, and C to an extent you make a lot of use of the stack and expect all data to get duplicated when you do a copy.
In particular with Pascal, though you have access to the heap you often don’t need to consider the difference between deep and shallow copies because all your data is on the stack.
Pascal uses the “:=” operator, C and C++ will make shallow copies with ‘=’ but has the same behavior. Rust is the same: copy everything on the stack for the variable being copied but don’t copy anything on the heap, just copy the pointer in the variable if one exists.
Additionally with Rust, as explained before, any pointers to heap data also have ownership information and that will not be duplicated but instead transfered and later references to the original pointers checked by the compiler. C and Pascal have no such checks. C++ prevents you from copying unique_ptr values without move().
In contrast to C and Pascal, Rust by default tracks ownership and won’t allow copies on the stack between variables to share ownership, except when the variable’s type implements the ‘copy’ trait. A few basic primitive types have the ‘copy’ trait already to facilitate sane behavior out of the box. But, any complex types have to opt-in to copy by value on the stack by implementing the ‘copy’ trait.
struct Customer{
customer_id:i32,
reward_points:i32
}
fn main(){
let c:Customer = Customer{customer_id: 0, reward_points : 10};
println!("Hello {}, you have {}", c.customer_id, c.reward_points);
let d = c;
println!("This is a copy of c {}",c.customer_id);
}
This fails with
error[E0382]: borrow of moved value: `c`
--> ex.rs:12:36
|
9 | let c:Customer = Customer{customer_id: 0, reward_points : 10};
| - move occurs because `c` has type `Customer`, which does not implement the `Copy` trait
10 | println!("Hello {}, you have {}", c.customer_id, c.reward_points);
11 | let d = c;
| - value moved here
12 | println!("This is a copy of c {}",c.customer_id);
| ^^^^^^^^^^^^^ value borrowed here after move
If your Rust data structures are entirely on the stack and implement the ‘copy’ trait, you won’t find the ownership rules intruding and Rust will seem to function similar to C or Pascal. By default however, you’ll get a compilation error if you access a value through the original variable if you’ve made a copy: It’s a signal that you may not want to be copying data around unnecessarily, so you must use the ‘copy’ trait to indicate you want to use copies.
The same program but with Copy implemented on Customer :
#[derive(Copy,Clone)]
struct Customer{
customer_id:i32,
reward_points:i32
}
fn main(){
let c:Customer = Customer{customer_id: 0, reward_points : 10};
println!("Hello {}, you have {}", c.customer_id, c.reward_points);
let d = c;
println!("This is a copy of c {}",c.customer_id);
}
This builds.
3. Function Arguments Capture Ownership and Allow “Borrowing” as an alternative
Pass By Value and Reference in C and Pascal
With C and Pascal you may pass arguments by value, or by reference: The default with both is to do pass-by-value. With Pascal in particular, you rarely use pass-by-reference and so don’t suffer side effects of changing a variable’s value inside a function while using the value outside the function’s scope. Programming this way is a choice and not enforced.
type
Piece = (pawn, knight, bishop, rook, king, queen, empty);
ChessBoard = array[1..8,1..8] of Piece;
function move_piece(board:ChessBoard; x1,y1,x2,y2:integer):ChessBoard;
begin
board[x2,y2] := board[x1,y1];
board[x1,y1] := empty;
result := board;
end;
This won’t change the board you pass in. Instead it returns a copy of the board with the changes. Often this behavior is preferred as it keeps reasoning about code simple: There are no side-effects. Many data structures are small enough that the overhead of copying them doesn’t matter much.
If you used the above function in the middle of a chess engine though, you might want better performance. You could strategically make some functions take a reference to the boards instead:
procedure move_piece(var board:ChessBoard; x1,y1,x2,y2:integer);
var tmp_piece:Piece;
begin
board[x2,y2] := board[x1,y1];
board[x1,y1] := empty;
end;
The ‘var’ keyword means the board data will be variable and this is done by passing in a memory reference to the board data structure instead of making a copy. The reference is small – the size of the pointers – and so also takes less time to pass than copying the whole board. C++ is similar:
enum piece {knight,pawn,bishop,rook,king,queen,empty};
struct ChessBoard{piece square[8][8];};
// Pass by value. Returns a 'new' copy of the original board. The new board has the moved piece
ChessBoard new_board_move_piece(int x1, int y1, int x2, int y2, ChessBoard board){
board.square[x2-1][y2-1] = board.square[x1-1][y1-1];
board.square[x1-1][y1-1] = empty;
return board;
}
// Alters the board you pass in.
void move_piece(int x1, int y1, int x2, int y2, ChessBoard & board){
board.square[x2-1][y2-1] = board.square[x1-1][y1-1];
board.square[x1-1][y1-1] = empty;
}
By default, when you pass by reference, you may then make changes to the passed data structure and those will be made to the original (since there’s no copy.) You can make the variable read-only with ‘const’, but that wouldn’t be of use in this example as then you couldn’t change the board.
C++ Transferring Ownership When Passing Parameters
With C++ you can pass in unique_ptr values to a function. The compiler will force you to use the move() function to transfer ownership so you’re aware the variable won’t be usable outside the function unless you return it from that function.
// Here the return from the function automatically transfers ownership back
// But you must 'move' ownership in first.
unique_ptr<BigObject> change_data(unique_ptr<BigObject> bg){
for(int i=0;i<bg->DATA_SIZE;i++) bg->data[i] = i * 9 % 17 -3;
bg->name = bg->name += " changed while holding unique ownership";
return bg;
}
To use this function you use move(bg2). Look at where ‘bg3’ is created.
unique_ptr<BigObject> bg = make_unique<BigObject>("Object A");
cout << bg->name << "\n";
// Since bg is a unique_ptr, you cannot assign it to another variable
// unless you use 'move' to move ownership.
auto bg2 = move(bg);
cout << "bg2: " << bg2->name << "\n";
// You have to explicitly 'move' the unique_ptr to the function
// but the 'return' in the function moves it back to the calling scope.
// Rust does the move by default, you don't have to tell it to move anything.
auto bg3 = change_data(move(bg2));
cout << "bg3: " << bg3->name << "\n";
Borrowing and Ownership in Rust
With C, C++ and Rust you use ‘&’ instead of ‘var’ to pass variables by reference to a function. In Rust you can do something similar with ‘&’ known as “borrowing” instead of pass-by-reference. There’s a good reason for that. With C++ and Pascal it’s helpful to think of passing by reference as borrowing a value for the duration of a function. If you keep in mind where the variable started from you can remember to do cleanup for it there as well, and keep in mind to avoid changing data inside the function unless you really need to. You have to do some mental book-keeping to decide which area of code “owns” the variable. In most languages this ownership is entirely unchecked and unenforced.
Rust by contrast enforces a single ownership. Any time you borrow a variable with ‘&’ in Rust you cannot dispose of that variable in the borrower’s scope, or reassign it and transfer ownership. That’s because the borrower doesn’t own that value. References in Rust don’t allow ownership, they side-step it by restricting what you can do with borrowed values. In addition references are immutable by default, where in C and Pascal they are mutable unless specifically qualified to be const.
If you don’t use ‘&’ when passing a stack allocated variable to a function in Rust, then ownership is transferred unless the ‘copy’ trait is annotated for the variable’s type. So Rust will nudge you in the direction of borrowing the variable instead, or force you to think about if you want to pass by value by implementing the ‘copy’ trait. This guides the programmer in the direction of fast (less pass-by-value) but safe (no multiple ownership or ownership contention.)
Variables using heap memory (like ‘Vec’) can implement a ‘clone()’ method to explicitly do a deep copy on their content. You could use clone() to make a totally new version of a Vec if you wanted to avoid the single ownership rule with pass-by-value on the original Vec. However most times you’d just borrow the original Vec instance.
In Rust:
#[derive(Copy,Clone, Debug)]
enum Piece{Knight, Bishop, Rook, King, Queen, Pawn, Empty}
//#[derive(Copy,Clone)]
struct Board{
squares:[Piece;64]
}
fn move_piece(mut board:Board, to:(usize,usize), from:(usize,usize))
board.squares[to.0 + 8*to.1] = board.squares[from.0 + from.1*8];
board.squares[from.0 + from.1*8] = Piece::Empty;
}
fn main(){
let mut board:Board = Board{squares:[ Piece::Empty;64]};
board.squares[1 + 2 *8] = Piece::Pawn;
move_piece(board,(1,2), (1,4));
println!("Board is sized {}",board.squares.len());
}
Uses of ‘board’ will result in ownership errors after the function call to ‘move_piece()’. You have to implement the ‘copy’ trait on Board to allow the board to get copied in. But if you do that, your changes to the board won’t be reflected unless you return a board:
fn move_piece(mut board:Board, to:(usize,usize), from:(usize,usize))->Board{
board.squares[to.0 + 8*to.1] = board.squares[from.0 + from.1*8];
board.squares[from.0 + from.1*8] = Piece::Empty;
board
}
Then you can get a new copy of the board.
let new_board = move_piece(board,(1,2), (1,4));
This works, but wastes time copying in the board data. You could instead skip implementing the copy trait but pass ownership back. The code will look the same where you use it.
board = move_piece(board,(1,2), (1,4));
Now ownership has been transferred back to the calling scope by the return of the function. This is all right but makes it a little clunky to also return other stuff. In this example you might want to return an ‘illegal move’ error or something and modify the board directly.
Avoid Ownership by Borrowing
I have talked about borrowing a bit already. This example shows why borrowing can be better than moving ownership around. Rather than passing ownership back and forth you can “borrow” without taking ownership using the ‘&’ reference.
fn move_piece(board:& mut Board, to:(usize,usize), from:(usize,usize))->Status{
board.squares[to.0 + 8*to.1] = board.squares[from.0 + from.1*8];
board.squares[from.0 + from.1*8] = Piece::Empty;
Status::Good
}
fn main(){
let mut board:Board = Board{squares:[ Piece::Empty;64]};
board.squares[1 + 2 *8] = Piece::Pawn;
let status = move_piece(& mut board,(1,2), (1,4));
println!("Board is sized {}",board.squares.len());
}
So now instead of using the function return to hand back ownership, you avoided taking ownership of the board in the first place. Notice you had to qualify the type signature with ‘& mut’ to make the variable mutable. By default when borrowing variables they are immutable.
The usual way to use these sorts of functions that modify a data structure is to make them functions in the ‘impl’ of a struct as you would with classes in other languages. Moving the ‘move_piece’ function to the impl of the Board struct simplifies the code a bit too: You can stop worrying about ownership when moving to and from the ‘move_piece’ function. It still needs to take a ‘mut’ version of ‘self’ (the Board type.) This shows the method is going to mutate something on ‘self’. But the calling code is simpler now.
#[derive(Copy,Clone)]
struct Board{
squares:[Piece;64]
}
impl Board{
fn move_piece(mut self, to:(usize,usize),from:(usize,usize))->Status{
// In a real program we'd check if the square was free and if not return an
// 'Ilegal' status....
self.squares[to.0 + 8*to.1] = self.squares[from.0 + from.1*8];
self.squares[from.0 + from.1*8] = Piece::Empty;
Status::Good
}
}
fn main(){
let board:Board = Board{squares:[ Piece::Empty;64]};
let status = board.move_piece((1,2), (1,4));
println!("Board is sized {}",board.squares.len());
}
Moving Ownership to an Outer Scope
In Rust, if you create a data structure in a function and return it, its ownership is moved out of that function:
fn numbers()->Vec<i32>{
let mut nums :Vec<i32> = Vec::new();
for n in 1..835{
nums.push(n*3);
}
nums
}
The return value isn’t copied with a deep copy. The object is shallow-copied and its ownership transfered. This is the simple common case where you want some function to generate and return data quickly and safely. In this case the vector data is on the heap so the pointer to the heap is part of the ‘Vec’ struct that’s returned.
C and Pascal have no such concept of ownership and struggle with this basic operation. You’d need to call ‘new’ or ‘malloc’ inside the function and return a pointer to the data. Then, later, you’d need to remember to deallocate the memory with ‘free’ or ‘delete’. This pattern can be the source of a lot of mistakes. You might instead return a copy which would not cause a crash or error,but would be slow with large data.
Modern C++ has move semantics for common standard data structures like ‘vector’ so you can do something similar to Rust:
vector<int> numbers(){
vector<int> nums;
for (int n=1; n<835; n++) nums.push_back(n * 3);
return nums;
}
Here the vector gets moved when you return it, no copying by value is done. The vector data is on the heap and the pointer gets copied.
But, C++ doesn’t always automatically handle ownership moves and you have to think more about them. For instance if you had a user defined data structure, you’d need to wrap it in a unique_ptr or shared_ptr. In a simple case where you make a unique_ptr__ instead of a ___vector inside a function, the move happens on return. Again, the simple way to write things in C++ isn’t very safe. In Rust it’s the reverse, simplest is usually safest.