Learning Rust
🌱 notes 🌱
One of my 2024 batch goals at Recurse is to learn a bit of Rust.
The plan
Nicole has a good blog post on why Rust is hard to learn and she later released YARR to help new Rustaceans get up to speed quickly.
- work through three introductory Rust resources
- Yet Another Rust Resource (see Nicole’s introductory post for some context on this ‘yet another’ resource)
- The Rust Book
- Rustlings
- implement a small project in Rust
Getting started
- Initial set up: Getting started - Rust Programming Language
- Rustup: Installation - The rustup book
rustup
installs rustc, cargo, rustup and other standard tools in Cargo’sbin
- Rust By Example
- a collection of examples that illustrate Rust concepts and standard libraries
Additional resources
Progress
Notes: Working through YARR
Control flow
if-else
- no ternary (
?
) operator: as in Go,if
-else
is idiomatic - syntax: parentheses around the condition are optional and usually considered un-idiomatic
- return the value of the last expression in each branch
- corollary: if used thus, all branch return types must be the same
loop expressions: loop
, while
, for
- loop body must result in ()
- emit values via
break
- assign a loop-generated value using
loop
while
andfor
aren’t guaranteed to hit thebreak
statement
compiler won’t like:
let x = for count in 0..3 {
if count > 1 {
break count * 2;
}
};
compiler fine with:
let mut count = 0;
let x = loop {
if count > 1 {
break count * 2;
}
count += 1;
};
pattern-match with if-let
and while
let value = Some(42); // pretend value actually comes from a map
// check if a variant satisfies the match
{
if let Some(inner) = value {
println!("inner was {inner}");
} else {
println!("this is the failure case");
}
}
let values = vec![1,2,3,4,5];
let mut iter = values.iter();
while let Some(v) = iter.next() {
println!("v = {v}");
}
Functions
fn name-of-function(args) -> <return-type> {...}
- omitting the return type is equivalent to
-> ()
and the function returns theunit
type- similar to returning
void
in C
- similar to returning
- final expression in the function body becomes the returned value; the expression ends without any punctuation, normally
;
or,
- if no such expression evaluated,
()
returned
- if no such expression evaluated,
Fibonacci example:
fn main() {
let n = 10;
let x = fibonacci(n);
println!("Fibonacci({n}) = {x}");
}
fn fibonacci(n: u32) -> u32 {
if n < 2 {
return n
}
fibonacci(n-1) + fibonacci(n-2)
}
Memory management
- no runtime garbage collector
- no default reference counting
- can explicitly request it to, though
- you control allocation and deallocation
- modulo safety rules: can’t deallocate and then reuse
Stack and Heap
stack: ordered data frames
heap: open field of memory (not a traditional heap data structure)
Stack: Controlled by program’s execution:
- call frames (pushed on for function calls) + the local variables of those functions
Once the function ends, its call frame is popped off the stack and its local variables can no longer be referenced.
Heap: Like an open field; essentially unbounded. Variables can live as long as they’re not deallocated. Memory that’s allocated on the heap MUST be deliberately deallocated later.
- A bit less efficient than memory allocated on the stack: the TL;DR is that variables on the stack are likely in the CPU cache when related code is executed; not true for heap memory, where a slow fetch is required since the CPU’s ability to predict what you’ll need in that scenario is limited.
In relation to memory management of other languages, Rust sits between C
and Go
or Python
.
- C:
malloc
,free
,realloc
- can get buffer overflows and use-after-free issues
- Go: automated (tracing) garbage collection
- Python: automated reference counting
Rust: the power without the pain :) via “ownership” tracking and “lifetimes” plus automatic deallocation.
References
- reference to variable
x
:&x
- to access the underlying value, dereference
x
:*x
- Rust will automatically dereference references so, usually, can omit
*
- Rust will automatically dereference references so, usually, can omit
Here, the output for each call to println!
will be the same:
let x = 10;
let ref_x: &u32 = &x;
println!("x = {}", *ref_x);
println!("x = {}", ref_x);
Pointers
- underlying representation same as reference
- but! a pointer is just a memory address and doesn’t reference another variable
- as a result, Rust doesn’t make guarantees about what a pointer can do, so unsafe Rust is required to use pointers
- not used for general purposes in Rust but can use unsafe Rust if needed for specific task
Heap allocation
Examples where needed:
- size unknown at compile time
- define independently of scope
Allocation methods: boxed values, vecs, other collection types
Boxed values
- generic:
Box<u32>
is au32
that’s heap-allocated - constructor:
Box::new
- compiler can often infer type, else use type annotations:
let x: Box<u32> = Box::new<42>;
let y = Box::<f64>::new(4.2);
- to move a value back onto the stack, out of the box, dereference it:
let x: Box<u32> = Box::new(42);
let y = *x // y is on the stack
Vecs
- list, dynamically sized
- Rust’s arrays are of fixed size
- iterables
let parrots = vec!["Shivers", "Tweety", "Dinner"];
for parrot in parrots.iter() {
println!("{} says hi.", parrot);
}
Other collection types
- see std::collections docs for all available types and when to use each
- sample: Vec, HashMap, BTreeMap, HashSet, BTreeSet
Ownership and lifetimes
Ownership and the borrow checker distinguish the Rust language from other languages, generally.
Recall that Rust has no garbage collector. The compiler tracks when memory should be allocated and deallocated to ensure that references remain valid. It does so by tracking variables’ lifetimes and ownership.
- a value has a unique owner at any given time
- ownership tracks when memory is valid and when it is dropped
- lifetime is the time during which references to the variable are valid
Gedanken experiments:
- What happens if variable
x
is defined in an outer scope, then initialized in an inner scope where a reference tox
is assigned to a new variabley
?- Spoiler: The compiler will not be happy. Why?
- What happens if a function where a variable
x
is defined and initialized attempts to return a reference tox
:&x
.- Spoiler: The compiler will not be happy. Why?
In the above scenarios, the borrow checker comes into play. In C, the examples would compile, then lead to use-after-free errors. So, let’s talk about a concept integral to the borrow checker: lifetimes.
Lifetimes
- every reference is a borrow
- each borrow has a lifetime: variable creation to destruction
- lifetimes can be named; generally as
'a
but descriptive names okay, too
fn example<'a>(x: &'a u32) {
let y: &'a u32 = &x;
}
<'a>
is for generics- here, the parameter
x
is a reference of the lifetime<'a>
- the lifetime only becomes known according to the parameter of the generic function
'static
means the lifetime is the duration of the program, often used for string constants:
let msg: &'static str = "hello, world";
- a lifetime can be explicitly provided anywhere a type annotation is provided for a reference
- usual for structs, enums, and other data structures containing references
- generally, not usual for other functions because of lifetime ellision
- lifetime elision: whenever it’s permissible to let the compiler make a rules-based guess at the lifetime
- eg, the above would idiomatically be:
fn example(x: &u32) { let y: &u32 = &x; }
- getting started with lifetimes:
- omit by default
- compiler will complain if needed, then try adding them
Ownership
A variable gets a new owner when it’s passed by value, unless the variable type implements the Copy trait. For example, a Vec<i32>
used as the iterable of a for
loop cannot be used after the for loop.
- Pass by value: value actually gets copied to another variable and that copied object is passed to the method; the method uses the copy
- Pass by reference: a reference to the parameter is passed to the method; the method accesses the actual parameter
Checking whether a type is_copy
:
is_copy::<u32>();
Generally, primitive types “are Copy”, and both tuples and arrays whose elements “are Copy” are also Copy.
Closures
Rust closures enable anonymous functions. Examples for annotated and inferred types of inline closures:
let y: u32 = 10;
let annotated = |x: u32| -> u32 { x + y };
let inferred = |x| x + y;
Syntax notes:
- pipes around the parameter list, followed by
- the expression for the desired return value,
- where a no-arg closure (||) is an empty param list
Closures can reference values from their outer scope. They can also capture the outer values and use them; the captured var remains valid in its original scope. Example:
let mut count = 0;
let mut increment = || {
count += 1;
count
};
println!("count is {}", increment());
println!("count is {}", increment());
println!("count is {}", increment());
println!("count after calling increment 3x is {}", count); // still valid!
Closures can be returned from functions. If any outer scope variables are captured by such a returned closure, they’ll need to be moved into the closure.
Returned closure functions return an impl
of a trait. More about traits later. For now, consider a trait an interface: defines what can be done, but doesn’t specify the type to which it applies.
Returned closure functions can impl
one of three traits: Fn
, FnMut
, FnOnce
. These traits have a hierarchy of sorts: Fn
can be used as FnMut
or FnOnce
; likewise, FnMut
can be used as FnOnce
. The inverse is not true.
Examples: functions returning closures
Print message:
fn print_msg<'a>(msg:&'a str) -> impl Fn() + 'a {
let printer = move || { // move ownership of msg to printer closure
println!("{msg}");
};
printer
}
fn main() {
let f = print_msg("msg: hello, world"); // nothing printed yet
f(); // invoke the function, ie the closure returned by print_msg
}
nb: the lifetime must be assigned explicitly in print_msg
(string slice, &str
), but not in make_counter
(primitive type, u32
)
Make counter:
fn make_counter() -> impl FnMut() -> u32 {
let mut count = 0;
let increment = move || {
count += 1;
count
};
increment
}
fn main() {
let mut counter = make_counter();
println!("count is {}", counter());
println!("count is {}", counter());
println!("count is {}", counter());
}
Structs
Data is structured with structs: a named grouping of fields. A struct can also have methods.
struct PirateShip {
captain: String,
crew: Vec<String>,
treasure: f64,
}
impl PirateShip {
pub fn count_treasure(&self) -> f64 {
// some computations probably
self.treasure
}
pub fn mutiny(&mut self) {
if self.crew.len() > 0 {
// replace the captain with one of the crew
self.captain = self.crew.pop().unwrap();
} else {
println!("there's no crew to perform mutiny");
}
}
}
let blackbeard = "Blackbeard".to_owned();
let crew = vec!["Scurvy".to_owned(), "Rat".to_owned(), "Polly".to_owned()];
let ship = PirateShip {
captain: blackbeard,
crew,
treasure: 64.0,
};
Aside: to_owned
- the
to_owned
method takes a reference to a string (&str) and creates an owned string (String)- do this to avoid worrying about lifetimes
To avoid worrying about lifetimes, can use to_owned
(&str -> owned String) or clone
.
Indeed, strings are more complicated to work with in Rust. More on that later.
Enums
In comparison to other languages, enums in Rust are relatively powerful: they’re important tools in structuring data and programs idiomatically.
In Rust, enums capture more than just a constant: each variant of the enum can also have data. Rust enums are like tagged unions in C.
enum Result {
Ok(i32),
Err(String),
}
fn divide_in_two(n: i32) -> Result {
if n % 2 == 0 {
Result::Ok(n / 2)
} else {
Result::Err(format!("cannot divide {n} into two equal integers"))
}
}
fn main() {
let n = 100;
match divide_in_two(n) {
Result::Ok(half) => println!("{n} divided in two is {half}"),
Result::Err(msg) => println!("error: {msg}"),
}
}
Modules
Essentially, enable more maintainable code; also, hide implementation details.
Can store modules in separate files; then, declare the module where needed.
// math.rs
pub fn add(x: u32, y: u32) -> u32 {
x + y
}
// main.rs
pub mod math;
fn main() {
println!("add 5 and 6 using the math module to get {}", math::add(5,6));
}
- Public modules: anyone consuming the crate can use the module and its pub members.
- Private modules: accessible only to themselves and their descendants.
Syntax to add a module:
use std::collections::HashMap;
- reference to
super::thing
getsthing
from parent module - reference to
crate::thing
getsthing
from root of containing crate (the crate you’re in)
Unit tests
Idiomatic approach: Create a child module test
and import it from the parent to test things. Why? Structured as such, these will compile only if a flag is enabled for tests, so they’re not included in release builds.
Use the cfg(test) syntax to flag test functions:
pub fn plus(x: i32, y:i32) -> i32 {
x + y
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn one_test_for_plus() {
let x = 10;
let y = 20;
let expected = 30;
assert_eq!(plus(x, y), expected, "err msg: x and y don't add up!");
}
#[test]
fn another_test_for_plus() {
let x = 2_000_000_000;
let y = &x;
}
}
Integration tests
- integration tests can only consume any public API provided by your code
- any file in the
tests/
directory will be treated as an integration test
Example: plus
function in a crate. Create tests/my_test.rs
:
use my_library::plus;
#[test]
fn test_addition() {
assert_eq!(plus(10, 20), 30)};
}
Doc tests
Write tests in your documentation!
- put a docstring on a function, module, etc. with
///
- if markdown style code blocks are used, the test will be compiled and run on
cargo test
- ! side effect: doc code examples automatically break the build if they’re out of date
Example doc test for the `plus` function
/// Adds together two numbers; doesn't handle rollover.
///
/// ```
/// use my_library::plus;
/// assert_eq!(30, plus(10, 20));
/// ```
Notes: Working through Rustlings
Sparse notes: These exercises were practical tests of knowledge picked up from other resources.
Options
- a structure with Some and None fields
- notice the None field: a value is pattern-matched against valid options/variants; if invalid, return a well defined, unambiguous type (vs null or 0, etc)
- a common implementation: query the presence of a value and take action based on pattern-matching the value to valid options/variants; no match triggers the action associated with the None case
To illustrate (from rust docs):
fn divide(numerator: f64, denominator: f64) -> Option<f64> {
if denominator == 0.0 {
None
} else {
Some(numerator / denominator)
}
}
// The return value of the function is an option
let result = divide(2.0, 3.0);
// Pattern match to retrieve the value
match result {
// The division was valid
Some(x) => println!("Result: {x}"),
// The division was invalid
None => println!("Cannot divide by 0"),
}