Contents

Learning Rust

🌱 notes 🌱

One of my 2024 batch goals at Recurse is to learn a bit of Rust.

The plan

Nicole has a good blog post on why Rust is hard to learn and she later released YARR to help new Rustaceans get up to speed quickly.

Getting started

Additional resources

Progress


Notes: Working through YARR

Control flow

if-else

  • no ternary (?) operator: as in Go, if-else is idiomatic
  • syntax: parentheses around the condition are optional and usually considered un-idiomatic
  • return the value of the last expression in each branch
    • corollary: if used thus, all branch return types must be the same

loop expressions: loop, while, for

  • loop body must result in ()
  • emit values via break
  • assign a loop-generated value using loop
    • while and for aren’t guaranteed to hit the break statement

compiler won’t like:

let x = for count in 0..3 {
    if count > 1 {
        break count * 2;
    }
};

compiler fine with:

let mut count = 0;

let x = loop {
    if count > 1 {
        break count * 2;
    }

    count += 1;
};

pattern-match with if-let and while

let value = Some(42); // pretend value actually comes from a map

// check if a variant satisfies the match
{
  if let Some(inner) = value {
    println!("inner was {inner}");
  } else {
    println!("this is the failure case");
  }
}
let values = vec![1,2,3,4,5];
let mut iter = values.iter();

while let Some(v) = iter.next() {
  println!("v = {v}");
}

Functions

fn name-of-function(args) -> <return-type> {...}
  • omitting the return type is equivalent to -> () and the function returns the unit type
    • similar to returning void in C
  • final expression in the function body becomes the returned value; the expression ends without any punctuation, normally ; or ,
    • if no such expression evaluated, () returned

Fibonacci example:

fn main() {
  let n = 10;
  let x = fibonacci(n);
  println!("Fibonacci({n}) = {x}");
}

fn fibonacci(n: u32) -> u32 {
  if n < 2 {
    return n
  }

  fibonacci(n-1) + fibonacci(n-2)
}

Memory management

  • no runtime garbage collector
  • no default reference counting
    • can explicitly request it to, though
  • you control allocation and deallocation
    • modulo safety rules: can’t deallocate and then reuse

Stack and Heap

stack: ordered data frames

heap: open field of memory (not a traditional heap data structure)

Stack: Controlled by program’s execution:

  • call frames (pushed on for function calls) + the local variables of those functions

Once the function ends, its call frame is popped off the stack and its local variables can no longer be referenced.

Heap: Like an open field; essentially unbounded. Variables can live as long as they’re not deallocated. Memory that’s allocated on the heap MUST be deliberately deallocated later.

  • A bit less efficient than memory allocated on the stack: the TL;DR is that variables on the stack are likely in the CPU cache when related code is executed; not true for heap memory, where a slow fetch is required since the CPU’s ability to predict what you’ll need in that scenario is limited.

In relation to memory management of other languages, Rust sits between C and Go or Python.

  • C: malloc, free, realloc
    • can get buffer overflows and use-after-free issues
  • Go: automated (tracing) garbage collection
  • Python: automated reference counting

Rust: the power without the pain :) via “ownership” tracking and “lifetimes” plus automatic deallocation.

References

  • reference to variable x: &x
  • to access the underlying value, dereference x: *x
    • Rust will automatically dereference references so, usually, can omit *

Here, the output for each call to println! will be the same:

let x = 10;
let ref_x: &u32 = &x;
println!("x = {}", *ref_x);
println!("x = {}", ref_x);

Pointers

  • underlying representation same as reference
  • but! a pointer is just a memory address and doesn’t reference another variable
    • as a result, Rust doesn’t make guarantees about what a pointer can do, so unsafe Rust is required to use pointers
  • not used for general purposes in Rust but can use unsafe Rust if needed for specific task

Heap allocation

Examples where needed:

  • size unknown at compile time
  • define independently of scope

Allocation methods: boxed values, vecs, other collection types

Boxed values

  • generic: Box<u32> is a u32 that’s heap-allocated
  • constructor: Box::new
  • compiler can often infer type, else use type annotations:
let x: Box<u32> = Box::new<42>;
let y = Box::<f64>::new(4.2);
  • to move a value back onto the stack, out of the box, dereference it:
let x: Box<u32> = Box::new(42);
let y = *x // y is on the stack

Vecs

  • list, dynamically sized
    • Rust’s arrays are of fixed size
  • iterables
let parrots = vec!["Shivers", "Tweety", "Dinner"];
for parrot in parrots.iter() {
    println!("{} says hi.", parrot);
}

Other collection types

  • see std::collections docs for all available types and when to use each
    • sample: Vec, HashMap, BTreeMap, HashSet, BTreeSet

Ownership and lifetimes

Ownership and the borrow checker distinguish the Rust language from other languages, generally.

Recall that Rust has no garbage collector. The compiler tracks when memory should be allocated and deallocated to ensure that references remain valid. It does so by tracking variables’ lifetimes and ownership.

  • a value has a unique owner at any given time
    • ownership tracks when memory is valid and when it is dropped
  • lifetime is the time during which references to the variable are valid

Gedanken experiments:

  • What happens if variable x is defined in an outer scope, then initialized in an inner scope where a reference to x is assigned to a new variable y?
    • Spoiler: The compiler will not be happy. Why?
  • What happens if a function where a variable x is defined and initialized attempts to return a reference to x: &x.
    • Spoiler: The compiler will not be happy. Why?

In the above scenarios, the borrow checker comes into play. In C, the examples would compile, then lead to use-after-free errors. So, let’s talk about a concept integral to the borrow checker: lifetimes.

Lifetimes

  • every reference is a borrow
  • each borrow has a lifetime: variable creation to destruction
  • lifetimes can be named; generally as 'a but descriptive names okay, too
fn example<'a>(x: &'a u32) {
  let y: &'a u32 = &x;
}
  • <'a> is for generics
  • here, the parameter x is a reference of the lifetime <'a>
  • the lifetime only becomes known according to the parameter of the generic function
    • 'static means the lifetime is the duration of the program, often used for string constants:
    let msg: &'static str = "hello, world";
    
  • a lifetime can be explicitly provided anywhere a type annotation is provided for a reference
    • usual for structs, enums, and other data structures containing references
    • generally, not usual for other functions because of lifetime ellision
  • lifetime elision: whenever it’s permissible to let the compiler make a rules-based guess at the lifetime
    • eg, the above would idiomatically be:
    fn example(x: &u32) {
      let y: &u32 = &x;
    }
    
  • getting started with lifetimes:
    • omit by default
    • compiler will complain if needed, then try adding them

Ownership

A variable gets a new owner when it’s passed by value, unless the variable type implements the Copy trait. For example, a Vec<i32> used as the iterable of a for loop cannot be used after the for loop.

  • Pass by value: value actually gets copied to another variable and that copied object is passed to the method; the method uses the copy
  • Pass by reference: a reference to the parameter is passed to the method; the method accesses the actual parameter

Checking whether a type is_copy:

is_copy::<u32>();

Generally, primitive types “are Copy”, and both tuples and arrays whose elements “are Copy” are also Copy.

Closures

Rust closures enable anonymous functions. Examples for annotated and inferred types of inline closures:

    let y: u32 = 10;
    let annotated = |x: u32| -> u32 { x + y };
    let inferred = |x| x + y;

Syntax notes:

  • pipes around the parameter list, followed by
  • the expression for the desired return value,
  • where a no-arg closure (||) is an empty param list

Closures can reference values from their outer scope. They can also capture the outer values and use them; the captured var remains valid in its original scope. Example:

    let mut count = 0;
    let mut increment = || {
        count += 1;
        count
    };

    println!("count is {}", increment());
    println!("count is {}", increment());
    println!("count is {}", increment());
    println!("count after calling increment 3x is {}", count); // still valid!

Closures can be returned from functions. If any outer scope variables are captured by such a returned closure, they’ll need to be moved into the closure.

Returned closure functions return an impl of a trait. More about traits later. For now, consider a trait an interface: defines what can be done, but doesn’t specify the type to which it applies.

Returned closure functions can impl one of three traits: Fn, FnMut, FnOnce. These traits have a hierarchy of sorts: Fn can be used as FnMut or FnOnce; likewise, FnMut can be used as FnOnce. The inverse is not true.

Examples: functions returning closures

Print message:

fn print_msg<'a>(msg:&'a str) -> impl Fn() + 'a {
    let printer = move || { // move ownership of msg to printer closure
        println!("{msg}");
    };
    printer
}

fn main() {
  let f = print_msg("msg: hello, world"); // nothing printed yet
  f(); // invoke the function, ie the closure returned by print_msg
}

nb: the lifetime must be assigned explicitly in print_msg (string slice, &str), but not in make_counter (primitive type, u32)

Make counter:

fn make_counter() -> impl FnMut() -> u32 {
    let mut count = 0;
    let increment = move || {
        count += 1;
        count
    };
    increment
}

fn main() {
  let mut counter = make_counter();

  println!("count is {}", counter());
  println!("count is {}", counter());
  println!("count is {}", counter());
}

Structs

Data is structured with structs: a named grouping of fields. A struct can also have methods.

struct PirateShip {
    captain: String,
    crew: Vec<String>,
    treasure: f64,
}

impl PirateShip {
    pub fn count_treasure(&self) -> f64 {
        // some computations probably
        self.treasure
    }

    pub fn mutiny(&mut self) {
        if self.crew.len() > 0 {
            // replace the captain with one of the crew
            self.captain = self.crew.pop().unwrap();
        } else {
            println!("there's no crew to perform mutiny");
        }
    }
}

let blackbeard = "Blackbeard".to_owned();
let crew = vec!["Scurvy".to_owned(), "Rat".to_owned(), "Polly".to_owned()];
let ship = PirateShip {
    captain: blackbeard,
    crew,
    treasure: 64.0,
};

Aside: to_owned

  • the to_owned method takes a reference to a string (&str) and creates an owned string (String)
    • do this to avoid worrying about lifetimes

To avoid worrying about lifetimes, can use to_owned (&str -> owned String) or clone.

Indeed, strings are more complicated to work with in Rust. More on that later.

Enums

In comparison to other languages, enums in Rust are relatively powerful: they’re important tools in structuring data and programs idiomatically.

In Rust, enums capture more than just a constant: each variant of the enum can also have data. Rust enums are like tagged unions in C.

enum Result {
  Ok(i32),
  Err(String),
}

fn divide_in_two(n: i32) -> Result {
  if n % 2 == 0 {
    Result::Ok(n / 2)
  } else {
    Result::Err(format!("cannot divide {n} into two equal integers"))
  }
}

fn main() {
  let n = 100;
  match divide_in_two(n) {
    Result::Ok(half) => println!("{n} divided in two is {half}"),
    Result::Err(msg) => println!("error: {msg}"),
  }
}

Modules

Essentially, enable more maintainable code; also, hide implementation details.

Can store modules in separate files; then, declare the module where needed.

// math.rs
pub fn add(x: u32, y: u32) -> u32 {
  x + y
}
// main.rs
pub mod math;
fn main() {
  println!("add 5 and 6 using the math module to get {}", math::add(5,6));
}
  • Public modules: anyone consuming the crate can use the module and its pub members.
  • Private modules: accessible only to themselves and their descendants.

Syntax to add a module:

use std::collections::HashMap;
  • reference to super::thing gets thing from parent module
  • reference to crate::thing gets thing from root of containing crate (the crate you’re in)

Unit tests

Idiomatic approach: Create a child module test and import it from the parent to test things. Why? Structured as such, these will compile only if a flag is enabled for tests, so they’re not included in release builds.

Use the cfg(test) syntax to flag test functions:

pub fn plus(x: i32, y:i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
  use super::*;
  #[test]
  fn one_test_for_plus() {
    let x = 10;
    let y = 20;
    let expected = 30;
    assert_eq!(plus(x, y), expected, "err msg: x and y don't add up!");
  }

  #[test]
  fn another_test_for_plus() {
    let x = 2_000_000_000;
    let y = &x;

  }
}

Integration tests

  • integration tests can only consume any public API provided by your code
  • any file in the tests/ directory will be treated as an integration test

Example: plus function in a crate. Create tests/my_test.rs:

use my_library::plus;

#[test]
fn test_addition() {
    assert_eq!(plus(10, 20), 30)};
}

Doc tests

Write tests in your documentation!

  • put a docstring on a function, module, etc. with ///
  • if markdown style code blocks are used, the test will be compiled and run on cargo test
  • ! side effect: doc code examples automatically break the build if they’re out of date
Example doc test for the `plus` function
/// Adds together two numbers; doesn't handle rollover.
/// 
/// ```
/// use my_library::plus;
/// assert_eq!(30, plus(10, 20));
/// ```

Notes: Working through Rustlings

Sparse notes: These exercises were practical tests of knowledge picked up from other resources.

Options

Option type

  • a structure with Some and None fields
    • notice the None field: a value is pattern-matched against valid options/variants; if invalid, return a well defined, unambiguous type (vs null or 0, etc)
  • a common implementation: query the presence of a value and take action based on pattern-matching the value to valid options/variants; no match triggers the action associated with the None case

To illustrate (from rust docs):

fn divide(numerator: f64, denominator: f64) -> Option<f64> {
    if denominator == 0.0 {
        None
    } else {
        Some(numerator / denominator)
    }
}

// The return value of the function is an option
let result = divide(2.0, 3.0);

// Pattern match to retrieve the value
match result {
    // The division was valid
    Some(x) => println!("Result: {x}"),
    // The division was invalid
    None    => println!("Cannot divide by 0"),
}