Contents

How to train your Python

🌱 notes 🌱

Python is the defacto language of machine learning. Its expressive yet efficient syntax is the gift that keeps on giving. This is particularly true for matrix manipulation, when Python’s numerous libraries are added into the mix. Who among us can function without at least installing NumPy, Matplotlib, SciPy, Pandas, and PyTorch? These and other useful packages are convenient to access via the Python Package Index (PyPI).

Yet, for all of Python’s efficiency and convenience, there’s no single, idiomatic approach to version and dependency management. Just getting started means (soft-)locking into a choice without having a definitive decision tree at hand.

Where to begin?

That this is problematic is widely acknowledged, and many others have shared insightful commentary on the situation. I encourage you to read their work as I don’t go into the arguments for/against the various options here. A few current (mid-2024) examples:

UPDATE 2024.11

The following remains relevant, but I’ve shifted from rye to uv, and expect that to stick as uv is officially committed to being Python’s Cargo. Charlie Marsh of Astral gave a talk about that recently at Jane Street in NY.


On the Rye-ght path

nb: rye v. uv?:

When I recently picked up Python again in earnest to work on machine learning, the alternatives I came across were:

  1. Python’s built in version management tool, venv, plus pip for package/dependency management.
  2. virtualenv, alternative to venv: version management
  3. Docker: containerize to inherently maintain strict isolation between projects
  4. Anaconda: reputedly auto-includes (too?) many dependencies + fragile(?) package manager
  5. Poetry: focuses on making it easy to package up / distribute your own python modules
  6. Rye: to Python as Cargo is to Rust

After dabbling a bit, here’s what I’m going with for now:

  • for my own new projects, I initialize and sync Rye
  • for collaborative work, I simply use venv/pip with a Rye-managed Python

Rye managed projects (I’ll denote these as rye-proj) have a pyproject.toml that manages dependencies and version. This toml file is installed during Rye install, and updated with rye sync, typically after adding packages to the project as rye add <package name>.

The code for any rye-proj is still vanilla Python and can be run after Rye is uninstalled. The workflow to transition to using pip would be:

  • pip install packages
    • the first time you do this, there wouldn’t normally be a requirements.txt file so pip install -r requirements.txt won’t work, unfortunately
  • pip freeze > requirements.txt

…and run as usual. (Rye’s pyproject.toml will be ignored.)

Rye is to Python as Cargo is to Rust

Rye manages Python version and imported packages. It’s blazing fast because it’s written in Rust, and using it is very straightforward. (Slightly less straightforward is working on both rye-proj and not-rye-proj; more on that below.)

Installing Rye

  • during installation, normally need to make a few choices:
    • tool installation via pip tools or uv: I’m using pip tools (for now) as uv tripped me up (PyTorch/Numpy versions), though I plan to switch once there are fewer compatibility issues
    • whether a Rye-managed version of Python should be used for non-rye-proj
      • if not, the user-managed Python will be used outside of rye-proj: I’m still deciding on this, and I’m switching it up regularly to get a sense of how that choice impacts workflow.
curl -sSf https://rye.astral.sh/get | bash
This script will automatically download and install rye (latest) for you.
######################################################################## 100.0%
Welcome to Rye!

This installer will install rye to /Users/msyvr/.rye
This path can be changed by exporting the RYE_HOME environment variable.

Details:
  Rye Version: 0.39.0
  Platform: macos (aarch64)

✔ Continue? · yes
✔ Select the preferred package installer · pip-tools (slow, higher compatibility)
? What should running `python` or `python3` do when you are not inside a Rye man
✔ What should running `python` or `python3` do when you are not inside a Rye managed project? · Run the old default Python (provided by your OS, pyenv, etc.)
Installed binary to /Users/msyvr/.rye/shims/rye
Bootstrapping rye internals
Downloading [email protected]
Checking checksum
Unpacking
Downloaded [email protected]
Updated self-python installation at /Users/msyvr/.rye/self

All done!

The options elected during installation are implemented via ~/.rye/config.toml which, per the above installation, currently looks like:

[behavior]
use-uv = false

To use Rye-managed Python on non-rye-proj (no package management, though), update ~/.bashrc to include:

source "$HOME/.rye/env"

and run:

rye config --set-bool behavior.global-python=true

to update ~/.rye/config.toml to enable global shims; the toml file now looks like:

[behavior]
use-uv = false
global-python = true

Reviewing the details on global shims is helpful to becoming efficient at switching in/out of rye-proj repos.

Vanilla Python + venv for version mgmt + pip for package mgmt

Typical workflow for a non-rye-proj:

brew upgrade python3
pip3 --version # checking that pip is available
mkdir newproject
cd newproject
git init
python3 -m venv .venv
ls -a -l # this should verify that .git and .venv subdirectories were created
source .venv/bin/activate
pip install numpy pandas matplotlib torch torchaudio torchvisual plotnine # install only the ones needed
pip freeze > requirements.txt

optional, only if using a notebook:

pip install notebook # installs jupyter notebooks
pip freeze > requirements.txt
jupyter notebook # then, navigate to browser (default) localhost:8888 (127.0.0.1:8888)

and, to leave the venv:

deactivate

Rye experiment log

Rye is still pretty new and rough edges appear every now and again. When running into challenges that seem not to have an obvious resolution, it’s worth taking a look at issues and PRs in Rye’s github repo.

So far, I’ve bumped into the following surprises:

  1. installing PyTorch after installing NumPy hits an issue with NumPy version… and the issue persists after updating pyproject.toml with the PyTorch-required NumPy version

  2. unstalling Rye still tries to access a rye-python (which no longer exists)

  • this seems to be resolved by source ~/.bash_profile despite no rye-related references
  1. running a non-Rye project that uses setuptools stumbles
  • it looks liks a known issue
  • under time-pressue, I resolved this by uninstalling Rye (customizing the PATH may work?)

Extra steps to fully uninstall Rye

Start with:

rye self uninstall
✔ Do you want to uninstall rye? · yes
Done!

Don't forget to remove the sourcing of $HOME/.rye/env from your shell config.

Of course, update your shell config per the uninstaller’s message (don’t forget to source it or start a fresh terminal session). A full uninstall will also require:

rm -rf ~/.rye

It’s good practice to remove stray code but, also, keeping that directory means the contents will persist after a new install (or, at least, that’s currently the case). Previous rye settings may surprise you if they don’t align with options selected during the new rye reinstallation.

Uninstalled Rye: path still maps to a Rye Python

Calling python after unstalling rye as above attempts to use the (now non-existent) python on the rye path (though which indicates the currently-valid python3 install…)

  • this occurs both in and outside of rye projects
*** $  which python
*** $  which python3
/opt/homebrew/bin/python3
*** $  python
-bash: /Users/msyvr/.rye/shims/python: No such file or directory
*** $  python3
-bash: /Users/msyvr/.rye/shims/python3: No such file or directory

Sourcing ~/.bash_profile redirects to the correct/current python path (note that this was required despite ~/.bashrc - which included the rye path info - having already been sourced):

*** $  source ~/.bash_profile
Welcome to a fresh (or refreshed) terminal window. Happy coding :)
To see the list of aliases, type 'alias' and hit enter.
*** $  python3
Python 3.12.5 (main, Aug  6 2024, 19:08:49) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>