How to train your Python
🌱 notes 🌱
Python is the defacto language of machine learning. Its efficient syntax is a gift for implementing complex math and manipulating arrays. Numerous libraries further boost Python’s utility: who among us can function without at least installing NumPy, matplotlib, Pandas, and PyTorch? Most are convenient to access via the Python Package Index (PyPI).
As one might expect, there’s a flipside to all this efficiency and convenience: Python doesn’t come with a single idiomatic approach to version and dependency management so things can get… unwieldy. Fast.
Where to begin?
That this is problematic is widely acknowledged, and many others have shared insightful commentary on the situation. I encourage you to read their work as I don’t go into the arguments for/against the various options here. A few current (mid-2024) examples:
- Explaining why the Python installation process is such a mess
- Python Packaging, One Year Later: A Look Back at 2023 in Python Packaging | Chris Warrick
- Poetry vs. Docker caching: Fight!
- Rye: A Vision Continued | Armin Ronacher's Thoughts and Writings
- … + discussion: Rye: A Vision Continued | Hacker News
On the Rye-ght path
When I recently picked up Python again in earnest to work on machine learning, the alternatives I came across were:
- Python’s built in version management tool,
venv
, pluspip
for package/dependency management. virtualenv
, alternative tovenv
: version managementDocker
: containerize to inherently maintain strict isolation between projectsAnaconda
: reputedly auto-includes (too?) many dependencies + fragile(?) package managerPoetry
: focuses on making it easy to package up / distribute your own python modulesRye
: to Python as Cargo is to Rust
After dabbling a bit, here’s what I’m going with for now:
- for my own new projects, I initialize and sync Rye
- for collaborative work, I simply use venv/pip with a Rye-managed Python
Rye managed projects (I’ll denote these as rye-proj) have a pyproject.toml
that manages dependencies and version. This toml file is installed during Rye install, and updated with rye sync
, typically after adding packages to the project as rye add <package name>
.
The code for any rye-proj is still vanilla Python and can be run after Rye is uninstalled. The workflow to transition to using pip
would be:
pip install
packages- the first time you do this, there wouldn’t normally be a
requirements.txt
file sopip install -r requirements.txt
won’t work, unfortunately
- the first time you do this, there wouldn’t normally be a
pip freeze > requirements.txt
…and run as usual. (Rye’s pyproject.toml
will be ignored.)
Rye is to Python as Cargo is to Rust
Rye manages Python version and imported packages. It’s blazing fast because it’s written in Rust, and using it is very straightforward. (Slightly less straightforward is working on both rye-proj and not-rye-proj; more on that below.)
- during installation, normally need to make a few choices:
- tool installation via
pip tools
oruv
: I usepip tools
as it has greater compatibility - whether a Rye-managed version of Python should be used for non-rye-proj
- if not, the user-managed Python will be used outside of rye-proj: I’m still deciding on this, and I’m switching it up regularly to get a sense of how that choice impacts workflow.
- tool installation via
curl -sSf https://rye.astral.sh/get | bash
This script will automatically download and install rye (latest) for you.
######################################################################## 100.0%
Welcome to Rye!
This installer will install rye to /Users/msyvr/.rye
This path can be changed by exporting the RYE_HOME environment variable.
Details:
Rye Version: 0.39.0
Platform: macos (aarch64)
✔ Continue? · yes
✔ Select the preferred package installer · pip-tools (slow, higher compatibility)
? What should running `python` or `python3` do when you are not inside a Rye man
✔ What should running `python` or `python3` do when you are not inside a Rye managed project? · Run the old default Python (provided by your OS, pyenv, etc.)
Installed binary to /Users/msyvr/.rye/shims/rye
Bootstrapping rye internals
Downloading [email protected]
Checking checksum
Unpacking
Downloaded [email protected]
Updated self-python installation at /Users/msyvr/.rye/self
All done!
The options elected during installation are implemented via ~/.rye/config.toml
which, per the above installation, currently looks like:
[behavior]
use-uv = false
To use Rye-managed Python on non-rye-proj (no package management, though), update ~/.bashrc
to include:
source "$HOME/.rye/env"
and run:
rye config --set-bool behavior.global-python=true
to update ~/.rye/config.toml
to enable global shims; the toml file now looks like:
[behavior]
use-uv = false
global-python = true
Reviewing the details on global shims is helpful to becoming efficient at switching in/out of rye-proj repos.
Vanilla Python + venv for version mgmt + pip for package mgmt
Typical workflow for a non-rye-proj:
brew upgrade python3
pip3 --version # checking that pip is available
mkdir newproject
cd newproject
git init
python3 venv -m .venv
ls -a -l # this should verify that .git and .venv subdirectories were created
source .venv/bin/activate
pip install numpy pandas matplotlib torch torchaudio torchvisual plotnine # install only the ones needed
pip freeze > requirements.txt
optional, only if using a notebook:
pip install notebook # installs jupyter notebooks
pip freeze > requirements.txt
jupyter notebook # then, navigate to browser (default) localhost:8888 (127.0.0.1:8888)
and, to leave the venv:
deactivate
Rye experiment log
Rye is still pretty new and rough edges appear every now and again. When running into challenges that seem not to have an obvious resolution, it’s worth taking a look at issues and PRs in Rye’s github repo.
So far, I’ve bumped into the following surprises:
-
installing PyTorch after installing NumPy hits an issue with NumPy version… and the issue persists after updating
pyproject.toml
with the PyTorch-required NumPy version -
unstalling Rye still tries to access a rye-python (which no longer exists)
- this seems to be resolved by
source ~/.bash_profile
despite no rye-related references
- running a non-Rye project that uses
setuptools
stumbles
- it looks liks a known issue
- under time-pressue, I resolved this by uninstalling Rye (customizing the PATH may work?)
Extra steps to fully uninstall Rye
Start with:
rye self uninstall
✔ Do you want to uninstall rye? · yes
Done!
Don't forget to remove the sourcing of $HOME/.rye/env from your shell config.
Of course, update your shell config per the uninstaller’s message (don’t forget to source
it or start a fresh terminal session). A full uninstall will also require:
rm -rf ~/.rye
It’s good practice to remove stray code but, also, keeping that directory means the contents will persist after a new install (or, at least, that’s currently the case). Previous rye settings may surprise you if they don’t align with options selected during the new rye reinstallation.
Uninstalled Rye: path still maps to a Rye Python
Calling python after unstalling rye as above attempts to use the (now non-existent) python on the rye path (though which
indicates the currently-valid python3 install…)
- this occurs both in and outside of rye projects
*** $ which python
*** $ which python3
/opt/homebrew/bin/python3
*** $ python
-bash: /Users/msyvr/.rye/shims/python: No such file or directory
*** $ python3
-bash: /Users/msyvr/.rye/shims/python3: No such file or directory
Sourcing ~/.bash_profile
redirects to the correct/current python path (note that this was required despite ~/.bashrc
- which included the rye path info - having already been sourced):
*** $ source ~/.bash_profile
Welcome to a fresh (or refreshed) terminal window. Happy coding :)
To see the list of aliases, type 'alias' and hit enter.
*** $ python3
Python 3.12.5 (main, Aug 6 2024, 19:08:49) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>