While Python continues to be the language of choice for ML projects, I'm increasingly seeing mentions of Rust in packages I use. Hugging Face fast tokenizers are written in Rust to speed up training and tokenization. They say it "Takes less than 20 seconds to tokenize a GB of text on a server's CPU." I recently used Polars, which is also written in Rust, for a quick task where I wanted to run a SQL query on 20 GB of CSV files. It only took about 60 seconds on my laptop. It's a way of bypassing Python's GIL to write code that is parallelizable. This is in contrast to packages like numpy that traditionally have been written in C/C++ for performance reasons.
How does one turn Rust code into something you can call from Python? It turns out I've already done something similar, compiling Rust into WebAssembly and using it in a JavaScript project. For Python, the process is similar. I followed Calling Rust from Python using PyO3 and re-created the Fibonacci numbers experiment (apparently I'm not the only one with this idea) from my previous post. It's a toy example, just intended to show the ease with which Rust can be leveraged. It ignores more complex data types and any actual multi-threaded code. Perhaps that will be the topic of a future post.
The function in Rust looks like this:
And the Python code that calls it and times it:
Using the same n=35 as in my JavaScript experiment, I'm seeing about .05 seconds for Rust vs. 7.31 seconds for pure Python. The likelihood of wanting/needing this in a Python project seems greater than with JavaScript. My guess is that the trend continues.