Porting Python projects to Rust

tags = [, , , ]

I’ve recently been working on porting some of my Python code to rust, both for performance reasons, and because of the strong typing in the language. As a fan of Haskell, I also just really enjoy using the language.

Porting any large project to a new language can be a challenge. There is a temptation to do a rewrite from the ground-up in idiomatic rust and using all new fancy features of the language.

Porting in one go

However, this is a bit of a trap:

  • It blocks other work. It can take a long time to finish the rewrite, during which time there is no good place to make other bug fixes/feature changes. If you make the change in the python branch, then you may also have to patch the in-progress rust fork.
  • No immediate return on investment. While the rewrite is happening, all of the investment in it is sunk costs.
  • Throughout the process, you can only run the tests for subsystems that have already been ported. It’s common to find subtle bugs later in code ported early.
  • Understanding existing code, porting it and making it idiomatic rust all at the same time takes more time and post-facto debugging.

Iterative porting

Instead, we’ve found that it works much better to take an iterative approach. One of the hidden gems of rust is the excellent PyO3 crate, which allows creating python bindings for rust code in a way that is several times less verbose and less painful than C or SWIG. Because of rust’s strong ownership model, it’s also really hard to muck up e.g. reference counts when creating Python bindings for rust code.

We port individual functions or classes to rust one at a time, starting with functionality that doesn’t have dependencies on other python code and gradually working our way up the call stack.

Each subsystem of the code is converted to two matching rust crates: one with a port of the code to pure rust, and one with python bindings for the rust code. Generally multiple python modules end up being a single pair of rust crates.

The signature for the pure Rust code follow rust conventions, but the business logic is mostly ported as-is (just in rust syntax) and the signatures of the python bindings match that of the original python code.

This then allows running the original python tests to verify that the code still behaves the same way. Changes can also immediately land on the main branch.

A subsequent step is usually to refactor the rust code to be more idiomatic - all the while keeping the tests passing. There is also the potential to e.g. switch to using external rust crates (with perhaps subtly different behaviour), or drop functionality altogether.

At some point, we will also port the tests from python to rust, and potentially drop the python bindings - once all the caller’s have been converted to rust.

Example

For example, imagine I have a Python module janitor/mail_filter.py with this function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
def parse_plain_text_body(text):
   lines = text.splitlines()

   for i, line in enumerate(lines):
       if line == 'Reply to this email directly or view it on GitHub:':
           return lines[i + 1].split('#')[0]
       if (line == 'For more details, see:'
               and lines[i + 1].startswith('https://code.launchpad.net/')):
           return lines[i + 1]
       try:
           (field, value) = line.split(':', 1)
       except ValueError:
           continue
       if field.lower() == 'merge request url':
           return value.strip()
   return None

Porting this to rust naively (in a crate I’ve called “mailfilter”) it might look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
pub fn parse_plain_text_body(text: &str) -> Option<String> {
     let lines: Vec<&str> = text.lines().collect();

     for (i, line) in lines.iter().enumerate() {
         if line == &"Reply to this email directly or view it on GitHub:" {
             return Some(lines[i + 1].split('#').next().unwrap().to_string());
         }
         if line == &"For more details, see:"
             && lines[i + 1].starts_with("https://code.launchpad.net/")
         {
             return Some(lines[i + 1].to_string());
         }
         if let Some((field, value)) = line.split_once(':') {
             if field.to_lowercase() == "merge request url" {
                 return Some(value.trim().to_string());
             }
         }
     }
     None
 }

Bindings are created in a crate called mailfilter-py, which looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
use pyo3::prelude::*;

 #[pyfunction]
 fn parse_plain_text_body(text: &str) -> Option<String> {
     janitor_mail_filter::parse_plain_text_body(text)
 }

 #[pymodule]
 pub fn _mail_filter(py: Python, m: &PyModule) -> PyResult<()> {
     m.add_function(wrap_pyfunction!(parse_plain_text_body, m)?)?;

     Ok(())
 }

The metadata for the crates is what you’d expect. mailfilter-py uses PyO3 and depends on mailfilter.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[package]
 name = "mailfilter-py"
 version = "0.0.0"
 authors = ["Jelmer Vernooij <jelmer@jelmer.uk>"]
 edition = "2018"

 [lib]
 crate-type = ["cdylib"]

 [dependencies]
 janitor-mail-filter = { path = "../mailfilter" }
 pyo3 = { version = ">=0.14", features = ["extension-module"]}

I use python-setuptools-rust to get the python ecosystem to build the python bindings. Here is what setup.py looks like:

1
2
3
4
5
6
7
8
9
#!/usr/bin/python3
from setuptools import setup
from setuptools_rust import RustExtension, Binding

setup(
        rust_extensions=[RustExtension(
        "janitor._mailfilter", "crates/mailfilter-py/Cargo.toml",
        binding=Binding.PyO3)],
)

And of course, setuptools-rust needs to be listed as a setup requirement in pyproject.toml or setup.cfg.

After that, we can replace the original python code with a simple import and verify that the tests still run:

1
from ._mailfilter import parse_plain_text_body

Of course, not all bindings are as simple as this. Iterators in particular are more complicated, as is code that has a loose idea of ownership in python. But I’ve found that the time investment is usually well worth the ability to land changes on the development head early and often.

I’d be curious to hear if people have had success with other approaches to porting Python code to Rust. If you do, please leave a comment.

Go Top