What it takes for a package to be added to Python

Reviewing the TOML parser, a new addition to Python 3.11

14 March 2022

The next version of Python, 3.11, has been announced and one of the changes is to incorporate a TOML parser into the standard library. This is primarily motivated by PEP 518, which requires the build settings to be specified in pyproject.toml. Having a TOML parser native to Python removes an awkward dependency from the build process.

The package tomli has been selected to become the new module and will be renamed to tomlib. This TOML parser is a small, pure-Python package with an interface inspired by the design of the native json parser.

What makes tomli worthy of ordainment into the hallowed grounds of the Python interpreter? I decided to review the package to see what I could learn.

The directory structure uses a src layout

The package recently moved to a src layout, but I’m not sure of the motivation behind this decision. This structure puts the bulk of the functional code under tomli/src/tomli .

├─ src/
│  ├─ tomli/
│  │  ├─ __init__.py
├─ tests/
│  ├─ __init__.py
├─ pyproject.toml
├─ README.md
├─ .bumpversion.cfg
├─ .gitignore
├─ .flake8

The __init__ file is used to surface module exports

The init file sets the parameters __all__ and __version__. (link)

The __all__ makes the package index explicit for use with from tomli import * (link).

The __version__ is means that you could use print(tomli.version)

Use bump2version to search and replace version strings

This new-to-me CLI can be used to bump the version string at every location in the package. Seems useful. I was expecting to find it being used in CICD, but it is not found in the Github action. Maybe it is only used by hand before a new release.

The build system uses Flit

Flit is an older build system focused on easily publishing to PyPI. According to the Flit docs, it motivated the introduction of pyproject.toml and the build system spec.

Uses the Atheris library from Google for fuzzing

Fuzzing is a testing strategy where generated strings are fed to the parser and monitored for failures. The string generation is dynamic and guided by code operations. The Fuzzer inserts program instrumentation and traces the execution of each input to inform the generation of the next input. I still don’t fully understand how this works, but at a conceptual level, the fuzzer seeks out various paths through an execution graph to unearth weird, unexpected, and buggy behavior.

Cool, I’ve never used this testing approach before but sounds useful for specific use-cases.