What it takes for a package to be added to Python
Reviewing the TOML parser, a new addition to Python 3.11
The next version of Python, 3.11, has been announced and one of the changes is to incorporate a TOML parser into the standard library. This is primarily motivated by PEP 518, which requires the build settings to be specified in pyproject.toml
. Having a TOML parser native to Python removes an awkward dependency from the build process.
The package tomli
has been selected to become the new module and will be renamed to tomlib
. This TOML parser is a small, pure-Python package with an interface inspired by the design of the native json
parser.
What makes tomli worthy of ordainment into the hallowed grounds of the Python interpreter? I decided to review the package to see what I could learn.
The directory structure uses a src
layout
The package recently moved to a src layout, but I’m not sure of the motivation behind this decision. This structure puts the bulk of the functional code under tomli/src/tomli
.
tomli/
├─ src/
│ ├─ tomli/
│ │ ├─ __init__.py
├─ tests/
│ ├─ __init__.py
├─ pyproject.toml
├─ README.md
├─ CHANGELOG.md
├─ .bumpversion.cfg
├─ .gitignore
├─ .flake8
The __init__
file is used to surface module exports
The init file sets the parameters __all__
and __version__
. (link)
The __all__
makes the package index explicit for use with from tomli import *
(link).
The __version__
is means that you could use print(tomli.version)
Use bump2version to search and replace version strings
This new-to-me CLI can be used to bump the version string at every location in the package. Seems useful. I was expecting to find it being used in CICD, but it is not found in the Github action. Maybe it is only used by hand before a new release.
The build system uses Flit
Flit is an older build system focused on easily publishing to PyPI. According to the Flit docs, it motivated the introduction of pyproject.toml
and the build system spec.
Uses the Atheris library from Google for fuzzing
Fuzzing is a testing strategy where generated strings are fed to the parser and monitored for failures. The string generation is dynamic and guided by code operations. The Fuzzer inserts program instrumentation and traces the execution of each input to inform the generation of the next input. I still don’t fully understand how this works, but at a conceptual level, the fuzzer seeks out various paths through an execution graph to unearth weird, unexpected, and buggy behavior.
Cool, I’ve never used this testing approach before but sounds useful for specific use-cases.
https://opensource.googleblog.com/2020/12/announcing-atheris-python-fuzzer.html