Thousands of Debian packages updated from their upstream Git repository
The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor.
Linux distributions like Debian fulfill an important function in the FOSS ecosystem - they are system integrators that take existing free and open source software projects and adapt them where necessary to work well together. They also make it possible for users to install more software in an easy and consistent way and with some degree of quality control and review.
One of the consequences of this model is that the distribution package often lags behind upstream releases. This is especially true for distributions that have tighter integration and standardization (such as Debian), and often new upstream code is only imported irregularly because it is a manual process - both updating the package, but also making sure that it still works together well with the rest of the system.
The process of importing a new upstream used to be (well, back when I started working on Debian packages) fairly manual and something like this:
- Go to the upstream’s homepage, find the tarball and signature and verify the tarball
- Make modifications so the tarball matches Debian’s format
- Diff the original and new upstream tarballs and figure out whether changes are reasonable and which require packaging changes
- Update the packaging, changelog, build and manually test the package
However, there have been developments over the last decade that make it easier to import new upstream releases into Debian packages.
Uscan and debian QA watch
Uscan and debian/watch have been around for a while and make it possible to find upstream tarballs.
A debian watch file usually looks something like this:
The QA watch service regularly polls all watch locations in the archive and makes the information available, so it’s possible to know which packages have changed without downloading each one of them.
Git is fairly ubiquitous nowadays, and most upstream projects and packages in Debian use it. There are still exceptions that do not use any version control system or that use a different control system, but they are becoming increasingly rare. 
DEP-12 specifies a file format with metadata about the upstream project that a package was based on. In particular relevant for our case is the fact it has fields for the location of the upstream version control location.
debian/upstream/metadata files look something like this:
1 2 3
--- Repository: https://www.dulwich.io/code/dulwich/ Repository-Browse: https://www.dulwich.io/code/dulwich/
While DEP-12 is still a draft, it has already been widely adopted - there are about 10000 packages in Debian that ship a debian/upstream/metadata file with Repository information.
The Autopkgtest standard and associated tooling provide a way to run a defined set of tests against an installed package. This makes it possible to verify that a package is working correctly as part of the system as a whole. ci.debian.net regularly runs these tests against Debian packages to detect regressions.
The Vcs-Git headers in debian/control are the equivalent of the Repository field in debian/upstream/metadata, but for the packaging repositories (as opposed to the upstream ones).
They’ve been around for a while and are widely adopted, as can be seen from zack’s stats:
The vcswatch service that regularly polls packaging repositories to see whether they have changed makes it a lot easier to consume this information in usable way.
Over the last couple of years, Debian has slowly been converging on a single build tool - debhelper’s dh interface.
Being able to rely on a single build tool makes it easier to write code to update packaging when upstream changes require it.
Debhelper (and its helpers) increasingly can figure out how to do the Right Thing in many cases without being explicitly configured. This makes packaging less effort, but also means that it’s less likely that importing a new upstream version will require updates to the packaging.
With all of these improvements in place, it actually becomes feasible in a lot of situations to update a Debian package to a new upstream version automatically. Of course, this requires that all of this information is available, so it won’t work for all packages. In some cases, the packaging for the older upstream version might not apply to the newer upstream version.
The Janitor has attempted to import a new upstream Git snapshot and a new upstream release for every package in the archive where a debian/watch file or debian/upstream/metadata file are present.
These are the steps it uses:
- Find new upstream version
- If release, use debian/watch - or maybe tagged in upstream repository
- If snapshot, use debian/upstream/metadata’s Repository field
- If neither is available, use guess-upstream-metadata from upstream-ontologist to guess the upstream Repository
- Merge upstream version into packaging repository, possibly importing tarballs using pristine-tar
- Update the changelog file to mention the new upstream version
- Run some checks to ensure there are no unintentional changes, e.g.:
- Scan diff between old and new for surprising license changes
- Today, abort if there are any - in the future, maybe update debian/copyright
- Check for obvious compatibility breaks - e.g. sonames changing
- Attempt to update the packaging to reflect upstream changes
- Refresh patches
- Attempt to build the package with deb-fix-build, to deal with any missing dependencies
- Run the autopkgtests with deb-fix-build to deal with missing dependencies, and abort if any tests fail
When run over all packages in unstable (sid), this process works for a surprising number of them.
For fresh-releases (aka imports of upstream releases), processing all packages maintained in Git for which QA watch reports new releases (about 11,000):
That means about 2300 packages updated, and about 4000 unchanged.
For fresh-snapshots (aka imports of latest Git commit from upstream), processing all packages maintained in Git (about 26,000):
Or 5100 packages updated and 2100 for which there was nothing to do, i.e. no upstream commits since the last Debian upload.
As can be seen, this works for a surprising fraction of packages. It’s possible to get the numbers up even higher, by both improving the tooling, the autopkgtests and the metadata that is provided by packages.
Using these packages
All the packages that have been built can be accessed from the Janitor APT repository. More information can be found at https://janitor.debian.net/fresh, but in short - run:
1 2 3 4 5 6
echo deb "[arch=amd64 signed-by=/usr/share/keyrings/debian-janitor-archive-keyring.gpg]" \ https://janitor.debian.net/ fresh-snapshots main | sudo tee /etc/apt/sources.list.d/fresh-snapshots.list echo deb "[arch=amd64 signed-by=/usr/share/keyrings/debian-janitor-archive-keyring.gpg]" \ https://janitor.debian.net/ fresh-releases main | sudo tee /etc/apt/sources.list.d/fresh-releases.list sudo curl -o /usr/share/keyrings/debian-janitor-archive-keyring.gpg https://janitor.debian.net/pgp_keys apt update
And then you can install packages from the fresh-snapshots (upstream git snapshots) or fresh-releases suites on a case-by-case basis by running something like:
apt install -t fresh-snapshots r-cran-roxygen2
Most packages are updated based on information provided by vcswatch and qa watch, but it’s also possible for upstream repositories to call a web hook to trigger a refresh of a package.
These packages were built against unstable, but should in almost all cases also work for testing.
Of course, since these packages are built automatically without human supervision it’s likely that some of them will have bugs in them that would otherwise have been caught by the maintainer.
|||I’m not saying that a monoculture is great here, but it does help distributions.|