Jelmer Vernooĳ

Debian Janitor: How to Contribute Lintian-Brush Fixers

meta = {'category': debian, 'date': Thu 15 October 2020}

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor.

lintian-brush can currently fix about 150 different issues that lintian can report, but that’s still a small fraction of the more than thousand different types of issue that lintian can detect.

If you’re interested in contributing a fixer script to lintian-brush, there is now a guide that describes all steps of the process:

how to identify lintian tags that are good candidates for automated fixing
creating test cases
writing the actual fixer

For more information about the Janitor’s lintian-fixes efforts, see the landing page.

comments.

Debian Janitor: Expanding Into Improving Multi-Arch

meta = {'category': debian, 'date': Sat 19 September 2020}

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor.

As of dpkg 1.16.2 and apt 0.8.13, Debian has full support for multi-arch. To quote from the multi-arch implementation page:

Multiarch lets you install library packages from multiple architectures on the same machine. This is useful in various ways, but the most common is installing both 64 and 32- bit software on the same machine and having dependencies correctly resolved automatically. In general you can have libraries of more than one architecture installed together and applications from one architecture or another installed as alternatives.

The Multi-Arch specification describes a new Multi-Arch header which can be used to indicate how to resolve cross-architecture dependencies.

The existing Debian Multi-Arch hinter is a version of dedup.debian.net that compares binary packages between architectures and suggests fixes to resolve multi-arch problems. It provides hints as to what Multi- Arch fields can be set, allowing the packages to be safely installed in a Multi-Arch world. The full list of almost 10,000 hints generated by the hinter is available at https://dedup.debian.net/static/multiarch-hints.yaml.

Recent versions of lintian-brush now include a command called apply-multiarch-hints that downloads and locally caches the hints and can apply them to a package maintained in Git. For example, to apply multi-arch hints to autosize.js:

 $ debcheckout autosize.js
 declared git repository at https://salsa.debian.org/js-team/autosize.js.git
 git clone https://salsa.debian.org/js-team/autosize.js.git autosize.js ...
 Cloning into 'autosize.js'...
 [...]
 $ cd autosize.js
 $ apply-multiarch-hints
 Downloading new version of multi-arch hints.
 libjs-autosize: Add Multi-Arch: foreign.
 node-autosize: Add Multi-Arch: foreign.
 $ git log -p
 commit 3f8d1db5af4a87e6ebb08f46ddf79f6adf4e95ae (HEAD -> master)
 Author: Jelmer Vernooĳ <jelmer@debian.org>
 Date:   Fri Sep 18 23:37:14 2020 +0000

     Apply multi-arch hints.
     + libjs-autosize, node-autosize: Add Multi-Arch: foreign.

     Changes-By: apply-multiarch-hints

 diff --git a/debian/changelog b/debian/changelog
 index e7fa120..09af4a7 100644
 --- a/debian/changelog
 +++ b/debian/changelog
 @@ -1,3 +1,10 @@
 +autosize.js (4.0.2~dfsg1-5) UNRELEASED; urgency=medium
 +
 +  * Apply multi-arch hints.
 +    + libjs-autosize, node-autosize: Add Multi-Arch: foreign.
 +
 + -- Jelmer Vernooĳ <jelmer@debian.org>  Fri, 18 Sep 2020 23:37:14 -0000
 +
  autosize.js (4.0.2~dfsg1-4) unstable; urgency=medium

    * Team upload
 diff --git a/debian/control b/debian/control
 index 01ca968..fbba1ae 100644
 --- a/debian/control
 +++ b/debian/control
 @@ -20,6 +20,7 @@ Architecture: all
  Depends: ${misc:Depends}
  Recommends: javascript-common
  Breaks: ruby-rails-assets-autosize (<< 4.0)
 +Multi-Arch: foreign
  Description: script to automatically adjust textarea height to fit text - NodeJS
   Autosize is a small, stand-alone script to automatically adjust textarea
   height to fit text. The autosize function accepts a single textarea element,
 @@ -32,6 +33,7 @@ Package: node-autosize
  Architecture: all
  Depends: ${misc:Depends}
   , nodejs
 +Multi-Arch: foreign
  Description: script to automatically adjust textarea height to fit text - Javascript
   Autosize is a small, stand-alone script to automatically adjust textarea
   height to fit text. The autosize function accepts a single textarea element,

The Debian Janitor also has a new multiarch-fixes suite that runs apply-multiarch-hints across packages in the archive and proposes merge requests. For example, you can see the merge request against autosize.js here.

For more information about the Janitor’s lintian-fixes efforts, see the landing page.

comments.

Debian Janitor: All Packages Processed with Lintian-Brush

meta = {'category': debian, 'date': Sat 12 September 2020}

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor.

On 12 July 2019, the Janitor started fixing lintian issues in packages in the Debian archive. Now, a year and a half later, it has processed every one of the almost 28,000 packages at least once.

As discussed two weeks ago, this has resulted in roughly 65,000 total changes. These 65,000 changes were made to a total of almost 17,000 packages. Of the remaining packages, for about 4,500 lintian-brush could not make any improvements. The rest (about 6,500) failed to be processed for one of many reasons – they are e.g. not yet migrated off alioth, use uncommon formatting that can’t be preserved or failed to build for one reason or another.

Now that the entire archive has been processed, packages are prioritized based on the likelihood of a change being made to them successfully.

Over the course of its existence, the Janitor has slowly gained support for a wider variety of packaging methods. For example, it can now edit the templates for some of the generated control files. Many of the packages that the janitor was unable to propose changes for the first time around are expected to be correctly handled when they are reprocessed.

If you’re a Debian developer, you can find the list of improvements made by the janitor in your packages by going to https://janitor.debian.net/m/.

For more information about the Janitor’s lintian-fixes efforts, see the landing page.

comments.

Debian Janitor: The Slow Trickle from Git Repositories to the Debian Archive

meta = {'category': debian, 'date': Sat 29 August 2020}

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor.

Last week’s blog post documented how there are now over 30,000 lintian issues that have been fixed in git packaging repositories by the Janitor.

It’s important to note that any fixes from the Janitor that make it into a Git packaging repository will also need to be uploaded to the Debian archive. This currently requires that a Debian packager clones the repository and builds and uploads the package.

Until a change makes it into the archive, users of Debian will unfortunately not see the benefits of improvements made by the Janitor.

82% of the 30,000 changes from the Janitor that have made it into a Git repository have not yet been uploaded, although changes do slowly trickle in as maintainers make other changes to packages and upload them along with the lintian fixes from the Janitor. This is not just true for changes from the Janitor, but for all sorts of other smaller improvements as well.

However, the process of cloning and building git repositories and uploading the resulting packages to the Debian archive is fairly time-consuming – and it’s probably not worth the time of developers to follow up every change from the Janitor with a labour-intensive upload to the archive.

It would be great if it was easier to trigger uploads from git commits. Projects like tag2upload will hopefully help, and make it more likely that changes end up in the Debian archive.

The majority packages do get at least one new source version upload per release, so most changes will eventually make it into the archive.

For more information about the Janitor’s lintian-fixes efforts, see the landing page.

comments.

Debian Janitor: > 60,000 Lintian Issues Automatically Fixed

meta = {'category': debian, 'date': Sat 22 August 2020}

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor.

Scheduling Lintian Fixes

To determine which packages to process, the Janitor looks at the import of lintian output across the archive that is available in UDD [1]. It will prioritize those packages with the most and more severe issues that it has fixers for.

Once a package is selected, it will clone the packaging repository and run lintian-brush on it. Lintian-brush provides a framework for applying a set of “fixers” to a package. It will run each of a set of “fixers” in a pristine version of the repository, and handles most of the heavy lifting.

The Inner Workings of a Fixer

Each fixer is just an executable which gets run in a clean checkout of the package, and can make changes there. Most of the fixers are written in Python or shell, but they can be in any language.

The contract for fixers is pretty simple:

If the fixer exits with non-zero, the changes are reverted and fixer is considered to have failed
If it exits with zero and made changes, then it should write a summary of its changes to standard out

If a fixer is uncertain about the changes it has made, it should report so on standard output using a pseudo-header. By default, lintian-brush will discard any changes with uncertainty but if you are running it locally you can still apply them by specifying --uncertain.

The summary message on standard out will be used for the commit message and (possibly) the changelog message, if the package doesn’t use gbp dch.

Example Fixer

Let’s look at an example. The package priority “extra” is deprecated since Debian Policy 4.0.1 (released August 2 017) – see Policy 2.5 “Priorities”. Instead, most packages should use the “optional” priority.

Lintian will warn when a package uses the deprecated “extra” value for the “Priority” - the associated tag is priority-extra-is-replaced-by-priority-optional. Lintian-brush has a fixer script that can automatically replace “extra” with “optional”.

On systems that have lintian-brush installed, the source for the fixer lives in /usr/share/lintian-brush/fixers/priority-extra-is-replaced-by-priority-optional.py, but here is a copy of it for reference:

#!/usr/bin/python3

from debmutate.control import ControlEditor
from lintian_brush.fixer import report_result, fixed_lintian_tag

with ControlEditor() as updater:
    for para in updater.paragraphs:
        if para.get("Priority") == "extra":
            para["Priority"] = "optional"
            fixed_lintian_tag(
                para, 'priority-extra-is-replaced-by-priority-optional')

report_result("Change priority extra to priority optional.")

This fixer is written in Python and uses the debmutate library to easily modify control files while preserving formatting — or back out if it is not possible to preserve formatting.

All the current fixers come with tests, e.g. for this particular fixer the tests can be found here: https://salsa.debian.org/jelmer/lintian-brush/-/tree/master/tests/priority-extra-is-replaced-by-priority-optional.

For more details on writing new fixers, see the README for lintian-brush.

For more details on debugging them, see the manual page.

Successes by fixer

Here is a list of the fixers currently available, with the number of successful merges/pushes per fixer:

Lintian Tag	Previously merged/pushed	Ready but not yet merged/pushed
uses-debhelper-compat-file	4906	4161
upstream-metadata-file-is-missing	4281	3841
package-uses-old-debhelper-compat-version	4256	3617
upstream-metadata-missing-bug-tracking	2438	2995
out-of-date-standards-version	2062	2936
upstream-metadata-missing-repository	1936	2987
trailing-whitespace	1720	2295
insecure-copyright-format-uri	1791	1093
package-uses-deprecated-debhelper-compat-version	1391	1287
vcs-obsolete-in-debian-infrastructure	872	782
homepage-field-uses-insecure-uri	527	1111
vcs-field-not-canonical	850	655
debian-changelog-has-wrong-day-of-week	224	376
debian-watch-uses-insecure-uri	314	242
useless-autoreconf-build-depends	112	428
priority-extra-is-replaced-by-priority-optional	315	194
debian-rules-contains-unnecessary-get-orig-source-target	35	428
tab-in-license-text	125	320
debian-changelog-line-too-long	186	190
debian-rules-sets-dpkg-architecture-variable	69	166
debian-rules-uses-unnecessary-dh-argument	42	182
package-lacks-versioned-build-depends-on-debhelper	125	95
unversioned-copyright-format-uri	43	136
package-needs-versioned-debhelper-build-depends	127	50
binary-control-field-duplicates-source	34	134
renamed-tag	73	69
vcs-field-uses-insecure-uri	14	109
uses-deprecated-adttmp	13	91
debug-symbol-migration-possibly-complete	12	88
copyright-refers-to-symlink-license	51	48
debian-control-has-unusual-field-spacing	33	66
old-source-override-location	32	62
out-of-date-copyright-format	20	62
public-upstream-key-not-minimal	43	30
older-source-format	17	54
custom-compression-in-debian-source-options	12	57
copyright-refers-to-versionless-license-file	29	39
tab-in-licence-text	33	31
global-files-wildcard-not-first-paragraph-in-dep5-copyright	28	33
out-of-date-copyright-format-uri	9	50
field-name-typo-dep5-copyright	29	29
copyright-does-not-refer-to-common-license-file	13	42
debhelper-but-no-misc-depends	9	45
debian-watch-file-is-missing	11	41
debian-control-has-obsolete-dbg-package	8	40
possible-missing-colon-in-closes	31	13
unnecessary-testsuite-autopkgtest-field	32	9
missing-debian-source-format	7	33
debhelper-tools-from-autotools-dev-are-deprecated	9	29
vcs-field-mismatch	8	29
debian-changelog-file-contains-obsolete-user-emacs-setting	33	0
patch-file-present-but-not-mentioned-in-series	24	9
copyright-refers-to-versionless-license-file	22	9
debian-control-has-empty-field	25	6
missing-build-dependency-for-dh-addon	10	20
obsolete-field-in-dep5-copyright	15	13
xs-testsuite-field-in-debian-control	20	7
ancient-python-version-field	13	12
unnecessary-team-upload	19	5
misspelled-closes-bug	6	16
field-name-typo-in-dep5-copyright	1	20
transitional-package-not-oldlibs-optional	4	17
maintainer-script-without-set-e	9	11
dh-clean-k-is-deprecated	4	14
no-dh-sequencer	14	4
missing-vcs-browser-field	5	12
space-in-std-shortname-in-dep5-copyright	6	10
xc-package-type-in-debian-control	4	11
debian-rules-missing-recommended-target	4	10
desktop-entry-contains-encoding-key	1	13
build-depends-on-obsolete-package	4	9
license-file-listed-in-debian-copyright	1	12
missing-built-using-field-for-golang-package	9	4
unused-license-paragraph-in-dep5-copyright	4	7
missing-build-dependency-for-dh_command	6	4
comma-separated-files-in-dep5-copyright	3	6
systemd-service-file-refers-to-var-run	4	5
copyright-not-using-common-license-for-apache2	3	5
debian-tests-control-autodep8-is-obsolete	2	6
dh-quilt-addon-but-quilt-source-format	2	6
no-homepage-field	3	5
font-packge-not-multi-arch-foreign	1	6
homepage-in-binary-package	1	4
vcs-field-bitrotted	1	3
built-using-field-on-arch-all-package	2	1
copyright-should-refer-to-common-license-file-for-apache-2	1	2
debian-pyversions-is-obsolete	3	0
debian-watch-file-uses-deprecated-githubredir	1	1
executable-desktop-file	1	1
skip-systemd-native-flag-missing-pre-depends	1	1
vcs-field-uses-not-recommended-uri-format	1	1
init.d-script-needs-depends-on-lsb-base	1	0
maintainer-also-in-uploaders	1	0
public-upstream-keys-in-multiple-locations	1	0
wrong-debian-qa-group-name	1	0
Total	29656	32209

Footnotes

[1]	temporarily unavailable due to Debian bug #960156 – but the Janitor is relying on historical data

For more information about the Janitor’s lintian-fixes efforts, see the landing page.

comments.

Debian Janitor: 8,200 landed changes landed so far

meta = {'category': debian, 'date': Sat 15 August 2020}

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor.

The bot has been submitting merge requests for about seven months now. The rollout has happened gradually across the Debian archive, and the bot is now enabled for all packages maintained on Salsa, GitLab, GitHub and Launchpad.

There are currently over 1,000 open merge requests, and close to 3,400 merge requests have been merged so far. Direct pushes are enabled for a number of large Debian teams, with about 5,000 direct pushes to date. That covers about 11,000 lintian tags of varying severities (about 75 different varieties) fixed across Debian.

For more information about the Janitor’s lintian-fixes efforts, see the landing page.

comments.

Improvements to Merge Proposals by the Janitor

meta = {'category': debian, 'date': Sat 08 August 2020}

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor.

Since the original post, merge proposals created by the janitor now include the debdiff between a build with and without the changes (showing the impact to the binary packages), in addition to the merge proposal diff (which shows the impact to the source package).

New merge proposals also include a link to the diffoscope diff between a vanilla build and the build with changes. Unfortunately these can be a bit noisy for packages that are not reproducible yet, due to the difference in build environment between the two builds.

This is part of the effort to keep the changes from the janitor high-quality.

The rollout surfaced some bugs in lintian-brush; these have been either fixed or mitigated (e.g. by disabling specified fixers).

For more information about the Janitor’s lintian-fixes efforts, see the landing page.

comments.

The Debian Janitor

meta = {'category': debian, 'date': Tue 03 December 2019}

There are a lot of small changes that can be made to the Debian archive to increase the overall quality. Many of these changes are small and have just minor benefits if they are applied to just a single package. Lintian encourages maintainers to fix these problems by pointing out the common ones.

Most of these issues are often trivially fixable; they are in general an inefficient use of human time, and it takes a lot of effort to keep up with. This is something that can clearly be automated.

Several tools (e.g. onovy’s mass tool, and the lintian-brush tool that I’ve been working on) go a step further and (for a subset of the issues reported by lintian) fix the problems for you, where they can. Lintian-brush can currently fix most instances of close to 100 lintian tags.

Thanks to the Vcs-* fields set by many packages and the APIs provided by hosting platforms like Salsa, it is now possible to proactively attempt to fix these issues.

The Debian Janitor is a tool that will run lintian-brush across the entire archive, and propose fixes to lintian issues via pull request.

Objectives

The aim of Debian Janitor is to take some drudge work away from Debian maintainers where possible, so they can spend their time on more important packaging work. Its purpose is to make automated changes quick and easy to apply, with minimal overhead for package maintainers. It is essentially a bit of infrastructure to run lintian-brush across all of the archive.

The actions of the bot are restricted to a limited set of problems for which obviously correct actions can be taken. It is not meant to automate all packaging, or even to cover automating all instances of the issues it knows about.

The bot is designed to be conservative and delight with consistently correct fixes instead of proposing possibly incorrect fixes and hoping for the best. Considerable effort has been made to avoid the janitor creating pull requests with incorrect changes, as these take valuable time away from maintainers, the package doesn’t actually improve (since the merge request is rejected) and it makes it likelier that future pull requests from the Debian Janitor bot are ignored or rejected.

In short: The janitor is meant to propose correct changes if it can, and back off otherwise.

Design

The Janitor finds package sources in version control systems from the Vcs*- control field in Debian source packages. If the packaging branch is hosted on a hosting platform that the Janitor has a presence on, it will attempt to run lintian-brush on the packaging branch and (if there are any changes made) build the package and propose a merge. It is based on silver-platter and currently has support for:

The Janitor is driven from the lintian and vcswatch tables in UDD. It queries for packages that are affected by any of the lintian tags that lintian-brush has a fixer script for. This way it can limit the number of repositories it has to process.

Ensuring quality

There are a couple of things I am doing to make sure that the Debian Janitor delights rather than annoys.

High quality changes

Lintian-brush has end-to-end tests for its fixers.

In order to make sure that merge requests are useful and high-value, the bot will only propose changes from lintian-brush that:

successfully build in a chroot and pass autopkgtest and piuparts;
are not completely trivial - e.g. only stripping whitespace

Changes for a package will also be reviewed by a human before they make it into a pull request.

One open pull request per package

If the bot created a pull request previously, it will attempt to update the current request by adding new commits (and updating the pull request description). It will remove and fix the branch when the pull request conflicts because of new upstream changes.

In other words, it will only create a single pull request per package and will attempt to keep that pull request up to date.

Gradual rollout

I’m slowly adding interested maintainers to receiving pull requests, before opening it up to the entire archive. This should help catch any widespread issues early.

Providing control

The bot will be upfront about its pull requests and try to avoid overwhelming maintainers with pull requests by:

Clearly identifying any merge requests it creates as being made by a bot. This should allow maintainers to prioritize contributions from humans.
Limiting the number of open proposals per maintainer. It starts by opening a single merge request and won’t open additional merge requests until the first proposal has a response
Providing a way to opt out of future merge requests; just a reply on the merge request is sufficient.

Any comments on merge requests will also still be reviewed by a human.

Current state

Debian janitor is running, generating changes and already creating merge requests (albeit under close review). Some examples of merge requests it has created:

Using the janitor

The janitor can process any package that’s maintained in Git and has its Vcs-Git header set correctly (you can use vcswatch to check this).

If you’re interested in receiving pull requests early, leave a comment below. Eventually, the janitor should get to all packages, though it may take a while with the current number of source packages in the archive.

By default, salsa does not send notifications when a new merge request for one of the repositories you’re a maintainer for is created. Make sure you have notifications enabled in your Salsa profile, by ticking “New Merge Requests” for the packages you care about.

You can also see the number of open merge requests for a package repository on QA - it’s the ! followed by a number in the pull request column.

It is also possible to download the diff for a particular package (if it’s been generated) ahead of the janitor publishing it:

 $ curl https://janitor.debian.net/api/lintian-fixes/pkg/PACKAGE/diff

E.g. for i3-wm, look at https://janitor.debian.net/api/lintian-fixes/pkg/i3-wm/diff.

Future Plans

The current set of supported hosting platforms covers the bulk of packages in Debian that is maintained in a VCS. The only other 100+ package platform that’s unsupported is dgit. If you have suggestions on how best to submit git changes to dgit repositories (BTS bugs with patches? or would that be too much overhead?), let me know.

The next platform that is currently missing is bitbucket, but there are only about 15 packages in unstable hosted there.

At the moment, lintian-brush can fix close to 100 lintian tags. It would be great to add fixers for more common issues.

The janitor should probably be more tightly integrated with other pieces of Debian infrastructure, e.g. Jenkins for running jobs or linked to from the tracker or lintian.debian.org.

More information

See the FAQ on the homepage.

If you have any concerns about these roll-out plans, have other ideas or questions, please let me know in the comments.

comments.

Silver Platter

meta = {'category': vcs, 'date': Thu 11 April 2019}

Making changes across the open source ecosystem is very hard; software is hosted on different platforms and in many different version control repositories. Not being able to make bulk changes slows down the rate of progress. For example, instead of being able to actively run a a script that strips out an obsolete header file (say “DM-Upload-Allowed”) across all Debian packages, we make the linter warn about the deprecated header and wait as all developers manually remove the deprecated header.

Silver Platter

Silver-platter is a new tool that aids in making automated changes across different version control repositories. It provides a common command-line interface and API that is not specific to a single version control system or hosting platform, so that it’s easy to propose changes based on a single script across a large set of repositories.

The tool will check out a repository, run a user-specified script that makes changes to the repository, and then either push those changes to the upstream repository or propose them for merging.

It’s specifically built so that it can be run in a shell loop over many different repository URLs.

Example

As an example, you could use the following script (fix-fsf-address.sh) to update the FSF address in copyright headers:

 #!/bin/sh

 perl -i -pe \
 'BEGIN{undef $/;} s/Free Software
 ([# ]+)Foundation, Inc\., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA/Free Software
 \1Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301  USA/smg' *

 echo "Update FSF postal address."

Say you a wanted to create a merge proposal with these changes against offlineimap. First, log into GitHub (this needs to be done once per hosting site):

 $ svp login https://github.com

To see what the changes would be without actually creating the pull request, do a dry-run:

 $ svp run --dry-run --diff ./fix-fsf-address.sh https://github.com/offlineimap/offlineimap
 Merge proposal created.
 Description: Update FSF postal address.

 === modified file 'offlineimap.py'
 --- upstream/offlineimap.py 2018-03-04 03:28:30 +0000
 +++ proposed/offlineimap.py 2019-04-06 21:07:25 +0000
 @@ -14,7 +14,7 @@
  #
  #    You should have received a copy of the GNU General Public License
  #    along with this program; if not, write to the Free Software
 -#    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 +#    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301  USA

  import os
  import sys

 === modified file 'setup.py'
 --- upstream/setup.py       2018-05-01 01:48:26 +0000
 +++ proposed/setup.py       2019-04-06 21:07:25 +0000
 @@ -19,7 +19,7 @@
  #
  #    You should have received a copy of the GNU General Public License
  #    along with this program; if not, write to the Free Software
 -#    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 +#    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301  USA

  import os
  from distutils.core import setup, Command

Then, create the actual pull request by running:

 $ svp run ./fix-fsf-address.sh https://github.com/offlineimap/offlineimap
 ...
 Reusing existing repository https://github.com/jelmer/offlineimap
 Merge proposal created.
 URL: https://github.com/OfflineIMAP/offlineimap/pull/609
 Description: Update FSF postal address.

This would create a new commit with the updated postal address (if any files were changed) and the commit message Update FSF postal address. You can see the resulting pull request here.

Debian-specific operations

To make working with Debian packaging repositories easier, Silver Platter comes with a wrapper (debian-svp) specifically for Debian packages.

This wrapper allows specifying package names to refer to packaging branches; packaging URLs are retrieved from the Vcs-Git header in a package. For example:

1	$ debian-svp run ~/fix-fsf-address.sh offlineimap

to fix the same issue in the offlineimap package.

(Of course, you wouldn’t normally fix upstream issues like this in the Debian package but forward them upstream instead)

There is also a debian-svp lintian-brush subcommand that will invoke lintian-brush on a packaging branch.

Supported technologies

Silver-Platter currently supports the following hosting platforms:

GitHub

GitLab instances (for example Salsa or GNOME gitlab)

Launchpad (both Git and Bazaar repositories through Breezy)

It works in one of three modes:

propose: Always create a pull request with the changes

push: Directly push changes back to the original branch

attempt-push: Attempt push, and fall back to propose if the current users doesn’t have permissions to push to the repository or the branch.

Installation

There is a Silver Platter repository on GitHub. Silver Platter is also available as a Debian package in unstable (not buster).

More information

For a full list of svp subcommands, see svp(1).

comments.

Breezy evolves

meta = {'category': bazaar, 'date': Sun 24 March 2019}

Last month Martin, Vincent and I finally released version 3.0.0 of Breezy, a little over a year after we originally forked Bazaar.

When we started working on Breezy, it was mostly as a way to keep Bazaar working going forward - in a world where Python 2 has mostly disappeared in favour of Python 3).

Improvements

Since then, we have also made other improvements. In addition to Python 3 support, Breezy comes with the following other bigger changes:

Batteries Included

Breezy bundles most of the common plugins. This makes the installation of Breezy much simpler (pip install brz), and prevents possible issues with API incompatibility that plagued Bazaar.

Bundled plugins include: grep, git, fastimport, propose, upload, stats and parts of bzrtools.

>120 fixed bugs

Since Bazaar 2.7, lots of bugs in the Bazaar code base have been fixed (over 120 as of March 2019). We’ve also started an effort to go through all bugs in the Bazaar bug tracker to see whether they also apply to Breezy.

Native Git Support

Breezy now supports the Git file formats as a first class citizen; Git support is included in Breezy itself, and should work just as well as regular Bazaar format repositories.

Improved abstractions

Bazaar has always had a higher level API that could be used for version control operations, and which was implemented for both Bazaar, Git and Subversion formats.

As part of the work to support the Git format natively, we have changed the API to remove Bazaar-specific artefacts, like the use of file ids. Inventories (a Bazaar concept) are now also an implementation detail of the bzr formats, and not a concept that is visible in the API or UI.

In the future, I hope the API will be useful for tools that want to make automated changes to any version controlled resource, whether that be Git, Bazaar, Subversion or Mercurial repositories.

comments.

Lintian Brush

meta = {'category': debian, 'date': Sun 09 December 2018}

With Debian packages now widely being maintained in Git repositories, there has been an uptick in the number of bulk changes made to Debian packages. Several maintainers are running commands over many packages (e.g. all packages owned by a specific team) to fix common issues in packages.

Examples of changes being made include:

Updating the Vcs-Git and Vcs-Browser URLs after migrating from alioth to salsa

Stripping trailing whitespace in various control files

Updating e.g. homepage URLs to use https rather than http

Most of these can be fixed with simple sed or perl one-liners.

Some of these scripts are publically available, for example:

The R packaging team’s routine-update,

Ondřej Nový’s onovy-mass repository.

Lintian-Brush

Lintian-Brush is both a simple wrapper around a set of these kinds of scripts and a repository for these scripts, with the goal of making it easy for any Debian maintainer to run them.

The lintian-brush command-line tool is a simple wrapper that runs a set of “fixer scripts”, and for each:

Reverts the changes made by the script if it failed with an error

Commits the changes to the VCS with an appropriate commit message

Adds a changelog entry (if desired)

The tool also provides some basic infrastructure for testing that these scripts do what they should, and e.g. don’t have unintended side-effects.

The idea is that it should be safe, quick and unobtrusive to run lintian-brush, and get it to opportunistically fix lintian issues and to leave the source tree alone when it can’t.

Example

For example, running lintian-brush on the package talloc fixes two minor lintian issues:

 % debcheckout talloc
 declared git repository at https://salsa.debian.org/samba-team/talloc.git
 git clone https://salsa.debian.org/samba-team/talloc.git talloc ...
 Cloning into 'talloc'...
 remote: Enumerating objects: 2702, done.
 remote: Counting objects: 100% (2702/2702), done.
 remote: Compressing objects: 100% (996/996), done.
 remote: Total 2702 (delta 1627), reused 2601 (delta 1550)
 Receiving objects: 100% (2702/2702), 1.70 MiB | 565.00 KiB/s, done.
 Resolving deltas: 100% (1627/1627), done.
 % cd talloc
 talloc% lintian-brush
 Lintian tags fixed: {'insecure-copyright-format-uri', 'public-upstream-key-not-minimal'}
 % git log
 commit 0ea35f4bb76f6bca3132a9506189ef7531e5c680 (HEAD -> master)
 Author: Jelmer Vernooĳ <jelmer@debian.org>
 Date:   Tue Dec 4 16:42:35 2018 +0000

     Re-export upstream signing key without extra signatures.

     Fixes lintian: public-upstream-key-not-minimal
     See https://lintian.debian.org/tags/public-upstream-key-not-minimal.html for more details.

  debian/changelog                |   1 +
  debian/upstream/signing-key.asc | 102 +++++++++++++++---------------------------------------------------------------------------------------
  2 files changed, 16 insertions(+), 87 deletions(-)

 commit feebce3147df561aa51a385c53d8759b4520c67f
 Author: Jelmer Vernooĳ <jelmer@debian.org>
 Date:   Tue Dec 4 16:42:28 2018 +0000

     Use secure copyright file specification URI.

     Fixes lintian: insecure-copyright-format-uri
     See https://lintian.debian.org/tags/insecure-copyright-format-uri.html for more details.

  debian/changelog | 3 +++
  debian/copyright | 2 +-
  2 files changed, 4 insertions(+), 1 deletion(-)

Script Interface

A fixer script is run in the root directory of a package, where it can make changes it deems necessary, and write a summary of what it’s done for the changelog (and commit message) to standard out.

If a fixer can not provide any improvements, it can simply leave the working tree untouched - lintian-brush will not create any commits for it or update the changelog. If it exits with a non-zero exit code, then it is assumed that it failed to run and it will be listed as such and its changes reset rather than committed.

In addition, tests can be added for fixers by providing various before and after source package trees, to verify that a fixer script makes the expected changes.

For more details, see the documentation on writing new fixers.

Availability

lintian-brush is currently available in unstable and testing. See man lintian-brush(1) for an explanation of the command-line options.

Fixer scripts are included that can fix (some of the instances of) 34 lintian tags.

Feedback would be great if you try lintian-brush - please file bugs in the BTS, or propose pull requests with new fixers on salsa.

comments.

Breezy: Forking Bazaar

meta = {'category': bazaar, 'date': Mon 08 January 2018}

A couple of months ago, Martin and I announced a friendly fork of Bazaar, named Breezy.

It’s been 5 years since I wrote a Bazaar retrospective and around 6 since I seriously contributed to the Bazaar codebase.

Goals

We don’t have any grand ambitions for Breezy; the main goal is to keep Bazaar usable going forward. Your open source projects should still be using Git.

The main changes we have made so far come down to fixing a number of bugs and to bundling useful plugins. Bundling plugins makes setting up an environment simpler and to eliminate the API compatibility issues that plagued external plugins in the Bazaar world.

Perhaps the biggest effort in Breezy is porting the codebase to Python 3, allowing it to be used once Python 2 goes EOL in 2020.

A fork

Breezy is a fork of Bazaar and not just a new release series.

Bazaar upstream has been dormant for the last couple of years anyway - we don’t lose anything by forking.

We’re forking because gives us the independence to make some of the changes we deemed necessary and that are otherwise hard to make for an established project, For example, we’re now bundling plugins, taking an axe to a large number of APIs and dropping support for older platforms.

A fork also means independence from Canonical; there is no CLA for Breezy (a hindrance for Bazaar) and we can set up our own infrastructure without having to chase down Canonical staff for web site updates or the installation of new packages on the CI system.

More information

Martin gave a talk about Breezy at PyCon UK this year.

Breezy bugs can be filed on Launchpad. For the moment, we are using the Bazaar mailing list and the #bzr IRC channel for any discussions and status updates around Breezy.

comments.

Xandikos, a lightweight Git-backed CalDAV/CardDAV server

meta = {'category': foss, 'date': Sat 06 May 2017}

For the last couple of years, I have self-hosted my calendar and address book data. Originally I just kept local calendars and address books in Evolution, but later I moved to a self-hosted CalDAV/CardDAV server and a plethora of clients.

CalDAV/CardDAV

CalDAV and CardDAV are standards for accessing, managing, and sharing calendaring and addressbook information based on the iCalendar format that are built atop the WebDAV standard, and WebDAV itself is a set of extensions to HTTP.

CalDAV and CardDAV essentially store iCalendar (.ics) and vCard (.vcf) files using WebDAV, but they provide some extra guarantees (e.g. files must be well-formed) and some additional methods for querying the data. For example, it is possible to retrieve all events between two dates with a single HTTP query, rather than the client having to check all the calendar files in a directory.

CalDAV and CardDAV are (unnecessarily) complex, in large part because they are built on top of WebDAV. Being able to use regular HTTP and WebDAV clients is quite neat, but results in extra complexity. In addition, because the standards are so large, clients and servers end up only implementing parts of it.

However, CalDAV and CardDAV have one big redeeming quality: they are the dominant standards for synchronising calendar events and addressbooks, and are supported by a wide variety of free and non-free applications. They’re the status quo, until something better comes along. (and hey, at least there is a standard to begin with)

Calypso

I have tried a number of servers over the years. In the end, I settled for calypso.

Calypso started out as friendly fork of Radicale, with some additional improvements. I like Calypso because it is:

quite simple, understandable, and small (sloccount reports 1700 LOC)
it stores plain .vcf and .ics files
stores history in git
easy to set up, e.g. no database dependencies
written in Python

Its use of regular files and keeping history in Git is useful, because this means that whenever it breaks it is much easier to see what is happening. If something were to go wrong (i.e. a client decides to remove all server-side entries) it’s easy to recover by rolling back history using git.

However, there are some downsides to Calypso as well.

It doesn’t have good test coverage, making it harder to change (especially in a way that doesn’t break some clients), though there are some recent efforts to make e.g. external spec compliance tests like caldavtester work with it.

Calypso’s CalDAV/CardDAV/WebDAV implementation is a bit ad-hoc. The only WebDAV REPORTs it implements are calendar-multiget and addressbook-multiget. Support for properties has been added as new clients request them. The logic for replying to DAV requests is mixed with the actual data store implementation.

Because of this, it can be hard to get going with some clients and sometimes tricky to debug.

Xandikos

After attempting to fix a number of issues in Calypso, I kept running into issues with the way its code is structured. This is only fixable by rewriting sigifnicant chunks of it, so I opted to instead write a new server.

The goals of Xandikos are along the same lines as those of Calypso, to be a simple CalDAV/CardDAV server for personal use:

easy to set up; at the moment, just running xandikos -d $HOME/dav —defaults is enough to start a new server
use of plain .ics/.ivf files for storage
history stored in Git

But additionally:

clear separation between protocol implementation and storage
be well tested
standards complete
standards compliant

Current status

The CalDAV/CardDAV implementation of Xandikos is mostly complete, but there still a number of outstanding issues.

In particular:

lack of authentication support; setting up authentication support in uwsgi or a reverse proxy is one way of working around this
there is no useful UI for users accessing the DAV resources via a web browser
test coverage

Xandikos has been tested with the following clients:

Trying it

To run Xandikos, follow the instructions on the homepage:

1	./bin/xandikos --defaults -d $HOME/dav

A server should now be listening on localhost:8080 that you can access with your favorite client.

comments.

The Samba Buildfarm

meta = {'category': samba, 'date': Sun 08 February 2015}

Portability has always been very important to Samba. Nowadays Samba is mostly used on top of Linux, but Tridge developed the early versions of his SMB implementation on a Sun workstation.

A few years later, when the project was being picked up, it was ported to Linux and eventually to a large number of other free and non-free Unix-like operating systems.

Initially regression testing on different platforms was done manually and ad-hoc.

Once Samba had support for a larger number of platforms, including numerous variations and optional dependencies, making sure that it would still build and run on all of these became a non-trivial process.

To make it easier to find regressions in the Samba codebase that were platform-specific, tridge put together a system to automatically build Samba regularly on as many platforms as possible. So, in Spring 2001, the build farm was born - this was a couple of years before other tools like buildbot came around.

The Build Farm

The build farm is a collection of machines around the world that are connected to the internet, with as wide a variety of platforms as possible. In 2001, it wasn’t feasible to just have a single beefy machine or a cloud account on which we could run virtual machines with AIX, HPUX, Tru64, Solaris and Linux so we needed access to physical hardware.

The build farm runs as a single non-privileged user, which has a cron job set up that runs the build farm worker script regularly. Originally the frequency was every couple of hours, but soon we asked machine owners to run it as often as possible. The worker script is as short as it is simple. It retrieves a shell script from the main build farm repository with instructions to run and after it has done so, it uploads a log file of the terminal output to samba.org using rsync and a secret per-machine password.

Some build farm machines are dedicated, but there have also been a large number of the years that would just run as a separate user account on a machine that was tasked with something else. Most build farm machines are hosted by Samba developers (or their employers) but we’ve also had a number of community volunteers over the years that were happy to add an extra user with an extra cron job on their machine and for a while companies like SourceForge and HP provided dedicated porter boxes that ran the build farm.

Of course, there are some security usses with this way of running things. Arbitrary shell code is downloaded from a host claiming to be samba.org and run. If the machine is shared with other (sensitive) processes, some of the information about those processes might leak into logs.

Our web page has a section about adding machines for new volunteers, with a long list of warnings.

Since then, various other people have been involved in the build farm. Andrew Bartlett started contributing to the build farm in July 2001, working on adding tests. He gradually took over as the maintainer in 2002, and various others (Vance, Martin, Mathieu) have contributed patches and helped out with general admin.

In 2005, tridge added a script to automatically send out an e-mail to the committer of the last revision before a failed build. This meant it was no longer necessary to bisect through build farm logs on the web to find out who had broken a specific platform when; you’d just be notified as soon as it happened.

The web site

Once the logs are generated and uploaded to samba.org using rsync, the web site at http://build.samba.org/ is responsible for making them accessible to the world. Initially there was a single perl file that would take care of listing and displaying log files, but over the years the functionality has been extended to do much more than that.

Initial extensions to the build farm added support for viewing per-compiler and per-host builds, to allow spotting trends. Another addition was searching logs for common indicators of running out of disk space.

Over time, we also added more samba.org-projects to the build farm. At the moment there are about a dozen projects.

In a sprint in 2009, Andrew Bartlett and I changed the build farm to store machine and build metadata in a SQLite database rather than parsing all recent build log files every time their results were needed.

In a follow-up sprint a year later, we converted most of the code to Python. We also added a number of extensions; most notably, linking the build result information with version control information so we could automatically email the exact people that had caused the build breakage, and automatically notifying build farm owners when their machines were not functioning.

autobuild

Sometime in 2011 all committers started using the autobuild script to push changes to the master Samba branch. This script enforces a full build and testsuite run for each commit that is pushed. If the build or any part of the testsuite fails, the push is aborted. This alone massively reduced the number of problematic changes that was pushed, making it less necessary for us to be made aware of issues by the build farm.

The rewrite also introduced some time bombs into the code. The way we called out to our ORM caused the code to fetch all build summary data from the database every time the summary page was generated. Initially this was not a problem, but as the table grew to 100,000 rows, the build farm became so slow that it was frustrating to use.

Analysis tools

Over the years, various special build farm machines have also been used to run extra code analysis tools, like static code analysis, lcov, valgrind or various code quality scanners.

Summer of Code

Of the last couple of years the build farm has been running happily, and hasn’t changed much.

This summer one of our summer of code students, Krishna Teja Perannagari, worked on improving the look of the build farm - updating it to the current Samba house style - as well as various performance improvements in the Python code.

Jenkins?

The build farm still works reasonably well, though it is clear that various other tools that have had more developer attention have caught up with it. If we would have to reinvent the build farm today, we would probably end up using an off-the-shelve tool like Jenkins that wasn’t around 14 years ago. We would also be able to get away with using virtual machines for most of our workers.

Non-Linux platforms have become less relevant in the last couple of years, though we still care about them.

The build farm in its current form works well enough for us, and I think porting to Jenkins - with the same level of platform coverage - would take quite a lot of work and have only limited benefits.

(Thanks to Andrew Bartlett for proofreading the draft of this post.)

comments.

Autonomous Shard Distributed Databases

meta = {'category': misc, 'date': Fri 22 August 2014}

Distributed databases are hard. Distributed databases where you don’t have full control over what shards run which version of your software are even harder, because it becomes near impossible to deal with fallout when things go wrong. For lack of a better term (is there one?), I’ll refer to these databases as Autonomous Shard Distributed Databases.

Distributed version control systems are an excellent example of such databases. They store file revisions and commit metadata in shards (“repositories”) controlled by different people.

Because of the nature of these systems, it is hard to weed out corrupt data if all shards ignorantly propagate broken data. There will be different people on different platforms running the database software that manages the individual shards.

This makes it hard - if not impossible - to deploy software updates to all shards of a database in a reasonable amount of time (though a Chrome-like update mechanism might help here, if that was acceptable). This has consequences for the way in which you have to deal with every change to the database format and model.

(e.g. imagine introducing a modification to the Linux kernel Git repository that required everybody to install a new version of Git).

Defensive programming and a good format design from the start are essential.

Git and its database format do really well in all of these regards. As I wrote in my retrospective, Bazaar has made a number of mistakes in this area, and that was a major source of user frustration.

I propose that every autonomous shard distributed databases should aim for the following:

For the “base” format, keep it as simple as you possibly can. (KISS)

The simpler the format, the smaller the chance of mistakes in the design that have to be corrected later. Similarly, it reduces the chances of mistakes in any implementation(s).

In particular, there is no need for every piece of metadata to be a part of the core database format.

(in the case of Git, I would argue that e.g. “author” might as well be a “meta-header”)

Corruption should be detected early and not propagated. This means there should be good tools to sanity check a database, and ideally some of these checks should be run automatically during everyday operations - e.g. when pushing changes to others or receiving them.

If corruption does occur, there should be a way for as much of the database as possible to be recovered.

A couple of corrupt objects should not render the entire database unusable.

There should be tools for low-level access of the database, but the format and structure should be also documented well enough for power users to understand it, examine and extract data.

No “hard” format changes (where clients /have/ to upgrade to access the new format).

Not all users will instantly update to the latest and greatest version of the software. The lifecycle of enterprise Linux distributions is long enough that it might take three or four years for the majority of users to upgrade.

Keep performance data like indexes in separate files. This makes it possible for older software to still read the data, albeit at a slower pace, and/or generate older format index files.

New shards of the database should replicate the entire database if at all possible; having more copies of the data can’t hurt if other shards go away or get corrupted.

Having the data locally available also means users get quicker access to more data.

Extensions to the database format that require hard format changes (think e.g. submodules) should only impact databases that actually use those extensions.

Leave some room for structured arbitrary metadata, which gets propagated but that not all clients need to be able to understand and can safely ignore.

(think fields like “Signed-Off-By”, “Reviewed-By”, “Fixes-Bug”, etc) in git commit metadata headers, or the revision metadata fields in Bazaar.

comments.

Posts – Page 2