Porcelain in Dulwich

"porcelain" is the term that is usually used in the Git world to refer to the user-facing parts. This is opposed to the lower layers: the plumbing.

For a long time, I have resisted the idea of including a porcelain layer in Dulwich. The main reason for this is that I don't consider Dulwich a full reimplementation of Git in Python. Rather, it's a library that Python tools can use to interact with local or remote Git repositories, without any extra dependencies.

dulwich has always shipped a 'dulwich' binary, but that's never been more than a basic test tool - never a proper tool for end users. It was a mistake to install it by default.

I don't think there's a point in providing a dulwich command-line tool that has the same behaviour as the C Git binary. It would just be slower and less mature. I haven't come across any situation where it didn't make sense to just directly use the plumbing.

However, Python programmers using Dulwich seem to think of Git operations in terms of porcelain rather than plumbing. Several convenience wrappers for Dulwich have sprung up, but none of them is very complete. So rather than relying on external modules, I've added a "porcelain" module to Dulwich in the porcelain branch, which provides a porcelain-like Python API for Git.

At the moment, it just implements a handful of commands but that should improve over the next few releases:

from dulwich import porcelain

r = porcelain.init("/path/to/repo")
porcelain.commit(r, "Create a commit")


Migrating packaging from Bazaar to Git

A while ago I migrated most of my packages from Bazaar to Git. The rest of the world has decided to use Git for version control, and I don't have enough reason to stubbornly stick with Bazaar and make it harder for myself to collaborate with others.

So I'm moving away from a workflow I know and have polished over the last few years - including the various bzr plugins and other tools involved. Trying to do the same thing using git is frustrating and time-consuming, but I'm sure that will improve with time. In particular, I haven't found a good way to merge in a new upstream release (from a tarball) while referencing the relevant upstream commits, like bzr merge-upstream can. Is there a good way to do this? What helper tools can you recommend for maintaining a Debian package in git?

Having been upstream for bzr-git earlier, I used its git-remote-bzr implementation to do the conversions of the commits and tags:

% git clone bzr::/path/to/bzr/foo.bzr /path/to/git/foo.git

One of my last contributions to bzr-git was a bzr git-push-pristine-tar-deltas subcommand, which will export all bzr-builddeb-style pristine-tar metadata to a pristine-tar branch in a Git repository that can be used by pristine-tar directly or through something like git-buildpackage.

Once you have created a git clone of your bzr branch, it should be a matter of running bzr git-push-pristine-tar-deltas with the target git repository and the Debian package name:

% cd /path/to/bzr/foo.bzr
% bzr git-push-pristine-tar-deltas /path/to/git/foo.git foo
% cd /path/to/git/foo.git foo
% git branch
*  master


Samba 4 and OpenChange daily Ubuntu packages

Daily builds

As of a month ago there are Ubuntu archives with fresh packages of Samba 4 and OpenChange, built on a daily basis day from the latest upstream revision.

This means that it is now possible to run a version of Samba 4 that is less than 24 hours old, without having to know how to extract source code from the version control system that upstream is using, without having to know how to build and install an application from source, but perhaps most importantly: without having to go through the tedious process of manually updating the source code and rebuilding.

OpenChange is tightly coupled to Samba 4, so installing a new version of OpenChange usually involves installing a new version of Samba 4 as well. To make matters more confusing, the two projects use different version control systems (Samba 4 is in Git, while OpenChange is in Subversion) and different build systems (Samba 4 uses waf, OpenChange uses autoconf and make).

I have been involved in Samba 4 and OpenChange as an upstream developer and more recently also as a packager for both Debian and Ubuntu.

As an upstream developer for both these projects it is important for me that users can easily run the development versions. It makes it possible for interested users to confirm the fixes for issues they have reported and to test new features. The more users run the development version, the more confident I can be as a developer that doing a release will not cause any unexpected surprises.

As a packager it is useful to know when there are upstream changes that are going to break my package with the next release.


The daily builds work using so-called recipes which describe how to build a Debian source package from a set of Bazaar branches. For example, the Samba 4 recipe looks like this:

# bzr-builder format 0.2 deb-version 4.0.0~alpha14~bzr{revno}~ppa{revno:packaging}+{revno:debian}
merge debian lp:~samba-team/samba/unstable
merge packaging lp:~samba-team/samba/4.0-ppa-maverick

This dictates that a source package should be built by taking the upstream Samba branch and merging the Debian packaging and some recipe-specific tweaking. The last bit on the first line indicates the version string to be used when generating a changelog entry for the daily build.

Every night Launchpad (through bzr-builder) merges these branches and attempts to build the resulting source package, e-mailing me in case of build problems. Generally I fix issues that come up by committing directly to upstream VCS or to the Debian packaging branch. There is no overhead in maintaining the daily build after I've set it up.

For more information on creating source package recipes, see getting started.


The entire toolchain that does the daily package builds for Ubuntu is Free Software, and I have contributed to various bits of that toolchain over the years. It's exciting to see everything come together.


Launchpad consists of multiple pillars - one of those pillars is Soyuz, which I hack on as part of my day job at Canonical. Soyuz is responsible for the archive management and package building. Debian source packages (a combination of upstream source code and packaging metadata) get uploaded by users and then built for various architectures on our buildfarm and published to the Ubuntu archive or to users personal package archives.


Another pillar of Launchpad is Launchpad-code, which is responsible for the hosting and management of version control branches. Launchpad users can either host their branches on Launchpad directly or mirror branches (either native Bazaar branches or branches in a foreign format such as Subversion, Git or Mercurial). The mirrorring of native and foreign branches happens using standard Bazaar API's. In the case of Samba and OpenChange we import the branches of the upstream projects (Samba is in Git, OpenChange is in Subversion) and the packaging for both projects is in Bazaar.

Launchad-code calls out to Bazaar to do the actual mirrorring. Over the last few years I have done a lot of work to improve Bazaars support for foreign branches, in particular on supporting Subversion, Git and Mercurial. As the code mirrorring in Launchpad is one of the biggest users of bzr-svn and bzr-git it has helped find some of the more obscure bugs in those plugins over the last few years, to the point where there are only a handful of issues with Git imports and Subversion imports left.

bzr-git and dulwich

bzr-git provides transparent access to Git repositories from within Bazaar and is built on top of Dulwich. Dulwich is a Python library that provides access to the Git file formats and protocols that is completely independent of Bazaar. James Westby originally started it and I adopted it for bzr-git and further extended it. There are now several other projects that use it as well, including hg-git, and rabbitvcs. Apart from James and myself, almost two dozen other people have contributed it so far.

bzr-svn and subvertpy

bzr-svn provides transparant access to Subversion repositories in Bazaar. When I grew frustrated with the existing Subversion Python bindings for various reasons, I decided to create independent Python bindings for Subversion from scratch. These bindings have since been split out into a separate project - subvertpy - and other projects have since also started using them, e.g. hgsubversion and basie.

Using the daily builds

To use the Samba 4 and OpenChange daily builds (Ubuntu Maverick only for now), run:

$ apt-add-repository ppa:samba-team/ppa
$ apt-add-repository ppa:openchange/daily-builds

Currently Playing: Karnivool - Themata


Linux.Conf.Au 2010 - Day 3 - Wednesday

I went to Jonathan Corbet's yearly update of the status of the Linux kernel. He talked about the various big changes that went into the kernel over the last year as well as the development processes. The Linux kernel is probably one of the largest open source projects, and very healthy - there are a lot of individuals and companies contributing to it. With this size comes a few interesting challenges coping with the flow of changes into Linus' tree. Their current processes seem to deal with this quite well, and don't seem to need a lot of major changes at the moment.

His talk also included the obligatory list of features that landed in the last year. The only one that really matters to me is the Nouveau driver, which I'm looking forward to trying out.

The second talk I went to in the morning was Selena Deckelmann's overview of the Open Source database landscape. She mentioned there's new projects started daily, but it was still a bit disappointing not to see TDB up there.

After lunch Rob gave a talk about Subunit, introducing to the ideas behind the Subunit protocol as well as presenting an overview of the tools that are available for it and the projects that have Subunitized as of yet. It's exciting to see the Subunit universe slowly growing, I wasn't aware of some of the projects that are using it. The recently announced testrepository also looks interesting, even though it is still very rudimentary at the moment.

In the evening Tridge, Rusty, Andrew, Jeremy, AJ and I participated in the hackoff as the "Samba Team".

The hackoff was a lot of fun, and consisted of 6 problems, each of which involved somehow decoding the data file for the problem and extracting a short token from it in one way or another, which was required to retrieve the next problem. We managed to solve 4 problems in the hour that the organizers had allocated, and ended first because we were a bit quicker in solving the 4th problem than the runner-ups. No doubt the fact that we were the largest team had something to do with this.

I hung out with some of the awesome Git and Github developers in the Malthouse in the evening, and talked about Dulwich, Bazaar and Launchpad ("No really, I am not aware of any plans to add Git support to Launchpad.").


Reconciling the Samba 3 and Samba 4 source code trees

While a few of us have been working very hard on Samba 4 to allow it to rock your socks off as an Active Directory Domain Controller, some of the other Samba developers have been working just as hard on improving the existing Samba 3 codebase and adding features to that. This situation has caused tension between developers as well as technical problems in the past - code with the same purpose is being developed in parallel, libraries diverge because features are only added in one branch and not in the other, one codebase is considered "obsolete" by some and the other is considered only a playground for experimental features by others.

As of yesterday, we now have the two codebases living in one and the same git branch. This should make it a lot easier for the two to use the same libraries. Better yet, it should allow us to reconcile the copies of various libraries that exist in both codebases, all of which have diverged to some degree in the last few years.

After a few problems came up merging the two branches the easy way (they both have a directory called "source" and git doesn't deal well with renaming them to "source3" and "source4" respectively), we decided to replay the history of both branches . This has the disadvantage that all existing branches that are based on the Samba 3 and Samba 4 branches will have to be rebased against the new master branch, but it also means we keep the ability to run "git log" inside of our source directories and having it work right.

Other than the fact that this makes it possible to share more code between the two codebases, one of the ideas we have is also to see if it is possible to provide an Active Directory DC by glueing the best bits of Samba 3 and Samba 4 together (aka "Franky") before they are eventually merged completely.

Currently Playing: Phideaux - Formaldehyde


Git cutting corners

My relationship with git is still one of love and hate. It cuts corners to increase performance in a couple of places and that can be really bloody annoying.

For example, jerry renamed one of the top-level directories in Samba 3 (revision 9f672c26d63955f613088489c6efbdc08b5b2d14). Git will skip rename detection in this revision because of the number of files it affects, thus causing the output of "git log <path>" of this particular directory to be useless.

I'm the first to admit "bzr log" on directories and files in large history projects is painfully slow, but at least it gets the output right.

Currently Playing: Brandi Carlile - The Story


Ohloh - Statistics on Free Software projects

Ohloh is a nice web 2.0 site that contains stats on various Free Software projects. At the moment, they only support Subversion, CVS and Git. They're open to feature requests though. If enough people ask for it, hopefully they'll support Bazaar at some point.