Posts in 'misc'

Autonomous Shard Distributed Databases

Distributed databases are hard. Distributed databases where you don't have full control over what shards run which version of your software are even harder, because it becomes near impossible to deal with fallout when things go wrong. For lack of a better term (is there one?), I'll refer to these databases as Autonomous Shard Distributed Databases.

Distributed version control systems are an excellent example of such databases. They store file revisions and commit metadata in shards ("repositories") controlled by different people.

Because of the nature of these systems, it is hard to weed out corrupt data if all shards ignorantly propagate broken data. There will be different people on different platforms running the database software that manages the individual shards.

This makes it hard - if not impossible - to deploy software updates to all shards of a database in a reasonable amount of time (though a Chrome-like update mechanism might help here, if that was acceptable). This has consequences for the way in which you have to deal with every change to the database format and model.

(e.g. imagine introducing a modification to the Linux kernel Git repository that required everybody to install a new version of Git).

Defensive programming and a good format design from the start are essential.

Git and its database format do really well in all of these regards. As I wrote in my retrospective, Bazaar has made a number of mistakes in this area, and that was a major source of user frustration.

I propose that every autonomous shard distributed databases should aim for the following:

  • For the "base" format, keep it as simple as you possibly can. (KISS)

    The simpler the format, the smaller the chance of mistakes in the design that have to be corrected later. Similarly, it reduces the chances of mistakes in any implementation(s).

    In particular, there is no need for every piece of metadata to be a part of the core database format.

    (in the case of Git, I would argue that e.g. "author" might as well be a "meta-header")

  • Corruption should be detected early and not propagated. This means there should be good tools to sanity check a database, and ideally some of these checks should be run automatically during everyday operations - e.g. when pushing changes to others or receiving them.

  • If corruption does occur, there should be a way for as much of the database as possible to be recovered.

    A couple of corrupt objects should not render the entire database unusable.

    There should be tools for low-level access of the database, but the format and structure should be also documented well enough for power users to understand it, examine and extract data.

  • No "hard" format changes (where clients /have/ to upgrade to access the new format).

    Not all users will instantly update to the latest and greatest version of the software. The lifecycle of enterprise Linux distributions is long enough that it might take three or four years for the majority of users to upgrade.

  • Keep performance data like indexes in separate files. This makes it possible for older software to still read the data, albeit at a slower pace, and/or generate older format index files.

  • New shards of the database should replicate the entire database if at all possible; having more copies of the data can't hurt if other shards go away or get corrupted.

    Having the data locally available also means users get quicker access to more data.

  • Extensions to the database format that require hard format changes (think e.g. submodules) should only impact databases that actually use those extensions.

  • Leave some room for structured arbitrary metadata, which gets propagated but that not all clients need to be able to understand and can safely ignore.

    (think fields like "Signed-Off-By", "Reviewed-By", "Fixes-Bug", etc) in git commit metadata headers, or the revision metadata fields in Bazaar.


Using Propellor for configuration management

For a while, I've been wanting to set up configuration management for my home network. With half a dozen servers, a VPS and a workstation it is not big, but large enough to make it annoying to manually log into each machine for network-wide changes.

Most of the servers I have are low-end ARM machines, each responsible for a couple of tasks. Most of my machines run Debian or something derived from Debian. Oh, and I'm a member of the declarative school of configuration management.


Propellor caught my eye earlier this year. Unlike some other configuration management tools, it doesn't come with its own custom language but it is written in Haskell, which I am already familiar with. It's also fairly simple, declarative, and seems to do most of the handful of things that I need.

Propellor is essentially a Haskell application that you customize for your site. It works very similar to e.g. xmonad, where you write a bit of Haskell code for configuration which uses the upstream library code. When you run the application it takes your code and builds a binary from your code and the upstream libraries.

Each host on which Propellor is used keeps a clone of the site-local Propellor git repository in /usr/local/propellor. Every time propellor runs (either because of a manual "spin", or from a cronjob it can set up for you), it fetches updates from the main site-local git repository, compiles the Haskell application and runs it.


Propellor was surprisingly easy to set up. Running propellor creates a clone of the upstream repository under ~/.propellor with a README file and some example configuration. I copied config-simple.hs to config.hs, updated it to reflect one of my hosts and within a few minutes I had a basic working propellor setup.

You can use ./propellor <host> to trigger a run on a remote host.

At the moment I have propellor working for some basic things - having certain Debian packages installed, a specific network configuration, mail setup, basic Kerberos configuration and certain SSH options set. This took surprisingly little time to set up, and it's been great being able to take full advantage of Haskell.

Propellor comes with convenience functions for dealing with some commonly used packages, such as Apt, SSH and Postfix. For a lot of the other packages, you'll have to roll your own for now. I've written some extra code to make Propellor deal with Kerberos keytabs and Dovecot, which I hope to submit upstream.

I don't have a lot of experience with other Free Software configuration management tools such as Puppet and Chef, but for my use case Propellor works very well.

The main disadvantage of propellor for me so far is that it needs to build itself on each machine it runs on. This is fine for my workstation and high-end servers, but it is somewhat more problematic on e.g. my Raspberry Pi's. Compilation takes a while, and the Haskell compiler and libraries it needs amount to 500Mb worth of disk space on the tiny root partition.

In order to work with Propellor, some Haskell knowledge is required. The Haskell in the configuration file is reasonably easy to understand if you keep it simple, but once the compiler spits out error messages then I suspect you'll have a hard time without any Haskell knowledge.

Propellor relies on having a central repository with the configuration that it can pull from as root. Unlike Joey, I am wary of publishing the configuration of my home network and I don't have a highly available local git server setup.


The state of distributed bug trackers

A whopping 5 years ago, LWN ran a story about distributed bug trackers. This was during the early waves of distributed version control adoption, and so everybody was looking for other things that could benefit from decentralization.

TL;DR: Not much has changed since.

The potential benefits of a distributed bug tracker are similar to those of a distributed version control system: ability to fork any arbitrary project, easier collaboration between related projects and offline access to full project data.

The article discussed a number of systems, including Bugs Everywhere, ScmBug, DisTract, DITrack, ticgit and ditz. The conclusion of our favorite grumpy editor at the time was that all of the available distributed bug trackers were still in their infancy.

All of these piggyback on a version control system somehow - either by reusing the VCS database, by storing their data along with the source code in the tree, or by adding custom hooks that communicate with a central server.

Only ScmBug had been somewhat widely deployed at the time, but its homepage gives me a blank page now. Of the trackers reviewed by LWN, Bugs Everywhere is the only one that is still around and somewhat active today.

In the years since the article, a handful of new trackers have come along. Two new version control systems - Veracity and Fossil - come with the kitchen sink included and so feature a built-in bug tracker and wiki.

There is an extension for Mercurial called Artemis that stores issues in an .issues directory that is colocated with the Mercurial repository.

The other new tracker that I could find (though it has also not changed since 2009) is SD. It uses its own distributed database technology for storing bug data - called Prophet, and doesn't rely on a VCS. One of the nice features is that it supports importing bugs from foreign trackers.

Some of these provide the benefits you would expect of a distributed bug tracker. Unfortunately, all those I've looked at fail to even provide the basic functionality I would want in a bug tracker. Moreso than with a version control system, regular users interact with a bug tracker. They report bugs, provide comments and feedback on fixes. All of the systems I tried make these actions a lot harder than with your average bugzilla or mantis instance - they provide a limited web UI or no web interface at all.

Update: LWN later also published articles on SD and on Fossil. Other interesting links are Eric Sink's article on distributed bug tracking (Erik works at Sourcegear who develop Veracity) and the dist-bugs mailing list.


Quantified Self

Dear lazyweb,

I've been reading about what the rest of the world seems to be calling "quantified self". In essence, it is tracking of personal data like activity, usually with the goal of data-driven decision making. Or to take a less abstract common example: counting the number of steps you take each day to motivate yourself to take more. I wish it'd been given a less annoying woolly name but this one seems to have stuck.

There are a couple of interesting devices available that track sleep, activity and overall health. Probably best known are the FitBit and the jazzed-up armband pedometers like the Jawbone UP and the Nike Fuelband. Unfortunately all existing devices seem to integrate with cloud services somehow, rather than giving the user direct access to their data. Apart from the usual privacy concerns, this means that it is hard to do your own data crunching or create a dashboard that contains data from multiple sources.

Has anybody found any devices that don't integrate with the cloud and just provide raw data access?


Porcelain in Dulwich

"porcelain" is the term that is usually used in the Git world to refer to the user-facing parts. This is opposed to the lower layers: the plumbing.

For a long time, I have resisted the idea of including a porcelain layer in Dulwich. The main reason for this is that I don't consider Dulwich a full reimplementation of Git in Python. Rather, it's a library that Python tools can use to interact with local or remote Git repositories, without any extra dependencies.

dulwich has always shipped a 'dulwich' binary, but that's never been more than a basic test tool - never a proper tool for end users. It was a mistake to install it by default.

I don't think there's a point in providing a dulwich command-line tool that has the same behaviour as the C Git binary. It would just be slower and less mature. I haven't come across any situation where it didn't make sense to just directly use the plumbing.

However, Python programmers using Dulwich seem to think of Git operations in terms of porcelain rather than plumbing. Several convenience wrappers for Dulwich have sprung up, but none of them is very complete. So rather than relying on external modules, I've added a "porcelain" module to Dulwich in the porcelain branch, which provides a porcelain-like Python API for Git.

At the moment, it just implements a handful of commands but that should improve over the next few releases:

from dulwich import porcelain

r = porcelain.init("/path/to/repo")
porcelain.commit(r, "Create a commit")



From LWN's weekly edition:

Documentation is the sort of thing that will never be great unless someone from outside contributes it (since the developers can never remember which parts are hard to understand).

Avery Pennarun


Back to blogging

Hello Internet. After a long silence and several fights with Serendipity I am back.

The contents from my old Serendipity install have been migrated to restructuredText in pelican. Among other things, this means I can now get rid of the last PHP install I had left on my server.


Last day at Canonical

This Friday will be my last day at Canonical.

It has been a lot of fun working here. There is so much I have learned in the last three years. I'm going to miss my colleagues.

Over the last couple of months I have slowly stepped down from my involvement in Bazaar and the Bazaar packaging in Debian and Ubuntu. I would like to stay involved in Ubuntu, but we will see how that goes.

I'm taking some time off until the end of the year to see the world and hack, before starting something new in February.


Mumble and bluetooth

Mumble is an open source, low-latency, high quality voice chat application that we're using at Canonical, and which Samba has recently also adopted.

After I busted the cable of my cabled headset and inspired by Matt's post about Mumble and Bluetooth I bought a cheap Nokia Bluetooth headset that works with my laptop as well as my phone. By using BlueMan I even found that the headset worked out of the box on Maverick.

A nice feature in Mumble is that it is controllable using D-Bus. Blueman supports running an arbitrary command when the answer button is pressed. Combining these two features, it is possible to automatically toggle mute when the answer button is pressed:

import dbus
bus = dbus.SessionBus()
mumble = bus.get_object("net.sourceforge.mumble.mumble", "/")
is_muted = mumble.isSelfMuted()
mumble.setSelfMuted(not is_muted)

To use this script, set its path in the configuration tab for the "Headset" plugin in blueman.

The only remaining problem with this setup is that I can't keep the headset on during my work day, as I don't have a way to put it in standby mode automatically. This means that my battery runs out pretty quickly, even when nothing is happening on Mumble.

Currently Playing: Red Sparowes - Finally As That Blazing Sun Shone


Working from home

For about 6 months now I've been working for Canonical on the Soyuz component of Launchpad. Like most other engineers at Canonical I don't work at the office but from a desk at home, as our nearest office is in London, not really a distance that is feasible for a commute. I do work at regular hours during work days and stay in touch with my colleagues using IRC and voice over IP.

I did have some experience working on contracts and study assignments from home previously, but working a fulltime regular job has turned out to be a bigger challenge. It seems easy enough. No travel time, every day is casual Friday, being able to listen to obscure death metal all day without driving coworkers crazy. Awesome, right?

Well, not entirely. I can't say I wasn't warned beforehand (I was) but I still ran head-first into some of the common mistakes.


I can work well by myself and I appreciate the occasional solitude, but it does get kinda lonely when you're physically sitting by yourself for 8 hours a day, five days a week.

Fortunately we regularly have sprints at different locations around the world and, apart from appealing to the travel junkie in me, that brings some essential face time with coworkers. Electronic communication mechanisms such as mailing lists, IRC, Skype and, more recently, mumble also help make the rest of the company feel closer, but it's still very different from being able to talk to people at the water cooler (the point of which, btw, still escapes me. What's wrong with proper cold tap water?).

What also seems to help is going into the city and meeting up with others for lunch, or even just to get groceries.

Concentration, work times

One of the nice things about working at home is that you're quite flexible in planning your days; it's possible to interrupt work to run an errand if necessary. The downside of it is that it is also really easy to get distracted, and there's something I do very well: procrastinating. I initially ended up getting distracted quite often and then would end up working into the evening to make up for that lost time. The result being that, while only spending 8 hours doing actual work, it felt like having been at work for 12 hours in the end and having lost all spare time. Or as a friend summarized it accurately: working at home is all about boundaries.

This is at least partially related to the fact that I am a compulsive multi-tasker; I always do several things at once and context-switch every minute or so (prompted by e.g. having to wait for code to compile), including checking email and responding to conversations on IRC and Google Talk. This, among other things, has the effect that I respond quite slowly in IRC/IM conversations; if you've ever chatted with me you've probably noticed it. Multi-tasking has always worked well for me - despite research suggesting otherwise - because software development always involves a lot of waiting (for vcses, compilers, testsuites, ...).

Recently I've tried to eliminate some of the other distractions by signing out off Skype, Empathy (Google Talk, MSN, etc) and Google reader completely and only checking email a couple of times per day.

Feeling productive

What has perhaps surprised me most of all was how essential the satisfaction of getting something done is. After spending about a day staring at Python code it's important for your mood to have accomplished something. This appears to be a vicious circle, as lack of progress kills the fun of work, which kills motivation, which causes a lack of progress.

I am hard core, so during my first months I used my lunch breaks and evenings to hack on other free software projects, triaging bug reports that had come in or reviewing patches. Despite the fact that this is indeed technically a break from Launchpad, it didn't (surprise!) seem to work as well as stepping away from the computer completely. Also, it turns out that spending 14 hours a day programming doesn't make you all that much more productive than working a couple of hours less.

What I've discovered recently is that getting at least one branch done by the end of each day, even if it's just by fixing a trivial bug, helps tremendously in giving me some sense of accomplishment. Julian also wrote a blog post with some useful hints on feeling productive a while ago.

What is your experience working from home? Any good tips?

Currently Playing: Sieges Even - Unbreakable


Thoughts on the Nokia N900

Yesterday I started writing up a quick review of my new phone/tablet, a Nokia N900. Unfortunately it has since broken down by what appears to be a hardware issue. It does still work to a certain degree, as the leds flash and I can access it as a USB mass-storage device, but the screen hasn't shown anything since I woke up this morning. The timing could not have been worse, as I'm currently abroad and having a phone (as well as a GPS) would be useful, especially considering the ash cloud situation in Iceland.

Update: 4 weeks after dropping my phone off at the Nokia care center I have received a new N900; they weren't able to tell me what was wrong with the old one.

Anyway, my thoughts about the N900. Please note that I haven't used a lot of other smartphones, so I can't really compare this with Android phones or iPhones.

The good

The integration between the different components of the phone is really well done. Nokia has invested heavily in Telepathy, which is used for all voice and text messaging, and it shows. Skype and regular telephony are very nicely integrated with telepathy, albeit through proprietary daemons. There is one global status for all protocols. Skype and SIP conferencing work well.

The address book is another thing that is very nicely integrated with the rest of the device. It combines IM addresses, phone numbers, email addresses and other information for a contact and there are third party applications that can help keep that information up to date.

Both the Ovi maps app and its data are free, and it's possible to copy the maps onto the device so you don't need to stream them all from the internet when you're abroad. It's pretty quick compared to e.g. a Garmin GPS, but lacks features. It can't load GPX files, can't show points of interest, store tracks, etc.

The music player is neat and automatically indexes all audio and video files that are copied onto the device. The camera and photo manager work well, and are reasonably snappy.

The web browser displays all pages I've accessed so far without rendering issues, including flash.

There are quite a lot of neat third-party free software applications available - eCoach, Hermes, grr, mBarcode, Mauku, tubes, tuner, fahrplan.

The bad

Maemo is Linux-based and it has a lot of similarities to your average Debian-derived distribution. Despite that, it contains a lot of proprietary software. In particular, the Skype and POT plugins for telepathy, the address book and things like Ovi Maps are proprietary. This means it's impossible for me to fix little annoyances (see below) in these packages, but more importantly, it's impossible for others to fix these little annoyances or port these apps to the desktop and other devices (where there's an even larger pool of potential bug fixers).

The GPS doesn't work very well. This might be due to the fact that the hardware is substandard, but I suspect it has got more to do with the fact that if it does not find a signal within 30 seconds it will switch itself off. I assume this is to save energy, but this behaviour can not be switched off anywhere. A workaround is to close and reopen Ovi Maps every 15 seconds or so, but that's pretty annoying. I'm sure somebody with access to the source (I don't!) can fix it.

I'm sure the calendaring app is nice, but it lacks support for synchronization with Google calendar, which makes it unusable for me.

The standard email client does work, but it doesn't scale well. It will lock up at times while trying to index mailboxes. I've tried using claws mail for a while, but its interface is just too cumbersome - in general, but in particular on such a small screen.

The battery lifetime sucks. If I'm lucky I can make it through a single day with a fully charged battery, with only mild usage. The form factor is also still a minor issue for me, but I doubt that bit can be fixed in software.

Overall I'm pretty happy with the N900, although I'm not sure if I would pick it over an Android phone again.


Linux.Conf.Au 2010 - Day 3 - Wednesday

I went to Jonathan Corbet's yearly update of the status of the Linux kernel. He talked about the various big changes that went into the kernel over the last year as well as the development processes. The Linux kernel is probably one of the largest open source projects, and very healthy - there are a lot of individuals and companies contributing to it. With this size comes a few interesting challenges coping with the flow of changes into Linus' tree. Their current processes seem to deal with this quite well, and don't seem to need a lot of major changes at the moment.

His talk also included the obligatory list of features that landed in the last year. The only one that really matters to me is the Nouveau driver, which I'm looking forward to trying out.

The second talk I went to in the morning was Selena Deckelmann's overview of the Open Source database landscape. She mentioned there's new projects started daily, but it was still a bit disappointing not to see TDB up there.

After lunch Rob gave a talk about Subunit, introducing to the ideas behind the Subunit protocol as well as presenting an overview of the tools that are available for it and the projects that have Subunitized as of yet. It's exciting to see the Subunit universe slowly growing, I wasn't aware of some of the projects that are using it. The recently announced testrepository also looks interesting, even though it is still very rudimentary at the moment.

In the evening Tridge, Rusty, Andrew, Jeremy, AJ and I participated in the hackoff as the "Samba Team".

The hackoff was a lot of fun, and consisted of 6 problems, each of which involved somehow decoding the data file for the problem and extracting a short token from it in one way or another, which was required to retrieve the next problem. We managed to solve 4 problems in the hour that the organizers had allocated, and ended first because we were a bit quicker in solving the 4th problem than the runner-ups. No doubt the fact that we were the largest team had something to do with this.

I hung out with some of the awesome Git and Github developers in the Malthouse in the evening, and talked about Dulwich, Bazaar and Launchpad ("No really, I am not aware of any plans to add Git support to Launchpad.").


Linux.Conf.Au 2010 - Day 2 - Tuesday

On Tuesday we had our "Launchpad" mini-conf, which featured talks from Launchpad developers and users alike. It wasn't necessarily about hosting projects on Launchpad, but rather about how various projects could benefit from Launchpad.

I popped out of the Launchpad track for a bit to attend Andrews talk about the current status of Samba 4. He did a nice job of summarizing the events in the last year, the most of import one of course being the support for DC synchronization. I'm proud we've finally managed to pull this off - and hopefully we'll actually have a beta out next year. We have been saying "maybe next year" for almost 4 years now when people asked us for estimates of a release date.

At the end of the day Aaron and I gave a quick introduction to Launchpad's code imports and code reviews.


Linux.Conf.Au 2010 - Day 1

Linux.Conf.Au has a reputation for being one of the best FLOSS conferences in the world, and it more than lived up to my high expectations. The last one I attended was also in New Zealand, but further south - in Dunedin.

Day 1 - Monday

As usual the actual conference was preceded by two days of miniconfs. On the first day I attended some of the talks in the Open Languages track.

mwhudson gave a talk about pypy - Python implemented in Python. He discussed the reasons for doing what they do and the progress they've made so far. Like so many of the custom Python implementations, one of the main things that's holding them back is the lack of support for the extensions written in C for CPython.

Rusty gave a quick tutorial to talloc after lunch ("it's a shame K&R didn't think of this!") and explained why it's so great.

In the afternoon I caught some of the talks in the distro summit track. Both of the sessions that I attended happened to be Ubuntu-related - first Dustin gave a quick introduction to the components of Launchpad, followed by a talk from Lucas about the relationship between Ubuntu and Debian. There was a discussion afterwards about interoperability between the various hosting sites and bug trackers. Several audience members questioned the relevance of Debian and suggested everything should just switch to Launchpad, but this seemed to be founded in ignorance. (For the record, none were actually Launchpad developers).


Build from branch

At the moment I am returning home after three very productive and awesome weeks in Wellington, Sydney and Strasbourg.

I spent the first week in the West Plaza in Wellington, working together with fellow Launchpad developers on getting the basics of building from branches working. We eventually managed to get something working at the end of Friday afternoon. We split the work up at the beginning of the week and then worked on it in pairs for a couple of days before integrating all work on Friday. At the end of the week William managed to get a basic source package build from recipe through the queue.

Pair-programming with Jono and Michael was very educational, I suspect I'll be a fair bit quicker when I get back to hacking on Launchpad by myself. It's scary to see how some people can make the changes that would take me a full day in a mere hour.

Tim picked up my initial work on support for Mercurial imports and completed and landed it during the sprint. Since the rollout on Wednesday it is possible to request Mercurial imports on Launchpad. Most imports (e.g. mutt, dovecot, hg) seem to work fine, with the main exception being the really large Mercurial repositories such as and OpenJDK. This is because of (known) scaling issues that will be fixed in one of the next releases of bzr-hg.

This was the first time I was back in Wellington since 2006, and the weather this year was exactly as I remembered it; showers and wind, with the occasional day of sunshine. For a capital the city centre is quite small, but it has its charm and the view from the various hills around the bay is amazing.

On the weekend I met up with Andrew and Kirsty and we did some hiking around Wellington (where the weather allowed it).