Posts tagged 'buildlog'

The upstream ontologist

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor.

The upstream ontologist is a project that extracts metadata about upstream projects in a consistent format. It does this with a combination of heuristics and reading ecosystem-specific metadata files, such as Python’s setup.py, rust’s Cargo.toml as well as e.g. scanning README files.

Supported Data Sources

It will extract information from a wide variety of sources, including:

Supported Fields

Fields that it currently provides include:

  • Homepage: homepage URL
  • Name: name of the upstream project
  • Contact: contact address of some sort of the upstream (e-mail, mailing list URL)
  • Repository: VCS URL
  • Repository-Browse: Web URL for viewing the VCS
  • Bug-Database: Bug database URL (for web viewing, generally)
  • Bug-Submit: URL to use to submit new bugs (either on the web or an e-mail address)
  • Screenshots: List of URLs with screenshots
  • Archive: Archive used - e.g. SourceForge
  • Security-Contact: e-mail or URL with instructions for reporting security issues
  • Documentation: Link to documentation on the web:
  • Wiki: Wiki URL
  • Summary: one-line description of the project
  • Description: longer description of the project
  • License: Single line license description (e.g. “GPL 2.0”) as declared in the metadata[1]
  • Copyright: List of copyright holders
  • Version: Current upstream version
  • Security-MD: URL to markdown file with security policy

All data fields have a “certainty” associated with them (“certain”, “confident”, “likely” or “possible”), which gets set depending on how the data was derived or where it was found. If multiple possible values were found for a specific field, then the value with the highest certainty is taken.

Interface

The ontologist provides a high-level Python API as well as two command-line tools that can write output in two different formats:

For example, running guess-upstream-metadata on dulwich:

 % guess-upstream-metadata
 <string>:2: (INFO/1) Duplicate implicit target name: "contributing".
 Name: dulwich
 Repository: https://www.dulwich.io/code/
 X-Security-MD: https://github.com/dulwich/dulwich/tree/HEAD/SECURITY.md
 X-Version: 0.20.21
 Bug-Database: https://github.com/dulwich/dulwich/issues
 X-Summary: Python Git Library
 X-Description: |
   This is the Dulwich project.
   It aims to provide an interface to git repos (both local and remote) that
   doesn't call out to git directly but instead uses pure Python.
 X-License: Apache License, version 2 or GNU General Public License, version 2 or later.
 Bug-Submit: https://github.com/dulwich/dulwich/issues/new

Lintian-Brush

lintian-brush can update DEP-12-style debian/upstream/metadata files that hold information about the upstream project that is packaged as well as the Homepage in the debian/control file based on information provided by the upstream ontologist. By default, it only imports data with the highest certainty - you can override this by specifying the —uncertain command-line flag.

[1]Obviously this won’t be able to describe the full licensing situation for many projects. Projects like scancode-toolkit are more appropriate for that.

comments.

The Buildlog Consultant

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor.

Reading build logs

Build logs for Debian packages can be quite long and difficult for a human to read. Anybody who has looked at these logs trying to figure out why a build failed will have spent time scrolling through them and skimming for certain phrases (lines starting with “error:” for example). In many cases, you can spot the problem in the last 10 or 20 lines of output – but it’s also quite common that the error is somewhere at the beginning of many pages of error output.

The buildlog consultant

The buildlog consultant project attempts to aid in this process by parsing sbuild and non-Debian (e.g. the output of “make”) build logs and trying to identify the key line that explains why a build failed. It can then either display this specific line, or a fragment of the log around surrounding the key line.

Classification

In addition to finding the key line explaining the failure, it can also classify and parse the error in many cases and return a result code and some metadata.

For example, in a failed build of gnss-sdr that has produced 2119 lines of output, the reason for the failure is that log4cpp is missing – which is on line 641:

634
635
636
637
638
639
640
641
642
643
644
645
646
647
 -- Required GNU Radio Component: ANALOG missing!
 -- Could NOT find GNURADIO (missing: GNURADIO_RUNTIME_FOUND)
 -- Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
 -- Could NOT find LOG4CPP (missing: LOG4CPP_INCLUDE_DIRS
 LOG4CPP_LIBRARIES)

 CMake Error at CMakeLists.txt:593 (message):
   *** Log4cpp is required to build gnss-sdr

 -- Configuring incomplete, errors occurred!
 See also "/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/CMakeFiles/
 CMakeOutput.log".
 See also "/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/CMakeFiles/
 CMakeError.log".

In this case, the buildlog consultant can both figure out line was problematic and what the problem was:

 % analyse-sbuild-log build.log
 Failed stage: build
 Section: build
 Failed line: 641:
   *** Log4cpp is required to build gnss-sdr
 Error: Missing dependency: Log4cpp

Or, if you’d like to do something else with the output, use JSON output:

 % analyse-sbuild-log --json build.log
 {"stage": "build", "section": "Build", "lineno": 641, "kind": "missing-dependency", "details": {"name": "Log4cpp""}}

How it works

The consultant does some structured parsing (most notably it can parse the sections from a sbuild log), but otherwise is a large set of carefully crafted regular expressions and heuristics. It doesn’t always find the problem, but has proven to be fairly accurate. It is constantly improved as part of the Debian Janitor project, and that exposes it to a wide variety of different errors.

You can see the classification and error detection in action on the result codes page of the Janitor.

Using the buildlog consultant

You can get the buildlog consultant from either pip or Debian unstable (package: python3-buildlog-consultant ).

The buildlog consultant comes with two scripts – analyse-build-log and analyse-sbuild-log, for analysing build logs and sbuild logs respectively.

comments.