the ultimate debian database

Post on 29-May-2015

388 Views

Category:

Education

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Some comments about the sources of data stored in the Ultimate Debian Database

TRANSCRIPT

The Ultimate Debian

Database Israel Herraiz

<israel.herraiz@upm.es>

Davis, CA, July 26th 2012

Download these slides at http://slideshare.net/herraiz/the-ultimate-debian-database

1 / 25

Outline

1. Debian: what is it and sources of data

2. The UDD: what is it and where to get it

3. What has been done and what we can do

2 / 25

1. Debian: what is it and

sources of data

3 / 25

Debian

• GNU/Linux software distribution

• Goal: to deliver an entirely and exclusively free

distribution

• Maintained by volunteers

• Bureaucratic organization (policies, constitution,

social contract)

• Release when ready

• > 10 years history

• > 500 MSLOC

• > 15k packages

4 / 25

Debian Releases

5 / 25

6 / 25

Debian Source Packages

7 / 25

Source and Binary Packages

• A source package generates one or more binary

packages

octave

octave-core

octave-doc

liboctave

liboctave-dev

8 / 25

Package uploads

• There are no repositories like in other software

projects

• Although developers may privately use version

control systems

• When a bug is fixed, a new version is uploaded

• Uploads == commits

9 / 25

Source: octave

Section: math

Priority: extra

Maintainer: Debian Octave Group <pkg-octave-devel@lists.alioth.debian.org>

Uploaders: Thomas Weber <tweber@debian.org>, Sébastien Villemot

<sebastien.villemot@ens.fr>

DM-Upload-Allowed: yes

Build-Depends: gfortran, debhelper (>= 9), automake, dh-autoreconf, texinfo ….

Standards-Version: 3.9.3

Homepage: http://www.octave.org/

Vcs-Git: git://git.debian.org/git/pkg-octave/octave.git

Vcs-Browser: http://git.debian.org/?p=pkg-octave/octave.git

Source Packages metadata

10 / 25

Package: octave

Priority: extra

Section: math

Installed-Size: 4760

Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>

Architecture: amd64

Version: 3.6.1-1ubuntu1ppa1~precise1

Recommends: gnuplot, libatlas3gf-base

Replaces: octave3.2

Suggests: octave-info, octave-doc, octave-htmldoc

Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …

Conflicts: octave3.2

Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb

Size: 1746050

MD5sum: 2c431556d6cf98fd8a341e865ac63058

SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7

Description: GNU Octave language for numerical computations…

Binary Packages metadata

11 / 25

Package: octave

Priority: extra

Section: math

Installed-Size: 4760

Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>

Architecture: amd64

Version: 3.6.1-1ubuntu1ppa1~precise1

Recommends: gnuplot, libatlas3gf-base

Replaces: octave3.2

Suggests: octave-info, octave-doc, octave-htmldoc

Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …

Conflicts: octave3.2

Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb

Size: 1746050

MD5sum: 2c431556d6cf98fd8a341e865ac63058

SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7

Description: GNU Octave language for numerical computations…

Binary Packages metadata

12 / 25

Debian Popcon: Tracking Installations

• Popularity: total

install counts

• Recent Use (< 30

days)

• Old Use (Beyond 30

days)

• Data collected daily

• Users voluntarily opt-

in

• Source of bias

13 / 25

Debian Bugs

• People find bugs in binary packages

• ~500 bugs per month

• But bugs are linked to source packages

• Bugs can be

• Accepted and solved in Debian

• Rejected

• Forwarded to upstream

• Everything else, similar to other bug tracking

systems

• Life cycle, comments, severity levels…

14 / 25

2. The UDD: what is it and

where to get it

15 / 25

Research work: main paper (at MSR 2010)

16 / 25

Other papers at MSR 2010

17 / 25

What is the UDD?

• PostgreSQL database with all the information of

the sources described so far

• http://udd.debian.org

• New dumps available every two days

• ~ 500 MB bz2

• Used for some Debian internal services

• Schema too complex and too big for a slide

• Technical detail: you need a Debian-based

system to load the dump of the UDD

18 / 25

Debian sources of data

• Sources / Packages

metadata

• Bugs

• including *all*

archived bugs

• 1995-96-97

• Carnivore

• Debtags

• Popularity Contest

• DEHS

• Lintian

• Migrations to testing

• Uploads

• All the way back to

1998!

• New packages queue

• Translations status

• Orphaned packages

• Screenshots

19 / 25

!

20 / 25

Bear in mind!

• You can also obtain the source code of the

packages

• Easy to automate

• And the modifications done by the Debian

maintainers

• So add product metrics to the set of data

sources

• But this is not included in the UDD

21 / 25

3. What has been done and

what we can do

22 / 25

What kind of questions does Debian solve with the

UDD?

• High priority packages that have Release

Candidate blocker bugs

• Developers with very buggy and/or outdated

packages

• Who uploaded this package to the unstable

release?

• Who reported the RC bugs since the last

release?

23 / 25

Some questions solved in the literature

• The popularity bias

• http://oa.upm.es/9585/

• Open source projects get more bug reports if

they are popular

• The actual number of bugs is not related to the

number of bugs reported

• So more bugs actually means more quality

• Well, at least more people who decide to use the

software

24 / 25

The popularity bias

Lo

g(B

ug

s)

Log(installations)

Required packages

25 / 25

Summary

• Packages and sources metadata

• And source code

• Bugs

• All the way back to 1995-96-97!

• Popularity contest

• Maintainers activity (uploads)

• All the way back to 1998!

• And much more….

• Now, what do you think we can do with this?

top related