Proposal: Aegis to manage Linux kernel development

Peter Miller (millerp@canb.auug.org.au)
Fri, 26 Mar 1999 12:27:53 +0100


Purpose of this Posting

Recently there have been discussions about how to manage the
Linux kernel sources, rapidly side-tracking into how CVS isn't
sufficiency capable to do the job. These discussions appear in
numerous places on the Internet, and have even appeared in more
public forums, such as the recent Linux Expo.

I would like to suggest a candidate for serious consideration:

Aegis

This post is rather long, and I apologize in advance if you feel
this topic is an inappropriate use of linux-kernel bandwidth.
While it is a "meta" issue, about management of the kernel
sources, rather than about the kernel itself, no other forum
would appear more appropriate.

Summary for the Impatient

Source management is not enough. The Linux kernel is more than
the aggregation of its source files. A tool which supports the
the software development process for large teams is required.

Aegis supports large teams and large projects. Aegis is designed
around change sets. Aegis is designed around repository security
(availability, integrity and confidentiality). Aegis' distributed
development uses this existing mature functionality to keep two
or more repositories synchronized.

Aegis supports multiple repositories, multiple lines of
development, multiple distributed copies of repositories,
disconnected operation, and is security conscious.

Aegis is licensed under the GNU GPL.

Aegis is mature software. It is 8 years old. It has users all
around the world. It is actively being maintained and enhanced.

Aegis is easy to use. Is -is- big, it -does- have a lot of
functionality, but the essential process can be learned in less
than a day.

Aegis is available from

http://www.canb.auug.org.au/~millerp/aegis/

Please download it, plus one of the template projects, to get a
feel for the environment. If you would like more information,
there is also a Reference Manual and User Guide available from
the same place.

Source Management is not sufficient

In looking for a better way to manage the Linux kernel sources,
it is necessary to look beyond the obvious and perennial file
bashing, to see if there could be a larger picture.

In writing software, there is one basic underlying activity,
repeated again and again:

edit, build, test, check, commit

Different textbooks and tools will call the various steps
different things, like

edit, make, Unit Test, Peer Review, check-in

and for single-person projects, some of these steps are so
abbreviated as to be almost invisible, especially when you simply
jump in and edit the files in the master source directly.

And the activities are rarely so pure, usually there are
iterations and backtracking, which also serves to obscure the
underlying commonality of software development. The review step,
in particular, often moves around a great deal.

For the maintainer of an Internet project, the activities are
remarkably similar:

edit: apply an incoming patch,
build it (also serves to make sure it is consistent with
itself and the rest of the project),
test: make sure it works (does the thing right),
review: make sure it is appropriate (does the right thing),
commit: yes, I'll accept this

The term ``source management'' carries with it a focus on the
source files, but the activities outlined above only talk about
files indirectly! Source management alone is not enough.

Tools like RCS and SCCS concentrate exclusively on single files.
CVS also concentrates on files, but only at a slightly higher
level.

Enter the Change Set

One of the most obvious things about the software development
process outlined above, is that it is about *sets* of files.
You almost always edit several files to fix a bug or add a new
feature, you then build them to stitch them together into the
project, you test them as a set, if there is a review they will
be reviewed as a set, and you commit them together.

A project makes progress by applying a series of these change
sets, so tracking them is the only way to re-create self-
consistent prior versions of the project.

Software developers, however, frequently work on several changes
at once. Figuring out where one change sets stops and another
starts requires a modicum of discipline. The fuzziness of the
boundaries often serves to obscure the underlying presence of
change sets.

But are change sets enough? Change sets are, after all, a way
of aggregating the right versions of sets of *files*, and the
software development process above only mentions change sets
indirectly.

What Could be More than Change Sets?

For many developers, even those working in large companies and in
large teams, change sets are the best tool they have. They work,
day in and day out, with change sets. And they get the job done.

But take a look, for a moment, at what the project maintainer
does:
if the patch doesn't apply cleanly, don't accept it
if the patch doesn't build, don't accept it
if the patch doesn't test OK, don't accept it
if the patch doesn't look right, don't accept it
else commit

Stepping back a bit, you will notice that these apply equally
in work within a software house. How often have we all seen
stuff which was allowed to skip one of the validations, only to
get yanked and re-fixed later?

The next step in improving the development process is automating
the tracking of these steps, to make sure each one has been done.
Some tools merely beep at you if you skip a step, others make
the validations mandatory before a commit may occur. Mandatory
things usually get developers riled up, and prevent introduction
of the tools.

But these validations are done for a purpose: they are there to
catch stuff-ups *before* they reach the repository. They exist
to defend the quality of the product. They are not arbitrary
rules, they are just checking that we are doing the things we
say we are doing already.

The pay-back for such a tool is to detect such process blunders
before they introduce defects into the project. Fixing them
before they are committed is less effort than fixing them after
they are released (if we are to believe cumulative experience
*and* the numerous studies).

Let's look at the maintainers role again for a moment. Those
first 3 steps (patch, build, test) can be automated. I would not
suggest for a moment that the commit should be unconditional!
Thus, the 4th step, the code review, is the essential work
of the maintainer. The pay-back of this is also clear - less
mindless tedium for the maintainer.

What ELSE Could be More than Change Sets?

Most folks are not convinced by any of this. It's just a crock.
They can do it perfectly well manually. They *have* been doing
it manually for a decade or more - with more flexibility, too!

Working in a team comes with a number of costs. The most
obvious cost is that you need to manage the interactions between
the developers. It becomes rapidly obvious that they can't all
just leap into the source tree and edit on the files directly,
because pretty much instantly nothing compiles for anyone.
And the change sets are obfuscated beyond redemption.

That's what work areas are for - they've been re-invented
thousands of times, and have been called zillions of different
names (e.g. sand boxes), but they all do the same thing: Each
developer gets their own work area, and they leave the master
source alone. They do all their work there, and only when they
are ready to commit do files get modified in the master source.

Notice the strong correlation between work areas and change
sets? Different tools make this correlation weaker or stronger,
depending on what they are trying to achieve. The basic concept,
however, is that change sets have meaning even after the files
are committed, whereas a work area is where change sets are
created and reproduced.

A tool which seeks to do more than just manage files, or
even change sets, needs to address work areas, too. This is
particularly true when one of the validations (build, test or
review) *fail*. You don't want the master source polluted.

Work areas are only half the story though. Teams almost
immediately lead to the next problem: file conflicts. No matter
how you implement file locking, at some point you have to merge
the competing edits. Different tools do this at different points
in the software development process, but they all do it.

The tool needs to track file versions in work areas, so you know
if the file is up-to-date (if someone has committed a competing
edit ahead of you). This isn't a big problem, because change sets
must record file versions anyway. If the file isn't up-to-date,
you need a 3-way merge to bring it up-to-date (and you have the
3 versions - the one copied, the one in the work area, and the
one most recently committed). Most tools prevent commit from
occurring if the file needs to be merged. (You could prevent
build and test, too, but that's a bit too officious - there are
often good reasons for working with outdated sources.)

Software Configuration Management

``Nuh, uh. No way! I've tried BarfCase and it always crashed /
went far too slowly / harassed me. Not going there!''

This is a common reaction to tools which attempt to do more than
baby-sit files. On the whole, it's a very reasonable reaction,
considering what some of them do to you and your system.

However, SCM is the correct term (in the textbooks, anyway) for
looking after the process and not just the files. To look after
more you need to actually track the progress of change sets as
they work their way through the process. Some tools are *very*
invasive about this, and some are more subtle.

There are things the SCM tool needs to know to do its job:

* when a change set is created (this often implies the creation
of a work area)

* when a file is added to a change set, so the version can be
recorded (this often implies a copy into the work area)

* when files are created or deleted or renamed as part of a
change set.

* the results of building the change set (either for warnings
or errors, if a commit is tried against a failed build).

* the results of testing the change set (either for warnings or
errors, if a commit is tried against a failed test).

* the results of a review of the change sets (either for warnings
or errors, if a commit is tried against a failed review).

* when the change set is committed or abandoned (i.e. when it
is finished)

None of these things are new. All of us are doing all of
them already. Sometimes, some of the steps are pretty short,
but they are all there.

Distributed Development

Once you have change sets, you have the basics of distributed
development. You can use their information about files and file
versions to package them up and sling them across the net.

But what do you do when you are the recipient of a change set?
There is no way you are going to apply the damn thing to your
repository sight unseen. You are going to check it all the ways
you can: you will build it, you will test it, you will review it,
and maybe decide to commit it. You need *process*.

Even when you are working alone, when you are the only user on
a single PC, participation in a distributed development project
is a -team- activity, and you need an SCM tool which is designed
for working in teams. Source management alone is not enough.

Aegis

Aegis is a software configuration management system. It does
all of the above and more besides, but it delegates as much
as possible, so as to give you access to the other development
tools you need...

* the build step is watched, but what it does, and what tool
you use to do it, is up to you. Yes, you can use make.

* file merges are watched, but what it does, and what tool you
use to do it, is up to you.

* the test step is watched, but what it does, and what tool you
use to do it, is up to you. It's also optional.

* the review step is watched, but what it does, and what tool
you use to do it, is up to you.

* the commit step is watched, but what it does, and what tool
you use to do it, is up to you. Yes, you can use RCS. Yes,
you can use SCCS.

Aegis does all this, but introduces a bare minimum of commands.
Most of them perform functions developers are already intimately
familiar with, and others with obvious purpose in a process like
the one described above. Some of them are described here:

aenc (new change), aedb (develop begin) are used to create a
change set, and create its work area.

aecp (copy files) - analogous to RCS ``co'', used to copy
files into the change set, and remember the version.

aeb (build) - used to run the build tool of your choice, and
wait for the exit status.

aed (diff) - used to see the differences between the baseline
and the change set.

aede (develop end) - used to say the change set is ready for
review.

aerpass (review pass) - used to say a change set has passed review.

The commands are different (e.g. aeb vs make, aecp vs co) but the
activities are familiar. Aegis is easy to use - believe it or
not, you've just seen all of the *routine* commands necessary
for a developer to submit a change (there are only a couple
more routine commands for change set integrators, and they are
often automated).

One more command... The aedist command is used to package change
sets for sending, and unpackage them on receipt.

aedist -send -change N | mail linus

will take change set N and mail it somewhere. Easy. To apply
it at the other end (I use MH in this example) you simply say

show | aedist -receive

The change set will unpacked into a separate work area, be built,
and be tested (if tests enabled). If the change set has no
problems, it will then stop and wait for review. Similar things
can be done with aedist for web servers and clients.

Where to from Here?

Can Aegis do the job? I believe that it can, but you should
not take my word for it! Download a copy and start playing.
Get a feel for it. You can get Aegis from

http://www.canb.auug.org.au/~millerp/aegis/

If you would like to read some manuals, there is PostScript
copies of the User Guide and Reference Manual available for
download from the same place.

Once you have Aegis installed, download one of the template
projects, available from the same place. These template projects
get you up and running very quickly. (They also exercise the
distributed development functionality to do so: your first taste.)

In order to have informed discussion of the merits of Aegis,
it is necessary for a number of people to download Aegis and
try it out. And also try out distributing change sets with it.

Once this has happened, it will be possible to discuss whether
or not it is suitable for Linux kernel development, and if so,
how to implement it.

I look forward to your thoughtful comments and suggestions.

Regards
Peter Miller E-Mail: millerp@canb.auug.org.au
/\/\* WWW: http://www.canb.auug.org.au/~millerp/
Disclaimer: The opinions expressed here are personal and do not necessarily
reflect the opinion of my employer or the opinions of my colleagues.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/