Re: revision control for the kernel (BitKeeper)

Larry McVoy (lm@bitmover.com)
Mon, 22 Feb 1999 01:05:25 -0800


: It would be nice, if it was possible to not have the full kernel with it's
: history in the archive/repository, but only the versions since a certain
: date/revision, say 2.0.0 or 2.2.0 e.g.

Here's the reply to this question, this was discussed on the
kernel@bitmover.com alias.

To: kernel@bitmover.com
From: lm@bitmover.com (Larry McVoy)
Subject: Re: revision control for the kernel (BitKeeper)
Date: Sun, 21 Feb 1999 19:44:35 -0800

Linus Torvalds <torvalds@transmeta.com>:
: Then you come back, and you have created various new patches while you
: were away (making versions 2.3.129 and 2.3.130), so you just synch it all
: back (and now the complete version contains everything from 2.2.x to
: 2.3.130)
:
: With the collapsed version, you wouldn't have version information, but
: hey, that's cool, you just wanted a temporary smaller repository anyway.

I went off and did the numbers, which is what I should have done up
front. I can do better than that and with one heck of a lot less work.
Here are the numbers. Suppose you went through a big source base such
as the kernel, which had 10 years of history. For each file, dig out
each of the following sizes:

gotten file (i.e., checked out version)
revision history file (RCS or SCCS, they are about the same)
gzip -4 < revision history file

You will get ratios like

3.8 gotten file
7.65 SCCS file
1 gzipped SCCS file

So what's that tell us? The top of trunk checked version is approximately
1/2 the size of the entire revision control file. So even if I pruned
all the way up to the top of trunk, we get a 50% savings.

On the other hand, if we gzip -4 the SCCS file, we get almost a 4x space
savings, or a space savings of 75% instead of 50% to put in the same
terms as above.

The cool things are
- I can give you guys all the revision history
- the size on disk is 1/2 what it would be compared to the pruned
SCCS file and 1/4 what it would be compared to the uncompressed
SCCS file (this is a win in terms of disk I/O).
- it is way less work for me (I am particularly fond of this point)

So, sorry to rattle your cage about sizes, I should have ran these
numbers first, but they are very strongly pointing towards compression.
I just went and did some timings to make sure gzip is fast enough.
I can uncompress a 1.2 MB file (from .2M -> 1.2M) in 100 milliseconds
(aka an effective rate of 12MB/sec) on a 300Mhz K6. Going the other
way is slower, about 4MB/sec. But that's reasonable, it means that a
reasonably high end machine can keep up with the disk arm.

So this long rant says compression rocks. Did I miss anything?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/