Re: using bitkeeper to backport subsystems?

From: Larry McVoy (lm@bitmover.com)
Date: Tue Jul 23 2002 - 17:46:48 EST


On Tue, Jul 23, 2002 at 12:38:54PM -0600, Eric W. Biederman wrote:
> > > With all due respect to Larry and the bk team, I think you'll
> > > find determining 'needed changesets' in this case is a _hard_ problem.
> >
> > Thanks, we agree completely. It's actually an impossible problem
> > for a program since it requires semantic knowledge of the content
> > under revision control. And even then the program can get it wrong
> > (think about a change which shortens the depth of the stack followed by
> > a change that won't work with the old stack depth, now you export that
> > to the other tree and it breaks yet it worked in the first tree).
>
> The last time this was suggested, the idea was to look how far back into
> the repository (up to a given limit) a current changeset could apply, with all
> of it's current dependencies.
>
> But beyond that I suspect it would be easier to declare lack of dependencies.

[I'm going to argue with you up here, mostly to just explanations of how/why
BK works, but you can skip down to the next section and you'll see we have
a fair amount of agreement]

All of what you are saying makes perfect sense in a centralized system
like CVS, Perforce, Subversion, whatever. The reason it makes sense is
that there is exactly one copy of the truth, you can manipulate it in
the one location which has it, and that's that.

Sort of sounds like all those tools are better than BitKeeper, given
that description, right? Because BitKeeper is distributed, there is no
one place that you can do anything and force it upon everyone else.
You can only do things to your local revision history in a recorded
way and propogate that to anyone else that wants it.

Again, sounds like the distributed nature of BK is causing all sorts
of problems, so why not just toss it, centralized systems manage 99.9%
or more of the world's source, so they must be good enough. Maybe not.
It is the peer to peer nature of BK which allows all sorts of things to
work, from mundane stuff like performance (you operate against a local
copy of the history) to more complex things like work flow (it's trivial
to mimic Rational's unified change management system with a series of
repositories) to practical things like working both at home and at
work and not losing data.

Whether you agree or disagree with the value of the distributed nature
of BitKeeper, that's a basic part of how it works and it can be tweaked
but not thrown out. Consider it a "limitation" of the BitKeeper design.

OK, so now think about what you are asking. You want to move
changesets around out of order. Please explain to me how you are
going to synchronize two trees when you've done that. Right now,
we can use the fact that there is a strong ordering to do fast and
lightweight synchronization. Do an strace of a pull from bkbits.net when
there is nothing to pull and count the bytes that go across the wire.
It's tiny, probably about 5-6KB. Now do the same thing with CVS, the
amount of data is proportional to the number of files in the tree, i.e.,
dramatically more.

The reason we can do what we do is that a changeset actually implies the
existence of all the changesets which came before it. As soon as we do
the out of order stuff, we can no longer depend on that. The openlogging
kernel tree has 12,000 changesets in it. If I can't depend on ordering,
do you want me to compare all 12,000 to see if I need to update anything?
Or should I start doing the file by file comparison that CVS does?
No thanks, that sucks, we can quantify exactly how much it sucks and it
is too much.

I'm not saying "no, we won't fix it", I'm saying "understand why it is
the way it is and then suggest a fix". In other words, don't throw
the baby out with the bath water.

> drivers/net and drivers/ide are completely separate subtrees. At
> least not until you get ATA over ip. And even then the dependencies
> is with the ip layer.
>
> Maybe independence should be shown by putting each independent chunk
> into it's own repository. And then building a working kernel tree
> would just be a matter of checking out all of the parallel
> repositories, into the appropriate location. Then the global tree
> can just remember which version of all of the subtrees it was
> tested with last.

Whoohoo! Agree completely, and we're building this, we call them nested
repositories and they work pretty much exactly as you describe. However,
even there we do get into problems. Here's how: suppose that you have a
nested repo for include/ppc and another for arch/ppc. You make a change
in both and you commit a sort of "super changeset" which binds those
changes together because one won't work without the other. Now you go
to pull the include/ppc directory for some reason and it will force you
to pull the arch/ppc directory. So the dependencies are reduced but
can still creep across the boundaries. Not doing so isn't an option
because we can all agree we have to have some way to say "these changes
which span these subrepositories must move as a unit".

> Given that a fully independent program is likely to break because of
> a buggy libc (which I have no business depending upon the exact
> version), I think the insistence on global dependencies is just plain
> silly, you can never find the entire set of dependencies.

Agreed. And it's unlikely anyone would take explicit actions to bind
their app to libc unless for some reason they really did need at least
glibc2.3 for some (probably bad, IMHO) reason. So you'd be OK there.

By the way, we have customers who maintain (large) embedded Linux
distributions in BitKeeper with literally hundreds of unrelated
repositories, i.e., one for the kernel, one for gcc, one for make,
etc. They are the motivation behind the nested stuff so they don't
have to do scripts which do things like

        for i in `cat list_of_repos`
        do cd ~/ws/$i && bk pull
        done

> So Larry please cope with the fact that perfect dependency modeling is
> impossible, and setup a method that works in the real world.

It's high on our list and we are working on it. I get a little touchy
about it because some of what people say they want just won't work in
a distributed system, but as you suggested, there is more than one way
to do it and the nested stuff is a good start.

-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Jul 23 2002 - 22:00:44 EST