Re: diff format

Larry McVoy (lm@bitmover.com)
Thu, 25 Feb 1999 11:03:12 -0800


: > Folks, I need to get your read on something quickly. I'd like to use the
: > diff -n format for BitKeeper patches. I suspect that many will have an
: > issue with this because these patches are pretty useless to human beings.
: > Linus especially will not like this.
:
: It's not about "like".
:
: I will totally refuse to touch _any_ diff format that does not have
: context.

That's great because that's exactly what I want :-)
But I suspect that you meant something else. So let's hash this out,
because it is crucial to the correct operation of the system. The short
summary is that I believe that you will be happy.

The problem that I'm solving is that of having a distributed source base,
with changes happening in parallel all over the world. More specifically,
what I am doing is making a change idempotent - no matter how many
times it gets passed around, the change should be present in the tree
exactly once. If this is not an invariant, then the whole model fails -
as the number of people working on the tree goes, the time spent merging
goes to 100%.

So the goal is to make a particular go into the tree exactly once, if it
goes in at all.

In order to reach that goal, changes need to be identifiable and immutable.
If you don't like a change, as is frequently the case, then the remedy it
to make another change on top of the previous change. That's what peope
do when they edit the diffs. But you can not edit the diffs and apply the
diffs as **one change**. It has to be two changes - the original, immutable
change, plus the second change. If it works like that, then you can automate
the task of figuring out if a change is in Linus' tree. If it doesn't work
like that, you have no choice but to go read the code and see if it is in
there - there is not a prayer of automating it because you can't tell how
much has been changed by someone editing the diffs.

Another way of saying this is that changes are nodes in a graph. Once a
node is created, you can't change the node, you have to add another node
which changes the previous node. Another invariant is that the parent
child relationship of two nodes is immutable.

All that said, you still want your context diffs. You can have them.
The diff -n is just what the system uses to pass the changes between
repositories. *Nothing* goes into a repository before you have a
chance to look at the diffs, any diffs you want - context, unified,
sdiff, graphical file merge, whatever. What you lose is the ability
to screen in your mail box. If you must have this, I can emulate this.

Also, what I meant by "the system can apply this automatically" with
which Alan took issue, is this: the system can get a patch, determine
that the state of the tree from which the patch was generated is
present in the local tree (or not, in which case it fails the patch),
and graft this patch into copies of the local revision control files.
That's all. Let's take an example. Suppose Alan and Linus are
4 patches apart. Linus sends out a patch containing all 4 patches.
Alan gets that, runs it through takepatch (this is the part that can
be automatic). What is the state of Alan's tree? The tree itself is
unmodified - including the files changed by the patch. The changes
are "next" to the tree like so. Suppose that panic.c is one of the
changes.

Unchanged stuff:
/usr/src/linux/kernel/panic.c: unchanged in Alan's tree
/usr/src/linux/kernel/SCCS/s.panic.c: unchanged in Alan's tree

Stuff created as a result of the takepatch:
/usr/src/linux/PENDING/1999-03-09.01: the patch file
/usr/src/linux/RESYNC/kernel/SCCS/s.panic.c: has Linus's changes
/usr/src/linux/RESYNC/kernel/SCCS/r.panic.c: shows what's new

So what takepatch does is this: it looks in the local revision control files
and makes sure, for each delta, that the parent of the delta is present.
So if Linus sent out version 1.100 of panic.c and Alan was at 1.98, the
patch will fail, and it fails for all files in the patch, not just the
panic.c file. In other words, patches are atomic (and have two phase
commit - I save the patch in PENDING first).

If Alan had 1.99, then the patch would create a new s.panic.c with version
1.100 in it. That file should be bit for bit identical with the file
Linus sent. The r.panic.c is a little bread crumb which says what needs
to be done to resolve this file. In this case, there are no conflicts,
but suppose Alan already had a 1.100 and a 1.101 which are different that
Linus's 1.100. In that case, the file would be branched and the r.panic.c
would say

merge deltas 1.100 1.99 1.99.1.2

in other words, the right version, the greatest common ancestor version, and
the left version.

At this point, you can run resolve and decide if you want to apply the patch.
Resolve gives you the option to look at the files any way you want and I'll
make it do whatever you want so that you like it. I have everything I need
to generate any sort of diff you want, including threeway diffs so that you
can see who did each change. And this is the place where I can emulate
you editing the diffs.

This is a long message and I'm afraid I'll put you all to sleep if I go on.
Does any of this make sense?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/