Re: Kernel SCM saga..

From: Linus Torvalds
Date: Fri Apr 08 2005 - 18:29:27 EST




On Fri, 8 Apr 2005, Rajesh Venkatasubramanian wrote:
>
> Although directory changes are tracked using change-sets, there
> seems to be no easy way to answer "give me the diff corresponding to
> the commit (change-set) object <sha1>". That will be really helpful to
> review the changes.

Actually, it is very easy indeed. Here's what you do:

- look up the commit object ("cat-file commit <sha1>")

This object starts out with "tree <sha1>", followed by a list of
parent commit objects: "parent <sha1>"

Remember the tree object (it defines what the tree looks like at
the time of the commit). Pick the parent object you want to diff
against (normally the first one).

Also, print the checking messages at the end of the commit object.

- look up the parent object ("cat-file commit <parentsha1>")

Here you have the same kind of object, but this time you don't care
about going deeper, you just pick up the tree <sha1> that describes
the tree at the parent.

- look up the two tree objects. Unlike a commit object, a tree object
is a binary data blob, but the format is an _extremely_ simple table
of thse guys:

<ascii octal filemode> <space> <pathname> <NUL character> <20-byte sha1>

and the reason it's binary is really that that way "git" doesn't end
up having any issues with strange pathnames. If you want to have spaces
and newlines in your pathname, go wild.

In particular, the tree object is also _sorted_ by the pathname. This
makes things simple, because you now have to sorted trees, and the
first thing you do is just walk the two trees in lock-step, which is
trivial thanks to the sorted nature of the tree "array".

So now you have three cases:
- you have the same name, and the same sha1

ignore it - the file didn't change, you don't even have to look
at the contents (although if the file mode changed you might
want to note that)

- you have the same name in parent and child tree lists, but the
sha differs. Now you just need to do a "cat-file" on both of the
SHA1 values, and do a "diff -u" between them.

- you have the filename in only parent or only child. Do a
"create" or "delete" diff with the content of the sha1 file.

See? Very efficient. For any files that didn't change, you didn't have to
do anything at all - you didn't even have to look at their data.

Also note that the above algorithm really works for _any_ two commit
points (apart for the two first steps, which are obviously all about
finding the parent tree when you want to diff against a predecessor).

It doesn't have to be parent and child. Pick any commit you have. And pick
them in the other order, and you'll automatically get the reverse diff.

You can even do diffs between unrelated projects this way if you use the
shared sha1 directory model, although that obviously doesn't tend to be
all that sensible ;)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/