Re: GFS2: Pull request (merge window)

From: Linus Torvalds
Date: Fri May 05 2017 - 17:08:05 EST


On Fri, May 5, 2017 at 1:28 PM, Bob Peterson <rpeterso@xxxxxxxxxx> wrote:
>
> I asked around, but nobody could tell me what went wrong. Strangely,
> this command:
>
> git log --oneline --right-only origin/master...FETCH_HEAD --stat
>
> doesn't show this, but this one does:
>
> git diff --stat --right-only origin/master...FETCH_HEAD

So the fundamental difference between "git log" and "git diff" is that
one is a "ser operation" on the commits in question, and the other is
fundamentally a "operation between two endpoints".

And that's why "git log" will always show the "right" thing - because
even in the presense of complex history, there's no ambiguity about
which commits are part of the new set, and which are in the old set.
So "git log" just does a set difference, and shows the commits in one
set but not the other.

But "git diff", because it is fundamentally about two particular
points in history, can have a hard time once you have complex history:
what are the two points?

In particular, what "git diff origin/master...FETCH_HEAD" means is really:

- find the common point (git calls it "merge base" because the common
point is also used for merging) between the two commits (origin/master
and FETCH_HEAD)

- do the diff from that common point to the end result (FETCH_HEAD)

and for linear history that is all very obvious and unambiguous.

But once you have non-linear history, and particularly once you have
back-merges (ie you're not just merging work that is uniquely your own
from multiple of your *own* branches, but you're also doing merges of
upstream code), the notion of that "common case" is no longer
unambiguous. There is not necessarily any *one* common base, there can
be multiple points in history that are common between the two
branches, but are distinct points of history (ie one is not an
ancestor of another).

And since a diff is fundamentally about just two end-points ("what are
the differences between these two points in the history"), "git diff"
fundamentally cannot handle that case without help.

So "git diff" will pick the first of the merge bases it finds, and
just use that. Which even in the presense of more complex history will
often work by luck, but more often just means that you'll see
differences that aren't all from your tree, but some of them came from
the *other* common point(s).

For example, after doing the pull, I can then do:

git merge-base --all HEAD^ HEAD^2

to see the merge bases of the merge in HEAD. In this case, because of
your back-merge, there's two of them (with more complex history, there
can be more still):

f9fe1c12d126 rhashtable: Add rhashtable_lookup_get_insert_fast
69eea5a4ab9c Merge branch 'for-linus' of git://git.kernel.dk/linux-block

and because "git diff" will just pick the first one, you will
basically have done

git diff f9fe1c12d126..FETCH_HEAD

and if you then look at the *set* of changes (with "git log" of that
range), you'll see why that diff also ends up containing those block
changes (because they came on from that other merge base: commit
69eea5a4ab9c that had that linux-block merge).

Now, doing a *merge* in git will take _all_ of those merge bases into
account, and do something *much* more complicated than just a two-way
diff. It will internally first create a single merge base (by
recursively merging up all the other merge bases into a new internal
commit), and then using that single merge base it will then do a
normal three-way merge of the two branches.

"git diff' doesn't do that kind of complicated operation, and although
it *could* do that merge base recursive merging dance, the problem
would be what to do about conflicts (which "git merge" obviously can
also have, but with git merge you have that whole manual conflict
resolution case).

So once you have complex history that isn't just about merging your
own local changes from other local branches, you'll start hitting this
situation.

Visualizing the history with "gitk" for those cases is often a great
way to see why there's no single point that can be diffed against.

But once you *do* have that kind of complex history, you're also
expected to have the expertise to handle it:

> So I created a temporary local branch and used git merge to
> generate a correct diffstat.

That's the correct thing to do. Hopefully the above explains *why*
it's the correct thing to do.

(Although to be honest, I also am pretty used to parsing the wrong
diffs, and people sometimes just send me the garbage diffstat and say
"I don't know what happened", and I'll figure it out and can still
validate that the garbage diffstat they sent me is what I too get if I
do just a silly "git diff" without taking merge bases into account).

Linus