Re: Linux 2.6.24-rc6

From: Linus Torvalds
Date: Thu Dec 20 2007 - 23:41:45 EST




On Thu, 20 Dec 2007, Linus Torvalds wrote:
>
> It only happened for a few files that had lots of repeated lines - so that
> the diff could literally be done multiple different ways - and in fact,
> the file that caused the problems really had a bogus commit that
> duplicated *way* too much data, and caused lots of #define's to exist
> twice.

Here's the example of this kind of behaviour: in the 2.6.26-rc5 tree the
file drivers/video/mbx/reg_bits.h has the #defines for

/* DINTRS - Display Interrupt Status Register */
/* DINTRE - Display Interrupt Enable Register */

duplicated twice due to commit ba282daa919f89c871780f344a71e5403a70b634
("mbxfb: Improvements and new features") by Raphael Assenat mistakenly
adding another copy of the same old set of defines that we already got
added once before by commit fb137d5b7f2301f2717944322bba38039083c431
("mbxfb: Add more registers bits access macros").

Now, that was a mistake - and one that probably happened because Rafael or
more likely Andrew Morton used GNU patch with its insane defaults (which
is to happily apply the same patch that adds things twice, because it
doesn't really care if the context matches or not).

But what that kind of thing causes is that when you create a patch of the
end result, it can show the now new duplicate lines two different (but
equally valid) ways: it can show it as an addition of the _first_ set of
lines, or it can show it as an addition of the _second_ set of lines. They
are the same, after all.

Now, it doesn't really matter which way you choose to show it, although
because of how "git diff" finds similarities, it tends to prefer to show
the second set of identical lines as the "new" ones. Which is generally
reasonable.

However, that interacted really badly with the new git logic that said
that "if the two files end in the same sequence, just ignore the common
tail of the file", because the latter copy of the identical lines would
now show up as _part_ of that common tail, so the lines that the git diff
machinery would normally like to show up as "new" did in fact end up being
considered uninteresting, because they were part of an idential tail.

So now "git diff" would happily pick _earlier_ lines as the new ones, and
it would still be a conceptually valid diff, but because we had trimmed
the tail of the file, that conceptually valid diff no longer had the
expected shared context at the end.

And while it's a bit embarrassing, I'm really rather happy that both GNU
patch and "git apply" actually refused to apply the patch. It may have
been "conceptually correct" (ie it did really contain all of the changes!)
but because it lacked the expected context it really wasn't a good patch.

That was a rather long-winded explanation of what happened, mainly because
it was all very unexpected to me, and I had personally mistakenly thought
the git optimization was perfectly valid and actually had to go through
the end result to see what was going on.

Anyway, the diff on kernel.org should be all ok now, and mirrored out too.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/