Re: 2.6.24-rc4-git5: Reported regressions from 2.6.23

From: Linus Torvalds
Date: Sun Dec 09 2007 - 20:58:47 EST




On Sun, 9 Dec 2007, Alan Cox wrote:
>
> The one off regression is probably not one off, but this is IDE so
> actually its quite probable its a single broken firmware.
>
> The alternative is that you cripple just about every user of various
> other standards compliant devices and controllers whose hardware we
> finally fixed.

Alan, you're so full of shit that it's not even funny.

Have you even *read* the thread?

Tejun already reported that this apparently gets fixed _properly_ with the
more extensive cleanups and fixes that are pending for 2.6.25.

In other words, the stuff you call so critically important (yet we've been
able to live without it until now!) is apparently simply NOT YET READY.
It's breaking things.

In this case, Tejun seems to be right on the money. I also agree 100%
with him when he says

"Blacklist takes time to develop and temporary blacklist for just one
release doesn't sound like a good idea."

because if we create some blacklist for that one reported device, not only
is it likely going to be wrong (it's almost never just one firmware or one
chip that has a particular issue), but we tend to create thee blacklists
and later realize that we shouldn't have blacklisted things at all, we
should just have done things differently.

For examples of that, see the NCQ blacklist that was just _us_ doing
things wrong (over-reacting to things we shouldn't care about), and
there's currently another totally unrelated discussion on a very similar
thing wrt libata and the ACPI startup commands for an unused controller
port.

> Finally you need to remember that the 'regression' is caused by the fact
> we now do the _right_ thing both in terms of 'old IDE' and specs.

.. and what the hell does that matter? If the code doesn't work, it
doesn't work, and you might as well point to some random scribblings done
by a three-year-old on toilet paper rather than any "specs".

Real life matters more. Regressions matter more.

We apparently do have a full fix, but it seems to be too invasive for
2.6.24, which means that the thing that currently DOES NOT WORK and
causes regressions should be reverted, so that 2.6.24 is at least no worse
than 2.6.23 (and all earlier kernels) in this respect.

And then we should just hope that the more complete fix that Tejun has
doesn't cause any issues on its own. I would suggest that if you care so
deeply about this issue, you press Fedora into putting Tejun's tree into
Fedora testing, and get that thing tested out extensively.

So the fact is, we have a way forward, but we should *not* take steps
backwards just because you want to push something out that isn't quite
ready. We should revert the change that causes the current trouble, safe
in the knowledge (or at least "strong hope") that we have a way forward
that makes *both* 2.6.24 and 2.6.25 be continual improvements.

We used to allow regressions. It was really painful. It's hard to debug
things when things sometimes break. It's much better to have a nice
constant monotonic improvement.

It's better for users, but it's much better also for developers, even if
you may be frustrated right now because some new code effectively gets
shut down until it works for everybody.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/