Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

From: Linus Torvalds
Date: Thu May 24 2007 - 22:43:20 EST




On Fri, 25 May 2007, Nigel Cunningham wrote:
>
> To answer the question, I guess the answer is that although they're
> different creatures, they have similarities. This is one of them, which
> is why I could make the mistake I did. Nothing in the issue being
> discussed was unique to suspend-to-ram. Perhaps we (or at least I) focus
> too much on the similarities, but that doesn't mean they're not there.

I agree that the current bug is not unique to STR. In fact, I think Romano
tested both STD and STR, and both had the same bug with the 60s timeout.

But what irritates me is that STR really shouldn't have _had_ that bug at
all. The only reason STR had the same bug as STD was exactly the fact that
the two features are too closely inter-twined in the kernel.

That irritates me hugely. We had a bug we should never had had! We had a
bug because people are sharing code that shouldn't be shared! We had a bug
because of code that makes no sense in the first place!

I agree that disk snapshotting is much harder. If we had a bug just in
that part, I wouldn't mind it so much. Getting hard problems wrong isn't
something you should be ashamed of. What I mind is that the _easier_
problem got infected by all the bugs from the _harder_ issue. That just
makes me really really angry and frustrated.

Look at it this way: if you designed a CPU, and you made the integer
code-path share everything with the floating point side, because "addition
is addition", and as a result the latency for the simple arithmetic and
logical ops in integer ALU was four cycles, what would you be?

You'd be a moron, that's what.

And that is _exactly_ what the current STD/STR code does. It says "suspend
is suspend" and tries to share the same pipeline, even though the two
operations are totally different, and share nothing but the name people
use for it (and even the name is really pretty weak, and I've tried to
get people to use some other name for STD).

So yes,the two things _do_ share the problem, but they really really
shouldn't. There's no reason to think that they should. And it drives me
absolutely bonkers that people seem to have such a hard time seeing that.

That said, I think freezing is crap even for snapshotting/suspend-to-disk,
but the point of the above rant is to show how insane it is to think that
problems and complexity in one area should translate into problems and
complexity in another area.

And if the snapshot people want to screw up their snapshots with freezing,
I don't actually care that much. I'd much rather have working STR. As it
is now, they're now _both_ broken.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/