Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

From: Nigel Cunningham
Date: Thu May 24 2007 - 23:01:09 EST


Hi.

On Thu, 2007-05-24 at 19:41 -0700, Linus Torvalds wrote:
>
> On Fri, 25 May 2007, Nigel Cunningham wrote:
> >
> > To answer the question, I guess the answer is that although they're
> > different creatures, they have similarities. This is one of them, which
> > is why I could make the mistake I did. Nothing in the issue being
> > discussed was unique to suspend-to-ram. Perhaps we (or at least I) focus
> > too much on the similarities, but that doesn't mean they're not there.
>
> I agree that the current bug is not unique to STR. In fact, I think Romano
> tested both STD and STR, and both had the same bug with the 60s timeout.
>
> But what irritates me is that STR really shouldn't have _had_ that bug at
> all. The only reason STR had the same bug as STD was exactly the fact that
> the two features are too closely inter-twined in the kernel.
>
> That irritates me hugely. We had a bug we should never had had! We had a
> bug because people are sharing code that shouldn't be shared! We had a bug
> because of code that makes no sense in the first place!
>
> I agree that disk snapshotting is much harder. If we had a bug just in
> that part, I wouldn't mind it so much. Getting hard problems wrong isn't
> something you should be ashamed of. What I mind is that the _easier_
> problem got infected by all the bugs from the _harder_ issue. That just
> makes me really really angry and frustrated.
>
> Look at it this way: if you designed a CPU, and you made the integer
> code-path share everything with the floating point side, because "addition
> is addition", and as a result the latency for the simple arithmetic and
> logical ops in integer ALU was four cycles, what would you be?
>
> You'd be a moron, that's what.
>
> And that is _exactly_ what the current STD/STR code does. It says "suspend
> is suspend" and tries to share the same pipeline, even though the two
> operations are totally different, and share nothing but the name people
> use for it (and even the name is really pretty weak, and I've tried to
> get people to use some other name for STD).

I think I get what you're trying to say, but I also think you're either
overstating your case ("...totally different and share nothing but the
name...") or underestimating the similiarity - they both need (albeit
for different reasons) to do the cpu hotplugging, driver suspending
(yeah, there are similarities and differences there) and irq disabling.
That's _some_ similarity. Apart from that, yeah - they are totally
different.

Re the name, we discussed changing the name of Suspend2 on IRC a night
or two ago. We came to the conclusion that, for better or for worse,
it's too well known now. I can see your logic in wanting to
differentiate them, but I seem to be a bit stuck :\. Push some more.
Maybe we'll get there anyway :) Maybe you can get rid of that horrible,
unpronounceable 'swsusp' name while you're at it? :)

> So yes,the two things _do_ share the problem, but they really really
> shouldn't. There's no reason to think that they should. And it drives me
> absolutely bonkers that people seem to have such a hard time seeing that.
>
> That said, I think freezing is crap even for snapshotting/suspend-to-disk,
> but the point of the above rant is to show how insane it is to think that
> problems and complexity in one area should translate into problems and
> complexity in another area.

Does that imply that you'd prefer to see filesystem checkpointing code,
that you think freezing can be done better, or do you have some other
solution that hasn't occurred to me?

> And if the snapshot people want to screw up their snapshots with freezing,
> I don't actually care that much. I'd much rather have working STR. As it
> is now, they're now _both_ broken.

Fair enough :).

Nigel

Attachment: signature.asc
Description: This is a digitally signed message part