Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

From: Pavel Machek
Date: Wed Apr 25 2007 - 17:44:56 EST


Hi!

> > Can I get you on IRC somewhere? No, I do not think I'm a moron, and
> > yes, I need to suspend^Wsnapshot the devices before, so I have that in
> > the snapshot. Of course, I'll need to resume^Wrestore the devices
> > before writing snapshot. That's okay, it does not take long.
>
> You do NOT need to "suspend" the devices, and that's the whole point.
>
> You may want to save the device info somewhere, BUT THAT IS SOMETHING
> TOTALLY DIFFERENT!
>
> This is *exactly* the confusion I'm talking about. The STD and STR
> codepaths try to use the same function for two TOTALLY DIFFERENT things.
>
> STR actually wants to "suspend".
>
> STD actually wants to "atomic snapshot", and it must not allow allocations
> or anything like that, because the whole snapshot image should be done
> atomically as one event. But it should *not* suspend, because that device
> may actually be needed afterwards.
>
> So not the same thing at all.

Not the same... but they are still related. "freeze" (for atomic
snapshot) is actually subset of "suspend"... freeze needs DMAs off and
saved state, and you need DMAs off and saved state for "suspend".

So it is actually correct to do "suspend" when you want "freeze"; it
is just slow. That's why they only differ in parameter these days.

> So here's what "suspend()" wants:
> - suspend() - preparatory work, can error our, can delay, can park the
> disk, etc etc.
> - suspend_late() - called late, with interrupts disabled, should actually
> suspend if the early suspend didn't do it already
>
> And here is what "snapshot()" wants:
> - prepare_to_snapshot() (for memory allocation)

Lets call this "freeze"?

> - snapshot() - called late, with interrupts disabled, save state.

> and there is absolutely _zero_ overlap between them. There just isn't
> anything in common. Yes, both are two-phase (for the simple reason that
> both want an "atomic" part), but there's really no real overlap.

As I tried to explain, you can use suspend() to stop DMAs and save
state, and that's enough to get sane snapshot. (You do cli() before
doing snapshot, that helps with irqs).

> Just trying to *make* them be the same operations is just going to
> introduce flags that then cause them to be totally different *and*
> confusing and generate bugs. It also means that people do one of them, and
> "it works" for that case, and the other case is totally broken, but it's
> not obvious, because doing one means that the system _thinks_ that you did
> both!
>
> In the very unlikely case that some driver actually *wants* to use the
> same function for snapshots and suspending, that driver could just go
> ahead and _use_ the same function pointer. But now, as things are set up,
> we force a total confusion on drivers by calling them through the same
> interface for two totally different things.

Ok ok, we can do

suspend(PMSG_SUSPEND) -> suspend()
suspend(PMSG_FREEZE) -> freeze()

. We'll need to do big search&replace over the kernel etc. But if you
think it helps with confusion...

I'd still like to keep people using same method for both. It means
suspend path gets more testing, even when some stuff it does is not
strictly neccessary for freeze.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/