Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

From: Neil Brown
Date: Wed Jun 02 2010 - 07:02:48 EST


On Wed, 2 Jun 2010 02:12:10 -0700
Arve HjÃnnevÃg <arve@xxxxxxxxxxx> wrote:

> 2010/6/2 Neil Brown <neilb@xxxxxxx>:
> > On Wed, 2 Jun 2010 00:05:14 -0700
> > Arve HjÃnnevÃg <arve@xxxxxxxxxxx> wrote:
> >
> >> On Tue, Jun 1, 2010 at 10:32 PM, Neil Brown <neilb@xxxxxxx> wrote:
> >> > On Tue, 1 Jun 2010 12:50:01 +0200 (CEST)
> >> > Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> >
> >> >> On Tue, 1 Jun 2010, Neil Brown wrote:
> >> >> >
> >> >> > I think you have acknowledged that there is a race with suspend - thanks.
> >> >> > Next step was "can it be closed".
> >> >> > You seem to suggest that it can, but you describe it as a "work around"
> >> >> > rather than a "bug fix"...
> >> >> >
> >> >> > Do you agree that the race is a "bug", and therefore it is appropriate to
> >> >> > "fix" it assuming an acceptable fix can be found (which I think it can)?
> >> >>
> >> >> If we can fix it, yes we definitely should do and not work around it.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Â Â Â tglx
> >> >
> >> > OK.
> >> > Here is my suggestion.
> >> >
> >> > While I think this patch would actually work, and hope the ugly aspects are
> >> > reasonably balanced by the simplicity, I present it primarily as a base for
> >> > improvement.
> >> > The important part is to present how drivers and user-space can co-operate
> >> > to avoid losing wake-events. ÂThe details of what happens in the kernel are
> >> > certainly up for discussion (as is everything else really of course).
> >> >
> >> > The user-space suspend daemon avoids losing wake-events by using
> >> > fcntl(F_OWNER) to ensure it gets a signal whenever any important wake-event
> >> > is ready to be read by user-space. ÂThis may involve:
> >> > Â- the one daemon processing all wake events
> >>
> >> Wake up events are not all processed by one daemon.
> >
> > Not with your current user-space code, no. ÂAre you saying that you are not
> > open to any significant change in the Android user-space code? ÂThat would
> > make the situation a lot harder to resolve.
> >
>
> Some wakeup events like the an incoming phone may be handled by a
> vendor supplied daemon that I do not have the source code for. And, no
> I'm not open to a change that would require all wakeup events to go to
> a single process.

Ahh.. Well I have no answer for the "I must support a closed-source app"
card that has not been heard 1000 times already.

My proposal doesn't require all wakeup events to go through one single
process - it was just one of (at least) 3 options.

>
> >>
> >> > Â- Both the suspend daemon and the main event handling daemon opening any
> >> > Â Âgiven device that delivers wake events (this should work with input
> >> > Â Âevents ... unless grabbing is needed)
> >>
> >> Not all wakeup events are broadcast like input events so they cannot
> >> be read by both daemons. Not that this really matters, since reading
> >> the event from the suspend daemon does not mean that it has been
> >> delivered to and processed by the other daemon.
> >
> > There would still need to be some sort of communication between the the
> > suspend daemon on any event daemon to ensure that the events had been
> > processed. ÂThis could be very light weight interaction. ÂThe point though is
> > that with this patch it becomes possible to avoid races. ÂPossible is better
> > than impossible.
> >
>
> We already have a solution. I don't think rejecting our solution but
> merging a worse solution should be the goal.
>
> >>
> >> > Â- The event handling daemon giving the suspend-daemon's pid as F_OWNER, and
> >> > Â Âusing poll/select to get the events itself.
> >>
> >> I don't like the idea of using signals for this. Without the hack Alan
> >> Stern suggested, you will temporarily block suspend if the wakeup
> >> event happened before the suspend daemon thread made it to the kernel,
> >> but abort suspend if it happened right after.
> >
> > I'm not sure why that difference matters. ÂBut I'm also not sure that it is
> > true.
> > When any wakeup event happen, a signal will be delivered to the suspend
> > daemon.
> > This will interrupt a pending suspend request, or a sleep, or whatever else
> > the daemon is doing.
> > It can then go back to waiting for a good time to suspend, and then try to
> > suspend again.
> >
>
> This is inferior to the solution that is in the android kernel and the
> suspend blocker patchset. Both suspend as soon as possible and do not
> require signal handlers that modify the argument to your kernel call.
>

The solution in the android kernel and the suspend blocker patchset both
share one fairly fatal flaw - they are not being accepted upstream.
I am trying to find a minimal suitable solution that does not share that
flaw.
I do not know yet if it does or not, but as it is fixing a real (design) bug,
I feel it has some chance. Of course if it doesn't meet your need, then
that becomes irrelevant....

And there is no requirement to modify any arguments in any signal handlers.

> >
> >>
> >> >
> >> > When 'mem' is written to /sys/power/state, suspend_prepare waits in an
> >> > interruptible wait until any wake-event that might have been initiated before
> >> > the suspend was request, has had a chance to be queued for user-space and
> >> > trigger kill_fasync.
> >>
> >> And what happens if you are not waiting when this happens?
> >
> > I'm not sure I understand the question. ÂCould you explain it please?
> >
>
> If the thread is not already in the kernel how does your signal
> handler abort suspend.

setjmp / longjmp. This is the time-honoured method for allowing a signal to
break the flow of a program and re-start somewhere else.

>
> > Either the initial event happens late enough to abort/resume the suspend, or
> > the signal happens early enough to abort the suspend, or alert the daemon not
> > to do a suspend. ÂEither way we don't get stuck in suspend.
> >
> >
> >>
> >> > Currently this wait is a configurable time after the last wake-event was
> >> > initiated. ÂThis is hackish, but simple and probably adequate.
> >>
> >> Waiting after a wake event is the same as suspend_block_timeout. This
> >> is useful when passing events through layers of code that does no
> >> block suspend, but we should strive to avoid it. Other people had much
> >> stronger objections to this, which is why it is not included in the
> >> last suspend blocker patchset.
> >
> > Absolutely agree. ÂThe idea of of waiting was just a simple way to present
> > code that actually could work. ÂThere are doubtlessly better ways and I
> > assume they have been implemented in the suspend-blocker code.
> > We just need some way to wait for the suspend-block count to reach zero, with
> > some confidence that this amount of time is limited.
> >
> > (though to be honest ... the incredible simplicity of waiting a little while
> > is very compelling.... :-))
> >
>
> Sure, but forcing that as the only way to prevent suspend is taking to too far.
>
> >>
> >> It also does not work for drivers that need to block suspend for more
> >> than a few seconds. For instance the gpio keypad matrix driver needs
> >> to block suspend while keys are pressed so it can scan the keypad.
> >
> > I cannot imagine why it would take multiple seconds to scan a keypad.
> > Can you explain that?
> >
> > Do you mean while keys are held pressed?
>
> Yes.
>
> > ÂMaybe you don't get a wake-up event
> > on key-release?
>
> We should.
>
> > ÂIn that case your user-space daemon could block suspend
> > while there are any pressed keys.... Âconfused.
> >
>
> The user-space daemon should not need to know which keys are in a
> matrix. We also have other drivers that need to block suspend. For
> instance, some devices need to block suspend while connected to a USB
> host.

And this decision (to block suspend) really needs to be made in the driver,
not in userspace?

You could get those drivers to return EBUSY from PM_SUSPEND_PREPARE (which
would need to be a configurable option), but then I guess you have no way to
wait for the device to become non-busy.

If user-space really cannot tell if the driver is busy or not, then I would
suggest that the driver is fairly poorly designed.

It would seem then that a user-space requested suspend is not sufficient for
your needs even if we remove the race window, as you have drivers that want
to avoid suspend indefinitely, and that "need to avoid suspend" status is not
visible from user-space.
It is a pity that this extra requirement was not clear from your introduction
to the "Opportunistic suspend support" patch.

If that be the case, I'll stop bothering you with suggestions that can never
work.
Thanks for your time,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/