Re: Attempted summary of suspend-blockers LKML thread

From: Rafael J. Wysocki
Date: Thu Aug 05 2010 - 19:44:10 EST


On Friday, August 06, 2010, Arve Hjønnevåg wrote:
> 2010/8/5 Rafael J. Wysocki <rjw@xxxxxxx>:
> > On Thursday, August 05, 2010, Arve Hjønnevåg wrote:
> >> 2010/8/4 Rafael J. Wysocki <rjw@xxxxxxx>:
> >> > On Thursday, August 05, 2010, Arve Hjønnevåg wrote:
> >> >> On Wed, Aug 4, 2010 at 1:56 PM, Matthew Garrett <mjg59@xxxxxxxxxxxxx> wrote:
> >> >> > On Wed, Aug 04, 2010 at 10:51:07PM +0200, Rafael J. Wysocki wrote:
> >> >> >> On Wednesday, August 04, 2010, Matthew Garrett wrote:
> >> >> >> > No! And that's precisely the issue. Android's existing behaviour could
> >> >> >> > be entirely implemented in the form of binary that manually triggers
> >> >> >> > suspend when (a) the screen is off and (b) no userspace applications
> >> >> >> > have indicated that the system shouldn't sleep, except for the wakeup
> >> >> >> > event race. Imagine the following:
> >> >> >> >
> >> >> >> > 1) The policy timeout is about to expire. No applications are holding
> >> >> >> > wakelocks. The system will suspend providing nothing takes a wakelock.
> >> >> >> > 2) A network packet arrives indicating an incoming SIP call
> >> >> >> > 3) The VOIP application takes a wakelock and prevents the phone from
> >> >> >> > suspending while the call is in progress
> >> >> >> >
> >> >> >> > What stops the system going to sleep between (2) and (3)? cgroups don't,
> >> >> >> > because the voip app is an otherwise untrusted application that you've
> >> >> >> > just told the scheduler to ignore.
> >> >> >>
> >> >> >> I _think_ you can use the just-merged /sys/power/wakeup_count mechanism to
> >> >> >> avoid the race (if pm_wakeup_event() is called at 2)).
> >> >> >
> >> >> > Yes, I think that solves the problem. The only question then is whether
> >> >>
> >> >> How? By passing a timeout to pm_wakeup_event when the network driver
> >> >> gets the packet or by passing 0. If you pass a timeout it is the same
> >> >> as using a wakelock with a timeout and should work (assuming the
> >> >> timeout you picked is long enough). If you don't pass a timeout it
> >> >> does not work, since the packet may not be visible to user-space yet.
> >> >
> >> > Alternatively, pm_stay_awake() / pm_relax() can be used.
> >> >
> >>
> >> Which makes the driver and/or network stack changes identical to using
> >> wakelocks, right?
> >
> > Please refer to the Matthew's response.
> >
> >> >> > it's preferable to use cgroups or suspend fully, which is pretty much up
> >> >> > to the implementation. In other words, is there a reason we're still
> >> >>
> >> >> I have seen no proposed way to use cgroups that will work. If you
> >> >> leave some processes running while other processes are frozen you run
> >> >> into problems when a frozen process holds a resource that a running
> >> >> process needs.
> >> >>
> >> >>
> >> >> > having this conversation? :) It'd be good to have some feedback from
> >> >> > Google as to whether this satisfies their functional requirements.
> >> >> >
> >> >>
> >> >> That is "this"? The merged code? If so, no it does not satisfy our
> >> >> requirements. The in kernel api, while offering similar functionality
> >> >> to the wakelock interface, does not use any handles which makes it
> >> >> impossible to get reasonable stats (You don't know which pm_stay_awake
> >> >> request pm_relax is reverting).
> >> >
> >> > Why is that a problem (out of curiosity)?
> >> >
> >>
> >> Not having stats or not knowing what pm_relax is undoing? We need
> >> stats to be able to debug the system.
> >
> > You have the stats in struct device and they are available via sysfs.
> > I suppose they are insufficient, but I'd like to know why exactly.
> >
>
> Our wakelock stats currently have
> (name,)count,expire_count,wake_count,active_since,total_time,sleep_time,max_time
> and last_change. Not all of these are equally important (total_time is
> most important followed by active_since), but you only have count.
> Also as discussed before, many wakelocks/suspendblockers are not
> associated with a struct device.

OK

How much of that is used in practice and what for exactly?

Do you _really_ have to debug the wakelocks in drivers that much?

> >> If the system does not suspend
> >> at all or is awake for too long, the wakelock stats tells us which
> >> component is at fault. Since pm_stay_awake and pm_relax does not
> >> operate on a handle, you cannot determine how long it prevented
> >> suspend for.
> >
> > Well, if you need that, you can add a counter of "completed events" into
>
> We need more than that (see above).
>
> > struct dev_pm_info and a function similar to pm_relax() that
> > will update that counter. I don't think anyone will object to that change.
> >
>
> What about adding a handle that is passed to all three functions?

I don't think that will fly at this point.

> >> >> The proposed in user-space interface
> >> >> of calling into every process that receives wakeup events before every
> >> >> suspend call
> >> >
> >> > Well, you don't really need to do that.
> >> >
> >>
> >> Only if the driver blocks suspend until user-space has read the event.
> >> This means that for android to work we need to block suspend when
> >> input events are not processed, but a system using your scheme needs a
> >> pm_wakeup_event call when the input event is queued. How to you switch
> >> between them? Do we add separate ioctls in the input device to enable
> >> each scheme? If someone has a single threaded user space power manager
> >> that also reads input event it will deadlock if you block suspend
> >> until it reads the input events since you block when reading the wake
> >> count.
> >
> > Well, until someone actually tries to implement a power manager in user space
> > it's a bit vague.
> >
>
> Not having clear rules for what the drivers should do is a problem.
> The comments in your code seem to advocate using timeouts instead of
> overlapping pm_stay_awake/pm_relax sections. I find this
> recommendation strange given all the opposition to
> wakelock/suspendblocker timeouts.

There's no recommendation either way.

> But more importantly, calling
> pm_wakeup_event with a timeout of 0 is incompatible with the android
> user space code,

Which I don't find really relevant, sorry.

> and I would prefer that the kernel interfaces would
> encourage drivers to block suspend until user space has consumed the
> event, which works for the android user space, instead of just long
> enough to work with a hypothetical user space power manager.

Well, that are your personal preferences, which I respect. I also have some
personal preferences that are not necessarily followed by the kernel code.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/