Re: [PATCH 1/2] sched/swait: allow swake_up() to return

From: Peter Xu
Date: Fri Nov 10 2017 - 02:10:25 EST


On Thu, Nov 09, 2017 at 11:23:03AM +0100, Peter Zijlstra wrote:
> On Thu, Nov 09, 2017 at 05:18:53PM +0800, Peter Xu wrote:
> > Let swake_up() to return whether any of the waiters is waked up. One use
> > case of it would be:
> >
> > if (swait_active(wq)) {
> > swake_up(wq);
> > // do something when waiter is waked up
> > waked_up++;
> > }
>
> The word is 'woken', and no that doesn't work. All it says is that there
> was a waiter, not that you were to one to wake it. Another concurrent
> wakeup might have done so.

Yes. Or IIUC the waiter can be calling finish_swait() somehow so it
removed itself from the list before being woken.

>
> >
> > Logically it's possible that when reaching swake_up() the wait queue is
> > not active any more, and here doing something like waked_up++ would be
> > inaccurate. To correct it, we need an atomic version of it.
> >
> > With this patch, we can simply re-write it into:
> >
> > if (swake_up(wq)) {
> > // do something when waiter is waked up
> > waked_up++;
> > }
> >
> > After all we are checking swait_active() inside swake_up() too.
>
> We're not in fact; you've been staring at old code; see commit:
>
> 35a2897c2a30 ("sched/wait: Remove the lockless swait_active() check in swake_up*()")

I thought the tree was new enough, but obviously I was wrong...
Thanks for the pointer.

>
>
> Also, you're changing the interface relative to the regular wait
> interface. The two should be similar wherever possible.

Indeed.

I came to this when reading kvm_vcpu_wake_up(), so that only affects
some statistic which may not be that critical. However I don't know
whether there would be any other real use case that we would like to
know exactly whether a call to [s]wake_up() has really done something
or just returned with a NOP.

Anyway, please let me know if you think the same change to wake_up()
would be meaningful, otherwise I can drop this patch and post another
KVM-only one to clean up the redundant callers of swait_active(),
since even if we dropped that list check in 35a2897c2a30, we'll do
that again in swake_up_locked().

And after knowing 35a2897c2a30, I do think that calling swait_active()
before swake_up() is not good since that call is without a lock as
well, just like what can happen before 35a2897c2a30.

(I am not 100% sure whether I fully understand the problem mentioned
in 35a2897c2a30, but I think it's the memory barrier in the
lock/unlock that matters.)

Thanks,

--
Peter Xu