Re: [PATCH] RFC: console: hack up console_lock more v3
From: Peter Zijlstra
Date: Thu May 09 2019 - 09:38:38 EST
On Thu, May 09, 2019 at 03:06:09PM +0200, Daniel Vetter wrote:
> On Thu, May 9, 2019 at 2:31 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Thu, May 09, 2019 at 02:09:03PM +0200, Daniel Vetter wrote:
> > > Fix this by creating a prinkt_safe_up() which calls wake_up_process
> > > outside of the spinlock. This isn't correct in full generality, but
> > > good enough for console_lock:
> > >
> > > - console_lock doesn't use interruptible or killable or timeout down()
> > > calls, hence an up() is the only thing that can wake up a process.
> >
> > Wrong :/ Any task can be woken at any random time. We must, at all
> > times, assume spurious wakeups will happen.
>
> Out of curiosity, where do these come from? I know about the races
> where you need to recheck on the waiter side to avoid getting stuck,
> but didn't know about this. Are these earlier (possibly spurious)
> wakeups that got held up and delayed for a while, then hit the task
> much later when it's already continued doing something else?
Yes, this. So they all more or less have the form:
CPU0 CPU1
enqueue_waiter()
done = true;
if (waiters)
for (;;) {
if (done)
break;
...
}
dequeue_waiter()
do something else again
wake_up_task
<gets wakeup>
The wake_q thing made the above much more common, but we've had it
forever.
> Or even
> more random, and even if I never put a task on a wait list or anything
> else, ever, it can get woken spuriously?
I had patches that did that on purpose, but no.
> > Something like the below might work.
>
> Yeah that looks like the proper fix. I guess semaphores are uncritical
> enough that we can roll this out for everyone. Thanks for the hint.
It's actually an optimization that we never did because semaphores are
so uncritical :-)
The thing is, by delaying the wakup until after we've released the
spinlock, the waiter will not contend on the spinlock the moment it
wakes.