Re: Memory barrier needed with wake_up_process()?

From: Peter Zijlstra
Date: Fri Sep 02 2016 - 15:20:40 EST


On Fri, Sep 02, 2016 at 02:10:13PM -0400, Alan Stern wrote:
> Paul, Peter, and Ingo:
>
> This must have come up before, but I don't know what was decided.
>
> Isn't it often true that a memory barrier is needed before a call to
> wake_up_process()? A typical scenario might look like this:
>
> CPU 0
> -----
> for (;;) {
> set_current_state(TASK_INTERRUPTIBLE);
> if (signal_pending(current))
> break;
> if (wakeup_flag)
> break;
> schedule();
> }
> __set_current_state(TASK_RUNNING);
> wakeup_flag = 0;
>
>
> CPU 1
> -----
> wakeup_flag = 1;
> wake_up_process(my_task);
>
> The underlying pattern is:
>
> CPU 0 CPU 1
> ----- -----
> write current->state write wakeup_flag
> smp_mb();
> read wakeup_flag read my_task->state
>
> where set_current_state() does the write to current->state and
> automatically adds the smp_mb(), and wake_up_process() reads
> my_task->state to see whether the task needs to be woken up.
>
> The kerneldoc for wake_up_process() says that it has no implied memory
> barrier if it doesn't actually wake anything up. And even when it
> does, the implied barrier is only smp_wmb, not smp_mb.
>
> This is the so-called SB (Store Buffer) pattern, which is well known to
> require a full smp_mb on both sides. Since wake_up_process() doesn't
> include smp_mb(), isn't it correct that the caller must add it
> explicitly?
>
> In other words, shouldn't the code for CPU 1 really be:
>
> wakeup_flag = 1;
> smp_mb();
> wake_up_process(task);
>

No, it doesn't need to do that. try_to_wake_up() does the right thing.

It does:

smp_mb__before_spinlock();
raw_spin_lock_irqsave(&p->pi_lock);

Now, smp_mb__before_spinlock() is a bit of an odd duck, if you look at
its comment it says:

/*
* Despite its name it doesn't necessarily has to be a full barrier.
* It should only guarantee that a STORE before the critical section
* can not be reordered with LOADs and STOREs inside this section.
* spin_lock() is the one-way barrier, this LOAD can not escape out
* of the region. So the default implementation simply ensures that
* a STORE can not move into the critical section, smp_wmb() should
* serialize it with another STORE done by spin_lock().
*/
#ifndef smp_mb__before_spinlock
#define smp_mb__before_spinlock() smp_wmb()
#endif


So per default it ends up being:

WMB
LOCK

Which is sufficient to order the prior store vs the later load as is
required. Note that a spinlock acquire _must_ imply a store (we need to
mark the lock as taken), therefore the prior store is ordered against
the lock store per the wmb, and since the lock must imply an ACQUIRE
that limits the load.


Now, PowerPC defines smp_mb__before_spinlock as smp_mb(), and this is
because PowerPC ACQUIRE is a bit of an exception, if you want more
details I'm sure I or Paul can dredge them up :-)