Re: [PATCH] kernel: fix data race in put_pid

From: Peter Zijlstra
Date: Fri Sep 18 2015 - 09:52:09 EST


On Fri, Sep 18, 2015 at 03:28:44PM +0200, Oleg Nesterov wrote:
> On 09/18, Peter Zijlstra wrote:
> >
> > On Thu, Sep 17, 2015 at 08:09:19PM +0200, Oleg Nesterov wrote:
> >
> > > I need to recheck, but afaics this is not possible. This optimization
> > > is fine, but probably needs a comment.
> >
> > For sure, this code doesn't make any sense to me.
>
> So yes, after a sleep I am starting to agree that in theory this fast-path
> check is wrong. I'll write another email..

This other mail will include a patch adding comments to pid.c ? That
code didn't want to make sense to me this morning.

> > As an alternative patch, could we not do:
> >
> > void put_pid(struct pid *pid)
> > {
> > struct pid_namespace *ns;
> >
> > if (!pid)
> > return;
> >
> > ns = pid->numbers[pid->level].ns;
> > if ((atomic_read(&pid->count) == 1) ||
> > atomic_dec_and_test(&pid->count)) {
> >
> > + smp_read_barrier_depends(); /* ctrl-dep */
>
> Not sure... Firstly it is not clear what this barrier pairs with. And I
> have to admit that I can not understand if _CTRL() logic applies here.
> The same for atomic_read_ctrl().

The control dependency barrier pairs with the full barrier of
atomic_dec_and_test.

So the two put_pid() instances:

CPU0 CPU1

pid->foo = 1;
atomic_dec_and_test() == false atomic_read_ctrl() == 1
kmem_cache_free(pid)

CPU0 will modify a pid field and decrement, but not reach 0.
CPU1 finds we're the last, but must also be able to observe our foo
store such that we can rest assured it is complete before we free the
storage.

The freeing of pid, on CPU1, is stores, these must not happen before we
satisfy the freeing condition, iow a load-store barrier, which is what
the control dependency provides.

> OK, please forget about put_pid() for the moment. Suppose we have
>
> X = 1;
> synchronize_sched();
> Y = 1;
>
> Or
> X = 1;
> call_rcu_sched( func => { Y = 1; } );
>
>
>
> Now. In theory this this code is wrong:
>
> if (Y) {
> BUG_ON(X == 0);
> }
>
> But this is correct:
>
> if (Y) {
> rcu_read_lock_sched();
> rcu_read_unlock_sched();
> BUG_ON(X == 0);
> }
>
> So perhaps something like this
>
> /*
> * Comment to explain it is eq to read_lock + read_unlock,
> * in a sense that this guarantees a full barrier wrt to
> * the previous synchronize_sched().
> */
> #define rcu_read_barrier_sched() barrier()
>
> make sense?
>
>
> And again, I simply can't understand if this code
>
> if (READ_ONCE_CTRL(Y))
> BUG_ON(X == 0);
>
> to me it does _not_ look correct in theory.

So control dependencies provide a load-store barrier. Your examples
above rely on a load-load barrier; BUG_ON(X == 0) is a load.

kmem_cache_free() OTOH is stores (we must modify the free list).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/