Re: [PATCH] kernel: fix data race in put_pid

From: Peter Zijlstra
Date: Fri Sep 18 2015 - 04:57:32 EST

Next message: Charles Keepax: "Re: [PATCH 1/2] mfd: Fixup clients of multi_reg_write/register_patch"
Previous message: Chao Yu: "[PATCH 3/3] f2fs: disallow switch extent_cache option dynamically"
In reply to: Dmitry Vyukov: "Re: [PATCH] kernel: fix data race in put_pid"
Next in thread: Peter Zijlstra: "Re: [PATCH] kernel: fix data race in put_pid"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Sep 17, 2015 at 08:09:19PM +0200, Oleg Nesterov wrote:
> On 09/17, Dmitry Vyukov wrote:
> >
> > I can update the patch description, but let me explain it here first.
>
> Yes thanks.
>
> > Here is the essence of what happens:
>
> Aha, so you really meant that 2 put_pid's can race with each other,
>
> > // thread 1
> > 1: pid->foo = 1; // foo is the first word of pid object
> > // then it does put_pid
> > 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
> > returns false so the function returns
> >
> > // thread 2
> > // executes put_pid
> > 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
> > // then kmem_cache_free does:
> > 4: *(void**)pid = head->freelist;
> > 5: head->freelist = (void*)pid;
> >
> > This can be executed as:
> >
> > 4: *(void**)pid = head->freelist;
> > 1: pid->foo = 1; // foo is the first word of pid object
> > 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and
> > returns false so the function returns
> > 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free
> > 5: head->freelist = (void*)pid;
>
> Unless I am totally confused, everything is simpler. We can forget
> about the hoisting, freelist, etc.
>
> Thread 2 can see the result of atomic_dec_and_test(), but not the
> result of "pid->foo = 1". In this case in can free the object which
> can be re-allocated _before_ STORE(pid->foo) completes. Of course,
> this would be really bad.
>
> I need to recheck, but afaics this is not possible. This optimization
> is fine, but probably needs a comment.

For sure, this code doesn't make any sense to me.

> We rely on delayed_put_pid()
> called by RCU. And note that nobody can write to this pid after it
> is removed from the rcu-protected list.
>
> So I think this is false alarm, but I'll try to recheck tomorrow, it
> is too late for me today.

As an alternative patch, could we not do:

void put_pid(struct pid *pid)
{
struct pid_namespace *ns;

if (!pid)
return;

ns = pid->numbers[pid->level].ns;
if ((atomic_read(&pid->count) == 1) ||
atomic_dec_and_test(&pid->count)) {

+ smp_read_barrier_depends(); /* ctrl-dep */

kmem_cache_free(ns->pid_cachep, pid);
put_pid_ns(ns);
}
}

That would upgrade the atomic_read() path to a full READ_ONCE_CTRL(),
and thereby avoid any of the kmem_cache_free() stores from leaking out.
And its free, except on Alpha. Whereas the atomic_read_acquire() will
generate a full memory barrier on whole bunch of archs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Charles Keepax: "Re: [PATCH 1/2] mfd: Fixup clients of multi_reg_write/register_patch"
Previous message: Chao Yu: "[PATCH 3/3] f2fs: disallow switch extent_cache option dynamically"
In reply to: Dmitry Vyukov: "Re: [PATCH] kernel: fix data race in put_pid"
Next in thread: Peter Zijlstra: "Re: [PATCH] kernel: fix data race in put_pid"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]