Re: [PATCH v6] taskstats: fix data-race

From: Dmitry Vyukov
Date: Wed Oct 23 2019 - 08:40:10 EST


On Wed, Oct 23, 2019 at 2:16 PM Andrea Parri <parri.andrea@xxxxxxxxx> wrote:
>
> On Mon, Oct 21, 2019 at 01:33:27PM +0200, Christian Brauner wrote:
> > When assiging and testing taskstats in taskstats_exit() there's a race
> > when writing and reading sig->stats when a thread-group with more than
> > one thread exits:
> >
> > cpu0:
> > thread catches fatal signal and whole thread-group gets taken down
> > do_exit()
> > do_group_exit()
> > taskstats_exit()
> > taskstats_tgid_alloc()
> > The tasks reads sig->stats without holding sighand lock.
> >
> > cpu1:
> > task calls exit_group()
> > do_exit()
> > do_group_exit()
> > taskstats_exit()
> > taskstats_tgid_alloc()
> > The task takes sighand lock and assigns new stats to sig->stats.
> >
> > The first approach used smp_load_acquire() and smp_store_release().
> > However, after having discussed this it seems that the data dependency
> > for kmem_cache_alloc() would be fixed by WRITE_ONCE().
> > Furthermore, the smp_load_acquire() would only manage to order the stats
> > check before the thread_group_empty() check. So it seems just using
> > READ_ONCE() and WRITE_ONCE() will do the job and I wanted to bring this
> > up for discussion at least.
>
> Mmh, the RELEASE was intended to order the memory initialization in
> kmem_cache_zalloc() with the later ->stats pointer assignment; AFAICT,
> there is no data dependency between such memory accesses.

I agree. This needs smp_store_release. The latest version that I
looked at contained:
smp_store_release(&sig->stats, stats_new);

> Correspondingly, the ACQUIRE was intended to order the ->stats pointer
> load with later, _independent dereferences of the same pointer; the
> latter are, e.g., in taskstats_exit() (but not thread_group_empty()).

How these later loads can be completely independent of the pointer
value? They need to obtain the pointer value from somewhere. And this
can only be done by loaded it. And if a thread loads a pointer and
then dereferences that pointer, that's a data/address dependency and
we assume this is now covered by READ_ONCE.
Or these later loads of the pointer can also race with the store? If
so, I think they also need to use READ_ONCE (rather than turn this earlier
pointer load into acquire).


> Looking again, I see that fill_tgid_exit()'s dereferences of ->stats
> are protected by ->siglock: maybe you meant to rely on such a critical
> section pairing with the critical section in taskstats_tgid_alloc()?
>
> That memcpy(-, tsk->signal->stats, -) at the end of taskstats_exit()
> also bugs me: could these dereferences of ->stats happen concurrently
> with other stores to the same memory locations?
>
> Thanks,
> Andrea