Re: KCSAN: data-race in exit_signals / prepare_signal

From: Marco Elver
Date: Mon Oct 21 2019 - 08:15:50 EST


On Mon, 21 Oct 2019 at 14:00, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>
> On 10/21, Christian Brauner wrote:
> >
> > This traces back to Oleg fixing a race between a group stop and a thread
> > exiting before it notices that it has a pending signal or is in the middle of
> > do_exit() already, causing group stop to get wacky.
> > The original commit to fix this race is
> > commit d12619b5ff56 ("fix group stop with exit race") which took sighand
> > lock before setting PF_EXITING on the thread.
>
> Not really... sig_task_ignored() didn't check task->flags until the recent
> 33da8e7c81 ("signal: Allow cifs and drbd to receive their terminating signals").
> But I think this doesn't matter, see below.
>
> > If the race really matters and given how tsk->flags is currently accessed
> > everywhere the simple fix for now might be:
> >
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index c4da1ef56fdf..cf61e044c4cc 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -2819,7 +2819,9 @@ void exit_signals(struct task_struct *tsk)
> > cgroup_threadgroup_change_begin(tsk);
> >
> > if (thread_group_empty(tsk) || signal_group_exit(tsk->signal)) {
> > + spin_lock_irq(&tsk->sighand->siglock);
> > tsk->flags |= PF_EXITING;
> > + spin_unlock_irq(&tsk->sighand->siglock);
>
> Well, exit_signals() tries to avoid ->siglock in this case....
>
> But this doesn't matter. IIUC the problem is not that exit_signals() sets
> PF_EXITING, the problem is that it writes to tsk->flags and kasan detects
> the data race.
>
> For example, freezable_schedule() which sets PF_FREEZER_SKIP can equally
> "race" with sig_task_ignored() or with ANY other code which checks this
> task's flags.
>
> I think this is WONTFIX.

If taking the spinlock is unnecessary (which AFAIK it probably is) and
there are no other writers to this flag, you will still need a
WRITE_ONCE(tsk->flags, tsk->flags | PF_EXITING) to avoid the
data-race.

However, if it is possible that there are concurrent writers setting
other bits in flags, you need to ensure that the bits are set
atomically (unless it's ok to lose some bits here). This can only be
done via 1) taking siglock, or 2) via e.g. atomic_or(...), however,
flags is not atomic_t ...

-- Marco