Re: [syzbot] [kernel?] WARNING in signal_wake_up_state
From: Linus Torvalds
Date: Tue Jan 09 2024 - 14:06:12 EST
Oleg/Eric, can you make any sense of this?
On Tue, 9 Jan 2024 at 10:18, syzbot
<syzbot+c6d438f2d77f96cae7c2@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> The issue was bisected to:
>
> commit f9010dbdce911ee1f1af1398a24b1f9f992e0080
Hmm. This smells more like a "that triggers the problem" than a cause.
Because the warning itself is
> WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
That's
lockdep_assert_held(&t->sighand->siglock);
at the top of the function, with the call trace being
> signal_wake_up include/linux/sched/signal.h:448 [inline]
just a wrapper setting 'state'.
> zap_process fs/coredump.c:373 [inline]
That's zap_process() that does a
for_each_thread(start, t) {
and then does a
signal_wake_up(t, 1);
on each thread.
> zap_threads fs/coredump.c:392 [inline]
And this is zap_threads(), which does
spin_lock_irq(&tsk->sighand->siglock);
...
nr = zap_process(tsk, exit_code);
Strange. The sighand->siglock is definitely taken.
The for_each_thread() must be hitting a thread with a different
sighand, but it's basically a
list_for_each_entry_rcu(..)
walking over the tsk->signal->thread_head list.
But if CLONE_THREAD is set (so that we share that 'tsk->signal', then
we always require that CLONE_SIGHAND is also set:
if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
return ERR_PTR(-EINVAL);
so we most definitely should have the same ->sighand if we have the
same ->signal. And that's true very much for that vhost_task_create()
case too.
So as far as I can see, that bisected commit does add a new case of
threaded signal handling, but in no way explains the problem.
Is there some odd exit race? The thread is removed with
list_del_rcu(&p->thread_node);
in __exit_signal -> __unhash_process(), and despite the RCU
annotations, all these parts seem to hold the right locks too (ie
sighand->siglock is held by __exit_signal too), so I don't even see
any delayed de-allocation issue or anything like that.
Thus bringing in Eric/Oleg to see if they see something I miss.
Original email at
https://lore.kernel.org/all/000000000000a41b82060e875721@xxxxxxxxxx/
for your pleasure.
Linus