Re: [PATCH v3] signal: break out of wait loops on kthread_stop()

From: Jason A. Donenfeld
Date: Mon Jul 11 2022 - 16:18:38 EST

Hi Eric,

Thanks for the review.

On Mon, Jul 11, 2022 at 01:57:55PM -0500, Eric W. Biederman wrote:
> For kthread_stop clearing TIF_NOTIFY_SIGNAL seems pointless as the
> process has exited.

Alright, I'll get rid of the clear there.

> I wonder if we should clear TIF_NOTIFY_SIGNAL in kthread_parkme.
> Ordinarily TIF_NOTIFY_SIGNAL gets cleared when the target process
> calls get_signal. Which doesn't happen for kernel threads.
> So I am not certain what the best pattern is, but my sense is that
> keeping things as close to how TIF_NOTIFY_SIGNAL is processed
> in the rest of the kernel seems the least error prone pattern.
> So I am thinking the code should do something like:
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 544fd4097406..c80e8d44e405 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -266,6 +266,7 @@ static void __kthread_parkme(struct kthread *self)
> * changin from TASK_PARKED and us failing the
> * wait_task_inactive() in kthread_park().
> */
> + clear_notify_signal();
> set_special_state(TASK_PARKED);
> if (!test_bit(KTHREAD_SHOULD_PARK, &self->flags))
> break;

That seems at least semantically correct for how parkers usually work.

> I am trying to think about how things interact if two threads call
> kthread_park. If the target thread is not responsible for clearing
> TIF_NOTIFY_SIGNAL it feels like two threads could get into a race. A
> race where one thread sees the target thread has parked itself right
> after another thread calls kthread_park and sets TIF_NOTIFY_SIGNAL.
> Given the nature of kthread_park that scenario is probably completely
> ridiculous, but it is all I can think of at the moment to demonstrate
> my concerns.

Right, it's a bit of a stretch to consider two threads racing on
kthread_park(), and were it to happen, maybe the right response would
be, "take a mutex so you don't race for that weird case." But it still
seems semantically correct to clear the flag as your diff does.

> In practice kthread_park is pretty specialized. Do you have any
> evidence that we need to break out of interruptible loops to make it
> more reliable. Perhaps it is best just to handle kthread_stop, and
> leave kthread_park for when it actually matters. Which would ensure
> there are examples that people care about to help sort through exactly
> how the code should behave in the kthread_park case.

Oh, okay. I guess I can do that and we'll address the park case sometime
later if it ever comes up? But actually, upon very very cursory look, it
might actually be an issue now... The uses of it currently break down

Core kernel things:
- kernel/stop_machine.c
- kernel/smpboot.c
- kernel/cpu.c

- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
- drivers/gpu/drm/msm/adreno/adreno_device.c
- drivers/gpu/drm/scheduler/sched_main.c
- drivers/net/wireless/mediatek/mt76/util.h
- drivers/firmware/psci/psci_checker.c
- drivers/md/raid5-cache.c
- fs/btrfs/disk-io.c

btrfs makes calls to _interruptible() functions (though I'm not sure how
the context stack traces work out). The drm scheduler appears to use
wait_event_interruptible() in the first couple lines of the thread's
loop, though it's wait condition does check for kthread_should_stop().

So I don't know...

> Which I guess my long way of saying I think you can just change
> kthread_stop to say:
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 544fd4097406..52e9b3432496 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -704,6 +704,7 @@ int kthread_stop(struct task_struct *k)
> kthread = to_kthread(k);
> set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
> kthread_unpark(k);
> + set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
> wake_up_process(k);
> wait_for_completion(&kthread->exited);
> ret = kthread->result;

Okay. I'll constrain it to just kthread_stop(). But please file away in
the back of your mind the potential for kthread_park() to be problematic
down the line, in case we have to fix that later.