Re: [PATCH 1/4] workqueue: Reap workers via kthread_stop() and remove detach_completion

From: Lai Jiangshan
Date: Wed Jul 24 2024 - 20:11:57 EST


Hello Marc

Thank you for the report.

On Wed, Jul 24, 2024 at 12:19 AM Marc Hartmayer <mhartmay@xxxxxxxxxxxxx> wrote:

> Hi Lai,
>
> a bisect of a regression in our CI on s390x led to this patch. The bug
> is pretty easy to reproduce (currently, I only tested it on s390x - will
> try to test it on x86 as well):

I can't reproduce it in x86 after testing it for only 30 minutes.
It can definitely theoretically happen in x86.

>
> 1. Start a Linux QEMU/KVM guest with 2 cores using this patch and enable
> `panic_on_warn=1` for the guest kernel.
> 2. Run the following command in the KVM guest:
>
> $ dd if=/dev/zero of=/dev/null & while : ; do chcpu -d 1; chcpu -e 1; done
>
> 3. Wait for the crash. e.g.:
>
> 2024/07/23 18:01:21 [M83LP63]: [ 157.267727] ------------[ cut here ]------------
> 2024/07/23 18:01:21 [M83LP63]: [ 157.267735] WARNING: CPU: 21 PID: 725 at kernel/workqueue.c:3340 worker_thread+0x54e/0x558


> @@ -3330,7 +3338,6 @@ static int worker_thread(void *__worker)
> ida_free(&pool->worker_ida, worker->id);
> worker_detach_from_pool(worker);
> WARN_ON_ONCE(!list_empty(&worker->entry));
> - kfree(worker);
> return 0;
> }

The condition "!list_empty(&worker->entry)" can be true when the
worker is still in the cull_list awaiting being reaped by
reap_dying_workers() after
this change.

I will remove the WARN_ON_ONCE().

Thanks
Lai