Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug

From: Tejun Heo
Date: Fri Feb 02 2024 - 12:29:31 EST


Hello, Helge.

On Fri, Feb 02, 2024 at 09:41:38AM +0100, Helge Deller wrote:
> In a second step I extended your patch to print the present
> and online CPUs too. Below is the relevant dmesg part.
>
> Note, that on parisc the second CPU will be activated later in the
> boot process, after the kernel has the inventory.
> This I think differs vs x86, where all CPUs are available earlier
> in the boot process.
> ...
> [ 0.000000] XXX workqueue_init_early: possible_cpus=ffff present=0001 online=0001
..
> [ 0.228080] XXX workqueue_init: possible_cpus=ffff present=0001 online=0001
..
> [ 0.263466] XXX workqueue_init_topology: possible_cpus=ffff present=0001 online=0001

So, what's bothersome is that when the wq_dump.py script printing each cpu's
pwq, it's only printing for CPU 0 and 1. The for_each_possible_cpu() drgn
helper reads cpu_possible_mask from the kernel and iterates that, so that
most likely indicates at some point the cpu_possible_mask becomes 0x3
instead of the one used during boot - 0xffff, which is problematic.

Can you please sprinkle more printks to find out whether and when the
cpu_possible_mask changes during boot?

Thanks.

--
tejun