Re: [PATCH] watchdog/softlockup: Fix softlockup_stop_all() hungtask bug

From: Petr Mladek
Date: Wed Sep 22 2021 - 10:32:11 EST


On Wed 2021-09-22 17:52:02, Jinhui Guo wrote:
> If NR_CPUS equal to 1, it would trigger hungtask, it can be
> triggered by follow command:
> echo 0 > /proc/sys/kernel/watchdog
> echo 1 > /proc/sys/kernel/watchdog
> The hungtask stack:
> __schedule
> schedule
> schedule_timeout
> __wait_for_common
> softlockup_stop_fn
> lockup_detector_reconfigure
> proc_watchdog_common
> proc_watchdog
> proc_sys_call_handler
> vfs_write
> ksys_write
> The watchdog_allowed_mask is completely cleared when the
> watchdog is disabled. But the macro for_each_cpu() assume
> all masks are "1" when macro NR_CPUS equal to 1. It makes
> watchdog_allowed_mask not work at all.
>
> Fixes: be45bf5395e0 ("watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug")
>
> Cc: Petr Mladek <pmladek@xxxxxxxx>
> Reported-by: kernel test robot <lkp@xxxxxxxxx>
> Signed-off-by: Jinhui Guo <guojinhui@xxxxxxxxxx>
> ---
> arch/x86/include/asm/smp.h | 5 +++++
> include/linux/cpumask.h | 5 +++--
> 2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> index 630ff08532be..f5d3ca5696b3 100644
> --- a/arch/x86/include/asm/smp.h
> +++ b/arch/x86/include/asm/smp.h
> @@ -21,7 +21,12 @@ DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
>
> static inline struct cpumask *cpu_llc_shared_mask(int cpu)
> {
> +#ifdef CONFIG_SMP
> return per_cpu(cpu_llc_shared_map, cpu);
> +#else
> + /* cpu_llc_shared_map is not defined while !CONFIG_SMP */
> + return cpu_all_mask;
> +#endif

This looks dangerous. The cpumask returned can be modified,
for example, in

static void remove_siblinginfo(int cpu)
{
[...]
for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
[...]
cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
[...]
}

With you patch, this code would clear "cpu_all_mask" which
is wrong!

We need to fix this another way. I see few possibilities:

1. Define cpu_llc_shared_map even with NR_CPUS == 1.

2. Do not compile the problematic code with NR_CPUS == 1.
It seems to be needed for CPU hotplug so it does not
make sense to have it.

3. Define for_each_cpu_optimized() that would behave the same
way as the non-patched for_each_cpu(). It can be
used in remove_siblinginfo().

4. Do not patch for_each_cpu(). Instead, add for_each_cpu_safe()
that would work correctly even with NR_CPUS == 1.
It can than be used in watchdog.c.


IMHO, the 2nd solution would be the best if the change
is simple.

The 3rd solution is my other favorite. It would keep
the default for_each_cpu() safe. People might use
for_each_cpu_optimized() only when they know what
they are doing. It will be easy to find these
locations.

Best Regards,
Petr